Corpus and Baseline Model for Domain-Specific Entity Recognition in German
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Transfer Learning approaches are a promising means to analyze low-resource domain specific texts. The German SmartData corpus is the first German corpus, annotated with entities from different domains, and thus allows to investigate transfer learning approaches for Named Entity Recognition (NER) on different domains. In order to prepare such investigations, this work includes a thorough analysis of the SmartData corpus, and a revision w.r.t. annotations and the split into training and test data, considering the distribution of document and entity types. Based on that a baseline model for NER using BiLSTM-CRF neural networks including hyperparameter optimization is presented.
Details
| Originalsprache | Englisch |
|---|---|
| Titel | 6th International IEEE Congress on Information Science and Technology, CiSt 2020 - Proceeding |
| Redakteure/-innen | Mohammed El Mohajir, Mohammed Al Achhab, Badr Eddine El Mohajir, Bernadetta Kwintiana Ane, Ismail Jellouli |
| Herausgeber (Verlag) | Wiley-IEEE Press |
| Seiten | 314-320 |
| Seitenumfang | 7 |
| ISBN (elektronisch) | 9781728166469 |
| ISBN (Print) | 978-1-7281-6647-6 |
| Publikationsstatus | Veröffentlicht - 12 Juni 2021 |
| Peer-Review-Status | Ja |
Konferenz
| Titel | 2020 6th IEEE Congress on Information Science and Technology (CiSt) |
|---|---|
| Dauer | 5 - 12 Juni 2021 |
| Ort | Agadir - Essaouira, Morocco |
Externe IDs
| Scopus | 85103811992 |
|---|---|
| Ieee | 10.1109/CiSt49399.2021.9357189 |
| ORCID | /0000-0001-9756-6390/work/142250120 |
Schlagworte
ASJC Scopus Sachgebiete
Schlagwörter
- Annotations, Information science, Neural networks, Optimization, Training, Training data, Transfer learning, NER, Named Entity Recognition, natural language processing, transfer learning, Domain-specific, Hyperparameter Optimization, BiLSTM-CRF, German