Corpus and Baseline Model for Domain-Specific Entity Recognition in German

Sunna Torge; Waldemar Hahn; René Jäkel; Wolfgang E. Nagel

doi:10.1109/CiSt49399.2021.9357189

Corpus and Baseline Model for Domain-Specific Entity Recognition in German

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Sunna Torge - , Center for Information Services and High Performance Computing (ZIH) (Author)
Waldemar Hahn - , Center for Information Services and High Performance Computing (ZIH) (Author)
René Jäkel - , Center for Information Services and High Performance Computing (ZIH) (Author)
Wolfgang E. Nagel - , Center for Information Services and High Performance Computing (ZIH) (Author)

Abstract

Transfer Learning approaches are a promising means to analyze low-resource domain specific texts. The German SmartData corpus is the first German corpus, annotated with entities from different domains, and thus allows to investigate transfer learning approaches for Named Entity Recognition (NER) on different domains. In order to prepare such investigations, this work includes a thorough analysis of the SmartData corpus, and a revision w.r.t. annotations and the split into training and test data, considering the distribution of document and entity types. Based on that a baseline model for NER using BiLSTM-CRF neural networks including hyperparameter optimization is presented.

Details

Original language	English
Title of host publication	2020 6th IEEE Congress on Information Science and Technology (CiSt)
Publisher	Wiley-IEEE Press
Pages	314-320
Number of pages	7
ISBN (electronic)	9781728166469
ISBN (print)	978-1-7281-6647-6
Publication status	Published - 12 Jun 2021
Peer-reviewed	Yes

Conference

Title	2020 6th IEEE Congress on Information Science and Technology (CiSt)
Duration	5 - 12 June 2021
Location	Agadir - Essaouira, Morocco

External IDs

Scopus	85103811992
Ieee	10.1109/CiSt49399.2021.9357189
ORCID	/0000-0001-9756-6390/work/142250120

Keywords

Annotations, Information science, Neural networks, Optimization, Training, Training data, Transfer learning, NER, Named Entity Recognition, natural language processing, transfer learning

Research Portal of the TU Dresden

Contributors

Abstract

Details

Conference

External IDs

Keywords

Keywords