Corpus and Baseline Model for Domain-Specific Entity Recognition in German

Sunna Torge; Waldemar Hahn; René Jäkel; Wolfgang E. Nagel

doi:10.1109/CiSt49399.2021.9357189

Corpus and Baseline Model for Domain-Specific Entity Recognition in German

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Sunna Torge - , Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) (Autor:in)
Waldemar Hahn - , Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) (Autor:in)
René Jäkel - , Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) (Autor:in)
Wolfgang E. Nagel - , Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) (Autor:in)

Abstract

Transfer Learning approaches are a promising means to analyze low-resource domain specific texts. The German SmartData corpus is the first German corpus, annotated with entities from different domains, and thus allows to investigate transfer learning approaches for Named Entity Recognition (NER) on different domains. In order to prepare such investigations, this work includes a thorough analysis of the SmartData corpus, and a revision w.r.t. annotations and the split into training and test data, considering the distribution of document and entity types. Based on that a baseline model for NER using BiLSTM-CRF neural networks including hyperparameter optimization is presented.

Details

Originalsprache	Englisch
Titel	6th International IEEE Congress on Information Science and Technology, CiSt 2020 - Proceeding
Redakteure/-innen	Mohammed El Mohajir, Mohammed Al Achhab, Badr Eddine El Mohajir, Bernadetta Kwintiana Ane, Ismail Jellouli
Herausgeber (Verlag)	Wiley-IEEE Press
Seiten	314-320
Seitenumfang	7
ISBN (elektronisch)	9781728166469
ISBN (Print)	978-1-7281-6647-6
Publikationsstatus	Veröffentlicht - 12 Juni 2021
Peer-Review-Status	Ja

Konferenz

Titel	6th IEEE Congress on Information Science and Technology
Kurztitel	CiSt 2020
Veranstaltungsnummer	6
Dauer	5 - 12 Juni 2021
Webseite	http://www.ieee.ma/cist20/index.php
Stadt	Agadir - Essaouira
Land	Marokko

Externe IDs

Scopus	85103811992
Ieee	10.1109/CiSt49399.2021.9357189
ORCID	/0000-0001-9756-6390/work/142250120

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Annotations, Information science, Neural networks, Optimization, Training, Training data, Transfer learning, NER, Named Entity Recognition, natural language processing, transfer learning, Domain-specific, Hyperparameter Optimization, BiLSTM-CRF, German

Forschungsportal der TU Dresden