From web tables to concepts: A semantic normalization approach
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Relational Web tables, embedded in HTML or published on data platforms, have become an important resource for many applications, including question answering or entity augmentation. To utilize the data, we require some understanding of what the tables are about. Previous research on recovering Web table semantics has largely focused on simple tables, which only describe a single semantic concept. However, there is also a significant number of de-normalized multi-concept tables on theWeb. Treating these as single-concept tables results in many incorrect relations being extracted. In this paper, we propose a normalization approach to decompose multi-concept tables into smaller single-concept tables. First, we identify columns that represent keys or identifiers of entities. Then, we utilize the table schema as well as intrinsic data correlations to identify concept boundaries and split the tables accordingly. Experimental results on real Web tables show that our approach is feasible and effectively identifies semantic concepts.
Details
| Originalsprache | Englisch |
|---|---|
| Titel | Conceptual Modeling |
| Redakteure/-innen | Óscar Pastor López, Mong Li Lee, Stephen W. Liddle, Paul Johannesson, Andreas L. Opdahl |
| Herausgeber (Verlag) | Springer-Verlag |
| Seiten | 247-260 |
| Seitenumfang | 14 |
| ISBN (elektronisch) | 978-3-319-25264-3 |
| ISBN (Print) | 978-3-319-25263-6 |
| Publikationsstatus | Veröffentlicht - 2015 |
| Peer-Review-Status | Ja |
Publikationsreihe
| Reihe | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Band | 9381 |
| ISSN | 0302-9743 |
Konferenz
| Titel | 34th International Conference on Conceptual Modeling, ER 2015 |
|---|---|
| Dauer | 19 - 22 Oktober 2015 |
| Stadt | Stockholm |
| Land | Schweden |
Externe IDs
| ORCID | /0000-0001-8107-2775/work/199215561 |
|---|
Schlagworte
Forschungsprofillinien der TU Dresden
DFG-Fachsystematik nach Fachkollegium
Fächergruppen, Lehr- und Forschungsbereiche, Fachgebiete nach Destatis
ASJC Scopus Sachgebiete
Schlagwörter
- Conceptualization, Normalization, Semantics, Web tables