From web tables to concepts: A semantic normalization approach
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Relational Web tables, embedded in HTML or published on data platforms, have become an important resource for many applications, including question answering or entity augmentation. To utilize the data, we require some understanding of what the tables are about. Previous research on recovering Web table semantics has largely focused on simple tables, which only describe a single semantic concept. However, there is also a significant number of de-normalized multi-concept tables on theWeb. Treating these as single-concept tables results in many incorrect relations being extracted. In this paper, we propose a normalization approach to decompose multi-concept tables into smaller single-concept tables. First, we identify columns that represent keys or identifiers of entities. Then, we utilize the table schema as well as intrinsic data correlations to identify concept boundaries and split the tables accordingly. Experimental results on real Web tables show that our approach is feasible and effectively identifies semantic concepts.
Details
| Original language | English |
|---|---|
| Title of host publication | Conceptual Modeling |
| Editors | Óscar Pastor López, Mong Li Lee, Stephen W. Liddle, Paul Johannesson, Andreas L. Opdahl |
| Publisher | Springer-Verlag |
| Pages | 247-260 |
| Number of pages | 14 |
| ISBN (electronic) | 978-3-319-25264-3 |
| ISBN (print) | 978-3-319-25263-6 |
| Publication status | Published - 2015 |
| Peer-reviewed | Yes |
Publication series
| Series | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
|---|---|
| Volume | 9381 |
| ISSN | 0302-9743 |
Conference
| Title | 34th International Conference on Conceptual Modeling, ER 2015 |
|---|---|
| Duration | 19 - 22 October 2015 |
| City | Stockholm |
| Country | Sweden |
External IDs
| ORCID | /0000-0001-8107-2775/work/199215561 |
|---|
Keywords
Research priority areas of TU Dresden
DFG Classification of Subject Areas according to Review Boards
Subject groups, research areas, subject areas according to Destatis
ASJC Scopus subject areas
Keywords
- Conceptualization, Normalization, Semantics, Web tables