From web tables to concepts: A semantic normalization approach

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

Relational Web tables, embedded in HTML or published on data platforms, have become an important resource for many applications, including question answering or entity augmentation. To utilize the data, we require some understanding of what the tables are about. Previous research on recovering Web table semantics has largely focused on simple tables, which only describe a single semantic concept. However, there is also a significant number of de-normalized multi-concept tables on theWeb. Treating these as single-concept tables results in many incorrect relations being extracted. In this paper, we propose a normalization approach to decompose multi-concept tables into smaller single-concept tables. First, we identify columns that represent keys or identifiers of entities. Then, we utilize the table schema as well as intrinsic data correlations to identify concept boundaries and split the tables accordingly. Experimental results on real Web tables show that our approach is feasible and effectively identifies semantic concepts.

Details

Original languageEnglish
Title of host publicationConceptual Modeling
EditorsÓscar Pastor López, Mong Li Lee, Stephen W. Liddle, Paul Johannesson, Andreas L. Opdahl
PublisherSpringer-Verlag
Pages247-260
Number of pages14
ISBN (electronic)978-3-319-25264-3
ISBN (print)978-3-319-25263-6
Publication statusPublished - 2015
Peer-reviewedYes

Publication series

SeriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9381
ISSN0302-9743

Conference

Title34th International Conference on Conceptual Modeling, ER 2015
Duration19 - 22 October 2015
CityStockholm
CountrySweden

External IDs

ORCID /0000-0001-8107-2775/work/199215561

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Subject groups, research areas, subject areas according to Destatis

Keywords

  • Conceptualization, Normalization, Semantics, Web tables