Top-k entity augmentation using consistent set covering

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

Abstract

Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.

Details

Original languageEnglish
Title of host publicationSSDBM 2015 - Proceedings of the 27th International Conference on Scientific and Statistical Database Management
EditorsAmarnath Gupta, Susan Rathbun
PublisherAssociation for Computing Machinery
ISBN (electronic)9781450337090
Publication statusPublished - 29 Jun 2015
Peer-reviewedYes

Publication series

SeriesACM International Conference Proceeding Series
Volume29-June-2015

Conference

Title27th International Conference on Scientific and Statistical Database Management, SSDBM 2015
Duration29 June - 1 July 2015
CitySan Diego
CountryUnited States of America

External IDs

ORCID /0000-0001-8107-2775/work/198592326