Top-k entity augmentation using consistent set covering
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods can not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions from which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.
Details
| Original language | English |
|---|---|
| Title of host publication | SSDBM 2015 - Proceedings of the 27th International Conference on Scientific and Statistical Database Management |
| Editors | Amarnath Gupta, Susan Rathbun |
| Publisher | Association for Computing Machinery |
| ISBN (electronic) | 9781450337090 |
| Publication status | Published - 29 Jun 2015 |
| Peer-reviewed | Yes |
Publication series
| Series | ACM International Conference Proceeding Series |
|---|---|
| Volume | 29-June-2015 |
Conference
| Title | 27th International Conference on Scientific and Statistical Database Management, SSDBM 2015 |
|---|---|
| Duration | 29 June - 1 July 2015 |
| City | San Diego |
| Country | United States of America |
External IDs
| ORCID | /0000-0001-8107-2775/work/198592326 |
|---|