Exploratory ad-hoc analytics for big data

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in Buch/Sammelband/GutachtenBeigetragenBegutachtung

Beitragende

Abstract

In a traditional relational database management system, queries can only be defined over attributes defined in the schema, but are guaranteed to give single, definitive answer structured exactly as specified in the query. In contrast, an information retrieval system allows the user to pose queries without knowledge of a schema, but the result will be a top-k list of possible answers, with no guarantees about the structure or content of the retrieved documents. In this chapter, we present Drill Beyond, a novel IR/RDBMS hybrid system, in which the user seamlessly queries a relational database together with a large corpus of tables extracted from a web crawl. The system allows full SQL queries over a relational database, but additionally enables the user to use arbitrary additional attributes in the query that need not to be defined in the schema. The system then processes this semi-specified query by computing a top-k list of possible query evaluations, each based on different candidate web data sources, thus mixing properties of two worlds RDBMS and IR systems.

Details

OriginalspracheEnglisch
TitelHandbook of Big Data Technologies
Herausgeber (Verlag)Springer International Publishing AG
Seiten365-407
Seitenumfang43
ISBN (elektronisch)9783319493404
ISBN (Print)9783319493398
PublikationsstatusVeröffentlicht - 25 Feb. 2017
Peer-Review-StatusJa

Externe IDs

ORCID /0000-0001-8107-2775/work/142253527