A Technical Perspective of DataCalc - Ad-hoc Analyses on Heterogeneous Data Sources
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Many organizations store and process data at different locations using a heterogeneous set of formats and data management systems. However, data analyses can often provide better insight when data from several sources is integrated into a combined perspective. DataCalc is an extensible data integration platform that executes ad-hoc analytical queries on a set of heterogeneous data processors. The platform uses an expressive function shipping interface that promotes local computation and reduces data movement between processors. In this paper, we provide a detailed discussion of the architecture and implementation of DataCalc. We introduce data processors for plain files, JDBC, the MongoDB document store, and a custom in memory system. Finally, we discuss the cost of integrating additional processors and evaluate the overall performance of the platform. Our main contribution is the specification and evaluation of the DataCalc code delegation interface.
Details
Originalsprache | Englisch |
---|---|
Titel | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
Redakteure/-innen | Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye |
Herausgeber (Verlag) | IEEE, New York [u. a.] |
Seiten | 3864-3873 |
Seitenumfang | 10 |
ISBN (elektronisch) | 9781728108582 |
Publikationsstatus | Veröffentlicht - Dez. 2019 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | 2019 IEEE International Conference on Big Data (Big Data) |
---|
Konferenz
Titel | 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|---|
Dauer | 9 - 12 Dezember 2019 |
Stadt | Los Angeles |
Land | USA/Vereinigte Staaten |
Externe IDs
Scopus | 85081317576 |
---|---|
ORCID | /0000-0001-8107-2775/work/142253465 |