Supporting fine-grained dataflow parallelism in big data systems

Sebastian Ertel; Justus Adam; Jeronimo Castrillon

doi:10.1145/3178442.3178447

Supporting fine-grained dataflow parallelism in big data systems

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Sebastian Ertel - , Professur für Compilerbau (cfaed) (Autor:in)
Justus Adam - , Professur für Compilerbau (cfaed) (Autor:in)
Jeronimo Castrillon - , Professur für Compilerbau (cfaed) (Autor:in)

Abstract

Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to nd the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.

Details

Originalsprache	Englisch
Titel	PMAM'18: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores
Redakteure/-innen	Quan Chen, Zhiyi Hunag, Pavan Balaji
Seiten	41-50
Seitenumfang	10
ISBN (elektronisch)	978-1-4503-5645-9
Publikationsstatus	Veröffentlicht - 24 Feb. 2018
Peer-Review-Status	Ja

Publikationsreihe

Reihe	PPoPP: Principles and Practice of Parallel Programming

Konferenz

Titel	23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Kurztitel	PPoPP 2018
Veranstaltungsnummer	23
Beschreibung	co-located with the CGO 2018 and HPCA 2018
Dauer	24 - 28 Februar 2018
Stadt	Wien
Land	Österreich

Externe IDs

ORCID	/0000-0002-5007-445X/work/141545622

Forschungsportal der TU Dresden

Supporting fine-grained dataflow parallelism in big data systems

Beitragende

Abstract

Details

Publikationsreihe

Konferenz

Externe IDs

Schlagworte

Forschungsprofillinien der TU Dresden

ASJC Scopus Sachgebiete