Supporting fine-grained dataflow parallelism in big data systems

Sebastian Ertel; Justus Adam; Jeronimo Castrillon

doi:10.1145/3178442.3178447

Supporting fine-grained dataflow parallelism in big data systems

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Sebastian Ertel - , Chair of Compiler Construction (cfaed) (Author)
Justus Adam - , Chair of Compiler Construction (cfaed) (Author)
Jeronimo Castrillon - , Chair of Compiler Construction (cfaed) (Author)

Abstract

Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as increased network bandwidth, into a higher throughput. This is particularly the case for applications that have inherent sequential, computationally intensive phases. In this paper, we analyze the data processing cores of state-of-the-art big data systems to nd the cause for these scalability problems. We identify design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance. As a proof of concept, we rewrite parts of the Hadoop MapReduce framework in an implicit parallel language that exploits this parallelism without adding code complexity. Our experiments on a data analytics workload show throughput speedups of up to 3.5x.

Details

Original language	English
Title of host publication	PMAM'18: Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores
Editors	Quan Chen, Zhiyi Hunag, Pavan Balaji
Pages	41-50
Number of pages	10
ISBN (electronic)	978-1-4503-5645-9
Publication status	Published - 24 Feb 2018
Peer-reviewed	Yes

Publication series

Series	PPoPP: Principles and Practice of Parallel Programming

Conference

Title	PPoPP '18: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Conference number
Duration	24 - 28 February 2018
Location
City	Vienna
Country	Austria

External IDs

ORCID	/0000-0002-5007-445X/work/141545622

Research Portal of the TU Dresden

Supporting fine-grained dataflow parallelism in big data systems

Contributors

Abstract

Details

Publication series

Conference

External IDs

Keywords

Research priority areas of TU Dresden

ASJC Scopus subject areas