Job Performance Overview of Apache Flink and Apache Spark Applications

Jan Frenzel; René Jäkel

Job Performance Overview of Apache Flink and Apache Spark Applications

Publikation: Beitrag zu Konferenzen › Poster › Beigetragen › Begutachtung

Beitragende

Jan Frenzel - , Abteilung Verteiltes und Datenintensives Rechnen (VDR) (Autor:in)
René Jäkel - , Abteilung Verteiltes und Datenintensives Rechnen (VDR) (Autor:in)

Abstract

Apache Spark and Apache Flink are two Big Data frameworks used for fast data exploration and analysis. Both frameworks provide the runtime of program sections and performance metrics, such as the number of bytes read or written, via an integrated dashboard. Performance metrics available in the dashboard lack timely information and are only shown aggregated in a separate part of the dashboard. However, performance investigations and optimizations would benefit from an integrated view with detailed performance metric events. Thus, we propose a system that samples metrics at runtime and collects information about the program sections after the execution finishes. The performance data is stored in an established format independent from Spark and Flink versions and can be viewed with state-of-the-art performance tools, i.e. Vampir. The overhead depends on the sampling interval and was below 10% in our experiments.

Details

Originalsprache	Englisch
Publikationsstatus	Veröffentlicht - 2019
Peer-Review-Status	Ja

Konferenz

Titel	2019 International Conference for High Performance Computing, Networking, Storage, and Analysis
Kurztitel	SC 19
Dauer	17 - 22 November 2019
Webseite	https://sc19.supercomputing.org
Bekanntheitsgrad	Internationale Veranstaltung
Ort	Colorado Convention Center
Stadt	Denver
Land	USA/Vereinigte Staaten

Externe IDs

ORCID	/0009-0007-5755-1427/work/142250922

Forschungsportal der TU Dresden

Job Performance Overview of Apache Flink and Apache Spark Applications

Beitragende

Abstract

Details

Konferenz

Externe IDs

Schlagworte