Bridging between Data Science and Performance Analysis: Tracing of Jupyter Notebooks

Elias Werner; Sunna Torge; Lalith Manjunath; Jan Frenzel

doi:10.1145/3486001.3486249

Bridging between Data Science and Performance Analysis: Tracing of Jupyter Notebooks

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Elias Werner - , Center for Information Services and High Performance Computing (ZIH) (Author)
Sunna Torge - , Center for Information Services and High Performance Computing (ZIH) (Author)
Lalith Manjunath - , Center for Information Services and High Performance Computing (ZIH) (Author)
Jan Frenzel - , Center for Information Services and High Performance Computing (ZIH) (Author)

Abstract

In the last years, an increasing amount of available data has led to new application approaches and an application field that is now called data science (DS). Such applications often require low runtimes while having to deal with restricted compute resources. Up to now, we perceive that the DS community lacks tool support for runtime and resource usage investigations. Thus, we present an approach that combines DS and performance analysis from the High Performance Computing domain. Our concept integrates the measurement framework Score-P in Jupyter, a popular editor for the development of DS applications. We designed and implemented a custom Jupyter kernel that collects runtime data and applied it to a natural language processing application. The measurement overhead was 12.55 seconds. The benefits are, that the collected data can then be visualised using established performance analysis tools.

Details

Original language	English
Title of host publication	1st International Conference on AI-ML-Systems, AIMLSystems 2021
Pages	1-7
ISBN (electronic)	9781450385947
Publication status	Published - 21 Oct 2021
Peer-reviewed	Yes

External IDs

Scopus	85118298825
ORCID	/0000-0001-9756-6390/work/142250107
ORCID	/0009-0007-5755-1427/work/142250920

Keywords

ASJC Scopus subject areas

Keywords

data science, jupyter notebook, performance analysis

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

ASJC Scopus subject areas

Keywords