Bridging between Data Science and Performance Analysis: Tracing of Jupyter Notebooks
Research output: Contribution to book/conference proceedings/anthology/report › Conference contribution › Contributed › peer-review
Contributors
Abstract
In the last years, an increasing amount of available data has led to new application approaches and an application field that is now called data science (DS). Such applications often require low runtimes while having to deal with restricted compute resources. Up to now, we perceive that the DS community lacks tool support for runtime and resource usage investigations. Thus, we present an approach that combines DS and performance analysis from the High Performance Computing domain. Our concept integrates the measurement framework Score-P in Jupyter, a popular editor for the development of DS applications. We designed and implemented a custom Jupyter kernel that collects runtime data and applied it to a natural language processing application. The measurement overhead was 12.55 seconds. The benefits are, that the collected data can then be visualised using established performance analysis tools.
Details
Original language | English |
---|---|
Title of host publication | The First International Conference on AI-ML-Systems |
Pages | 1-7 |
ISBN (electronic) | 9781450385947 |
Publication status | Published - 21 Oct 2021 |
Peer-reviewed | Yes |
External IDs
Scopus | 85118298825 |
---|---|
ORCID | /0000-0001-9756-6390/work/142250107 |
ORCID | /0009-0007-5755-1427/work/142250920 |
Keywords
ASJC Scopus subject areas
Keywords
- data science, jupyter notebook, performance analysis