FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow

Lincoln Sherpa; Valentin Khaydarov; Ralph Müller-Pfefferkorn

doi:10.5334/dsj-2024-055

FAIRness Along the Machine Learning Lifecycle Using Dataverse in Combination with MLflow

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Lincoln Sherpa - , Department of Distributed and Data Intensive Computing (VDR) (Author)
Valentin Khaydarov - , Process Systems Engineering Group (Author)
Ralph Müller-Pfefferkorn - , Department of Distributed and Data Intensive Computing (VDR) (Author)

Abstract

Typical Machine Learning (ML) approaches are characterized by their iterative and exploratory nature: continuously refining and adapting not only code but also ML models to optimize the results and the performance on new data. This poses novel challenges related to keeping the trained model Findable, Accessible, Interoperable and Reusable (FAIR), especially for the automation of the entire machine learning lifecycle within the concept of Machine Learning Operations (MLOps). The article introduces a comprehensive integration of a data repository (based on the software Dataverse) and an ML platform (based on the MLflow framework) that enables seamless sharing and publishing of data, experiments and models, ensuring FAIRness. The presented solution is evaluated using an ML use case scenario with model training, hyper-parameter optimization, and model sharing via the data platform.

Details

Original language	English
Article number	55
Journal	Data Science Journal
Volume	23
Issue number	1
Publication status	Published - 1 Dec 2024
Peer-reviewed	Yes

External IDs

ORCID	/0000-0001-8719-5741/work/173516468
unpaywall	10.5334/dsj-2024-055
Scopus	85212217613

Keywords

ASJC Scopus subject areas

Keywords

FAIR data, FAIR principles, Machine Learning, research data management, Competing Interests, Database management system, FAIR Data, FAIR data principles, Machine Learning (ML), Research data management

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

ASJC Scopus subject areas

Keywords