Impact of Data Sampling on Performance and Robustness of Machine Learning Models in Production Engineering

F. Conrad; E. Boos; M. Mälzer; H. Wiemer; S. Ihlenfeldt

doi:10.1007/978-3-031-18318-8_47

Impact of Data Sampling on Performance and Robustness of Machine Learning Models in Production Engineering

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Buch/Sammelband/Gutachten › Beigetragen › Begutachtung

Beitragende

F. Conrad - , Professur für Werkzeugmaschinenentwicklung und adaptive Steuerungen (Autor:in)
E. Boos - , Professur für Werkzeugmaschinenentwicklung und adaptive Steuerungen (Autor:in)
M. Mälzer - , Professur für Werkzeugmaschinenentwicklung und adaptive Steuerungen (Autor:in)
H. Wiemer - , Professur für Werkzeugmaschinenentwicklung und adaptive Steuerungen (Autor:in)
S. Ihlenfeldt - , Professur für Werkzeugmaschinenentwicklung und adaptive Steuerungen, Fraunhofer-Institut für Werkzeugmaschinen und Umformtechnik (Autor:in)

Abstract

The application of machine learning models in production systems is continuously growing. Hence, ensuring a reliable estimation of the model performance is crucial, as all following decisions regarding the deployment of the machine learning models are based on this aspect. Especially when modelling with datasets of small sample sizes, commonly used train-test split variation techniques and model evaluation strategies encompass a high variance on the model’s performance. This difficulty arises, as the available amount of meaningful data is severely limited in production engineering and can lead to the model's actual performance being greatly over- or underestimated. This work provides an experimental overview on different train-test splitting techniques and model evaluation strategies. Sophisticated statistical sampling methods are compared to simple random sampling, and their impact on performance evaluation in production datasets is analysed. The aim is to ensure a high robustness of the model performance evaluation, even when working with small datasets. Hence, the decision process for the deployment of machine learning models in production systems will be improved.

Details

Originalsprache	Englisch
Titel	Lecture Notes in Production Engineering
Herausgeber (Verlag)	Springer Nature
Seiten	463-472
Seitenumfang	10
Publikationsstatus	Veröffentlicht - 2023
Peer-Review-Status	Ja

Publikationsreihe

Reihe	Lecture Notes in Production Engineering
Band	Part F1163
ISSN	2194-0525

Externe IDs

ORCID	/0000-0001-7540-4235/work/160952791
ORCID	/0000-0002-6593-4678/work/173054389

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Data sampling, Performance evaluation, Train-test-split, Usable artificial intelligence

Forschungsportal der TU Dresden