Noise-Aware Undersampling for imbalanced medical data (NAUS)

Zholdas Buribayev; Ainur Yerkos; Zhibek Zhetpisbay; Markus Wolfien

doi:10.1016/j.imu.2026.101731

Noise-Aware Undersampling for imbalanced medical data (NAUS)

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Zholdas Buribayev - , Al Farabi Kazakh National University (Autor:in)
Ainur Yerkos - , Al Farabi Kazakh National University (Autor:in)
Zhibek Zhetpisbay - , Al Farabi Kazakh National University (Autor:in)
Markus Wolfien - , Institut für Medizinische Informatik und Biometrie, Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Autor:in)

Abstract

Advancements in medical research have increasingly relied on robust data analytics to support diagnostic and treatment decisions. However, data analysis still faces challenges when investigating datasets with severe class imbalance, often stemming from the rarity of certain conditions and uneven disease distributions. To address this issue, we propose the Noise-Aware Undersampling with Subsampling (NAUS) algorithm. NAUS integrates clustering, noise removal, and Tomek-link identification techniques to create refined subsamples that assess the significance of individual observations, while systematically removing redundant and noisy data. The proposed approach was evaluated on datasets related to chronic kidney disease, liver disease, heart disease and its performance was compared to that of traditional oversampling methods (e.g., SMOTE, ADASYN, LoRAS) and undersampling techniques (e.g., random undersampling, Tomek-links). Our experimental results, based on machine learning classifiers (e.g. Random Forest, LightGBM, and Multilayer Perceptron). Data visualization further confirmed that NAUS effectively mitigates class imbalance, making it a promising tool for enhancing the reliability of medical data analysis.

Details

Originalsprache	Englisch
Aufsatznummer	101731
Fachzeitschrift	Informatics in Medicine Unlocked : IMU
Jahrgang	60
Publikationsstatus	Veröffentlicht - Jan. 2026
Peer-Review-Status	Ja

Externe IDs

Scopus	105027254171

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Data analysis, Data balancing, Noise removal, Tomek-link, Undersampling

Forschungsportal der TU Dresden