Artificial intelligence-generated synthetic data for cancer research and clinical trials

Jan-Niklas Eckardt; Waldemar Hahn; Arsela Prelaj; Martin Bornhäuser; Jan Moritz Middeke; Jakob Nikolas Kather

doi:10.1038/s41568-026-00912-4

Artificial intelligence-generated synthetic data for cancer research and clinical trials

Publikation: Beitrag in Fachzeitschrift › Übersichtsartikel (Review) › Beigetragen › Begutachtung

Beitragende

Jan-Niklas Eckardt - , Medizinische Klinik und Poliklinik I, Else Kröner Fresenius Zentrum für Digitale Gesundheit (Autor:in)
Waldemar Hahn - , Institut für Medizinische Informatik und Biometrie, Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI Dresden) (Autor:in)
Arsela Prelaj - , IRCCS Fondazione Istituto Nazionale per lo studio e la cura dei tumori - Milano (Autor:in)
Martin Bornhäuser - , Medizinische Klinik und Poliklinik I, Nationales Centrum für Tumorerkrankungen Dresden, Deutsches Konsortium für Translationale Krebsforschung (DKTK) - Dresden, Deutsches Krebsforschungszentrum (DKFZ) (Autor:in)
Jan Moritz Middeke - , Medizinische Klinik und Poliklinik I, Else Kröner Fresenius Zentrum für Digitale Gesundheit (Autor:in)
Jakob Nikolas Kather - , Else Kröner Fresenius Zentrum für Digitale Gesundheit, Medizinische Klinik und Poliklinik I, Nationales Zentrum für Tumorerkrankungen (NCT) Heidelberg (Autor:in)

Abstract

Synthetic data, generated through advanced artificial intelligence models, are gaining traction in healthcare research, particularly in high-stakes fields such as haematology and oncology. By replicating statistical properties, intervariable relationships and behaviours of real-world data, synthetic data sets can serve as valuable supplements or substitutes for conventional medical data. They offer the potential to overcome barriers to data access and sharing, democratize scientific discovery, and reduce the costs and failure rates of clinical trials. However, the lack of standardization in training data selection, model evaluation, bias mitigation, privacy preservation and quality assurance remain major challenges, limiting their reliability and safe application. In this Review, we explore the role of synthetic data in cancer research and clinical trials, present real-world examples of their use, critically examine limitations and pitfalls, and propose best practices to enhance fidelity, validity, fairness and utility. Although synthetic data are not a 'silver bullet' for the challenges of clinical research, with rigorous validation and oversight, they have the potential to transform data sharing, scientific collaboration and clinical trial design.

Details

Originalsprache	Englisch
Seiten (von - bis)	351–363
Seitenumfang	13
Fachzeitschrift	Nature reviews : Cancer
Jahrgang	26
Ausgabenummer	5
Frühes Online-Datum	20 Feb. 2026
Publikationsstatus	Veröffentlicht - Mai 2026
Peer-Review-Status	Ja

Externe IDs

Scopus	105030682415
ORCID	/0000-0002-3730-5348/work/209584206

Schlagworte

Ziele für nachhaltige Entwicklung

SDG 3 – Gute Gesundheit und Wohlergehen