Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Jan-Niklas Eckardt; Waldemar Hahn; Christoph Röllig; Sebastian Stasik; Uwe Platzbecker; Carsten Müller-Tidow; Hubert Serve; Claudia D Baldus; Christoph Schliemann; Kerstin Schäfer-Eckart; Maher Hanoun; Martin Kaufmann; Andreas Burchert; Christian Thiede; Johannes Schetelig; Martin Sedlmayr; Martin Bornhäuser; Markus Wolfien; Jan Moritz Middeke

doi:10.1038/s41746-024-01076-x

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Research output: Contribution to journal › Research article › Contributed › peer-review

Contributors

Jan-Niklas Eckardt - , Department of Internal Medicine I, Else Kröner Fresenius Center for Digital Health (Author)
Waldemar Hahn - , Institute for Medical Informatics and Biometry, Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig (Author)
Christoph Röllig - , Department of Internal Medicine I (Author)
Sebastian Stasik - , Department of Internal Medicine I (Author)
Uwe Platzbecker - , University Hospital Leipzig (Author)
Carsten Müller-Tidow - , University Hospital Heidelberg (Author)
Hubert Serve - , Goethe University Frankfurt a.M. (Author)
Claudia D Baldus - , University Hospital Schleswig-Holstein Campus Kiel (Author)
Christoph Schliemann - , University Hospital Münster (Author)
Kerstin Schäfer-Eckart - , Paracelsus Medical University Nuremberg, Nuremberg Clinic (Author)
Maher Hanoun - , University Hospital Essen (Author)
Martin Kaufmann - , Robert Bosch Krankenhaus Stuttgart (Author)
Andreas Burchert - , University of Marburg (Author)
Christian Thiede - , Department of Internal Medicine I (Author)
Johannes Schetelig - , Department of Internal Medicine I (Author)
Martin Sedlmayr - , Institute for Medical Informatics and Biometry (Author)
Martin Bornhäuser - , Department of Internal Medicine I, National Center for Tumor Diseases Dresden (Author)
Markus Wolfien - , Institute for Medical Informatics and Biometry, Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig (Author)
Jan Moritz Middeke - , Department of Internal Medicine I, Else Kröner Fresenius Center for Digital Health (Author)

Abstract

Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence - CTAB-GAN+ and normalizing flows (NFlow) - to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.

Details

Original language	English
Article number	76
Journal	npj digital medicine
Volume	7
Issue number	1
Publication status	Published - 20 Mar 2024
Peer-reviewed	Yes

External IDs

PubMedCentral	PMC10954666
Scopus	85188234036
ORCID	/0000-0002-1887-4772/work/164198987
ORCID	/0000-0002-9888-8460/work/164199199

Research Portal of the TU Dresden

Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

Contributors

Abstract

Details

External IDs

Keywords