Utility of disease probability scores to guide decision-making during screening for phaeochromocytoma and paraganglioma: a machine learning modelling cross sectional study

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Christina Pamporaki - , Department of Internal Medicine III, University Hospital Carl Gustav Carus Dresden (Author)
  • Georg Pommer - , Institute of Clinical Genetics, University Hospital Carl Gustav Carus Dresden (Author)
  • Ioannis D. Apostolopoulos - , University of Thessaly (Author)
  • Angelos Filippatos - , University of Patras (Author)
  • Mirko Peitzsch - , Institute of Clinical Chemistry and Laboratory Medicine, University Hospital Carl Gustav Carus Dresden (Author)
  • Hanna Remde - , University of Würzburg (Author)
  • Georgiana Constantinescu - , Department of Internal Medicine III, University Hospital Carl Gustav Carus Dresden (Author)
  • Annika M.A. Berends - , University of Groningen (Author)
  • Matthew A. Nazari - , Eunice Kennedy Shriver National Institute of Child Health and Human Development (Author)
  • Felix Beuschlein - , Ludwig Maximilian University of Munich, University of Zurich, The LOOP Zurich Medical Research Center (Author)
  • Martin Fassnacht - , University of Würzburg (Author)
  • Aleksander Prejbisz - , Cardinal Stefan Wyszynski Institute of Cardiology (Author)
  • Karel Pacak - , Eunice Kennedy Shriver National Institute of Child Health and Human Development (Author)
  • Graeme Eisenhofer - , Department of Internal Medicine III, University Hospital Carl Gustav Carus Dresden (Author)

Abstract

BACKGROUND: Interpretation of plasma metanephrines and methoxytyramine to assess likelihood of phaeochromocytoma/paraganglioma (PPGL) during screening can be challenging. This study (study period: 2021-2023) introduces new methods to select machine-learning (ML) models and evaluate derived probability-scores to better interpret laboratory results.

METHODS: ML models were trained and internally tested using data from 2046 patients with and without PPGL and according to several features: age, pre-test risk of PPGL, plasma metanephrines and methoxytyramine. External validation involved a second cohort of 1641 patients with and without PPGL. The study employed several processes to select and evaluate the best model: concordance of models with human intelligence; intra- and inter-laboratory variability in derived probability-scores; and comparison of scores of the selected model to predictions of ten clinical care specialists before and after provision of those scores.

FINDINGS: External validation established equally excellent diagnostic performance for all five best ML models according to areas under ROC curves (0.988-0.994) and balanced accuracies (0.958-0.981). Probability-scores of models, however, varied widely and were poorly correlated. The additional selection processes indicated an artificial-network model as a superior and more robust model than others. Predictions of disease likelihood by specialists, according to six categories from disease highly unlikely to disease clear, varied widely for individual patients. Within each of the six predictive categories, median probability-scores of the artificial-network model were 70-, 175-, 59-, 15-, 3.5- and 1.7-fold higher (P < 0.0001) in patients with than without PPGL. This superiority of probability scores over variable predictions by specialists remained evident after specialists were tasked to modify their predictions according to those scores.

INTERPRETATION: This study employed several novel processes to establish an ML model with probability-scores superior to predictions of disease likelihood by specialists. However, the negligible improvement in interpretations by specialists after provision of probability-scores indicates this alone is insufficient to improve decision-making.

FUNDING: Deutsche Forschungsgemeinschaft.

Details

Original languageEnglish
Article number103181
JournalEClinicalMedicine
Volume82
Publication statusPublished - Apr 2025
Peer-reviewedYes

External IDs

ORCID /0000-0003-0311-1745/work/203070942
ORCID /0000-0003-0772-1604/work/203072436
PubMed 40224674
PubMedCentral PMC11992530

Keywords

Sustainable Development Goals

ASJC Scopus subject areas

Keywords

  • Clinical decision support system, Machine learning, Metanephrines, Paraganglioma, Phaeochromocytoma