The impact of site-specific digital histology signatures on deep learning model accuracy and bias

Frederick M Howard; James Dolezal; Sara Kochanny; Jefree Schulte; Heather Chen; Lara Heij; Dezheng Huo; Rita Nanda; Olufunmilayo I Olopade; Jakob N Kather; Nicole Cipriani; Robert L Grossman; Alexander T Pearson

doi:10.1038/s41467-021-24698-1

The impact of site-specific digital histology signatures on deep learning model accuracy and bias

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Frederick M Howard - , The University of Chicago (Autor:in)
James Dolezal - , The University of Chicago (Autor:in)
Sara Kochanny - , The University of Chicago (Autor:in)
Jefree Schulte - , The University of Chicago (Autor:in)
Heather Chen - , The University of Chicago (Autor:in)
Lara Heij - , Universitätsklinikum Aachen (Autor:in)
Dezheng Huo - , The University of Chicago (Autor:in)
Rita Nanda - , The University of Chicago (Autor:in)
Olufunmilayo I Olopade - , The University of Chicago (Autor:in)
Jakob N Kather - , Nationales Centrum für Tumorerkrankungen Dresden, Else Kröner Fresenius Zentrum für Digitale Gesundheit, Universitätsklinikum Aachen, Leeds Teaching Hospitals NHS Trust, University of Leeds, Universitätsklinikum Heidelberg (Autor:in)
Nicole Cipriani - , The University of Chicago (Autor:in)
Robert L Grossman - , The University of Chicago (Autor:in)
Alexander T Pearson - , The University of Chicago (Autor:in)

Abstract

The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site.

Details

Originalsprache	Englisch
Aufsatznummer	4423
Seiten (von - bis)	4423
Fachzeitschrift	Nature communications
Jahrgang	12
Ausgabenummer	1
Publikationsstatus	Veröffentlicht - 20 Juli 2021
Peer-Review-Status	Ja

Externe IDs

PubMedCentral	PMC8292530
Scopus	85111167302

Schlagworte

Ziele für nachhaltige Entwicklung

SDG 3 – Gute Gesundheit und Wohlergehen

Schlagwörter

Biomarkers, Tumor/analysis, DNA Mutational Analysis/methods, Data Accuracy, Deep Learning, Gene Expression Profiling/methods, Humans, Image Processing, Computer-Assisted/methods, Mutation, Neoplasm Staging, Neoplasms/diagnosis, Risk Assessment/methods, Specimen Handling/methods

Forschungsportal der TU Dresden