Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets

Marc-Andre Schulz; B T Thomas Yeo; Joshua T Vogelstein; Janaina Mourao-Miranada; Jakob N Kather; Konrad Kording; Blake Richards; Danilo Bzdok

doi:10.1038/s41467-020-18037-z

Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Marc-Andre Schulz - , Universitätsklinikum Aachen (Autor:in)
B T Thomas Yeo - , Nanyang Technological University (Autor:in)
Joshua T Vogelstein - , Johns Hopkins University (Autor:in)
Janaina Mourao-Miranada - , University College London (Autor:in)
Jakob N Kather - , Else Kröner Fresenius Zentrum für Digitale Gesundheit, Universitätsklinikum Aachen, Deutsches Krebsforschungszentrum (DKFZ), Deutsches Konsortium für Translationale Krebsforschung (DKTK) - Dresden (Autor:in)
Konrad Kording - , University of Pennsylvania (Autor:in)
Blake Richards - , McGill University (Autor:in)
Danilo Bzdok - , Mila - Quebec Artificial Intelligence Institute (Autor:in)

Abstract

Recently, deep learning has unlocked unprecedented success in various domains, especially using images, text, and speech. However, deep learning is only beneficial if the data have nonlinear relationships and if they are exploitable at available sample sizes. We systematically profiled the performance of deep, kernel, and linear models as a function of sample size on UKBiobank brain images against established machine learning references. On MNIST and Zalando Fashion, prediction accuracy consistently improves when escalating from linear models to shallow-nonlinear models, and further improves with deep-nonlinear models. In contrast, using structural or functional brain scans, simple linear models perform on par with more complex, highly parameterized models in age/sex prediction across increasing sample sizes. In sum, linear models keep improving as the sample size approaches ~10,000 subjects. Yet, nonlinearities for predicting common phenotypes from typical brain scans remain largely inaccessible to the examined kernel and deep learning methods.

Details

Originalsprache	Englisch
Aufsatznummer	4238
Fachzeitschrift	Nature communications
Jahrgang	11
Publikationsstatus	Veröffentlicht - 25 Aug. 2020
Peer-Review-Status	Ja

Externe IDs

PubMedCentral	PMC7447816
Scopus	85089774011
ORCID	/0000-0002-3730-5348/work/198594447

Schlagworte

Schlagwörter

Biological Specimen Banks, Brain/diagnostic imaging, Deep Learning, Humans, Linear Models, Machine Learning, Neuroimaging/methods, Phenotype, Sample Size, United Kingdom

Bibliotheksschlagworte

500 Naturwissenschaften und Mathematik

Forschungsportal der TU Dresden