Effects of Label Noise on Deep Learning-Based Skin Cancer Classification

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Achim Hekler - , German Cancer Research Center (DKFZ) (Author)
  • Jakob N. Kather - , German Cancer Research Center (DKFZ), RWTH Aachen University (Author)
  • Eva Krieghoff-Henning - , German Cancer Research Center (DKFZ) (Author)
  • Jochen S. Utikal - , Heidelberg University , German Cancer Research Center (DKFZ) (Author)
  • Friedegund Meier - , Department of Dermatology, National Center for Tumor Diseases Dresden, Skin Tumor Center, University Hospital Carl Gustav Carus Dresden (Author)
  • Frank F. Gellrich - , Department of Dermatology, National Center for Tumor Diseases Dresden, Skin Tumor Center, University Hospital Carl Gustav Carus Dresden (Author)
  • Julius Upmeier zu Belzen - , Berlin Institute of Health at Charité (Author)
  • Lars French - , Ludwig Maximilian University of Munich (Author)
  • Justin G. Schlager - , Ludwig Maximilian University of Munich (Author)
  • Kamran Ghoreschi - , Charité – Universitätsmedizin Berlin (Author)
  • Tabea Wilhelm - , Charité – Universitätsmedizin Berlin (Author)
  • Heinz Kutzner - , Medical care center Dermapathology Friedrichshafen/Lake Constance PartG (Author)
  • Carola Berking - , Friedrich-Alexander University Erlangen-Nürnberg (Author)
  • Markus V. Heppt - , Friedrich-Alexander University Erlangen-Nürnberg (Author)
  • Sebastian Haferkamp - , University of Regensburg (Author)
  • Wiebke Sondermann - , University of Duisburg-Essen (Author)
  • Dirk Schadendorf - , University of Duisburg-Essen (Author)
  • Bastian Schilling - , University of Würzburg (Author)
  • Benjamin Izar - , Dana-Farber Cancer Institute (Author)
  • Roman Maron - , German Cancer Research Center (DKFZ) (Author)
  • Max Schmitt - , German Cancer Research Center (DKFZ) (Author)
  • Stefan Fröhling - , German Cancer Research Center (DKFZ) (Author)
  • Daniel B. Lipka - , German Cancer Research Center (DKFZ), Otto von Guericke University Magdeburg (Author)
  • Titus J. Brinker - , German Cancer Research Center (DKFZ) (Author)

Abstract

Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.

Details

Original languageEnglish
Article number177
JournalFrontiers in medicine
Volume7
Publication statusPublished - 6 May 2020
Peer-reviewedYes

External IDs

Scopus 85085189217
ORCID /0000-0003-4340-9706/work/157319236

Keywords

Sustainable Development Goals

Keywords

  • artificial intelligence, dermatology, label noise, melanoma, nevi, skin cancer