Smarter Self-distillation: Optimizing the Teacher for Surgical Video Applications

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Contributors

  • Amine Yamlahi - , German Cancer Research Center (DKFZ), Heidelberg University  (Author)
  • Piotr Kalinowski - , German Cancer Research Center (DKFZ), HIDSS4Health – Helmholtz Information and Data Science School for Health, Heidelberg University  (Author)
  • Patrick Godau - , German Cancer Research Center (DKFZ), Heidelberg University , HIDSS4Health – Helmholtz Information and Data Science School for Health (Author)
  • Rayan Younis - , Department of Visceral, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus Dresden (Author)
  • Martin Wagner - , Department of Visceral, Thoracic and Vascular Surgery, University Hospital Carl Gustav Carus Dresden (Author)
  • Beat Müller - , University Digestive Healthcare Center Basel - Clarunis (Author)
  • Lena Maier-Hein - , German Cancer Research Center (DKFZ), Heidelberg University , HIDSS4Health – Helmholtz Information and Data Science School for Health (Author)

Abstract

Surgical workflow analysis poses significant challenges due to complex imaging conditions, annotation ambiguities, and the large number of classes in tasks such as action recognition. Self-distillation (SD) has emerged as a promising technique to address these challenges by leveraging soft labels, but little is known about how to optimize the quality of these labels for surgical scene analysis. In this work, we thoroughly investigate this issue. First, we show that the quality of soft labels is highly sensitive to several design choices and that relying on a single top-performing teacher selected based on validation performance often leads to suboptimal results. Second, as a key technical innovation, we introduce a multi-teacher distillation strategy that ensembles checkpoints across seeds and epochs within a training phase where soft labels maintain an optimal balance—neither underconfident nor overconfident. By ensembling at the teacher level rather than the student level, our approach reduces computational overhead during inference. Finally, we validate our approach on three benchmark datasets, where it demonstrates consistent improvements over existing SD methods. Notably, our method sets a new state-of-the-art (SOTA) performance on the CholecTriplet benchmark, achieving a 43.1% mean Average Precision (mAP) score and real-time inference time, thereby establishing a new standard for surgical video analysis in challenging and ambiguous environments. Code available at https://github.com/IMSY-DKFZ/self-distilled-swin.

Details

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention, MICCAI 2025
EditorsJames C. Gee, Jaesung Hong, Carole H. Sudre, Polina Golland, Jinah Park, Daniel C. Alexander, Juan Eugenio Iglesias, Archana Venkataraman, Jong Hyo Kim
PublisherSpringer Science and Business Media B.V.
Pages522-531
Number of pages10
ISBN (electronic)978-3-032-05114-1
ISBN (print)978-3-032-05113-4
Publication statusPublished - 2026
Peer-reviewedYes

Publication series

SeriesLecture notes in computer science
Volume15968 LNCS
ISSN0302-9743

Conference

Title28th International Conference on Medical Image Computing and Computer Assisted Intervention
Abbreviated titleMICCAI 2025
Conference number28
Duration23 - 27 September 2025
Website
LocationDaejeon Convention Center
CityDaejeon
CountryKorea, Republic of

Keywords

Keywords

  • Self-Distillation, Soft labels optimization, Surgical Action Recognition