Smarter Self-distillation: Optimizing the Teacher for Surgical Video Applications

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

  • Amine Yamlahi - , Deutsches Krebsforschungszentrum (DKFZ), Universität Heidelberg (Autor:in)
  • Piotr Kalinowski - , Deutsches Krebsforschungszentrum (DKFZ), HIDSS4Health – Helmholtz Information and Data Science School for Health, Universität Heidelberg (Autor:in)
  • Patrick Godau - , Deutsches Krebsforschungszentrum (DKFZ), Universität Heidelberg, HIDSS4Health – Helmholtz Information and Data Science School for Health (Autor:in)
  • Rayan Younis - , Klinik und Poliklinik für Viszeral- Thorax- und Gefäßchirurgie, Universitätsklinikum Carl Gustav Carus Dresden (Autor:in)
  • Martin Wagner - , Klinik und Poliklinik für Viszeral- Thorax- und Gefäßchirurgie, Universitätsklinikum Carl Gustav Carus Dresden (Autor:in)
  • Beat Müller - , University Digestive Healthcare Center Basel - Clarunis (Autor:in)
  • Lena Maier-Hein - , Deutsches Krebsforschungszentrum (DKFZ), Universität Heidelberg, HIDSS4Health – Helmholtz Information and Data Science School for Health (Autor:in)

Abstract

Surgical workflow analysis poses significant challenges due to complex imaging conditions, annotation ambiguities, and the large number of classes in tasks such as action recognition. Self-distillation (SD) has emerged as a promising technique to address these challenges by leveraging soft labels, but little is known about how to optimize the quality of these labels for surgical scene analysis. In this work, we thoroughly investigate this issue. First, we show that the quality of soft labels is highly sensitive to several design choices and that relying on a single top-performing teacher selected based on validation performance often leads to suboptimal results. Second, as a key technical innovation, we introduce a multi-teacher distillation strategy that ensembles checkpoints across seeds and epochs within a training phase where soft labels maintain an optimal balance—neither underconfident nor overconfident. By ensembling at the teacher level rather than the student level, our approach reduces computational overhead during inference. Finally, we validate our approach on three benchmark datasets, where it demonstrates consistent improvements over existing SD methods. Notably, our method sets a new state-of-the-art (SOTA) performance on the CholecTriplet benchmark, achieving a 43.1% mean Average Precision (mAP) score and real-time inference time, thereby establishing a new standard for surgical video analysis in challenging and ambiguous environments. Code available at https://github.com/IMSY-DKFZ/self-distilled-swin.

Details

OriginalspracheEnglisch
TitelMedical Image Computing and Computer Assisted Intervention, MICCAI 2025
Redakteure/-innenJames C. Gee, Jaesung Hong, Carole H. Sudre, Polina Golland, Jinah Park, Daniel C. Alexander, Juan Eugenio Iglesias, Archana Venkataraman, Jong Hyo Kim
Herausgeber (Verlag)Springer Science and Business Media B.V.
Seiten522-531
Seitenumfang10
ISBN (elektronisch)978-3-032-05114-1
ISBN (Print)978-3-032-05113-4
PublikationsstatusVeröffentlicht - 2026
Peer-Review-StatusJa

Publikationsreihe

ReiheLecture notes in computer science
Band15968 LNCS
ISSN0302-9743

Konferenz

Titel28th International Conference on Medical Image Computing and Computer Assisted Intervention
KurztitelMICCAI 2025
Veranstaltungsnummer28
Dauer23 - 27 September 2025
Webseite
OrtDaejeon Convention Center
StadtDaejeon
LandSüdkorea

Schlagworte

Schlagwörter

  • Self-Distillation, Soft labels optimization, Surgical Action Recognition