Aligning machine and human visual representations across abstraction levels

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

  • Lukas Muttenthaler - , Alphabet Inc., Technische Universität Berlin, BIFOLD, Max-Planck-Institut für Kognitions- und Neurowissenschaften (Autor:in)
  • Klaus Greff - , Alphabet Inc. (Autor:in)
  • Frieda Born - , Technische Universität Berlin, BIFOLD, Max Planck Institute for Human Development (Autor:in)
  • Bernhard Spitzer - , Professur für Biopsychologie, Max Planck Institute for Human Development (Autor:in)
  • Simon Kornblith - , Anthropic, Google DeepMind Inc. (Autor:in)
  • Michael C. Mozer - , Google DeepMind Inc. (Autor:in)
  • Klaus Robert Müller - , Alphabet Inc., Technische Universität Berlin, BIFOLD, Korea University, Max-Planck-Institut für Informatik (Autor:in)
  • Thomas Unterthiner - , Alphabet Inc. (Autor:in)
  • Andrew K. Lampinen - , Google DeepMind Inc. (Autor:in)

Abstract

Deep neural networks have achieved success across a wide range of applications, including as models of human behaviour and neural representations in vision tasks1,2. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do3,4, raising questions regarding the similarity of their underlying representations. We need to determine what is missing for modern learning systems to exhibit more human-aligned behaviour. Here we highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions (for example, ref. 5), model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgements, then transfer human-aligned structure from its representations to refine the representations of pretrained state-of-the-art vision foundation models via fine-tuning. These human-aligned models more accurately approximate human behaviour and uncertainty across a wide range of similarity tasks, including a dataset of human judgements spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognitive judgements and more practically useful, paving the way towards more robust, interpretable and human-aligned artificial intelligence systems.

Details

OriginalspracheEnglisch
Seiten (von - bis)349-355
Seitenumfang7
FachzeitschriftNature
Jahrgang647
Ausgabenummer8089
PublikationsstatusVeröffentlicht - 13 Nov. 2025
Peer-Review-StatusJa

Externe IDs

PubMed 41224979
ORCID /0000-0001-9752-932X/work/199218215

Schlagworte

ASJC Scopus Sachgebiete