Aligning machine and human visual representations across abstraction levels

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Lukas Muttenthaler - , Alphabet Inc., Technical University of Berlin, BIFOLD, Max Planck Institute for Human Cognitive and Brain Sciences (Author)
  • Klaus Greff - , Alphabet Inc. (Author)
  • Frieda Born - , Technical University of Berlin, BIFOLD, Max Planck Institute for Human Development (Author)
  • Bernhard Spitzer - , Chair of Biopsychology, Max Planck Institute for Human Development, TUD Dresden University of Technology (Author)
  • Simon Kornblith - , Anthropic, GoogleDeepMind (Author)
  • Michael C. Mozer - , GoogleDeepMind (Author)
  • Klaus Robert Müller - , Alphabet Inc., Technical University of Berlin, BIFOLD, Korea University, Max Planck Institute for Informatics (Author)
  • Thomas Unterthiner - , Alphabet Inc. (Author)
  • Andrew K. Lampinen - , GoogleDeepMind (Author)

Abstract

Deep neural networks have achieved success across a wide range of applications, including as models of human behaviour and neural representations in vision tasks1,2. However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do3,4, raising questions regarding the similarity of their underlying representations. We need to determine what is missing for modern learning systems to exhibit more human-aligned behaviour. Here we highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions (for example, ref. 5), model representations do not accurately capture all these levels of abstraction. To address this misalignment, we first train a teacher model to imitate human judgements, then transfer human-aligned structure from its representations to refine the representations of pretrained state-of-the-art vision foundation models via fine-tuning. These human-aligned models more accurately approximate human behaviour and uncertainty across a wide range of similarity tasks, including a dataset of human judgements spanning multiple levels of semantic abstractions. They also perform better on a diverse set of machine learning tasks, increasing generalization and out-of-distribution robustness. Thus, infusing neural networks with additional human knowledge yields a best-of-both-worlds representation that is both more consistent with human cognitive judgements and more practically useful, paving the way towards more robust, interpretable and human-aligned artificial intelligence systems.

Details

Original languageEnglish
Pages (from-to)349-355
Number of pages7
JournalNature
Volume647
Issue number8089
Publication statusPublished - 13 Nov 2025
Peer-reviewedYes

External IDs

PubMed 41224979

Keywords

ASJC Scopus subject areas