Simulation of talking faces in the human brain improves auditory speech recognition

Katharina Von Kriegstein; Özgür Dogan; Martina Grüter; Anne Lise Giraud; Christian A. Kell; Thomas Grüter; Andreas Kleinschmidt; Stefan J. Kiebel

doi:10.1073/pnas.0710826105

Simulation of talking faces in the human brain improves auditory speech recognition

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

Katharina Von Kriegstein - , University College London, Newcastle University (Autor:in)
Özgür Dogan - , Universitätsklinikum Frankfurt (Autor:in)
Martina Grüter - , Universität Wien (Autor:in)
Anne Lise Giraud - , Ecole Normale Superieure (Autor:in)
Christian A. Kell - , Universitätsklinikum Frankfurt, Ecole Normale Superieure (Autor:in)
Thomas Grüter - , Universität Wien (Autor:in)
Andreas Kleinschmidt - , Commissariat à l’énergie atomique et aux énergies alternatives (CEA), INSERM - Institut national de la santé et de la recherche médicale (Autor:in)
Stefan J. Kiebel - , University College London (Autor:in)

Abstract

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face.

Details

Originalsprache	Englisch
Seiten (von - bis)	6747-6752
Seitenumfang	6
Fachzeitschrift	Proceedings of the National Academy of Sciences of the United States of America : PNAS
Jahrgang	105
Ausgabenummer	18
Publikationsstatus	Veröffentlicht - 6 Mai 2008
Peer-Review-Status	Ja
Extern publiziert	Ja

Externe IDs

PubMed	18436648
ORCID	/0000-0001-7989-5860/work/142244401

Schlagworte

ASJC Scopus Sachgebiete

Allgemein

Schlagwörter

fMRI, Multisensory, Predictive coding, Prosopagnosia

Bibliotheksschlagworte

610 Medizin und Gesundheit

Forschungsportal der TU Dresden