An Investigation of Acoustic Features of the Lower Vocal Tract for Speaker Recognition .

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Abstract

Speaker recognition systems often use mel-scaled cepstral coefficients (MFCCs) as main features. In contrast to MFCCs, Godoy et al. (2015) proposed a different type of short-term spectral analysis that provides features related to the lower vocal tract (LVT). They are calculated as the ratio of the acoustic shorttime spectra during the closed and open phases of the glottal oscillation cycles based on a pitch-synchronous analysis. These features were suggested to be particularly speaker-specific and might therefore be suitable to substitute or complement MFCCs in speaker recognition systems. The present study investigated the benefit of these features in an i-vector-based speaker recognition system. Using the LVT features alone, the system achieved a speaker recognition rate of 92.3% with 63 enrolled speakers. When the LVT features were fused with conventional MFCC features, the recognition rate was about equal to the recognition rate using MFCC features alone (> 98%).

Details

OriginalspracheEnglisch
TitelElektronische Sprachsignalverarbeitung 2024
Redakteure/-innenTimo Baumann
Herausgeber (Verlag) Dresden : TUDpress
Seiten108-115
Seitenumfang8
ISBN (Print)978-3-95908-325-6
PublikationsstatusVeröffentlicht - 1 März 2024
Peer-Review-StatusJa

Publikationsreihe

ReiheStudientexte zur Sprachkommunikation
Band107
ISSN0940-6832

Externe IDs

ORCID /0000-0003-0167-8123/work/168716968

Schlagworte

Schlagwörter

  • Paralinguistische Analysen