An Investigation of Acoustic Features of the Lower Vocal Tract for Speaker Recognition .

Research output: Contribution to book/Conference proceedings/Anthology/ReportConference contributionContributedpeer-review

Abstract

Speaker recognition systems often use mel-scaled cepstral coefficients (MFCCs) as main features. In contrast to MFCCs, Godoy et al. (2015) proposed a different type of short-term spectral analysis that provides features related to the lower vocal tract (LVT). They are calculated as the ratio of the acoustic shorttime spectra during the closed and open phases of the glottal oscillation cycles based on a pitch-synchronous analysis. These features were suggested to be particularly speaker-specific and might therefore be suitable to substitute or complement MFCCs in speaker recognition systems. The present study investigated the benefit of these features in an i-vector-based speaker recognition system. Using the LVT features alone, the system achieved a speaker recognition rate of 92.3% with 63 enrolled speakers. When the LVT features were fused with conventional MFCC features, the recognition rate was about equal to the recognition rate using MFCC features alone (> 98%).

Details

Original languageEnglish
Title of host publicationElektronische Sprachsignalverarbeitung 2024
EditorsTimo Baumann
Publisher Dresden : TUDpress
Pages108-115
Number of pages8
ISBN (print)978-3-95908-325-6
Publication statusPublished - 1 Mar 2024
Peer-reviewedYes

Publication series

SeriesStudientexte zur Sprachkommunikation
Volume107
ISSN0940-6832

External IDs

ORCID /0000-0003-0167-8123/work/168716968

Keywords

Keywords

  • Paralinguistische Analysen