An Investigation of Acoustic Features of the Lower Vocal Tract for Speaker Recognition .
Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung
Beitragende
Abstract
Speaker recognition systems often use mel-scaled cepstral coefficients (MFCCs) as main features. In contrast to MFCCs, Godoy et al. (2015) proposed a different type of short-term spectral analysis that provides features related to the lower vocal tract (LVT). They are calculated as the ratio of the acoustic shorttime spectra during the closed and open phases of the glottal oscillation cycles based on a pitch-synchronous analysis. These features were suggested to be particularly speaker-specific and might therefore be suitable to substitute or complement MFCCs in speaker recognition systems. The present study investigated the benefit of these features in an i-vector-based speaker recognition system. Using the LVT features alone, the system achieved a speaker recognition rate of 92.3% with 63 enrolled speakers. When the LVT features were fused with conventional MFCC features, the recognition rate was about equal to the recognition rate using MFCC features alone (> 98%).
Details
Originalsprache | Englisch |
---|---|
Titel | Elektronische Sprachsignalverarbeitung 2024 |
Redakteure/-innen | Timo Baumann |
Herausgeber (Verlag) | Dresden : TUDpress |
Seiten | 108-115 |
Seitenumfang | 8 |
ISBN (Print) | 978-3-95908-325-6 |
Publikationsstatus | Veröffentlicht - 1 März 2024 |
Peer-Review-Status | Ja |
Publikationsreihe
Reihe | Studientexte zur Sprachkommunikation |
---|---|
Band | 107 |
ISSN | 0940-6832 |
Externe IDs
ORCID | /0000-0003-0167-8123/work/168716968 |
---|
Schlagworte
Schlagwörter
- Paralinguistische Analysen