On the Optimal Set of Features and the Robustness of Classifiers in Radar-based Silent Phoneme Recognition

Pouriya Amini Digehsara; Christoph Wagner; Petr Schaffer; Michael Bärhold; Simon Stone; Dirk Plettemeier; Peter Birkholz

On the Optimal Set of Features and the Robustness of Classifiers in Radar-based Silent Phoneme Recognition

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

Pouriya Amini Digehsara - , Juniorprofessur für Kognitive Systeme (Autor:in)
Christoph Wagner - , Professur für Sprachtechnologie und Kognitive Systeme, Juniorprofessur für Kognitive Systeme (Autor:in)
Petr Schaffer - , Professur für Hochfrequenztechnik (Autor:in)
Michael Bärhold - , Professur für Hochfrequenztechnik (Autor:in)
Simon Stone - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)
Dirk Plettemeier - , Professur für Hochfrequenztechnik (Autor:in)
Peter Birkholz - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)

Abstract

Silent speech recognition (SSR) is an active area of research with applications ranging from speech restoration to speech enhancement. Radar-based SSR has been proposed and investigated as a non-invasive method to infer vocal tract states and articulatory movements from measured changes in scattering parameters. One of the challenges in developing a radar-based SSR system is to determine the optimal set of features from these measurements. In this study, we therefore investigated the following problems: (a) The selection of the features that play the most significant role for classification. (b) The determination of the contribution of each reflection and transmission spectrum and the most important frequencies. (c) The determination of the performance of the classifiers when using fewer features. (d) The determination of the robustness of the classifiers against different noise levels. The data used in this study consisted of 230 samples of 25 German phonemes (15 vowels, each in 10 contexts, and 10 consonants, each in 8 contexts) produced by two German native speakers. Using the full feature set, a Linear Discriminant Analysis (LDA) classifier achieved up to 94 % classification accuracy for speaker 1 and 84 % for speaker 2. Using only the most important features as identified by a decision tree, the classification accuracy deteriorated slightly in most conditions, but in one case improved the accuracy from 73.5 % to 81 %. Regarding the robustness against noise, the accuracy of the LDA dropped sharply with increasing noise levels, while the decrease of the SVM’s accuracy was less steep.

Details

Originalsprache	Englisch
Titel	Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2021
Redakteure/-innen	Stefan Hillmann, Benjamin Weiss, Thilo Michael, Sebastian Möller
Herausgeber (Verlag)	Dresden : TUDpress
Seiten	112-119
Seitenumfang	8
ISBN (Print)	978-3-959082-27-3
Publikationsstatus	Veröffentlicht - 1 März 2021
Peer-Review-Status	Ja

Externe IDs

ORCID	/0000-0003-0167-8123/work/168716961

Schlagworte

Schlagwörter

Automatische Spracherkennung

Forschungsportal der TU Dresden