On the Optimal Set of Features and the Robustness of Classifiers in Radar-based Silent Phoneme Recognition

Pouriya Amini Digehsara; Christoph Wagner; Petr Schaffer; Michael Bärhold; Simon Stone; Dirk Plettemeier; Peter Birkholz

On the Optimal Set of Features and the Robustness of Classifiers in Radar-based Silent Phoneme Recognition

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

Pouriya Amini Digehsara - , Junior Professorship in Cognitive Systems (Author)
Christoph Wagner - , Chair of Speech Technology and Cognitive Systems, Junior Professorship in Cognitive Systems (Author)
Petr Schaffer - , Chair of Radio Frequency and Photonics Engineering (Author)
Michael Bärhold - , Chair of Radio Frequency and Photonics Engineering (Author)
Simon Stone - , Chair of Speech Technology and Cognitive Systems (Author)
Dirk Plettemeier - , Chair of Radio Frequency and Photonics Engineering (Author)
Peter Birkholz - , Chair of Speech Technology and Cognitive Systems (Author)

Abstract

Silent speech recognition (SSR) is an active area of research with applications ranging from speech restoration to speech enhancement. Radar-based SSR has been proposed and investigated as a non-invasive method to infer vocal tract states and articulatory movements from measured changes in scattering parameters. One of the challenges in developing a radar-based SSR system is to determine the optimal set of features from these measurements. In this study, we therefore investigated the following problems: (a) The selection of the features that play the most significant role for classification. (b) The determination of the contribution of each reflection and transmission spectrum and the most important frequencies. (c) The determination of the performance of the classifiers when using fewer features. (d) The determination of the robustness of the classifiers against different noise levels. The data used in this study consisted of 230 samples of 25 German phonemes (15 vowels, each in 10 contexts, and 10 consonants, each in 8 contexts) produced by two German native speakers. Using the full feature set, a Linear Discriminant Analysis (LDA) classifier achieved up to 94 % classification accuracy for speaker 1 and 84 % for speaker 2. Using only the most important features as identified by a decision tree, the classification accuracy deteriorated slightly in most conditions, but in one case improved the accuracy from 73.5 % to 81 %. Regarding the robustness against noise, the accuracy of the LDA dropped sharply with increasing noise levels, while the decrease of the SVM’s accuracy was less steep.

Details

Original language	English
Title of host publication	Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2021
Editors	Stefan Hillmann, Benjamin Weiss, Thilo Michael, Sebastian Möller
Publisher	Dresden : TUDpress
Pages	112-119
Number of pages	8
ISBN (print)	978-3-959082-27-3
Publication status	Published - 1 Mar 2021
Peer-reviewed	Yes

External IDs

ORCID	/0000-0003-0167-8123/work/168716961

Keywords

Automatische Spracherkennung

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

Keywords