Articulatory Synthesis for Data Augmentation in Phoneme Recognition

P.K. Krug; P. Birkholz; B. Gerazov; D.R. van Niekerk; A. Xu; Y. Xu

doi:10.21437/Interspeech.2022-10874

Articulatory Synthesis for Data Augmentation in Phoneme Recognition

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

P.K. Krug - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)
P. Birkholz - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)
B. Gerazov - (Autor:in)
D.R. van Niekerk - (Autor:in)
A. Xu - (Autor:in)
Y. Xu - (Autor:in)

Abstract

While numerous studies on automatic speech recognition have been published in recent years describing data augmentation strategies based on time or frequency domain signal processing, few works exist on the artificial extensions of training data sets using purely synthetic speech data. In this work, the German KIEL corpus was augmented with synthetic data generated with the state-of-the-art articulatory synthesizer VOCALTRACTLAB. It is shown that the additional synthetic data can lead to a significantly better performance in single-phoneme recognition in certain cases, while at the same time, the performance can also decrease in other cases, depending on the degree of acoustic naturalness of the synthetic phonemes. As a result, this work can potentially guide future studies to improve the quality of articulatory synthesis via the link between synthetic speech production and automatic speech recognition.

Details

Originalsprache	Englisch
Titel	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Seiten	1228-1232
Seitenumfang	5
Band	2022-September
Publikationsstatus	Veröffentlicht - 2022
Peer-Review-Status	Ja

Externe IDs

Scopus	85137197320

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

articulatory speech synthesis, automatic speech recognition, data augmentation, phoneme recognition

Forschungsportal der TU Dresden