Glottal inverse filtering based on articulatory synthesis and deep learning

I. Langheinrich; S. Stone; X. Zhang; P. Birkholz

doi:10.21437/Interspeech.2022-10119

Glottal inverse filtering based on articulatory synthesis and deep learning

Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review

Contributors

I. Langheinrich - , TUD Dresden University of Technology (Author)
S. Stone - , Chair of Speech Technology and Cognitive Systems (Author)
X. Zhang - , Chair of Speech Technology and Cognitive Systems (Author)
P. Birkholz - , Chair of Speech Technology and Cognitive Systems (Author)

Abstract

We propose a new method to estimate the glottal vocal tract excitation from speech signals based on deep learning. To that end, a bidirectional recurrent neural network with long short-term memory units was trained to predict the glottal airflow derivative from the speech signal. Since natural reference data for this task is unobtainable at the required scale, we used the articulatory speech synthesizer VocalTractLab to generate a large dataset containing synchronous connected speech and glottal airflow signals for training. The trained model's performance was objectively evaluated by means of stationary synthetic signals from the OPENGLOT glottal inverse filtering benchmark dataset and by using our dataset of connected synthetic speech. Compared to the state of the art, the proposed model produced a more accurate estimation using OPENGLOT's physically synthesized signals but was less accurate for its computationally simulated signals. However, our model was much more accurate and plausible on the connected speech signals, especially for sounds with mixed excitation (e.g. fricatives) or sounds with pronounced zeros in their transfer function (e.g. nasals). Future work will introduce more variety into the training data (e.g. regarding pitch and phonation) and focus on estimating features of the glottal flow instead of the entire waveform.

Details

Original language	English
Title of host publication	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages	1327-1331
Number of pages	5
Volume	2022-September
Publication status	Published - 2022
Peer-reviewed	Yes

External IDs

Scopus	85140055042

Keywords

ASJC Scopus subject areas

Keywords

Glottal inverse filtering, glottal source estimation, source-filter separation, speech synthesis

Research Portal of the TU Dresden

Contributors

Abstract

Details

External IDs

Keywords

ASJC Scopus subject areas

Keywords