Self-Supervised Solution to the Control Problem of Articulatory Synthesis

P.K. Krug; P. Birkholz; B. Gerazov; D.R. van Niekerk; A. Xu; Y. Xu

doi:10.21437/Interspeech.2023-2173

Self-Supervised Solution to the Control Problem of Articulatory Synthesis

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/Gutachten › Beitrag in Konferenzband › Beigetragen › Begutachtung

Beitragende

P.K. Krug - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)
P. Birkholz - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)
B. Gerazov - (Autor:in)
D.R. van Niekerk - (Autor:in)
A. Xu - (Autor:in)
Y. Xu - (Autor:in)

Abstract

Given an articulatory-to-acoustic forward model, it is a priori unknown how its motor control must be operated to achieve a desired acoustic result. This control problem is a fundamental issue of articulatory speech synthesis and the cradle of acoustic-to-articulatory inversion, a discipline which attempts to address the issue by the means of various methods. This work presents an end-to-end solution to the articulatory control problem, in which synthetic motor trajectories of Monte-Carlo-generated artificial speech are linked to input modalities (such as natural speech recordings or phoneme sequence input) via speaker-independent latent representations of a vector-quantized variational autoencoder. The proposed method is self-supervised and thus, in principle, synthesizer and speaker model independent.

Details

Originalsprache	Englisch
Titel	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Seiten	4329-4333
Seitenumfang	5
Band	2023-August
Publikationsstatus	Veröffentlicht - 2023
Peer-Review-Status	Ja

Externe IDs

Scopus	85171564576

Schlagworte

ASJC Scopus Sachgebiete

Schlagwörter

Acoustic-to-articulatory inversion, VQ-VAE

Forschungsportal der TU Dresden