A Comparative Study of 3D and 1D Acoustic Simulations of the Higher Frequencies of Speech

R. Blandin; S. Stone; A. Remacle; V. Didone; P. Birkholz

doi:10.1109/TASLP.2023.3313423

A Comparative Study of 3D and 1D Acoustic Simulations of the Higher Frequencies of Speech

Publikation: Beitrag in Fachzeitschrift › Forschungsartikel › Beigetragen › Begutachtung

Beitragende

R. Blandin - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)
S. Stone - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)
A. Remacle - (Autor:in)
V. Didone - (Autor:in)
P. Birkholz - , Professur für Sprachtechnologie und Kognitive Systeme (Autor:in)

Abstract

Articulatory synthesis generates speech sounds by simulating the physical phenomena involved in speech production. The accuracy of the physical modelling is expected to affect the naturalness of the synthesis: the more realistic the description is, the greater the naturalness is expected to be. In this work, the accuracy of acoustic wave propagation in the vocal tract was evaluated with two perceptual experiments. Sustained vowels generated using a one-dimensional acoustic model, a three-dimensional acoustic model and an artificial bandwidth extension algorithm (without a physical basis) were compared. Since the difference between the acoustic methods tested affects mainly the frequencies above 4 kHz, we ensured that the low frequency part of the stimuli, up to 4 kHz, was similar. Thus, the participants' responses were based only on the differences at high frequency. The first experiment was a pair comparison, in which the participants had to select the more natural sounding stimuli. In the second experiment, the participants had to rate the naturalness of the stimuli on a linear scale. The results confirmed that a more accurate physical modeling leads to greater naturalness. However, this was limited to the phonemes /o/ and /u/, for which transverse resonances in the anterior vocal tract may play an important role that only a 3D acoustic simulation can accurately represent. It was also found that male stimuli were perceived as significantly more natural than female ones. However, voice quality did not affect naturalness.

Details

Originalsprache	Englisch
Seiten (von - bis)	3837-3847
Seitenumfang	11
Fachzeitschrift	IEEE/ACM Transactions on Audio Speech and Language Processing
Jahrgang	31
Publikationsstatus	Veröffentlicht - 2023
Peer-Review-Status	Ja

Externe IDs

Scopus	85171529740

Forschungsportal der TU Dresden

A Comparative Study of 3D and 1D Acoustic Simulations of the Higher Frequencies of Speech

Beitragende

Abstract

Details

Externe IDs

Schlagworte