Pitch-Scaled Spectrum based Excitation Model for HMM-based Speech Synthesis

Zhengqi Wen; Jianhua Tao

Pitch-Scaled Spectrum based Excitation Model for HMM-based Speech Synthesis

Publikation: Beitrag in Fachzeitschrift › Konferenzartikel › Beigetragen › Begutachtung

Beitragende

Zhengqi Wen - , Chinese Academy of Sciences (Autor:in)
Jianhua Tao - , Chinese Academy of Sciences (Autor:in)

Siemens AG

Abstract

The quality of speech generated from Hidden Markov Model (HMM)-based Speech Synthesis System (HTS) is suffered from 'buzzing' problem which is due to oversimplified vocoding technique. This paper proposed an excitation model to improve the parametric representation of speech in HTS. Residual got from inverse filtering keeps some detailed harmonic structure of speech which has not be included in linear prediction (LP) spectrum. Pitch-scaled spectrum can be used as a supplement of LP spectrum in speech reconstruction. This spectrum is compressed by principal component analysis (PCA) and eigenvalues are indicated as periodic parameter. Then, an aperiodic measure is also extracted from pitch-scaled spectrum and a sigmoid function is fitted to this measure as aperiodic parameter. These two parameters are integrated into HTS training as excitation parameter. Listening tests showed that this proposed technique could generate better sound than pulse train excitation model and take a comparable result with STRAIGHT.

Details

Originalsprache	Englisch
Seiten (von - bis)	609-+
Seitenumfang	2
Fachzeitschrift	International Conference on Signal Processing
Publikationsstatus	Veröffentlicht - 2012
Peer-Review-Status	Ja
Extern publiziert	Ja

Konferenz

Titel	IEEE 11th International Conference on Signal Processing (ICSP)
Dauer	21 - 25 Oktober 2012
Stadt	Beijing

Schlagworte

Schlagwörter

HMM-based Speech Synthesis, excitaton model, pitch-scaled spectrum, linear prediction, principal component analysis