speech synthesis as a statistical machine learning problem slides
TRANSCRIPT
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
1/66
Speech Synthesis as A Statistical
Keiichi Tokuda
Nagoya Institute of Technology
ASRU2011, HawaiiDecember 14, 2011
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
2/66
Introduction
Rule-based, formant synthesis ~90s Hand-crafting each phonetic units by rules
Corpus-based, concatenative synthesis (90s~)
Concatenate speech units (waveform) from a database
Single inventory: diphone synthesis
Multiple inventory: unit selection synthesisorpus- ase , s a s ca parame r c syn es s
Source-filter model + statistical acoustic model
Hidden Markov model: HMM-based synthesis
How we can formulate and understand the whole
corpus-based speech synthesis process in aunified statistical framework?
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
3/66
Problem of speech synthesis
We have a speech database, i.e., a set of texts and
corresponding speech waveforms.
Given a text to be synthesized, what is the speech
: s eech waveforms
: textsdatabase
Given
W
: text to be synthesizedw
: speec wave orm
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
4/66
Statistical formulation of speech synthesis (1/8)
ayes an ramewor or pre c on
: set of texts
database Given
: text to be synthesized
1. Estimate redictive distribution iven variables
: speech waveform unknown
3
2. Draw sample from the distribution
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
5/66
Statistical formulation of speech synthesis (2/8)
1. Estimating predictive distribution is hard Introduce acoustic model parameters
: acoustic model (e.g. HMM )
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
6/66
Statistical formulation of speech synthesis (3/8)
2. Using speech waveform directly is difficult Introduce parametric its representation
: parametric representation of speech waveform
(e.g., cepstrum, LPC, LSP, F0, aperiodicity)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
7/66
Statistical formulation of speech synthesis (4/8)
3. Same texts can have multiple pronunciations, POS, etc. Introduce labels
: labels derived from text
e.g. prons, , ex ca s ress, grammar, pause
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
8/66
Statistical formulation of speech synthesis (5/8)
4. Difficult to perform integral & sum over auxiliary variables Approximated by joint max
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
9/66
Statistical formulation of speech synthesis (6/8)
5. Joint maximization is hard Approximated by step-by-step maximizations
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
10/66
Statistical formulation of speech synthesis (7/8)
6. Training also requires parametric form of wav & labels Introduce them & approx by step-by-step maximizations
: parametric representation of speech waveforms
: a e s er ve rom ex s
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
11/66
Statistical formulation of speech synthesis (8/8)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
12/66
Overview of this talk
1. Mathematical formulation2. Implementation of individual components
3. Examples demonstrating its flexibility
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
13/66
HMM-based speech synthesis system
SPEECH
DATABASEExcitation Spectral
Speech signal Training part
Parameter
extraction
Parameter
ExtractionSpectralparameters
Excitation
parameters
Text analysis
Training HMMsLabels
TEXTContext-dependent HMMs
Text analysisParameter generation
from HMMs
LabelsExcitation Spectral
Excitation
generation
Synthesis
Filter
SYNTHESIZED
SPEECH
parame ersExcitation
parame ers
Synthesis part
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
14/66
HMM-based speech synthesis system
SPEECH
DATABASEExcitation Spectral
Speech signal Training part
Parameterextraction
Parameter
ExtractionSpectralparameters
Excitation
parameters
Text analysis
,maxargL
Training HMMsLabels ),|(maxarg LO
P
TEXTContext-dependent HMMs),|(maxarg loo
oP
Text analysisParameter generation
from HMMs
LabelsExcitation Spectral,maxar wll P
Excitation
generation
Synthesis
Filter
SYNTHESIZED
SPEECH
parame ersExcitation
parame ers
Synthesis part
l
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
15/66
HMM-based speech synthesis system
Training partSPEECHDATABASE
Excitation Spectral
Speech signal
Text analysis
Parameter
extraction
Parameter
ExtractionSpectralparameters
Excitation
parameters
Training HMMsLabels
TEXTContext-dependent HMMs
Text analysisParameter generation
from HMMs
LabelsExcitation Spectral
Synthesis part
Excitation
generation
Synthesis
Filter
SYNTHESIZED
SPEECH
parame ersExcitation
parame ers
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
16/66
Human speech production
Modulation of carrier wave
by speech information
SpeechFrequency
transfer
characteristicsSound source
Voiced: pulse
Unvoiced: noise
Magnitude
start--end
air flow
frequency
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
17/66
Source-filter model
LinearExcitationPulse train
Source excitation part Vocal tract resonance part
time-invariant
system
zH
)(ne
)(*)()( nenhnx White noise
Speech
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
18/66
ML estimation of spectral parameter
Mel-cepstral representation of speech spectra)
m
m
zmczH 0 )(exp)(ncy
(ra
M
m
m
zmczH 0
~)(exp)(
rped
frequ
warped frequencymel-scale frequency
~
1
11
1
~ jez
zz
Wa
Frequency (rad)
0 2/
-est mat on o me -cepstrum
x : speech waveform (Gaussian process)
c
c : mel-cepstrum
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
19/66
Waveform reconstruction
Original speech
Mel-cepstrumUnvoiced / voicedF0
u se ra n
Synthesis filter
H)(ne )(nxWhite noise
These speech parameters should be modeled by HMM
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
20/66
HMM-based speech synthesis system
SPEECH
DATABASEExcitation Spectral
Speech signal Training part
Parameter
extraction
Parameter
ExtractionText analysis Spectralparameters
Excitation
parameters
Training HMMsLabels
),
|(maxarg
LO P
TEXTContext-dependent HMMs
Text analysisParameter generation
from HMMsLabelsExcitation Spectral
Excitation
generation
Synthesis
Filter
SYNTHESIZED
SPEECH
parame ersExcitation
parame ers
Synthesis part
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
21/66
Hidden Markov model (HMM)
11a 22a 33a
12a 23a
ij
)( tqb o
: output probability
b o b o b o
1o 2o 3o 4o 5o To oObservation
sequence
1 1 1 1 2 2 3 3 qState
sequence
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
22/66
Structure of state output (observation) vector
Spectral parameters
(e.g., mel-cepstrum, LSPs)
Spectrum part
log F0 with V/UV
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
23/66
Observation of F0
1R
voiced0
2 Runvoiced
req
uenc
LogF
Time
Multi-space distribution HMM (MSD-HMM)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
24/66
Multi-space probablity distribution HMM(MSD-HMM)
1 2 3
1,1w 1,2w 1,3w1
1n
R
)(11 xN )(21 xN )(31 xN
2,1w 2,2w 2,3w2
2n
R)(12 xN )(22 xN )(32 xN
Mw ,1 Mw ,2 Mw ,3
M
M
)(1 xMN )(2 xMN )(3 xMN
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
25/66
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
26/66
Structure of state-output distributions
t
Mel-cepstrumtc
Voiced
pe
ctru
tc
Unvoiced
Voiced
tc2
Unvoiced
Voiced
itation
tptp
Unvoiced
Ex tp2
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
27/66
Contextual factors
Phoneme {preceding, succeeding} two phonemes
current phoneme
# of phonemes in {preceding, current, succeeding} syllable
{accent, stress} of {preceding, current, succeeding} syllable Position of current syllable in current word
o prece ng, succee ng accen e , s resse sy a e n curren p rase
# of syllables {from previous, to next} {accented, stressed} syllable
Vowel within current syllable
Word Part of speech of {preceding, current, succeeding} word
# of syllables in {preceding, current, succeeding} word
Position of current word in current phrase
, # of words {from previous, to next} content word
Phrase # of syllables in {preceding, current, succeeding} phrase
Huge # of combinations Difficult to have all possible models
..
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
28/66
Decision tree-based state clustering [Odell; 95]
k-a+b yes noR=silence?
L=voice?
L=w?t-a+h
yes
eses
yes no
no
no
no
L=gy?R=silence?s-i+n
C=unvoice?
C=vowel?
R=silence?
R=cl?
L=gy?
L=voice?
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
29/66
Stream-dependent tree-based clustering (1)
pectrum excitation have di erent contextdependency Build decision trees separately
for
mel-cepstrum
Decision trees
for F0
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
30/66
State duration modeling
HMM (Hidden Markov Model) State duration prob. depends only on transition prob.
State duration probability exponentially decreases
HSMM (Hidden Semi Markov Model)
HMM + explicit duration model HSMM
3-demensional Gaussian
1q 2q 3q
K
1O 2O 3O 4O 5O 6O 7O 9O8O
1d 2d 3d iii
1,
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
31/66
Stream-dependent tree-based clustering (2)
State durationThree dimensional Gaussian
Decision treeec s on rees
for
mel-cepstrum
for state dur.
models
Decision trees
for F0
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
32/66
HMM-based speech synthesis system
SPEECHDATABASE
Excitation Spectral
Speech signal Training part
Parameter
extraction
Parameter
ExtractionSpectral
parameters
Excitation
parameters
Text analysis
Training HMMsLabels
TEXTContext-dependent HMMs
)
,
|(maxarg loo P
Text analysisParameter generation
from HMMs
LabelsExcitation Spectral
o
Excitation
generation
Synthesis
Filter
SYNTHESIZED
SPEECH
ExcitationSynthesis part
parame ers parame ers
C i i f HMM f i
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
33/66
Composition of sentence HMM for given text
TEXTG2P
w
Text analysisText normalization
Pause prediction
context-dependent label
sequence l
sentence HMM given labels
,
S h t ti l ith
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
34/66
Speech parameter generation algorithm [Tokuda; 00]
For given sentence HMM, determine a speech parametervector sequence which maximizes
TTTT ],,,[ 21 Toooo
),|(),|(),|( lqqolo PPP
)
,
|()
,|(max lqqoq
q
PP
),|(maxarg lqqq
P
),|(maxarg qooo
P
D t i ti f t t
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
35/66
Determination of state sequence
K
ii dpP ),|( lq
i 1
ip
id
: state-duration distribution of -th state
: state duration of -th state
i
i
K : # of states in a sentence HMM for l
Gaussian
iiiiiii mdmdNdp ,|
S h t ti l ith
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
36/66
Speech parameter generation algorithm
For given HMM ,determine a speech parameter vectorSequence which maximizes
TTTT ],,,[ 21 Toooo
),|(),|(),|( lqqolo PPP
)
,
|()
,|(max lqqoq
q
PP
),|(maxarg lqqq
P
),(maxarg qoo o P
With t d i f t
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
37/66
Without dynamic feature
becomes a sequence of mean vectors
D i f t
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
38/66
Dynamic features
ct
2
.
ttt
t
c112
ttttt
cccc
tc1tc 1tc tc
1tc 1tc
5.0 5.0 1 12
tc tc1tc 1tc 1t 1t
I t ti f d i f t
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
39/66
Integration of dynamic features
Relationship between speech parameter vectors &static feature vectors
o c
T
TTT
tttt ccco 2,, M M
M3
M
S l ti f th bl (1/2)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
40/66
Solution for the problem (1/2)
By setting O
,,
O
c
,
1
1
qqq
WWcW
TT
TTTT ],,,[ 21 Tcccc w ere
TTTT ],,,[ 21 Tqqqq
TTTT ,,, 21 Tqqqq
S l ti f th bl (2/2)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
41/66
Solution for the problem (2/2)
1qTW W c
TW 1
q
q
Generated speech parameter trajector
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
42/66
Generated speech parameter trajectory
Trajectory HMM
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
43/66
Trajectory HMM
is not a proper distribution of c),|(),|( lWclo PP
Conventional HMM Trajectory HMM
Training ),|(maxarg LO
P ),|(maxarg LC
P
Synthesis
maxar
,maxarg
lc
o Wcoo
P
,
maxar lcP
Solve inconsistenc between trainin & s nthesis
c
improving the model accuracy
Generated spectra
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
44/66
Generated spectra
a i silsil
0123
45
(kHz)
w/o dyn
01
(345
Hz)
w dyn
Spectra changing smoothly between phonemes
Generated F0
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
45/66
Generated F0
uency
100150200
natural speech
Fre 50
0 1 time (s)2
equency
100150200
without dynamic features
Fr 50
0 1 2 time (s)
requency
100150200
F
0 1 2 time (s)
Effect of dynamic features
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
46/66
Effect of dynamic features
Mel-cepstrum
static+ static2
static+
Smooth!2log F0static
Overview of this talk
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
47/66
Overview of this talk
1. Mathematical formulation2. Implementation of individual components
3. Examples demonstrating its flexibility
Emotional speech synthesis
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
48/66
Emotional speech synthesis
text neutral angry
class! Turn off it!
You must attend the weekly meeting!
trained with 200 utterances
Speaker adaptation (mimicking voices)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
49/66
Speaker adaptation (mimicking voices)
MLLR-based adaptation
Adaptation
data
speaker A
Speaker-independent Adapted model
adaptation
w/o adaptation (initial model)
Adapted with 50 utterances
pea er s spea er- epen en sys em
Speaker interpolation (mixing voices)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
50/66
Speaker interpolation (mixing voices)
Linear combination of two speaker-dependent models
Model A Model BInterpolated model
1.00 0.75 0.50 0.25 0.00A:
0.00 0.25 0.50 0.75 1.00B:
Voice morphing
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
51/66
Voice morphing
Two voices:
Four voices:
Interpolation of speaking styles
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
52/66
Interpolation of speaking styles
Base model A Base model B
Interpolation extrapolation
Neutral High Tension
Eigenvoice (producing voices) [Shichiri; 02]
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
53/66
Eigenvoice (producing voices) [Shichiri; 02]
Speaker 1 Speaker 2 Speaker S
)1(e ( )ke ( )Ke
System OverviewClick here for a demo
Multilingual speech synthesis
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
54/66
Multilingual speech synthesis
apanese
American English
Chinese (Mandarin) (by ATR)
Brazilian Portuguese (by Nitech, and )
European Portuguese (by Nitech, Univ of Porto, and UFRJ)
(by Bostjan Vesnicer, University of Ljubljana, Slovenia)
Swedish (by Anders Lundgren, KTH, Sweden) ,
Korean (by Sang-Jin Kim, ETRI, Korea)
Finish (by TKK, Finland)
Baby English (by Univ of Edinburgh, UK)
Polish, Slovak, Arabic, Farsi, Croatian, Polyglot, etc.
Singing voice synthesis [Oura; 10] (1/2)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
55/66
Singing voice synthesis [Oura; 10] (1/2)
Trained HMMs Male:
Female:
Musical score Singing voicedatabase
Synthesized
singing voice
Male:
Female:
ap a on:
Overview of this talk
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
56/66
Overview of this talk
1. Mathematical formulation2. Implementation of individual components
3. Examples demonstrating its flexibility
Inclusion of all components
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
57/66
Inclusion of all components
Problem of statistical parametric speech synthesis
WXw ,,|P
lcP wlP c)(P
Draw from
Waveform generation Parameter generation Text processing
Ll,
LC ,|P
os er or o mo e parame er
Cc ddddP ,|WLP XC|P
Text processing PriorSpeech analysis
Relaxing approximations
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
58/66
Relaxing approximations
Marginalizing model parametersVariational Bayesian acoustic modeling for speech synthesis
an a u;
Marginalizing labelso n ron -en ac -en mo e ra n ng ura;
Inclusion of waveform generation partave orm- eve s a s ca mo e a a;
Summary
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
59/66
Summary
Statistical approach to speech synthesis
Whole speech synthesis process is described in astatistical framework
It gives a unified view and reveals what is correct
and what is wron
Importance of the database
Still we have many problems should be solved:
Combination with text processing part, etc.
Final message
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
60/66
Final message
s speec syn es s a messy pro em
Let us join speech synthesis research!
Thanks!
References (1/4)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
61/66
References (1/4)
ag sa a; - nu- speec syn es s sys em, , .
Black;'96 - "Automatically clustering similar units...," Euro speech, '97.Beutnagel;'99 - "The AT&T Next-Gen TTS system," Joint ASA, EAA, & DAEA meeting, '99.
Yoshimura;'99 - "Simultaneous modeling of spectrum ...," Eurospeech, '99.
' - " " - ' ..., . , , .
Imai;'88 - "Unbiased estimator of log spectrum and its application to speech signal...," EURASIP, '88.
Kobayashi;'84 - "Spectral analysis using generalized cepstrum," IEEE Trans. ASSP, 32, '84.Tokuda;'94 - "Mel-generalized cepstral analysis -- A unified approach to speech spectral...," ICSLP, '94.
Imai;'83 - "Ce stral anal sis s nthesis on the mel fre uenc scale," ICASSP, '83.
Fukada;'92 - "An adaptive algorithm for mel-cepstral analysis of speech," ICASSP, '92.
Itakura;'75 - "Line spectrum representation of linear predictive coefficients of speech...," J. ASA (57), '75.
Tokuda;'02 - "Multi-space probability distribution HMM," IEICE Trans. E85-D(3), '02.
Odell;'95 - "The use of context in large vocaburary...," PhD thesis, University of Cambridge, '95.Shinoda;'00 - "MDL-based context-dependent subword modeling...," Journal of ASJ(E) 21(2), '00.
Yoshimura;'98 - "Duration modeling for HMM-based speech synthesis," ICSLP, '98.
Tokuda;'00 - "Speech parameter generation algorithms for HMM-based speech synthesis," ICASSP, '00.
Kobayashi;'85 - "Mel generalized-log spectrum approximation...," IEICE Trans. J68-A (6), '85.
' " " '- ..., , .Donovan;'95 - "Improvements in an HMM-based speech synthesiser," Eurospeech, '95.
Kawai;'04 - "XIMERA: A new TTS from ATR based on corpus-based technologies," ISCA SSW5, '04.
Hirai;'04 - "Using 5 ms segments in concatenative speech synthesis," Proc. ISCA SSW5, '04.
References (2/4)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
62/66
References (2/4)
ou a; - n se ec on or speec syn es s ase on a new acous c arge cos , n erspeec , .
Huang;'96 - "Whistler: A trainable text-to-speech system," ICSLP, '96.Mizutani;'02 - "Concatenative speech synthesis based on HMM," ASJ autumn meeting, '02.
Ling;'07 - "The USTC and iFlytek speech synthesis systems...," Blizzard Challenge workshop, 07.
' - " - " ..., , .
Plumpe;'98 - "HMM-based smoothing for concatenative speech synthesis," ICSLP, '98.
Wouters;'00 - "Unit fusion for concatenative speech synthesis," ICSLP, '00.Okubo;'06 - "Hybrid voice conversion of unit selection and generation...," IEICE Trans. E89-D(11), '06.
A lett;'08 - "Combinin statistical arametric s eech s nthesis and unit selection..." Lan Tech, '08.
Pollet;'08 - "Synthesis by generation and concatenation of multiform segments," Interspeech, '08.
Yamagishi;'06 - "Average-voice-based speech synthesis," PhD thesis, Tokyo Inst. of Tech., '06.
Yoshimura;'97 - "Speaker interpolation in HMM-based speech synthesis system," Eurospeech, '97.
Tachibana;'05 - "Speech synthesis with various emotional expressions...," IEICE Trans. E88-D(11), '05.Kuhn;'00 - "Rapid speaker adaptation in eigenvoice space," IEEE Trans. SAP 8(6), '00.
Shichiri;'02 - "Eigenvoices for HMM-based speech synthesis," ICSLP, '02.
Fujinaga;'01 - "Multiple-regression hidden Markov model," ICASSP, '01.
Nose;'07 - "A style control technique for HMM-based expressive speech...," IEICE Trans. E90-D(9), '07.
' " " '- - , , .Kawahara;'97 - "Restructuring speech representations using a ...", Speech Communication, 27(3), '97.
Zen;'07 - "Details of the Nitech HMM-based speech synthesis system...", IEICE Trans. E90-D(1), '07.
Abdl-Hamid;'06 - "Improving Arabic HMM-based speech synthesis quality," Interspeech, '06.
References (3/4)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
63/66
References (3/4)
emp nne; - n egra on o e armon c p us no se mo e n o e..., as er es s, , .
Banos;'08 - "Flexible harmonic/stochastic modeling...," V. Jornadas en Tecnologias del Habla, '08.Cabral;'07 - "Towards an improved modeling of the glottal source in...," ISCA SSW6, '07.
Maia;'07 - "An excitation model for HMM-based speech synthesis based on ...," ISCA SSW6, '07.
' - " - - - " ' , , .
Drugman;'09 - "Using a pitch-synchronous residual codebook for hybrid HMM/frame...", ICASSP, '09.
Dines;'01 - "Trainable speech synthesis with trended hidden Markov models," ICASSP, '01.Sun;'09 - "Polynomial segment model based statistical parametric speech synthesis...," ICASSP, '09.
Bul ko;'02 - "Robust s licin costs and efficient search with BMM models for...," ICASSP, '02.
Shannon;'09 - "Autoregressive HMMs for speech synthesis," Interspeech, '09.
Zen;'06 - "Reformulating the HMM as a trajectory model...", Computer Speech & Language, 21(1), '06.
Wu;'06 - "Minimum generation error training for HMM-based speech synthesis," ICASSP, '06.
Hashimoto;'09 - "A Bayesian approach to HMM-based speech synthesis," ICASSP, '09.Wu;'08 - "Minimum generation error training with log spectral distortion for...," Interspeech, '08.
Toda;'08 - "Statistical approach to vocal tract transfer function estimation based on...," ICASSP, '08.
Oura;'08 - "Simultaneous acoustic, prosodic, and phrasing model training for TTS...," ISCSLP, '08.
Ferguson;'80 - "Variable duration models...," Symposium on the application of HMM to text speech, '80.
' " " '- ..., , , .Beal;'03 - "Variational algorithms for approximate Bayesian inference," PhD thesis, Univ. of London, '03.
Masuko;'03 - "A study on conditional parameter generation from HMM...," Autumn meeting of ASJ, '03.
Yu;'07 - "A novel HMM-based TTS system using both continuous HMMs and discrete...," ICASSP, '07.
References (4/4)
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
64/66
References (4/4)
an; - enera ng na ura ra ec ory w a ve rees, n erspeec , .
Latorre;'08 - "Multilevel parametric-base F0 model for speech synthesis," Interspeech, '08.Tiomkin;'08 - "Statistical text-to-speech synthesis with improved dynamics," Interspeech, '08.
Toda;'07 - "A speech parameter generation algorithm considering global...," IEICE Trans. E90-D(5), '07.
' - " " ' ..., , .
Toda;'09 - "Trajectory training considering global variance for HMM-based speech...," ICASSP, '09.
Saino;'06 - "An HMM-based singing voice synthesis system," Interspeech, '06.
Tsuzuki;'04 - "Constructing emotional speech synthesizers with limited speech...," Interspeech, '04.
Sako;'00 - "HMM-based text-to-audio-visual s eech s nthesis," ICSLP, '00.
Tamura;'98 - "Text-to-audio-visual speech synthesis based on parameter generation...," ICASSP, '98.
Haoka;'02 - "HMM-based synthesis of hand-gesture animation," IEICE Technical report,102(517), '02.
Niwase;'05 - "Human walking motion synthesis with desired pace and...," IEICE Trans. E88-D(11), '05.
Hofer;'07 - "Speech driven head motion synthesis based on a trajectory model," SIGGRAPH, '07.Ma;'07 - "A MSD-HMM approach to pen trajectory modeling for online handwriting...," ICDAR, '07.
Morioka;'04 - "Miniaturization of HMM-based speech synthesis," Autumn meeting of ASJ, '04.
Kim;'06 - "HMM-Based Korean speech synthesis system for...," IEEE Trans. Consumer Elec., 52(4), '06.
Klatt;'82 - "The Klatt-Talk text-to-speech system," ICASSP, '82.
Acknowledgement
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
65/66
Acknowledgement
Keiichi Tokuda would like to thank HTS working
rou members, includin Hei a Zen, Keiichiro
Oura, Junichi Yamagichi, Tomoki Toda, Yoshihiko
Nankaku, Kei Hahimoto, and Sayaka Shiota fortheir help.
-
8/12/2019 Speech Synthesis as a Statistical Machine Learning Problem Slides
66/66
HTS Slidesreleased by HTS Working Group
htt ://hts.s .nitech.ac. /
Copyright (c) 1999 - 2011
Department of Computer Science
Some rights reserved.
This work is licensed under the Creative Commons Attribution 3.0 license.
See http://creativecommons.org/ for details.