Řízení prozodie řečového syntezátoru václav Šebesta, Ústav informatiky av Čr, v.v.i....

24
Řízení prozodie řečového syntezátoru Václav Šebesta, Ústav informatiky AV ČR, v.v.i. [email protected] MFF UK 11.12.2009

Upload: ramon-marner

Post on 14-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Řízení prozodie řečového syntezátoru

Václav Šebesta, Ústav informatiky AV ČR, v.v.i.

[email protected]

MFF UK 11.12.2009

MFF UK 11.12.2009

Content of contribution

1. Introduction

2. Speech synthesizer

3. Phonetic and phonologic features of speech units

4. GUHA method

5. Sensitivity approach

6. Weight approach

7. Combination of above mentioned approaches

8. Experimental results

9. Listening tests

MFF UK 11.12.2009

• The text-to-speech synthesizer of the Czech language The text-to-speech synthesizer of the Czech language is based on the is based on the concatenation of the elementary speech concatenation of the elementary speech unitsunits (phonemes, diphones or triphones)(phonemes, diphones or triphones). .

• The speech output from such a synthesizer, The speech output from such a synthesizer, represented by represented by fundamental frequencyfundamental frequency F F00 and duration and duration DD

of each of each speech unitspeech unit, is monotonous and sounds very , is monotonous and sounds very synthetically.synthetically.

• The naturalness of speech is considerably dependent on The naturalness of speech is considerably dependent on the implementation of the prosodic features.the implementation of the prosodic features.

• In this contribution In this contribution one can seeone can see severalseveral possible ways possible ways for for selection of suitable parameters for the training of Fselection of suitable parameters for the training of F

0 0

and Dand D and we will compare these ways.and we will compare these ways.

Introduction

MFF UK 11.12.2009

Phonetic transcription

Search of F0 in databaseDatabase of speech units

Speech production

FFUN

F0 PROS

Duration

Prosody control speech

Written text

F0

MFF UK 11.12.2009

Speech signal Database

Segmentation to speech units

F0, D(target values)

TextDatabase

SegmentationTranscription

Phonetic andphonological

properties

Input data for ANN

SegmentationTranscription

Synthesizer

Input text

ANNF0

D

Speech

Training of ANN

MFF UK 11.12.2009

Speech signal Database

Segmentation to speech units

F0, D(target values)

TextDatabase

SegmentationTranscription

Phonetic andphonological

properties

Input data for ANN

SegmentationTranscription

Synthesizer

Input processingtext

ANNF0

D

Speech

Prosody control by ANN

MFF UK 11.12.2009

P1 P11 P21 Identifikace mezery

P2 P12 P22 Identifikace přízvučné jednotky

P3 P13 P23 Identifikace středu slabiky

P4 P14 P24 Identifikace oddělovacího znaménka

P5 P15 P25 Identifikace fonému

P6 P16 P26 Výška samohlásky

P7 P17 P27 Délka samohlásky

P8 P18 P28 Znělost souhlásky

P9 P19 P29 Způsob vytváření souhlásky

P10 P20 P30 Počet řečových jednotek ve slově

Fonetické a fonologické vlastnosti řečových jednotek a vliv jejich kontextu

Pruning according to the Pruning according to the GUHA Method.GUHA Method. • The GUHA method may be used for the determination of relations The GUHA method may be used for the determination of relations

in experimental data. The processed data form a rectangular binary in experimental data. The processed data form a rectangular binary matrix of atmatrix of atttribribuutes, where the rows correspond to the different tes, where the rows correspond to the different objectsobjects (speech units) (speech units). .

• Columns correspond to the different investigated parameters Columns correspond to the different investigated parameters ((phonetic and phonological parameters, e.g.phonetic and phonological parameters, e.g. type of phonemetype of phoneme, , length of world, pause identificationlength of world, pause identification, , stress unit identificationstress unit identification, etc.), etc.). .

SS Non Non SS

AA aa bb a + b a + b

Non ANon A cc dd c + dc + d

a + ca + c     b + db + d n = a + b + c + dn = a + b + c + d  

MFF UK 11.12.2009

GUHA MethodGUHA Method

Quantifier FIMPLEQuantifier FIMPLE (Found almost implication quantifier) is valid iff (Found almost implication quantifier) is valid iff

a / (a + b ) a / (a + b ) FBOUND.FBOUND.

Quantifier Quantifier LLIMPLEIMPLE ( (Lower criticalLower critical implication quantifier) is valid iff implication quantifier) is valid iff

ppii (1 – (1 –

p)p)a+b-ia+b-i << LBOUND LBOUND

Quantifier FISCHERQuantifier FISCHER (Fischer test) is valid iff a (Fischer test) is valid iff a a aminmin, a . d > b . c , a . d > b . c and and   

a + c

i

b + d

a + b – i

n

a + b i = a

min (a + b, a + c)

Fisch

a + b

i

MFF UK 11.12.2009

GUHA MethodGUHA Method

It is possible to determine different levels of hypotheses, It is possible to determine different levels of hypotheses, e.g.:e.g.:

• „„weak hypotheses“ when p = 0.8, LBOUND = weak hypotheses“ when p = 0.8, LBOUND = 0.5, a0.5, aFisch Fisch = 10= 10-20-20

• „„stronger hypotheses“ when p = 0.9, LBOUND stronger hypotheses“ when p = 0.9, LBOUND = 0.1, a= 0.1, aFisch Fisch = 10= 10--2525

• „„strong hypotheses“ when p = 0.9strong hypotheses“ when p = 0.955, LBOUND = , LBOUND = 0.0.001, a1, aFisch Fisch = 10= 10-30-30

• „„extremely strong hypotheses“ when p = 0.extremely strong hypotheses“ when p = 0.9999, , LBOUND = 0.LBOUND = 0.0101, a, aFisch Fisch = 10= 10--3535

MFF UK 11.12.2009

MFF UK 11.12.2009

Pruning according to the sensitivity approach

The determination of input parameters, which can be omitted according to the comparison of sums of absolute values of output signals derivatives :

k

n

xknnk x

yxySS

k

0

lim,

N

nkn

kprunedkn xySxyS

1_ ),(min,

MFF UK 11.12.2009

Standard pruning approach according to the weights

The determination of input parameters, which can be omitted according to the comparison of sums of absolute values of weights :

iik

kprunedk wsum ||min_

all inputs of neuronneuron

MFF UK 11.12.2009

Combination of above mentioned 3 approaches:

If some attributes of proposed ANN (for F0 and D) for 2 from 3 approaches give recommendation for pruning, then input parameter have been pruned.

We can prune the ANN by a lot of known methods:• Mozer, Smolensky (1989) „sensitivity calculation“• Karnin (1990) „sensitivity of error function“• Le Cun et all (1990) „saliency“• Chauvin (1989) „penalty terms“• Weigend, Rumelhard, Hubermann (1991)• Sietsma, Dow (1991) „heuristics“• Šebesta (1993) „statistical derivatives“

## Parameter Parameter CriterionCriterion GUHAGUHA Sensitivity Sensitivity

WeightsWeights

P1P1 Silent pause identificationSilent pause identification ++   

P2P2 Stress unit identificationStress unit identification X +X + X +X + X +X +

P3P3 Syllable nucleus identificationSyllable nucleus identification X +X +    X +X +

P4P4 Punctuation mark identificationPunctuation mark identification ++ X +X +   

P5P5 Phoneme identificationPhoneme identification       X +X +

P6P6 Height of vowelHeight of vowel    X +X + X +X +

P7P7 Length of vowelLength of vowel XX      

P8P8 Voice of consonantVoice of consonant X +X +      

P9P9 Creation mode for consonantCreation mode for consonant X +X + ++

P10P10 Number of phonemesNumber of phonemes    X +X + X +X +

MFF UK 11.12.2009

X..... F0 +.......D can be omitted

## Parameter Parameter CriterionCriterion GUHAGUHA Sensitivity Sensitivity

WeightsWeights

P1P111 Silent pause identificationSilent pause identification    X +X + X +X +

PP1122 Stress unit identificationStress unit identification       X +X +

PP1133 Syllable nucleus identificationSyllable nucleus identification    X +X + ++

PP1144 Punctuation mark identificationPunctuation mark identification X +X + X +X + X +X +

PP1155 Phoneme identificationPhoneme identification         

PP1166 Height of vowelHeight of vowel    X +X + X +X +

PP1177 Length of vowelLength of vowel    XX X +X +

PP1188 Voice of consonantVoice of consonant         

PP1199 Creation mode for consonantCreation mode for consonant    X +X +   

PP2200 Number of phonemesNumber of phonemes         

MFF UK 11.12.2009

X..... F0 +.......D *........common can be omitted

## Parameter Parameter CriterionCriterion GUHAGUHA Sensitivity Sensitivity

WeightsWeights

PP2211 Silent pause identificationSilent pause identification    X +X +   

PP2222 Stress unit identificationStress unit identification         

PP2233 Syllable nucleus identificationSyllable nucleus identification ++ ++   

PP2244 Punctuation mark identificationPunctuation mark identification X +X + X +X +   

PP2255 Phoneme identificationPhoneme identification         

PP2266 Height of vowelHeight of vowel XX XX

PP2277 Length of vowelLength of vowel XX ++ X +X +

PP2288 Voice of consonantVoice of consonant    + + X +X +

PP2299 Creation mode for consonantCreation mode for consonant       X +X +

PP3300 Number of phonemesNumber of phonemes    X +X + X +X +

MFF UK 11.12.2009

X..... F0 +.......D *........common can be omitted

 

 

 

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Sensitivities of F0 (black), D (gray) and total sensitivity (white)

MFF UK 11.12.2009

Comparison of the pruning methods for F0

0

1

2

3

4

5

6

7

8

9

v1b1tr v6b1tr v5b2tr v5b16ts v8b16ts v9b16ts suma

sentences

erro

rs

|tar-k0| |tar-w| |tar-s| |tar-G| |tar-c|

MFF UK 11.12.2009

MFF UK 11.12.2009

0

20

40

60

80

100

120

140

sentence 5 sentence 8 sentence 10 sentence 20 sentence 21 sentence 23 suma-test suma

30-25-2 20-25-2 19-25-2 18-25-2 17-25-2

The histogram of the numbers of sentences with the best prosody in dependency on the number of input neurons

MFF UK 11.12.2009

Sentence: Sentence: Repeat entire operation once again! Repeat entire operation once again! ((test sentencetest sentence))

F0 contour

0

50

100

150

200

250

phonemes

F0

[Hz]

rules target K0 K4-3

MFF UK 11.12.2009

Sentence: Sentence: Really do you want to finish? Really do you want to finish? ((training training sentencesentence))

Duration contour

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

phonemes

du

rati

on

rules target K0 K4-3

MFF UK 11.12.2009

Comparison of the pruning methods for D - sentence v1b1 - trein

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

0,18

0,21 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33

speech units

D [

s]

target k0 weight sens guha22 comb

Listening test Listening test ofof synthetic speech synthetic speech

1. Prosody controlled by non-pruned neural net.2. Prosody controlled by pruned neural net trained

according the weights.3. Prosody controlled by pruned neural net trained

according the GUHA method.4. Prosody controlled by pruned neural net trained

according the sensitivities.5. Prosody controlled by optimal values F0 and

durations.6. Direct speech.

MFF UK 11.12.2009

MFF UK 11.12.2009

1 2 3 4 5 6

1. Weather forecasting for the night and tomorrow. Předpověď počasí na noc a zítřek.

2. Eastern wind from two to five meters per second.

Východní vítr dva až pět metrů za sekundu.

3. Pressure tendency: week increase, afternoon week decrease.

Tlaková tendence: slabý vzestup, odpoledne slabý pokles.

4. Maximal value of ultra-violet index is five point eight.

Maximální hodnota UV-indexu pět celých osm.

5. There was the news of the Czech radio one-Radiojournal.

To byly zprávy Českého rozhlasu 1 – Radiožurnálu.