04/08/04 why speech synthesis is hard chris brew the ohio state university
TRANSCRIPT
![Page 1: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/1.jpg)
04/08/04
Why Speech Synthesis is Hard
Chris Brew
The Ohio State University
![Page 2: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/2.jpg)
04/08/04
Issues for text-to-speech
It should sound like a person AND should sound like a person who can
read AND it should sound like a person who
understands what they are reading
![Page 3: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/3.jpg)
04/08/04
Credits
FESTIVAL: Alan W. Black, Paul Taylor, Simon King, Kevin Lenzo
Huang, Acero and Huang: Spoken Language Processing
Many web-based demos– http://www.ims.uni-stuttgart.de/~moehler/
synthspeech/examples.html– http://www.icsi.berkeley.edu/eecs225d/klatt.html
![Page 4: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/4.jpg)
04/08/04
Text-to-speech
Text and Phonetic Analysis: What to say Prosody: How to say it Waveform synthesis: Making it sound right
![Page 5: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/5.jpg)
04/08/04
Text and phonetic processing
Homographs Letter-to-sound Abbreviations
![Page 6: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/6.jpg)
04/08/04
Prosody
Pauses Pitch Speech rate/ relative duration
![Page 7: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/7.jpg)
04/08/04
Waveform generation
Articulatory Synthesis – Simulation of mechanics of speech production
Formant Synthesis– Source/filter model.
Concatenative synthesis– Limited domain waveform concatenation– No waveform modification– With waveform modification
![Page 8: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/8.jpg)
04/08/04
Waveform generation
Use linear predictive coding to analyse signal into filter and residual, then excite with appropriate residual. Main benefit, compression.
![Page 9: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/9.jpg)
04/08/04
One slide of speech acoustics
Formants - bands of strong energy in the speech signal
Spectrogram - representation of relation between time (x), frequency (y) and intensity
The speech organs consist of a noise source and some resonant cavities. We speak by changing the shape of the cavities, making some parts of the source come out strong, others weaker.
![Page 10: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/10.jpg)
04/08/04
Sound like a person
Get a person to record whole vocabulary, then splice together the words to make sentences.
But: speech is hard to cut up in such a way that it sews back together nicely.
![Page 11: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/11.jpg)
04/08/04
Sound like a person who can read
Grapheme to phoneme conversion. Input: text Output: phoneme string + annotations for
stress and intonation. Spelling rules get you some of the way, but
even in languages with regular spelling (English not among these) exceptions require the use of a dictionary.
![Page 12: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/12.jpg)
04/08/04
Text Normalization
Henry V Part I, Act II scene 11, Mr. X is, I believe V.I. Lenin and not Charles I.
![Page 13: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/13.jpg)
04/08/04
Specialized text types
Smith,Bobbie Q,3337 St Laurence St, Fort Worth,TX 71611-5484 (817) 839-3689
Anderson, W, 445 Sycamore Way NE, Lincoln, NE 98125-5108,(212)404-9998
Raw
Address
![Page 14: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/14.jpg)
04/08/04
SABLE
See rinss-slides
![Page 15: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/15.jpg)
04/08/04
Sound like you understand
Lexical stress and intonation matter very much, and tie in with pragmatics.
The system doesn’t in fact understand enough to get this right.
Best you can do is fake it. There are lots of cues available in the text, but mistakes are inevitable.
![Page 16: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/16.jpg)
04/08/04
Rumpke Advert
Rhetorical Systems
Definitely wrong
Possibly good enough
![Page 17: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/17.jpg)
04/08/04
Multilingual and flexible
Festival is open-architecture, and has been extended by lots of people
It can even (easily) be made to speak in your voice.
![Page 18: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/18.jpg)
04/08/04
Prosody
![Page 19: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/19.jpg)
04/08/04
Boston
It will be rainy today in Boston
![Page 20: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/20.jpg)
04/08/04
Challenges for speech synthesis
Improve overall speech quality Refine ways of organizing and collecting
speech databases Improve the quality of the control signal
![Page 21: 04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University](https://reader036.vdocuments.site/reader036/viewer/2022062407/56649e245503460f94b12588/html5/thumbnails/21.jpg)
04/08/04
Sounds