towards synthesis of focus in mandarin text-to-speech system dr. dezhi huang...

24
Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG [email protected] SNLP Unit, FTRD Beijing 2005/11/2 V1.1

Upload: ophelia-cummings

Post on 12-Jan-2016

258 views

Category:

Documents


40 download

TRANSCRIPT

Page 1: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

Towards Synthesis of Focus in Mandarin Text-to-speech System

Dr. Dezhi HUANG

[email protected] Unit, FTRD Beijing

2005/11/2 V1.1

Page 2: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

2

Table of Contents

11 Synthesis of focus

22 Proposal for SSML

33 Examples with <focus>

Page 3: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

3

Human has the strong ability of information reconstruct

Evidence from music perception

The “Butterfly Lovers” violin concerto

Page 4: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

4

Human has the strong ability of information reconstruct (Cont.)

Evidence from human vision

Page 5: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

5

Application model of Mandarin Text-to-speech (Cont.)

Spoken dialog system

PSTN/Wireless

PSTN/Wireless

Mandarin Voice-enabled

Service Gateway

Mandarin Voice-enabled

Service Gateway

Mandarin TTS

Engine

Mandarin TTS

Engine

Information query by the side of road

Angry Environment Noise

Environment Noise

Page 6: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

6

Why we fail?

The important content is not prominent as we expect

Weaken the background noise (Noise reduction)

Improve the prominence of information that we need

Utilizing the human ability of information reconstruct

Page 7: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

7

What do we need in speech communication?

The key information is always contained in a phrase/word in a sentence

Have you always seen Prof. Zhao? No, I saw him only once.

The container of key information is called the focus.

The semantic centre of a sentence

Page 8: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

8

The value of synthesis of focus

It is helpful for

Analyzing the syntactic of sentence

Understanding the meaning of utterance

Capturing the turn-taking

Comprehending the attempt and emotion of speaker

Improve the acceptance of TTS

Page 9: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

9

Key challenges in synthesis of focus

Difficult to locate a focus in a sentence Some focuses can be found from the syntactic structure

明天你准备去买什么?我要去买红色的帽子。

The other focuses are decided by the context of a sentence

老王去年退休了。 老王去年退休了。 老王去年退休了。

Lack of appropriate acoustic model to realize a focus

Pitch accent Duration Energy Pause Weakness

Markup Language for Focus

Page 10: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

10

Table of Contents

11 Synthesis of focus

22 Proposal for SSML

33 Examples with <focus>

Make the synthesized speech clear

Improve the validity of speech communication with TTS

Page 11: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

11

What is SSML?

It is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications

Natural Language

Processing and Understanding

Natural Language

Processing and UnderstandingSpeech SynthesisSpeech Synthesis

SSML

Page 12: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

12

<EMPHASIS> in SSML

The emphasis element requests that the contained text be spoken with emphasis (also referred to as prominence or stress)

Level: strong, moderate and none

For synthesizer, it is easy to know which word has sentence stress

老王买了车。 老王买了车。

Page 13: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

13

The proposed <focus> element

The focus element indicates that the contained text be the semantic centre and the carrier of important information of a sentence

In the perspective of pragmatics

Contrastive focus (also referred to as identificational focus)

Informational focus (also referred to as the presentational focus, natural focus)

Page 14: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

14

Samples of focus

(1) 你经常见赵教授吗? 我见过他一次。 (2) 昨天老张干什么了? 昨天老张去看病。 (3) 是老张帮我修了车。 (4) 他连我也不相信。 (5) 他经常和我打球。 (6) 他居然卖了房子。 (7) 我们去钓鱼吧。

Page 15: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

15

A focus in Mandarin is not one-to-one corresponding with an emphasis

Most of focuses are realized by stresses

是老张退休了。

明天最高气温多少度?明天最高气温 30度。

Some of them are realized by pause or intonation

你常常见赵老师吗?我见过他一次。

我们下象棋吧。

Page 16: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

16

Differences between focus and emphasis

Focus is the concept of semantics and pragmatics

We can mark the focus up without speech signal

国家工商总局昨天发出紧急通知强调,全国大中城市、边境地区、发生过疫情的地区、养殖大省四类区域必须建立健全禽类产品“挂牌经营”制度,市场内禽类产品要标明禽类生产地、动物检验检疫证明及销售承诺。

Emphasis is the concept of psychoacoustics

The consistency of emphasis label is relatively difficult to achieve without speech signal

Page 17: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

17

Differences between focus and emphasis (Cont.)

Focus always carries the purpose of utterance

We can know exactly what the sentence means

Emphasis is not directly linked to the purpose of utterance

The emphasized word may be trivial

黄菊强调,认真学习贯彻五中全会精神,继续推进国有商业银行改革。

他经常和我打球。

Page 18: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

18

What can we benefit from focus labeling?

Improve the intelligibility of synthesized speech, especially in communication environment with noise

Q: 明天最晚一班到北京的飞机是几点?

A: 在晚上 9 点钟有一班 CZ8071 的飞机飞往北京。

Q: 几点钟?

A: 是 9 点。

Q: 哪一班?

A: 是 CZ8071 。

Page 19: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

19

What can we benefit from focus labeling? (Cont.)

focus labeling can be directly applied to text information processing

The next generation of search engine should need to know

which is the topic of a paragraph which are the focuses of a sentence

Text highlight is important step for information retrieval

Keywords in automatic digest are always the focuses

Page 20: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

20

Table of Contents

11 Synthesis of focus

22 Proposal for SSML

33 Examples with <focus>

<focus> indicates what is semantic centre

<focus> solves the problem of focus location

Page 21: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

21

Attributes of <focus>

Type

informational

contrastive

Method

StrongStress ModerateStress None Pause Intonation

Page 22: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

22

Samples of <focus>

(1) 你经常见 <focus type=“informational” method=“StrongStress ”> 赵教授 </focus> 吗?

我见过他 <focus type=“informational” method=“Pause”>一次 </focus> 。

(2) 昨天老张干什么了?

昨天老张 <focus type=“informational” method=“ModerateStress ”>去看病 </focus> 。

(3) 是 <focus type=“contrastive” method=“StrongStress ”>老张 </focus> 帮我修了车。

Page 23: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

23

Samples of <focus> (Cont.)

(4) 他连 <focus type=“contrastive” method=“StrongStress ”> 我 </focus> 也不相信。

(5) 他经常 <focus type=“informational” method=“Pause”>和我打球 </focus> 。

(6) 他居然 <focus type=“informational” method=“ModerateStress ”>卖了房子 </focus> 。

(7) 我们 <focus type=“informational” method=“Intonation ”>去钓鱼 </focus> 吧。

Page 24: Towards Synthesis of Focus in Mandarin Text-to-speech System Dr. Dezhi HUANG dezhi.huang@francetelecom.com.cn SNLP Unit, FTRD Beijing 2005/11/2 V1.1

24

Thank you!