nixon patel and kishore prahallad bhrigus inc. hyderabad, india iiit hyderabad, india

21
1 SSML Extensions for TTS in SSML Extensions for TTS in Indian Languages Indian Languages II workshop on Internationalizing SSML II workshop on Internationalizing SSML 30-31 May 2006, Greece 30-31 May 2006, Greece Nixon Patel and Kishore Prahallad Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India IIIT Hyderabad, India

Upload: yetta

Post on 03-Feb-2016

69 views

Category:

Documents


0 download

DESCRIPTION

SSML Extensions for TTS in Indian Languages II workshop on Internationalizing SSML 30-31 May 2006, Greece. Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India. Topics. About Bhrigus Collaborative Efforts between Bhrigus and IIIT Hyderabad - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

11

SSML Extensions for TTS in Indian SSML Extensions for TTS in Indian LanguagesLanguages

II workshop on Internationalizing SSML II workshop on Internationalizing SSML 30-31 May 2006, Greece30-31 May 2006, Greece

Nixon Patel and Kishore PrahalladNixon Patel and Kishore Prahallad

Bhrigus Inc. Hyderabad, IndiaBhrigus Inc. Hyderabad, India

IIIT Hyderabad, IndiaIIIT Hyderabad, India

Page 2: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

2© Copyright 2006, Bhrigus Software Private Limited.

About Bhrigus

Collaborative Efforts between Bhrigus and IIIT Hyderabad

Nature of Indian language scripts – convergence and divergence

Issues across TTS rendering in all these languages

Proposed solutions/tags:

Syllable Element

Alien Element

Dialect Element

Topics

Page 3: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

3© Copyright 2006, Bhrigus Software Private Limited.

Bhrigusvoice & data solutions

http://www.bhrigus.com

Page 4: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

4© Copyright 2006, Bhrigus Software Private Limited.

Established : 2002

Business : Providing IVR, Speech &

Enterprise solutions to BFSI,

Telco’s, contact centers &

manufacturing companies.

Key Customers : Hewitt Associates,

AT&T, Pfizer, Merrill Lynch,

Union pacific railroad, CDIA,

South western energy,

Orange county, Stryker

SEI CMM Level 4 Process Implementation undergoing, ISO 9001: 2000 – KPMG certified.

About Bhrigus

Page 5: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

5© Copyright 2006, Bhrigus Software Private Limited.

Playing a leadership role in the development of ASR and TTS for all official Indian languages to provide voice solutions for Indian market

Collaborations: IIIT Hyderabad, & Carnegie Mellon University 10 member team + board of advisors

3 PhDs and 4 Masters Synthesis team, Recognition team, Linguist team and Language

resources team

Initiating SSML and VXML chapters in India

Speech and Language Technology Lab @ Bhrigus

Page 6: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

6© Copyright 2006, Bhrigus Software Private Limited.

Bhrigus Inc. Hyderabad – Voice based solution providers

IIIT Hyderabad – one of the leading universities in India doing speech research

Telugu TTS – Collaborative Efforts between Bhrigus Inc. and IIIT

Goal: Develop ASR and TTS for all official Indian languages

Collaborative Efforts

Page 7: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

7© Copyright 2006, Bhrigus Software Private Limited.

Basic units of the writing system are Aksharas

An Akshara is an orthographic representation of a speech sound

Akshara is syllabic in nature, typical forms are V, CV, CCV and CCCV (C – consonant, V – vowel)

Always ends with a vowel (or nasalized vowel) in written form

~1652 dialects/native languages

22 languages officially recognized

Nature of Indian Language (IL) Scripts

Page 8: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

8© Copyright 2006, Bhrigus Software Private Limited.

Aksharas are syllabic in nature

Common phonetic base

Share a common set of speech sounds across all languages

Fairly good (though not exact) correspondence between sequence of Aksharas and the corresponding sequence of sounds

Often referred to as Letter-to-sound rules

Written from left-to-right as in European languages

Words are separated by space as in European languages

Convergence of IL Scripts

Page 9: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

9© Copyright 2006, Bhrigus Software Private Limited.

Each IL has its own script

All IL share a common phonetic base – however, Phonotactics in each IL are different from each other

IL are non-tonal languages unlike eastern languages such as Chinese

Divergence of IL Scripts

Page 10: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

10© Copyright 2006, Bhrigus Software Private Limited.

Unicode

Useful for *rendering* the Indian language scripts

Not suitable for keying-in through QWERTY key board

Not suitable to build modules such as text-normalization (can’t see the Unicode characters on many editors)

Itrans-3 / OM - A transliteration scheme by IISc Bangalore, India and Carnegie Mellon University

Useful for *keying-in and store* the scripts of Indian language using QWERTY keyboards

Useful for processing and writing modules/rules for letter-to-sound, text normalization etc.

How to represent Indian language Scripts

Page 11: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

11© Copyright 2006, Bhrigus Software Private Limited.

Itrans-3 / OM Notation

Page 12: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

12© Copyright 2006, Bhrigus Software Private Limited.

Developed from the user readability aspects – Easier to read and type

It is case-insensitive.

This scheme is phonetic in nature, the characters corresponds to the actual sound that is being spoken.

Thus a single transliteration scheme is used for all the Indian languages, as they share the same set of sounds.

Each character (corresponding to a phone/sound) is not more than three letters length.

Adapted across Universities in India/Abroad and some industrial labs such as Bhrigus Inc.

Why Itrans-3/OM?

Page 13: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

13© Copyright 2006, Bhrigus Software Private Limited.

TTS should be able to pronounce words as Akshara (syllable) by Akshara (syllable)

Languages have heavy influence of English (alien) words

Alien words occur in between the sentences

Each language has its own dialect

Issues in TTS rendering in IL

Page 14: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

14© Copyright 2006, Bhrigus Software Private Limited.

<phoneme alphabet="itrans-3" ph="n aa t oo"> naatoo </phoneme>

Ph attribute specifies phoneme/phone string

Rendering “n” “aa” “t” “oo” individually does not make sense to the native speakers of Indian languages

Sounds needs to be rendered in terms of syllables

SSML Tag: Phoneme Element <phoneme>

Page 15: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

15© Copyright 2006, Bhrigus Software Private Limited.

<syllable alphabet="itrans-3" syl="naa too"> naatoo </syallable>

Render “naa” and “too” which are Aksharas (syllables)

Syllable Element <syllable>

Page 16: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

16© Copyright 2006, Bhrigus Software Private Limited.

Informal experiments suggested 33% of errors of TTS of IL occur while rendering alien (non-native) words

Such alien words could be automatically detected due to syllabic properties of the Indian languages

Motivation for Loan Word <alien>

Page 17: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

17© Copyright 2006, Bhrigus Software Private Limited.

BANK has to be pronounce as /B/ /AE/ /N/ /K/

/AE/ phoneme does not exist in Indian language phone set

<alien> baank </alien>

Alien (non-native) words could be rendered using different pronunciation dictionaries or letter-to-sound rules

Example of loan word

Page 18: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

18© Copyright 2006, Bhrigus Software Private Limited.

Each language has its own dialect

TTS should be able to handle dialects without unloading the language resources

Dialect Element <dialect>

Page 19: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

19© Copyright 2006, Bhrigus Software Private Limited.

<?xml version="1.0"?><speak version="1.0" xml:lang="tel-in">

<voice gender="female"> <dialect name = “andhra”> yekkad’iki vel’laali

</dialect> <dialect name = “telengana” pro = “yaad’iki

poovaale”> yekkad’iki vel’laali </dialect> </voice></speak>

Dialect Element <dialect>

Page 20: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

20© Copyright 2006, Bhrigus Software Private Limited.

Bhrigus Inc. Hyderabad taking lead position to develop ASR and TTS for Indian languages

Proposed <syllable> <alien> <dialect> elements for SSML extensions

Conclusions

Page 21: Nixon Patel and Kishore Prahallad Bhrigus Inc. Hyderabad, India IIIT Hyderabad, India

21© Copyright 2006, Bhrigus Software Private Limited.

ReferencesReferences

1.1. Prahallad Lavanya, Prahallad KishorePrahallad Lavanya, Prahallad Kishore and GanapathiRaju and GanapathiRaju Madhavi, Madhavi, A Simple Approach for Building Transliteration A Simple Approach for Building Transliteration Editors for Indian LanguagesEditors for Indian Languages, Journal of Zhejiang , Journal of Zhejiang University Science, vol.6A, no.11, pp. 1354-1361, Oct University Science, vol.6A, no.11, pp. 1354-1361, Oct 2005.2005.