accent modeling an overview. 02/09/07icons group presentation2 prologue our initial effort ...

33
Accent Modeling An Overview

Upload: alberta-peters

Post on 29-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

Accent Modeling

An Overview

Page 2: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 2

Prologue

Our Initial Effort Enhancement of speaker recognition through

score level fusion of Arithmetic Harmonic Sphericity (AHS) and Hidden Markov Model (HMM) techniques

performance improvements of 22% and 6% true acceptance rate (at 5% false acceptance rate) on YOHO and USF multi-modal biometric datasets, respectively.

Page 3: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 3

Prologue…contd

Enhanced Recognition at various FARs (YOHO)

0102030405060708090

3 5False Acceptance Rate (%)

True

Acc

epta

nce R

ate (

%)))

))))

AHS

HMM

HF

Enhanced Recognition at Various FARs (USF data)

0

10

20

30

40

50

60

70

80

3 5False Acceptance Rate (%)

True

Acc

epta

nce R

ate (

%) (

%))

AHS

HMMHF

Page 4: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 4

Prologue – what next

Further improvement of recognition rate through speaker accent

Speaker accent will play a critical role in the evaluation of biometric systems, since users will be international in nature.

Incorporating accent model in the speaker recognition/verification system will be a key component that our study will focus on.

Page 5: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 5

Accent

What is accent The cumulative auditory effect of those features of

pronunciation which identify where a person is from regionally and socially.

Difference between accent and dialect Accent is the negative (or rather colorful) influence of

the first language (L1) of a speaker to a second language, while Dialects of a given language are differences in speaking style of that language (which all belong to L1) because of geographical and ethnic differences.

Page 6: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 6

Accent

Factors affecting the level of accent Age at which speaker learns the second

language. Nationality of speaker’s language instructor. Grammatical and phonological differences

between the primary and secondary languages.

Amount of interaction the speaker has with native language speakers.

Page 7: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 7

Applications of Accent Modeling

Accent knowledge can be used for selection of alternative pronunciations or provide information for biasing a language model for speech recognition.

Accent can be useful in profiling speakers for call routing in a call centre.

Document retrieval systems.Speaker recognition systems.

Page 8: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 8

Examples of Accent

- Native American English - Indian - Chinese - British - Japanese - Russian - Arabic - Greek

Page 9: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 9

World’s Major Languages

Page 10: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 10

Accent Classification System

Speech Data

(Training)

Extract Accent Features

Reference Accent Model 1

Speech Data

(Testing)

Extract Accent Features

Classification

Speech Data

(Training)

Extract Accent Features

Reference Accent Model N

Score

Page 11: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 11

Accent– Research Work M. V. Chan, et.al., "Classification of speech accents with neural

networks," IEEE World Congress on Computational Intelligence, vol.7, pp.4483-4486, 27 Jun-2 Jul 1994.

L. M. Arslan, “Foreign Accent Classification in American English,” Ph. D. Dissertation, Duke University, 1996.

C. Teixeira, I. Trancoso, and A. Serralheiro, “Accent identification,” In Proc. International Conference on Spoken Language Processing, vol.3, pp.1784-1787, 1996.

P. Fung and W.K. Liu, "Fast Accent Identification and Accented Speech Recognition," in Proc. ICASSP'99, vol.1, pp. 221-224, 1999.

T. Chen, et.al., "Automatic accent identification using Gaussian mixture models," ASRU '01, pp. 343- 346, 9-13 Dec. 2001.

P. Angkititrakul, J.H.L. Hansen, "Stochastic Trajectory Model Analysis for Accent Classification”, Inter. Conf. on Spoken Language Processing, vol. 1, pp. 493-496, Sept. 2002.

X. Lin, S. Simske, "Phoneme-less hierarchical accent classification," Signals, Systems and Computers, vol.2, pp. 1801-1804, 7-10 Nov. 2004.

Page 12: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 12

Research Work … Contd F. Farahani, et.al., "Speaker identification using supra-segmental pitch

pattern dynamics," in Proc. ICASSP‘04, vol.1, pp. I-89-92, 17-21 May 2004. M. M. Tanabian, et.al., "Automatic speaker recognition with formant

trajectory tracking using CART and neural networks," Canadian Conference on Electrical and Computer Engineering, pp. 1225- 1228, 1-4 May 2005.

S. Gray, J. H. L. Hansen, "An integrated approach to the detection and classification of accents/dialects for a spoken document retrieval system," ASRU '05, pp. 35- 40, 27 Nov-1 Dec. 2005.

P. Angkititrakul, J. H. L. Hansen, "Advances in Phone-based Modeling For Automatic Accent Classification," IEEE Transactions on Audio, Speech, and Language Processing, vol.14, pp. 634- 646, March 2006 .

K. Bartkova, D. Jouvet, "Using Multilingual Units for Improved Modeling of Pronunciation Variants," in Proc. ICASSP‘06, vol.5, pp. V-1037- V-1040, 14-19 May 2006.

A. Ikeno, J. H. L. Hansen, "Perceptual Recognition Cues in Native English Accent Variation: "Listener Accent, Perceived Accent, and Comprehension,” in Proc. ICASSP‘04, vol.1, pp. I-401- I-404, 14-19 May 2006.

Page 13: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 13

Accent Classification Tree

Speech Dataset

Accent Features:

Modeling:

Classification/Decision

Pitch

Stochastic Trajectory ModelsArtificial Neural Networks

Gaussian Mixture Models Hidden Markov Models

Formant Trajectories

Energy Delta MFCCs

MFCCs Formants

Page 14: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 14

Foreign Accent Classification in American English - Dataset

Dataset consists of neutral American English, German, Spanish, Chinese, Turkish, French, Italian, Hindi, Rumanian, Japanese, Persian and greek accents.

All speech was sampled at 8000 Hz

Totally, 43 speakers used microphone input and 68 speakers used telephone input, in a quiet office environment.

Page 15: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 15

Formant Frequency Analysis

Formants represent those frequencies which encompass the majority of the acoustic energy from source to output with an acoustic tube model as the system.

Second and Third formants are particularly favorable for accent classification

Page 16: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 16

Mel Scale Vs Accent Scale

Page 17: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 17

Accent Classifier

The features consisted of 8 dimensional ASCCs, energy along with their delta features.

The IW-FS, CS-FS, and CS-PS classified with 74.5%, 61.3%, and 68.3% respectively.

Using a test word count of 7-8 words, accent classification accuracy among 4 accents is 93%.

Page 18: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 18

Computer Vs Humans

Page 19: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 19

Conclusions about specific features

Word-final stop release time is longer among foreign accents

Slope of intonation contour for isolated words is more negative for Chinese speakers, and more positive for German speakers than native speakers

Voice onset time for unvoiced stops is not a significant contributor for accents considered in this study.

Second and third formant positions are different for native and non native speakers.

Page 20: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 20

Accent Classification/Detection using ANN

Demographic data including speaker’s age, percentage of time in a day when English used as communication and the number of years English was spoken were used as features, along with speech features: average pitch frequency and averaged first three formant frequencies were given as inputs to the neural network.

A dataset of 10 native and 12 non-native speakers were used. F2 and F3 distributions of native and non-native groups show

high dissimilarity. Three neural network classification techniques namely

competitive learning, counter propagation and back propagation were compared.

Back propagation gave a detection rate of 100% for training data and 90.9% for testing data.

Page 21: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 21

Phoneme less Hierarchical Accent Classification

WSJCAM0 & TIDIGITS were used to train British and American accents respectively.

IViE & Voicemail were used to test British and American accents respectively.

13 dimensional MFCCs were used as features and 64-component Gaussian Mixture Model was used for modeling.

Page 22: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 22

Results show an average 7.1% error rate reduction relatively when compared to direct accent classification.

Page 23: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 23

Accent Classification Application

Page 24: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 24

Advances in Phone Based Modeling

Conventional HMMs assumes that the sequence of features are produced by a piecewise stationary process.

Hidden Markov Modeling assumes that adjacent frames are acoustically uncorrelated.

Also that the state dependant duration distributions are exponentially decreasing.

Page 25: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 25

Why Phone Based Modeling?

Capturing the temporal variation of acoustic signal is an important aspect of speech recognition.

A better framework for modeling the evolution of the spectral dynamics of speech

Flexibility and power due to whole segment classification, in contrast to frame by frame classification

Page 26: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 26

Trajectories of the phoneme sequence /aa/ - /r/ from the word

‘Target’

Page 27: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 27

Stochastic Trajectory Model An STM represents the acoustic observations of a

phoneme as clusters of trajectories in a parametric space.

If X is a sequence of N points :

Where each point is a D-dimensional vector,X is obtained by resampling a sequence of d frames along

the linear time scale.

0 1 1( , ,..., )NX x x x

Page 28: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 28

Stochastic Trajectory Model The resampled N-Frame vector vector X is considered to be

underlying trajectory of the original X with d frames. The pdf of a segment X given a duration d and the segment symbol s is:

Where is the set of all trajectory components associated with

, is the probability of observing trajectory , given that the segment is , with the constraint that

is the pdf of the vector sequence X, given component trajectory , duration , symbol .

( | ) 1,kk Ts

pr t s s

( | , ) ( | , , ) ( | )k kt Tk s

p X d s p X t d s pr t s

sT

s( | )kpr t s kt

s

( | , , )kp X t d s

kt d s

Page 29: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 29

Stochastic Trajectory Model

The distribution assigned to each of the samples points on a trajectory is characterized by a multivariate Gaussian distribution with a mean vector , and covariance matrix . With the assumption of frame independent trajectories, the pdf is modeled as,

The training algorithm performs maximum likelihood estimation of the parameters of the gaussian distribution.

,sk im ,

sk i

1

, ,0

( | , , ) ( ; , )N s s

k k i k ii

p X t d s Gaussian X m

Page 30: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 30

Accent Classification System

Page 31: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 31

Performance – Male and Female Chinese vs American-English

Page 32: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

02/09/07 iCONS Group Presentation 32

Further Investigation

Further study of accent classification and detection.

Study of accent in a linguistic point of view.

Experimentation and formulation of accent modeling and classification.

Combination of Accent information with my previous work to achieve speaker recognition enhancement.

Page 33: Accent Modeling An Overview. 02/09/07iCONS Group Presentation2 Prologue  Our Initial Effort  Enhancement of speaker recognition through score level

Questions

Thank You