  • 1. Analysis, Modelling and Synthesis ofBritish, Australian and AmericanAccents Qin YanSaeed Vaseghi Multimedia Communication Signal processing Lab Department of Electronic and Computer Engineering Brunel University Supported by EPSRC

2. Content

  • 1- Introduction toPhonetics and Acoustics of Accents
  • 2-Research Issues in Modelling Acoustics of Accents of English
  • 3- Current Research Problems
  • 4- Accent Analysis and Models
  • 5- Accent Morphing
  • 6- Audio Demo


  • 1.1 Background
  • Accents are acoustic manifestations of differences in pronunciation and intonations by a community of people from a national, regional or a socio-economic grouping.
  • Accents are dynamic processes in that they evolve over time influenced by large-scale immigration, socio-economic changes and cultural trends.
  • Applications of accent models include:
  • - speech recognition,
  • - text to speech synthesis,
  • - voice editing,
  • - accent morphing in broadcasting and films,
  • - toys and computer games,
  • - accent coaching, education.

1.Introduction toPhonetics and Acoustics of Accents

  • The importance of an accent feature depends on its distance from that of the standard or received pronunciation and the frequency with which that feature occurs in the acoustics of speech .
  • 1.2 Basic Structure of Accents
  • Generally the structural differences between accents can be divided into two broad parts:
  • (a) Differences in phonetic transcriptions.
  • (b) Differences in acoustics correlatesand intonations of accents .


  • 1.3 Phonetics of Accents
  • A dominantaspectof accents is inthedifferences in pronunciation as transcribed by a phonetic dictionary.
  • The differences in phonetic transcription can be categorized into two classes:
  • a) Differences in the number and identity of the phonemes.
  • For example, British English as transcribed by Cambridge Universitys BEEP dictionary 2has five extra vowels:/ax ( )ea ( )ia ( i )ua( u )ah( ) /compared to American as transcribed by Carnegie Melon University CMU dictionary./ i u / ,are allophones of/ i u / . American/ /is merged with/ a /compared with British accent.
  • American transcription has three different levels of stress for vowels and diphthongs. Also Australian English has distinctive vowels such as/i/instead of/ei/and/ / for/ au / .
  • b) Differences in phonetic realizations :phoneme substitution, deletion ,insertion.
  • For example, JOHNis pronounced as/ n/in American but as/ n/ in British and Australian English. The word SAYis pronounced as/sei/in British and American but it is pronounced as/s i/in Australian.


  • 1.4 Acoustics of Accents
  • Perceivedacoustics differences of accents are due to the differences, during the production of sound, in the configurations, positioning, tension and movement of laryngeal and supra-laryngeal articulatory parameters , namely vocal folds, vocal tract, tongue and lips
  • F our aspects of acoustic correlates of accents are considered essential for accent models and accent synthesis. These are:
  • (a) Formant s (i.e. frequency of vocal tract resonance)correlates of accents,including :
  • ( i ) Formant trajectories F k j ( t ),kis the formant index andjis phoneme index .
  • (ii)Timing and magnitude of the f ormant targetpoint(s) in formant space foreach phonetic unit .

(b) Pitch prosody correlates of accents, include : (i) Pitch trajectory atvariouslinguistic contexts and positions . e.g. pitch rise, atthe beginning of a voiced group or phrase, pitch fall at the end of a phrase . (ii) Pitch nucleus i.e. the timing and magnitude of theprominent pitcheventina voiced group. (c)Duration and Timing correlates of accents, (i) Duration of vowels and dip h thongs . (ii) Relative duration and timings of the twoconstituent vowelsof dip h thongs. (d) Laryngeal (glottal) correlates of accents , i.e the voice quality of speechsegments in certain contexts as a function of accent .

  • 2. Research Issues in Modelling Acoustics of Accents of English
  • Definition of an accent feature set composed of formantstrajectories, formantstarget points, pitch trajectory, power trajectory, duration.
  • Separation, normalisation, or averaging out of speakers characteristics from accent characteristics, this is required for modelling parameters of accent.
  • Modelling formants of vowels and diphthongs, the latter is composed of two connected elementary sounds.
  • Modelling the duration of vowels and diphthongs and the relative duration of the two halves of diphthongs.
  • Modelling pitch trajectory in different phonetic/linguistic positions and contexts.
  • Modelling voice quality correlates of an accents in different phonetic/linguistic positions and contexts.
  • Integration of all accent features within a coherent generative model.

Accent Profile (AP) Parameters Comments Rank Phonetic Parameters Substitution, insertion, deletionPronunciation differences obtained from phonetic transcription dictionaries ***** Supra-laryngeal and Laryngeal Correlates Formants & their trajectories 2 ndformant with largest variance is most sensitive to accent**** Glottal pulse (Voice Quality) Durations and shapes of opening and closing of glottal folds ** Prosody Correlates F 0meanAverage of pitch * F 0rangeRange of pitch * Pitch NucleusProminent point (stressed) within an intonation group (Tone Unit) *** Initial Pitch Rise First pitch slope of a narrative utterance *** Final Pitch LoweringFinal fall pitch slope of a narrative utterance *** Final Pitch Rise Final rise pitch slope of a narrative utterance*** Timing and Delivery Correlates Speaking RatePhonemes or words per second* Phoneme DurationVowel duration elongation and complete pronunciation all affect *** Excessive Co-articulation Clipped or short duration sounds**** 

Speech Accent Feature Analysis Method

  • The basic processes involved in accent analysis includes
  • Speechphonetic labellingandboundary segmentation usingHMMs
  • Pitchtrajectory and pitch nucleus estimation
  • Formant models and formant track estimation
  • Duration and power trajectory analysis

HMM Training Labeling & Segmentation Formants & Trajectories Pitch Contour Tracker PitchMarker Tone NucleusFeatures F0 Range/Mean Pitch Accents Accent Profile Speaking Rate & Durations Input Speech Block diagram illustration of the processes involved in accent analysis 

Analysis ofDuration Correlateof AU, US and UK Accent Speech Figure:Comparison of s peaking rates of British, Australian and American. Figure: Comparison of phoneme durations of British, Australian and American. 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 aa ae ah ao aw ay eh er ey ih iy ow oy uh uw Australian British American Duration (sec) 

Table : (%) word error of speech recognition across British, American and Australian accents.

  • Australian speaking(word)rate is23 % slower than British
  • American speaking(word)rate is15% slower than British

Comparison of speaking rates of British, American and Australian Accents.

  • There is an apparent correlation between automatic speech recognition and speaking rate.
  • Australian with the slowest speaking rate obtains the best recognition results followed by American and British.

7.28 27.3 33.1 Australian 29.94 8.830.6 American 34.9 29.3 12.8 British AustralianModel American Model British Model Model InputSpeaking Rate (number/sec) Phone Word British 12.1 3.64 American 11.6 3.1 Australian 10.8 2.8 

Formant Estimation with 2D-HMM Segmentation& window LPC Model Polynomial roots LP-based Formant-candidate feature extraction method Formantcandidate Feature vectorSpeech Frequency,Bandwidth Intensity Calculation

  • Formant feature extraction, illustrated consists of three main functions,
  • an LP model,
  • (2)a polynomial root finder, and
  • (3)a contour trend estimator.
  • Consider the z-transfer function of an LP model withKreal poles andIcomplex pole pairs and a gain factorGas
  • whereA kis the pole radius,F ithe pole frequency andF ssampling frequency.

estimator 

Frequency(Hz) Time(s) Illustration of of LP spectrum and the modelling of 6 complex pole pairs of a speech segment with an HMM composed of 4 formant-states.

  • 2D HMMs span time and frequency dimensions
  • Left-right HMM states across frequency model formants such that the first statemodels the first formant, the second state the second formant and so on
  • The distribution of formants in each state is modelled by a mixture Gaussian density.

Comparison of histograms (thin solid line) and Gaussian HMMs of formants of Australian English (bold dashed line). X axis: frequency (Hz); Y axis: probability. The figures show that HMMS are excellent models of the distribution of the formants. 

Comparison of Formants S