acoustic)phonetics) - tut · acoustic)phonetics) ... •...

8
SGN$14006 Audio and Speech Processing Acoustic Phonetics Slides for this lecture are based on those created by Katariina Mahkonen for TUT course ”Puheenkäsi;elyn menetelmät”. Other sources: QuaAeri: DiscreteCTime Speech Signal Processin – Principles&PracAce. K. Koppinen: ”Puheenkäsi;elyn menetelmät”, luentomoniste, TTY, h;p://www.cs.tut.fi/courses/SGNC4010/sgn4010.pdf Acoustic Phonetics AcousAcally, speech signal, as any sound, can be viewed as air pressure level variaAon AcousAc phoneAcs studies the acousAc characterisAcs of speech and their relaAonship to speech producAon 2 Link Longitudinal waves: http://www.acs.psu.edu/drussell/Demos/waves/wavemotion.html Speech waveform/spectrum is quasi$stationary only 5 $20ms Speech is processed in short frames (frameCbyCframe) Length of the frame/window in speech processing is usually 10C20ms Hanning/Hamming type windows are commonly used Remember how windowing works: What are speech signals like? In Ame domain Which phonemes have a lot of energy? How does the voiced/unvoiced difference appear in the signal? How do plosives look like? In the frequency domain How does the voiced nature of a phoneme appear in fCdomain? How do you find the pitch from speech spectrum? Which phoneme is easiest to recognize from the spectrogram? What special feature is visible in the spectrum of a nasal? So`ware for speech signal visualizaAon: Audacity, Wavesurfer, Praat, Rtgram (Windows), Baudline (Linux) 4 Audacity download Windows: RTgram download/ Linux: Baudline download

Upload: lekhue

Post on 19-May-2018

260 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

SGN$14006)Audio)and)Speech)Processing)

)Acoustic)Phonetics)

Slides'for'this'lecture'are'based'on'those'created'by'Katariina'Mahkonen'for'TUT'course'”Puheenkäsi;elyn'menetelmät”.'

Other'sources:'QuaAeri:'DiscreteCTime'Speech'Signal'Processin'–'Principles&PracAce.'

'K.'Koppinen:'”Puheenkäsi;elyn'menetelmät”,'luentomoniste,'TTY,'''h;p://www.cs.tut.fi/courses/SGNC4010/sgn4010.pdf'

'

Acoustic)Phonetics)

•  AcousAcally,'speech'signal,'as'any'sound,'can'be'viewed'as'air'pressure'level'variaAon'

•  AcousAc'phoneAcs'studies'the'acousAc'characterisAcs'of'speech'and'their'relaAonship'to'speech'producAon'

'

'

2!

Link!Longitudinal waves: http://www.acs.psu.edu/drussell/Demos/waves/wavemotion.html!

Speech)waveform/spectrum)is)quasi$stationary)only)5)$20ms)

•  Speech'is'processed'in'short'frames'(frameCbyCframe)'

•  Length'of'the'frame/window'in'speech'processing'is'usually'10C20ms'

•  Hanning/Hamming'type'windows'are'commonly'used'

•  Remember'how'windowing'works:'

What)are)speech)signals)like?)•  In'Ame'domain'

•  Which'phonemes'have'a'lot'of'energy?'

•  How'does'the'voiced/unvoiced'difference'appear'in'the'signal?'•  How'do'plosives'look'like?'

•  In'the'frequency'domain'

•  How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?'

•  How'do'you'find'the'pitch'from'speech'spectrum?'

•  Which'phoneme'is'easiest'to'recognize'from'the'spectrogram?'

•  What'special'feature'is'visible'in'the'spectrum'of'a'nasal?'

•  So`ware'for'speech'signal'visualizaAon:'Audacity,'Wavesurfer,'Praat,'Rtgram'(Windows),'Baudline'(Linux)'

4!

Audacity download!

Windows: RTgram download/! Linux: Baudline download!

Page 2: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

5!

Example sentence: ”He knew what taboos…” (arctic_b0510.wav)!

h! e! (k)n! ew! t! a! b! s!w(h)! a! t! oo!

Modeling))speech)

6!

Quatieri: Discrete-Time "Speech Signal Processing"- Principles and Practice!

Larynx)excitation))(glottis)signal))

Function)of)the)vocal)folds)

A glottis is closed when swallowing!B in voiced phonemes, the vocal chords vibrate periodically!C when whispering, airflow passes only through interarytenoid space!D in glottal fricatives (/h/), vocal folds are narrowly open !E rest/breathing position!F yawning!

What kind of signal (and spectrum) is produced from these?!

Page 3: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Electroglottograph)(EGG))

! Measures vocal"fold contact area!

Inverse)Oiltering)

How)does)glottal)signal)look)like?)

•  In'voiced'phonemes,'vibraAng'vocal'folds'cause'pressure'pulses'at'the'

vibraAon'rate'(fundamental'frequency'fate'F0)'(posiAon'B'above)'

•  In'unvoiced'phonemes,'the'narrow'hole'in'the'larynx'causes'a'turbulence'

in'the'airstream'from'the'lungs'(posiAon'C).'

Waveshapes)of)periodic)glottal)signal)

Page 4: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Spectra)of)periodic)glottal)signal)

13!Aiheesta University of New South Walesin (Australiassa) sivuilla!

http://www.phys.unsw.edu.au/jw/glottis-vocal-tract-voice.html!

Other)sources)of)sound)energy)in)speech)

1.  A'constricAon'(narrow'point)'in'the'vocal'tract'causes'a'

turbulence'in'the'air'stream'that'passes'through'

•  Difference'to'whispering:'here'the'turbulence'occurs'in'the'vocal'tract'and'not'in'the'gloes'

2.  Vocal'tract'is'closed'to'build'up'pressure'which'is'then'

released'and'the'air'”explodes”'out'(plosives)'

14!

Modeling)speech)–)inOluence)of)vocal)tract)on)speech)spectrum)

15!

Quatieri: Discrete-Time Speech Signal Processing - Principles and Practice!

Amazing grace (overtone singing)!

Spektrograms of overtone singing!

Formants:)resonances)of)vocal)tract)•  The'most'important'characterisAc'of'the'vocal'tract'are'its'resonances'(formants)'

•  Due'to'standing'waves'in'the'vibraAng'air'column'

•  Formants'(F1,'F2,'...)'can'usually'be'seen'in'the'spectrum'as'boosted'frequency'regions'

•  In'addiAon'to'frequency,'a'formant'is'characterized'by'its'intensity'and'bandwidth'

•  Different'vocal'tract'configuraAons'correspond'to'different'formant'frequencies'"'all'vowels'can'be'classified'based'on'formants'

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''spectrum'of'phoneme'/a/'

'

'

''

'

''

'

'

'

16!

Standing waves!

Page 5: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Vowels)$)formants)

17!

Mathematical)modeling)of)the)vocal)tract)•  CalculaAng'the'vocal'tract'resonances'based'on'the'shape'of'the'vocal'tract'is'analyAcally'intractable'(numerical'soluAons'

exist)'

'C'Should'take'into'account''

•  Different'larynx'excitaAons,''•  TimeCvarying'and'locaAonCdependent'changes'in'the'

vocal'tract'shape'

•  Nasal'tract'opening/closing,''•  Sound'radiaAon'from'the'lips,''

•  Energy'losses,''•  Turbulences''•  etc.'

''

•  However'by'studying'simplified'models'we'can'gain'a'fair'

amount'of'understanding'of'speech'producAon'

'

'

18!

Acoustics)of)simple)tubes)

19!

•  Resonances'of'the'vocal'tract'are'due'to'standing'waves'in'the'vibraAng'air'column'(similar'to'e.g.'wind'instruments)'

•  In'a'tube'of'uniform'crossCsecAonal'area,'the'wavelengths'λ'of'standing'waves'are''

•  SubsAtuAng'the'typical'vocal'tract'length'(male'0.17m,'female'0.15m),'the''frequencies'of'the'resonances'would'be'500Hz,'1500Hz,'2500Hz,…'

•  In'vocal'tract,'the'crossCsecAonal'area'varies'and'thus'resonance'frequencies'vary,'but'as'a'rule'of'thumb,'there'is'roughly''1'resonance'per'1'kHz'

Acoustics)of)simple)tubes)

20!

•  The'acousAcs'of'a'uniform'tube'can'be'solved'exactly'and'it'helps'us'in'the'following'(notaAon'x,'l,'S'on'the'previous'slide)'

•  AcousAcally'interesAng'variables'are'parAcle'velociAes'v(x,t)'in'the'tube'at'point'x'and'Ame't'and'pressure'p(x,t)'

•  For'simplicity,'we'assume'planar'pressure'waves'that'travel'along'the'tube.'For'convenience'we'use'volume'velocity'u(x,t)'instead'of'parAcle'velocity:'u(x,t)'='Sv(x,t)'

•  The'relaAonship'between'pressure'and'volume'velocity'is'governed'by'soCcalled'wave%equa(ons:'''''''where'ρ'denotes'atmospheric'pressure'and'c'is'speed'of'sound'

−∂p∂x

=ρS∂u∂t

−∂u∂x

=Sρc2

∂p∂t

! Intuition: if particle at point x is not moving but pressure is higher at its right side, pressure difference causes the particle accelerate to left.!

! Intuition: if pressure at point x is zero but particle velocity is higher on the right hand side, particles ”pile up” at point x and pressure increases.!

Page 6: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Solution)to)the)wave)equation)

21!

•  It'is'quite'easy'to'see'that'given'an'arbitrary'funcAon'f(y),'the'following'is'a'soluAon'to'the'wave'equaAon:''''u(x,t)'='f(t'–'x/c)'''''p(x,t)'='(ρc/S)'f(t'–'x/c)'which'is'simply'a'forward'traveling'wave'at'speed'c.'That'can'be'verified'by'subsAtuAng't!'t+1'and'x!'x+c'and'noAng'that'the'funcAon'gets'the'same'values'(wave'travels'for'one'sec.).'

•  Similarly,'a'backward'traveling'wave'is'a'soluAon'to'the'wave'equaAons.''

•  Alltogether,'we'can'write'the'soluAon'in'generic'form'as'''''u(x,t)'='f(t'–'x/c)'–'b(t'+'x/c)'''''p(x,t)'='(ρc/S)''['f(t'–'x/c)'+'b(t'+'x/c)']'where'f'and'b'are'arbitrary'forward'and'backward'traveling'waves,'respecAvely'

Modeling)the)vocal)tract)with)simple)tubes)

•  Vocal'tract'is'straightened'and'modeled'using'slices'with'constant'length'and'uniform'crossCsecAonal'area'

22!On the shape of the vocal tract “tube”!http://www.davidmhoward.com/voiceSoundModifiers.htm!

ReOlection)of)the)pressure)wave)

23!

nf

nn fknn fk )1( −

•  When two simple tubes are joined, reflections occur at the boundary!!

•  Reflection coefficient kn indicates how large part of "the volume-velocity wave traveling from a tube to the next is reflected back (tube cross-sectional areas Sn and"Sn+1 ):!

!!

1

1

+

+

+

−=

nn

nnn SS

SSk

ReOlection)of)the)pressure)wave)

•  Areas'are'posiAve,'therefore'91'<'kn'<'1'•  If'Sn+1'='0'','then''kn''=''1','and'the'wave'is'reflected'back'as'it'is'•  If'Sn+1'is'large,'kn''≈'C1','and'the'wave'is'reflected'in'its'enArety,'but'in'opposite'phase'

•  If'Sn'=Sn+1','no'reflecAon'occurs'

24!

Reflection of waves!1

1

+

+

+

−=

nn

nnn SS

SSkwww.acs.psu.edu/drussell/Demos/reflect/reflect.html!

Page 7: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Modeling)wave)reOlections)in)z$plane)–))lattice)structure)(Kelly$Lochbaum)structure))

25!-> length of one model slice = 340 m/s / fs!

•  fn'is'forwardCtraveling'sound'wave'in'the'tube'and'bn'is'backwardCtraveling'wave'

•  Let’s'model'the'wave'propagaAon'

and'reflecAon'using'the'KellyC

Lochbaum'laece'structure'shown'

in'the'figure'on'the'right'

•  Let’s'sample'the'behaviour'of'the'

structure'so'that'the'wave'is'

delayed'by'one'Ame'unit'(zC1)'when'it'travels'the'length'of'one'tube'

secAon'

Kelly$Lochbaum)equations)•  Wave'behaviour'can'be'described'using'the'following'

equaAons.'

•  From'the'figure'we'

obtain:'

26!

)()()1()( 11

1 zbkzzfkzf nnnnn +−

+ −−=

11

2 )()1()()( −+

− ++= zzbkzzfkzb nnnnn

Kelly$Lochbaum)equations)'

•  Let’s'solve'fn+1(z)'and'bn+1(z)'as'a'funcAon'of'fn(z)'and'bn(z)''•  SoluAon'in'matrix'format:''

27!

!"

#$%

&

!!!!

"

#

$$$$

%

&

++−

+−

+=!

"

#$%

&−

+

+

)()(

11

11)()(

1

1

1

1

zbzf

kz

kzk

kzk

kz

zbzf

n

n

nn

n

n

n

n

n

n

!"

#$%

&=!

"

#$%

&

+

+

)()(

)()(

1

1

zbzf

Hzbzf

n

nn

n

n

Vocal)tract)model)using)the)lattice)structure)

•  We'obtain'a'discreteCAme'representaAon'for'the'tubeCsegment'

model'of'the'vocal'tract'by'concatenaAng'laece'elements'28!

Page 8: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Lattice)structure)

•  The'model'has'been'found'to'work'sufficiently'well'also'with'

the'simplified'assumpAon'b0=0'ja'bN=0'•  Transfer'funcAon'of'the'enAre'laece'structure'is'obtained'as:'

•  So:'

'

29!

!"

#$%

&==!

"

#$%

&=!

"

#$%

&=!

"

#$%

&−

−−

)()(

......)()(

)()(

)()(

0

001

2

21

1

1

zbzf

HHHzbzf

HHzbzf

Hzbzf

NNN

NNN

N

NN

N

N

!"

#$%

&Η=!

"

#$%

&=!

"

#$%

&=!

"

#$%

&− 0

)()(

)()(

...)()(

0)(

0

001

zGz

zbzf

HHHzbzfzS

NNN

N

The)Oilter)used)to)model)the)vocal)tract)'

•  We'observe'that'the'filter'H(z)'is'of'allCpole'type:'

•  Thus'the'vocal'tract'is'modeled'using'the'aboveCdescribed'allCpole'

structure,'aka'autoregressive'(AR)'model.'That'is'system'whose'

transfer'funcAon'is'of'the'form'

•  The'next'lecture'will'discuss'a'mathemaAcal'technique'called'

LINEAR'PREDICTION'(LP)'that'allows'us'to'determine'the'

coefficients'of'the'funcAon'A(z).'The'linear'predicAon'can'in'pracAce'

be'computed'using'a'(fast)'LevinsonCDurbin'algorithm.'

'

'

30!

)(1)(zA

z =Η

But)before)that…)

Some vocal talent!http://www.youtube.com/watch?v=ZxcnloCzxq4!