Transcript
Page 1: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

SGN$14006)Audio)and)Speech)Processing)

)Acoustic)Phonetics)

Slides'for'this'lecture'are'based'on'those'created'by'Katariina'Mahkonen'for'TUT'course'”Puheenkäsi;elyn'menetelmät”.'

Other'sources:'QuaAeri:'DiscreteCTime'Speech'Signal'Processin'–'Principles&PracAce.'

'K.'Koppinen:'”Puheenkäsi;elyn'menetelmät”,'luentomoniste,'TTY,'''h;p://www.cs.tut.fi/courses/SGNC4010/sgn4010.pdf'

'

Acoustic)Phonetics)

•  AcousAcally,'speech'signal,'as'any'sound,'can'be'viewed'as'air'pressure'level'variaAon'

•  AcousAc'phoneAcs'studies'the'acousAc'characterisAcs'of'speech'and'their'relaAonship'to'speech'producAon'

'

'

2!

Link!Longitudinal waves: http://www.acs.psu.edu/drussell/Demos/waves/wavemotion.html!

Speech)waveform/spectrum)is)quasi$stationary)only)5)$20ms)

•  Speech'is'processed'in'short'frames'(frameCbyCframe)'

•  Length'of'the'frame/window'in'speech'processing'is'usually'10C20ms'

•  Hanning/Hamming'type'windows'are'commonly'used'

•  Remember'how'windowing'works:'

What)are)speech)signals)like?)•  In'Ame'domain'

•  Which'phonemes'have'a'lot'of'energy?'

•  How'does'the'voiced/unvoiced'difference'appear'in'the'signal?'•  How'do'plosives'look'like?'

•  In'the'frequency'domain'

•  How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?'

•  How'do'you'find'the'pitch'from'speech'spectrum?'

•  Which'phoneme'is'easiest'to'recognize'from'the'spectrogram?'

•  What'special'feature'is'visible'in'the'spectrum'of'a'nasal?'

•  So`ware'for'speech'signal'visualizaAon:'Audacity,'Wavesurfer,'Praat,'Rtgram'(Windows),'Baudline'(Linux)'

4!

Audacity download!

Windows: RTgram download/! Linux: Baudline download!

Page 2: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

5!

Example sentence: ”He knew what taboos…” (arctic_b0510.wav)!

h! e! (k)n! ew! t! a! b! s!w(h)! a! t! oo!

Modeling))speech)

6!

Quatieri: Discrete-Time "Speech Signal Processing"- Principles and Practice!

Larynx)excitation))(glottis)signal))

Function)of)the)vocal)folds)

A glottis is closed when swallowing!B in voiced phonemes, the vocal chords vibrate periodically!C when whispering, airflow passes only through interarytenoid space!D in glottal fricatives (/h/), vocal folds are narrowly open !E rest/breathing position!F yawning!

What kind of signal (and spectrum) is produced from these?!

Page 3: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Electroglottograph)(EGG))

! Measures vocal"fold contact area!

Inverse)Oiltering)

How)does)glottal)signal)look)like?)

•  In'voiced'phonemes,'vibraAng'vocal'folds'cause'pressure'pulses'at'the'

vibraAon'rate'(fundamental'frequency'fate'F0)'(posiAon'B'above)'

•  In'unvoiced'phonemes,'the'narrow'hole'in'the'larynx'causes'a'turbulence'

in'the'airstream'from'the'lungs'(posiAon'C).'

Waveshapes)of)periodic)glottal)signal)

Page 4: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Spectra)of)periodic)glottal)signal)

13!Aiheesta University of New South Walesin (Australiassa) sivuilla!

http://www.phys.unsw.edu.au/jw/glottis-vocal-tract-voice.html!

Other)sources)of)sound)energy)in)speech)

1.  A'constricAon'(narrow'point)'in'the'vocal'tract'causes'a'

turbulence'in'the'air'stream'that'passes'through'

•  Difference'to'whispering:'here'the'turbulence'occurs'in'the'vocal'tract'and'not'in'the'gloes'

2.  Vocal'tract'is'closed'to'build'up'pressure'which'is'then'

released'and'the'air'”explodes”'out'(plosives)'

14!

Modeling)speech)–)inOluence)of)vocal)tract)on)speech)spectrum)

15!

Quatieri: Discrete-Time Speech Signal Processing - Principles and Practice!

Amazing grace (overtone singing)!

Spektrograms of overtone singing!

Formants:)resonances)of)vocal)tract)•  The'most'important'characterisAc'of'the'vocal'tract'are'its'resonances'(formants)'

•  Due'to'standing'waves'in'the'vibraAng'air'column'

•  Formants'(F1,'F2,'...)'can'usually'be'seen'in'the'spectrum'as'boosted'frequency'regions'

•  In'addiAon'to'frequency,'a'formant'is'characterized'by'its'intensity'and'bandwidth'

•  Different'vocal'tract'configuraAons'correspond'to'different'formant'frequencies'"'all'vowels'can'be'classified'based'on'formants'

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''spectrum'of'phoneme'/a/'

'

'

''

'

''

'

'

'

16!

Standing waves!

Page 5: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Vowels)$)formants)

17!

Mathematical)modeling)of)the)vocal)tract)•  CalculaAng'the'vocal'tract'resonances'based'on'the'shape'of'the'vocal'tract'is'analyAcally'intractable'(numerical'soluAons'

exist)'

'C'Should'take'into'account''

•  Different'larynx'excitaAons,''•  TimeCvarying'and'locaAonCdependent'changes'in'the'

vocal'tract'shape'

•  Nasal'tract'opening/closing,''•  Sound'radiaAon'from'the'lips,''

•  Energy'losses,''•  Turbulences''•  etc.'

''

•  However'by'studying'simplified'models'we'can'gain'a'fair'

amount'of'understanding'of'speech'producAon'

'

'

18!

Acoustics)of)simple)tubes)

19!

•  Resonances'of'the'vocal'tract'are'due'to'standing'waves'in'the'vibraAng'air'column'(similar'to'e.g.'wind'instruments)'

•  In'a'tube'of'uniform'crossCsecAonal'area,'the'wavelengths'λ'of'standing'waves'are''

•  SubsAtuAng'the'typical'vocal'tract'length'(male'0.17m,'female'0.15m),'the''frequencies'of'the'resonances'would'be'500Hz,'1500Hz,'2500Hz,…'

•  In'vocal'tract,'the'crossCsecAonal'area'varies'and'thus'resonance'frequencies'vary,'but'as'a'rule'of'thumb,'there'is'roughly''1'resonance'per'1'kHz'

Acoustics)of)simple)tubes)

20!

•  The'acousAcs'of'a'uniform'tube'can'be'solved'exactly'and'it'helps'us'in'the'following'(notaAon'x,'l,'S'on'the'previous'slide)'

•  AcousAcally'interesAng'variables'are'parAcle'velociAes'v(x,t)'in'the'tube'at'point'x'and'Ame't'and'pressure'p(x,t)'

•  For'simplicity,'we'assume'planar'pressure'waves'that'travel'along'the'tube.'For'convenience'we'use'volume'velocity'u(x,t)'instead'of'parAcle'velocity:'u(x,t)'='Sv(x,t)'

•  The'relaAonship'between'pressure'and'volume'velocity'is'governed'by'soCcalled'wave%equa(ons:'''''''where'ρ'denotes'atmospheric'pressure'and'c'is'speed'of'sound'

−∂p∂x

=ρS∂u∂t

−∂u∂x

=Sρc2

∂p∂t

! Intuition: if particle at point x is not moving but pressure is higher at its right side, pressure difference causes the particle accelerate to left.!

! Intuition: if pressure at point x is zero but particle velocity is higher on the right hand side, particles ”pile up” at point x and pressure increases.!

Page 6: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Solution)to)the)wave)equation)

21!

•  It'is'quite'easy'to'see'that'given'an'arbitrary'funcAon'f(y),'the'following'is'a'soluAon'to'the'wave'equaAon:''''u(x,t)'='f(t'–'x/c)'''''p(x,t)'='(ρc/S)'f(t'–'x/c)'which'is'simply'a'forward'traveling'wave'at'speed'c.'That'can'be'verified'by'subsAtuAng't!'t+1'and'x!'x+c'and'noAng'that'the'funcAon'gets'the'same'values'(wave'travels'for'one'sec.).'

•  Similarly,'a'backward'traveling'wave'is'a'soluAon'to'the'wave'equaAons.''

•  Alltogether,'we'can'write'the'soluAon'in'generic'form'as'''''u(x,t)'='f(t'–'x/c)'–'b(t'+'x/c)'''''p(x,t)'='(ρc/S)''['f(t'–'x/c)'+'b(t'+'x/c)']'where'f'and'b'are'arbitrary'forward'and'backward'traveling'waves,'respecAvely'

Modeling)the)vocal)tract)with)simple)tubes)

•  Vocal'tract'is'straightened'and'modeled'using'slices'with'constant'length'and'uniform'crossCsecAonal'area'

22!On the shape of the vocal tract “tube”!http://www.davidmhoward.com/voiceSoundModifiers.htm!

ReOlection)of)the)pressure)wave)

23!

nf

nn fknn fk )1( −

•  When two simple tubes are joined, reflections occur at the boundary!!

•  Reflection coefficient kn indicates how large part of "the volume-velocity wave traveling from a tube to the next is reflected back (tube cross-sectional areas Sn and"Sn+1 ):!

!!

1

1

+

+

+

−=

nn

nnn SS

SSk

ReOlection)of)the)pressure)wave)

•  Areas'are'posiAve,'therefore'91'<'kn'<'1'•  If'Sn+1'='0'','then''kn''=''1','and'the'wave'is'reflected'back'as'it'is'•  If'Sn+1'is'large,'kn''≈'C1','and'the'wave'is'reflected'in'its'enArety,'but'in'opposite'phase'

•  If'Sn'=Sn+1','no'reflecAon'occurs'

24!

Reflection of waves!1

1

+

+

+

−=

nn

nnn SS

SSkwww.acs.psu.edu/drussell/Demos/reflect/reflect.html!

Page 7: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Modeling)wave)reOlections)in)z$plane)–))lattice)structure)(Kelly$Lochbaum)structure))

25!-> length of one model slice = 340 m/s / fs!

•  fn'is'forwardCtraveling'sound'wave'in'the'tube'and'bn'is'backwardCtraveling'wave'

•  Let’s'model'the'wave'propagaAon'

and'reflecAon'using'the'KellyC

Lochbaum'laece'structure'shown'

in'the'figure'on'the'right'

•  Let’s'sample'the'behaviour'of'the'

structure'so'that'the'wave'is'

delayed'by'one'Ame'unit'(zC1)'when'it'travels'the'length'of'one'tube'

secAon'

Kelly$Lochbaum)equations)•  Wave'behaviour'can'be'described'using'the'following'

equaAons.'

•  From'the'figure'we'

obtain:'

26!

)()()1()( 11

1 zbkzzfkzf nnnnn +−

+ −−=

11

2 )()1()()( −+

− ++= zzbkzzfkzb nnnnn

Kelly$Lochbaum)equations)'

•  Let’s'solve'fn+1(z)'and'bn+1(z)'as'a'funcAon'of'fn(z)'and'bn(z)''•  SoluAon'in'matrix'format:''

27!

!"

#$%

&

!!!!

"

#

$$$$

%

&

++−

+−

+=!

"

#$%

&−

+

+

)()(

11

11)()(

1

1

1

1

zbzf

kz

kzk

kzk

kz

zbzf

n

n

nn

n

n

n

n

n

n

!"

#$%

&=!

"

#$%

&

+

+

)()(

)()(

1

1

zbzf

Hzbzf

n

nn

n

n

Vocal)tract)model)using)the)lattice)structure)

•  We'obtain'a'discreteCAme'representaAon'for'the'tubeCsegment'

model'of'the'vocal'tract'by'concatenaAng'laece'elements'28!

Page 8: Acoustic)Phonetics) - TUT · Acoustic)Phonetics) ... • How'does'the'voiced'nature'of'a'phoneme'appear'in'fCdomain?' ... by'soCcalled'wave%equa(ons:' ' ' ' '

Lattice)structure)

•  The'model'has'been'found'to'work'sufficiently'well'also'with'

the'simplified'assumpAon'b0=0'ja'bN=0'•  Transfer'funcAon'of'the'enAre'laece'structure'is'obtained'as:'

•  So:'

'

29!

!"

#$%

&==!

"

#$%

&=!

"

#$%

&=!

"

#$%

&−

−−

)()(

......)()(

)()(

)()(

0

001

2

21

1

1

zbzf

HHHzbzf

HHzbzf

Hzbzf

NNN

NNN

N

NN

N

N

!"

#$%

&Η=!

"

#$%

&=!

"

#$%

&=!

"

#$%

&− 0

)()(

)()(

...)()(

0)(

0

001

zGz

zbzf

HHHzbzfzS

NNN

N

The)Oilter)used)to)model)the)vocal)tract)'

•  We'observe'that'the'filter'H(z)'is'of'allCpole'type:'

•  Thus'the'vocal'tract'is'modeled'using'the'aboveCdescribed'allCpole'

structure,'aka'autoregressive'(AR)'model.'That'is'system'whose'

transfer'funcAon'is'of'the'form'

•  The'next'lecture'will'discuss'a'mathemaAcal'technique'called'

LINEAR'PREDICTION'(LP)'that'allows'us'to'determine'the'

coefficients'of'the'funcAon'A(z).'The'linear'predicAon'can'in'pracAce'

be'computed'using'a'(fast)'LevinsonCDurbin'algorithm.'

'

'

30!

)(1)(zA

z =Η

But)before)that…)

Some vocal talent!http://www.youtube.com/watch?v=ZxcnloCzxq4!


Top Related