analysing voice quality - trinity college dublin€¦ · analysing voice quality john kane april...
TRANSCRIPT
uni
Analysing voice quality
John Kane
April 30, 2010
John Kane () Analysing voice quality April 30, 2010 1 / 18
uni
Voice quality (VQ)
Mainly a consequence of the vibration of the vocal folds.
Overall timbre of a person’s voice (organic setting and dynamicshifts).
VQ not limited to pitch and loudness.
John Kane () Analysing voice quality April 30, 2010 2 / 18
uni
Voice quality (VQ)
Mainly a consequence of the vibration of the vocal folds.
Overall timbre of a person’s voice (organic setting and dynamicshifts).
VQ not limited to pitch and loudness.
John Kane () Analysing voice quality April 30, 2010 2 / 18
uni
Voice quality (VQ)
Mainly a consequence of the vibration of the vocal folds.
Overall timbre of a person’s voice (organic setting and dynamicshifts).
VQ not limited to pitch and loudness.
John Kane () Analysing voice quality April 30, 2010 2 / 18
uni
Voice quality
Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).
VQs: breathy, whispery, creaky, harsh, falsetto, modal.
In real speech these VQs exist on continuous scales and incombination with others.
Voice quality examples
John Kane () Analysing voice quality April 30, 2010 3 / 18
uni
Voice quality
Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).
VQs: breathy, whispery, creaky, harsh, falsetto, modal.
In real speech these VQs exist on continuous scales and incombination with others.
Voice quality examples
John Kane () Analysing voice quality April 30, 2010 3 / 18
uni
Voice quality
Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).
VQs: breathy, whispery, creaky, harsh, falsetto, modal.
In real speech these VQs exist on continuous scales and incombination with others.
Voice quality examples
John Kane () Analysing voice quality April 30, 2010 3 / 18
uni
Voice quality
Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).
VQs: breathy, whispery, creaky, harsh, falsetto, modal.
In real speech these VQs exist on continuous scales and incombination with others.
Voice quality examples
John Kane () Analysing voice quality April 30, 2010 3 / 18
uni
Voice quality (VQ)
Reveals information on speaker’s state and attitude.
Infants already sensitive to different VQs.
Mackenzie Beck (2005) VQ used before understanding of linguisticcontent.
John Kane () Analysing voice quality April 30, 2010 4 / 18
uni
Voice quality (VQ)
Reveals information on speaker’s state and attitude.
Infants already sensitive to different VQs.
Mackenzie Beck (2005) VQ used before understanding of linguisticcontent.
John Kane () Analysing voice quality April 30, 2010 4 / 18
uni
Voice quality in speech communication
Contrastive linguistic purpose in some languages.
Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)
Status, popular trends.
Extralinguistic, active listening (grunts etc.).
Prosodic component in neutral running speech.
John Kane () Analysing voice quality April 30, 2010 5 / 18
uni
Voice quality in speech communication
Contrastive linguistic purpose in some languages.
Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)
Status, popular trends.
Extralinguistic, active listening (grunts etc.).
Prosodic component in neutral running speech.
John Kane () Analysing voice quality April 30, 2010 5 / 18
uni
Voice quality in speech communication
Contrastive linguistic purpose in some languages.
Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)
Status, popular trends.
Extralinguistic, active listening (grunts etc.).
Prosodic component in neutral running speech.
John Kane () Analysing voice quality April 30, 2010 5 / 18
uni
Voice quality in speech communication
Contrastive linguistic purpose in some languages.
Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)
Status, popular trends.
Extralinguistic, active listening (grunts etc.).
Prosodic component in neutral running speech.
John Kane () Analysing voice quality April 30, 2010 5 / 18
uni
Potentials of VQ/glottal source in speech technology
Improvement of naturalness in parameter speech synthesis (Cabral2008, Raitio 2008).
Potential for more flexible/expressive speech synthesis
Ability to aid emotion detection and paralinguistic annotation.
John Kane () Analysing voice quality April 30, 2010 6 / 18
uni
Potentials of VQ/glottal source in speech technology
Improvement of naturalness in parameter speech synthesis (Cabral2008, Raitio 2008).
Potential for more flexible/expressive speech synthesis
Ability to aid emotion detection and paralinguistic annotation.
John Kane () Analysing voice quality April 30, 2010 6 / 18
uni
Potentials of VQ/glottal source in speech technology
Improvement of naturalness in parameter speech synthesis (Cabral2008, Raitio 2008).
Potential for more flexible/expressive speech synthesis
Ability to aid emotion detection and paralinguistic annotation.
John Kane () Analysing voice quality April 30, 2010 6 / 18
uni
Difficulties measuring VQ
As listeners were are very sensitive to variation in VQ.
Difficult job for computers.
Hidden position of vocals folds.
Vocal Folds
Robust extraction of glottal source difficult job for signal processing.
John Kane () Analysing voice quality April 30, 2010 7 / 18
uni
Difficulties measuring VQ
As listeners were are very sensitive to variation in VQ.
Difficult job for computers.
Hidden position of vocals folds.
Vocal Folds
Robust extraction of glottal source difficult job for signal processing.
John Kane () Analysing voice quality April 30, 2010 7 / 18
uni
Difficulties measuring VQ
As listeners were are very sensitive to variation in VQ.
Difficult job for computers.
Hidden position of vocals folds.
Vocal Folds
Robust extraction of glottal source difficult job for signal processing.
John Kane () Analysing voice quality April 30, 2010 7 / 18
uni
Difficulties measuring VQ
As listeners were are very sensitive to variation in VQ.
Difficult job for computers.
Hidden position of vocals folds.
Vocal Folds
Robust extraction of glottal source difficult job for signal processing.
John Kane () Analysing voice quality April 30, 2010 7 / 18
uni
Electroglottography (EGG)
John Kane () Analysing voice quality April 30, 2010 8 / 18
uni
Inverse filtering
John Kane () Analysing voice quality April 30, 2010 9 / 18
uni
Parameterisation
Time based measurements (LF model)
0 20 40 60 80 100 120-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Time (ms)
Am
plitu
de
Adv: Related to physiologyDisAdv: Sensitive to noise and phase.
Frequency domain measurements
Adv: Avoids phase issuesDisAdv: Existing parameters strongly correlated.
John Kane () Analysing voice quality April 30, 2010 10 / 18
uni
Parameterisation
Time based measurements (LF model)
0 20 40 60 80 100 120-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Time (ms)
Am
plitu
de
Adv: Related to physiologyDisAdv: Sensitive to noise and phase.
Frequency domain measurements
Adv: Avoids phase issuesDisAdv: Existing parameters strongly correlated.
John Kane () Analysing voice quality April 30, 2010 10 / 18
uni
Our parameterisation system
RECORDEDSPEECH
CODEBOOKSEARCH FOR
INITIAL VALUES
TWO PARTOPTIMISATION
MODELPARAMETERS
AUTOMATICINVERSE
FILTERING(ALKU 2002)
0 500 1000 1500 2000 2500 3000−0.2
0
0.2
Time (ms)
Speech waveform
0 500 1000 1500 2000 2500 3000−1
−0.5
0
0.5
Time (ms)
Voice source waveform
0 500 1000 1500 2000 2500 3000 3500−90
−80
−70
−60
−50
−40
−30
−20
−10
Frequency (Hz)
Am
plitu
de (
dB)
Spectral optimisation
Voice source spectrum
Fitted model
Rk RgRk
EE RaF0
John Kane () Analysing voice quality April 30, 2010 11 / 18
uni
Our Interspeech/Speech Communication submission
Description of our frequency domain parameterisation approach.
Finnish vowels: /A e i o u y æ ø/, 11 speakers
BREATHY
MODAL
PRESSED
John Kane () Analysing voice quality April 30, 2010 12 / 18
uni
Our Interspeech/Speech Communication submission
Description of our frequency domain parameterisation approach.
Finnish vowels: /A e i o u y æ ø/, 11 speakers
BREATHY
MODAL
PRESSED
John Kane () Analysing voice quality April 30, 2010 12 / 18
uni
Our Interspeech/Speech Communication submission
Description of our frequency domain parameterisation approach.
Finnish vowels: /A e i o u y æ ø/, 11 speakers
BREATHY
MODAL
PRESSED
John Kane () Analysing voice quality April 30, 2010 12 / 18
uni
Evaluation
Robustness against simulations of difficult conditions.
Relative change Sensitivity of parametersCoefficient of variation Pulse-to-pulse variation
CLEANSIGNAL
SIGNAL WITHADDITIVE NOISE
(SNR = 45 dB)
SIGNAL WITHADDITIVE NOISE
(SNR = 30 dB)
SIGNAL WITHRECORDING
SYSTEMDISTORTION
Ability to discriminate voice qualities.
Explained variance Regression analysis.Classification Linear discriminant analysis.
John Kane () Analysing voice quality April 30, 2010 13 / 18
uni
Evaluation
Robustness against simulations of difficult conditions.
Relative change Sensitivity of parametersCoefficient of variation Pulse-to-pulse variation
CLEANSIGNAL
SIGNAL WITHADDITIVE NOISE
(SNR = 45 dB)
SIGNAL WITHADDITIVE NOISE
(SNR = 30 dB)
SIGNAL WITHRECORDING
SYSTEMDISTORTION
Ability to discriminate voice qualities.
Explained variance Regression analysis.Classification Linear discriminant analysis.
John Kane () Analysing voice quality April 30, 2010 13 / 18
uni
Overall results
Clearly better robustness against distortions imposed by recordingsystem.
Breathy Modal Pressed0
5
10
15
20
25
30
35
40Ra
Rel
ativ
e C
hang
e (%
)
Voice qualitiesBreathy Modal Pressed
0
5
10
15
20
25
Rel
ativ
e C
hang
e (%
)
Voice qualities
Rk
Breathy Modal Pressed0
1
2
3
4
5
6
7
8
9
Rel
ativ
e C
hang
e (%
)
Voice qualities
Rg
John Kane () Analysing voice quality April 30, 2010 14 / 18
uni
Overall results
Generally less senstive to moderate levels of additive noise imposed onsignals.
High noise levels at times affected robustness.
John Kane () Analysing voice quality April 30, 2010 15 / 18
uni
Overall results
Generally less senstive to moderate levels of additive noise imposed onsignals.
High noise levels at times affected robustness.
John Kane () Analysing voice quality April 30, 2010 15 / 18
uni
Overall results
Clearly higher R2 scores for individual parameters.
Rg Rk Ra0
5
10
15
20
25
ParametersR
−sq
uare
d va
lues
(%
)
New system
Time system
April 30, 2010
Abstract
1
Table 1: Confusion matrix of classification scores (%) of the three voice qualitiesusing the two systems.
Spec TimeBre Neu Pre Bre Neu Pre
Bre 79 20 1 76 22 2Neu 32 47 21 42 43 15Pre 6 24 70 8 28 64
1
Higher classification scores.
John Kane () Analysing voice quality April 30, 2010 16 / 18
uni
Some thoughts
New method may overcome some of the issues which have hamperedautomated glottal source analysis.
Produced vowels vs running speech.
Criteria to be defined to maximise the probablility of robustparameter extraction.
Extension of islands of reliability (Mokhtari & Campbell 2002)
John Kane () Analysing voice quality April 30, 2010 17 / 18
uni
Some thoughts
New method may overcome some of the issues which have hamperedautomated glottal source analysis.
Produced vowels vs running speech.
Criteria to be defined to maximise the probablility of robustparameter extraction.
Extension of islands of reliability (Mokhtari & Campbell 2002)
John Kane () Analysing voice quality April 30, 2010 17 / 18
uni
Some thoughts
New method may overcome some of the issues which have hamperedautomated glottal source analysis.
Produced vowels vs running speech.
Criteria to be defined to maximise the probablility of robustparameter extraction.
Extension of islands of reliability (Mokhtari & Campbell 2002)
John Kane () Analysing voice quality April 30, 2010 17 / 18
uni
Some thoughts
New method may overcome some of the issues which have hamperedautomated glottal source analysis.
Produced vowels vs running speech.
Criteria to be defined to maximise the probablility of robustparameter extraction.
Extension of islands of reliability (Mokhtari & Campbell 2002)
John Kane () Analysing voice quality April 30, 2010 17 / 18
uni
Future work
Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.
Further work with HMM based classification of voice qualities withMark Kane.
Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.
Open to other collaborations!
John Kane () Analysing voice quality April 30, 2010 18 / 18
uni
Future work
Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.
Further work with HMM based classification of voice qualities withMark Kane.
Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.
Open to other collaborations!
John Kane () Analysing voice quality April 30, 2010 18 / 18
uni
Future work
Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.
Further work with HMM based classification of voice qualities withMark Kane.
Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.
Open to other collaborations!
John Kane () Analysing voice quality April 30, 2010 18 / 18
uni
Future work
Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.
Further work with HMM based classification of voice qualities withMark Kane.
Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.
Open to other collaborations!
John Kane () Analysing voice quality April 30, 2010 18 / 18