analysing voice quality - trinity college dublin€¦ · analysing voice quality john kane april...

Analysing voice quality

John Kane

April 30, 2010

John Kane () Analysing voice quality April 30, 2010 1 / 18

Voice quality (VQ)

Mainly a consequence of the vibration of the vocal folds.

Overall timbre of a person’s voice (organic setting and dynamicshifts).

VQ not limited to pitch and loudness.

Voice quality (VQ)

Voice quality

Laver sought to provide quantitative physiological/acousticdescriptions of VQ (1980).

VQs: breathy, whispery, creaky, harsh, falsetto, modal.

In real speech these VQs exist on continuous scales and incombination with others.

Voice quality examples

Voice quality

Voice quality (VQ)

Reveals information on speaker’s state and attitude.

Infants already sensitive to different VQs.

Mackenzie Beck (2005) VQ used before understanding of linguisticcontent.

Voice quality (VQ)

Reveals information on speaker’s state and attitude.

Infants already sensitive to different VQs.

Mackenzie Beck (2005) VQ used before understanding of linguisticcontent.

Voice quality in speech communication

Contrastive linguistic purpose in some languages.

Gujurati “Twelve” vs “outside” (Breathy)Danish “hun” vs “hund” (Creaky)

Status, popular trends.

Extralinguistic, active listening (grunts etc.).

Prosodic component in neutral running speech.

Potentials of VQ/glottal source in speech technology

Improvement of naturalness in parameter speech synthesis (Cabral2008, Raitio 2008).

Potential for more flexible/expressive speech synthesis

Ability to aid emotion detection and paralinguistic annotation.

Difficulties measuring VQ

As listeners were are very sensitive to variation in VQ.

Difficult job for computers.

Hidden position of vocals folds.

Vocal Folds

Robust extraction of glottal source difficult job for signal processing.

Vocal Folds

Electroglottography (EGG)

Inverse filtering

Parameterisation

Time based measurements (LF model)

0 20 40 60 80 100 120-1

Time (ms)

Adv: Related to physiologyDisAdv: Sensitive to noise and phase.

Frequency domain measurements

Adv: Avoids phase issuesDisAdv: Existing parameters strongly correlated.

Parameterisation

Time based measurements (LF model)

0 20 40 60 80 100 120-1

Time (ms)

Adv: Related to physiologyDisAdv: Sensitive to noise and phase.

Frequency domain measurements

Adv: Avoids phase issuesDisAdv: Existing parameters strongly correlated.

Our parameterisation system

RECORDEDSPEECH

CODEBOOKSEARCH FOR

INITIAL VALUES

TWO PARTOPTIMISATION

MODELPARAMETERS

AUTOMATICINVERSE

FILTERING(ALKU 2002)

0 500 1000 1500 2000 2500 3000−0.2

Time (ms)

Speech waveform

0 500 1000 1500 2000 2500 3000−1

−0.5

Time (ms)

Voice source waveform

0 500 1000 1500 2000 2500 3000 3500−90

Frequency (Hz)

Spectral optimisation

Voice source spectrum

Fitted model

Rk RgRk

EE RaF0

Our Interspeech/Speech Communication submission

Description of our frequency domain parameterisation approach.

Finnish vowels: /A e i o u y æ ø/, 11 speakers

BREATHY

PRESSED

BREATHY

PRESSED

BREATHY

PRESSED

Evaluation

Robustness against simulations of difficult conditions.

Relative change Sensitivity of parametersCoefficient of variation Pulse-to-pulse variation

CLEANSIGNAL

SIGNAL WITHADDITIVE NOISE

(SNR = 45 dB)

(SNR = 30 dB)

SIGNAL WITHRECORDING

SYSTEMDISTORTION

Ability to discriminate voice qualities.

Explained variance Regression analysis.Classification Linear discriminant analysis.

Evaluation

Robustness against simulations of difficult conditions.

Relative change Sensitivity of parametersCoefficient of variation Pulse-to-pulse variation

CLEANSIGNAL

(SNR = 45 dB)

(SNR = 30 dB)

SIGNAL WITHRECORDING

SYSTEMDISTORTION

Ability to discriminate voice qualities.

Explained variance Regression analysis.Classification Linear discriminant analysis.

Overall results

Clearly better robustness against distortions imposed by recordingsystem.

Breathy Modal Pressed0

Voice qualitiesBreathy Modal Pressed

Voice qualities

Breathy Modal Pressed0

Voice qualities

Overall results

Generally less senstive to moderate levels of additive noise imposed onsignals.

High noise levels at times affected robustness.

Overall results

Generally less senstive to moderate levels of additive noise imposed onsignals.

High noise levels at times affected robustness.

Overall results

Clearly higher R2 scores for individual parameters.

Rg Rk Ra0

ParametersR

New system

Time system

April 30, 2010

Abstract

Table 1: Confusion matrix of classification scores (%) of the three voice qualitiesusing the two systems.

Spec TimeBre Neu Pre Bre Neu Pre

Bre 79 20 1 76 22 2Neu 32 47 21 42 43 15Pre 6 24 70 8 28 64

Higher classification scores.

Some thoughts

New method may overcome some of the issues which have hamperedautomated glottal source analysis.

Produced vowels vs running speech.

Criteria to be defined to maximise the probablility of robustparameter extraction.

Extension of islands of reliability (Mokhtari & Campbell 2002)

Some thoughts

Future work

Applying new method to analysis of glottal source dynamics with Dr.Yanushevskaya.

Further work with HMM based classification of voice qualities withMark Kane.

Possible collaboration with Catharine Oertel and Prof. Campbell inanalysis of voice quality from naturalistic speech recordings.

Open to other collaborations!

Future work

analysing voice quality - trinity college dublin€¦ · analysing voice quality john kane april...

Documents