computing the fundamental frequency variation …heldner/posters/laskowskiacoustics2008.slide… ·...

Post on 18-Mar-2018

225 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction SC/¬SC Prediction Experiments Conclusions

Computing the Fundamental Frequency

Variation Spectrum in ConversationalSpoken Dialogue Systems

Kornel Laskowskia,b,Matthias Wolfelb, Mattias Heldnerc & Jens Edlundc

aCMU, Pittsburgh PA, USAbUKA(TH), Karlsruhe, Germany

cKTH, Stockholm, Sweden

2 July, 2008

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Fundamental Frequency (F0) Variation (FFV)

how does F0 vary in time?

FFV: ongoing work, building on ICASSP 2008 and SpeechProsody 2008

OUR ULTIMATE GOAL: ability to automatically learnprosodic sequences characterizing various phenomena

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Canonical Measurement of F0 Variation

1 estimate frame-level autocorrelation

2 find local maxima

3 identify best maximum via dynamic programming acrossmultiple frames

4 median filter maxima across multiple frames

5 syllabify speech via ASR or landmark detection

6 fit linear model acros multiple frames in same syllable

7 estimate speaker’s baseline pitch across multiple frames

8 normalize out baseline

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Wish List

Would like

a representation which is:

continuous: not undefined in unvoiced regionsinstantaneous: no long-distance constraintsdistributed: vector-valued rather than scalar-valuedsparse: minimally redundant

and which:

exhibits speaker-independence: no normalization necessaryenjoys perceptual relevance: variation in octaves per timelends itself to a wealth of ASR HMM modeling techniques

FFV appears to satisfy all these constraints/requirements

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Applications in Speech Technology

identification of places to use back-channel feedback

classification of rhetorical relations

interpretation of discourse markers

dialogue act tagging

identification of speech repairs

here, prediction of speaker change in conversational spokendialogue systems

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Applications in Speech Technology

identification of places to use back-channel feedback

classification of rhetorical relations

interpretation of discourse markers

dialogue act tagging

identification of speech repairs

here, prediction of speaker change in conversational spokendialogue systems

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Applications in Speech Technology

identification of places to use back-channel feedback

classification of rhetorical relations

interpretation of discourse markers

dialogue act tagging

identification of speech repairs

here, prediction of speaker change in conversational spokendialogue systems

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Applications in Speech Technology

identification of places to use back-channel feedback

classification of rhetorical relations

interpretation of discourse markers

dialogue act tagging

identification of speech repairs

here, prediction of speaker change in conversational spokendialogue systems

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Applications in Speech Technology

identification of places to use back-channel feedback

classification of rhetorical relations

interpretation of discourse markers

dialogue act tagging

identification of speech repairs

here, prediction of speaker change in conversational spokendialogue systems

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Applications in Speech Technology

identification of places to use back-channel feedback

classification of rhetorical relations

interpretation of discourse markers

dialogue act tagging

identification of speech repairs

here, prediction of speaker change in conversational spokendialogue systems

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Applications in Speech Technology

identification of places to use back-channel feedback

classification of rhetorical relations

interpretation of discourse markers

dialogue act tagging

identification of speech repairs

here, prediction of speaker change in conversational spokendialogue systems

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Outline

1. Introduction & Motivation

2. Speaker-Change Prediction

3. Windowing Experiments

4. Conclusions

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Speaker-Change Prediction in Dialogue Systems

in other words: is the speaker finished?

study how humans behave, towards humans

learn from what actually happens: no need to label data

500 ms

f

T tg ,V

g

T tg ,N

t

T tf ,N

Lt =

{

SC if T tf ,N

− T tg ,N < 0

¬SC, otherwise(1)

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Assessing Performance

FALSE POSITIVE RATE

TR

UE

PO

SIT

IVE

RA

TE

receiver operating characteristic (ROC)curves: true vs false positive rate

performance of random guessing: line ofno discrimination

discrimination: area A below the ROCcurve, 0≤A≤1

in this work: area A between the ROCcurve and the line of no discrimination,0≤A≤1

2

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Assessing Performance

FALSE POSITIVE RATE

TR

UE

PO

SIT

IVE

RA

TE

receiver operating characteristic (ROC)curves: true vs false positive rate

performance of random guessing: line ofno discrimination

discrimination: area A below the ROCcurve, 0≤A≤1

in this work: area A between the ROCcurve and the line of no discrimination,0≤A≤1

2

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Assessing Performance

FALSE POSITIVE RATE

TR

UE

PO

SIT

IVE

RA

TE

receiver operating characteristic (ROC)curves: true vs false positive rate

performance of random guessing: line ofno discrimination

discrimination: area A below the ROCcurve, 0≤A≤1

in this work: area A between the ROCcurve and the line of no discrimination,0≤A≤1

2

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Assessing Performance

FALSE POSITIVE RATE

TR

UE

PO

SIT

IVE

RA

TE

receiver operating characteristic (ROC)curves: true vs false positive rate

performance of random guessing: line ofno discrimination

discrimination: area A below the ROCcurve, 0≤A≤1

in this work: area A between the ROCcurve and the line of no discrimination,0≤A≤1

2

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Data

interactive human-human dialogues

Swedish Map Task Corpus:

Duration Dialogue role gData Set

(mn:ss) speakers # EOTs # SCs

DevSet 77:40 F4,F5,M2,M3 480 222EvalSet 60:39 F1,F2,F3,M1 317 149

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

System Architecture

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

FILTERBANK

F0 VARIATION

WINDOWING

LIKELIHOOD

CLASSIFICATION

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 2: Windowing (& FFT Computation)

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

FILTERBANK

F0 VARIATION

LIKELIHOOD

CLASSIFICATION

WINDOWING

1 Spectral estimation over left and rightportions of analysis frame.

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 2: Windowing (& FFT Computation)

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

FILTERBANK

F0 VARIATION

LIKELIHOOD

CLASSIFICATION

WINDOWING

1 Spectral estimation over left and rightportions of analysis frame.

−0.016 −0.004−0.002 0 0.002 0.004 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 3: F0 Variation (FFV) Computation

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

FILTERBANK

WINDOWING

LIKELIHOOD

CLASSIFICATION

F0 VARIATION

1 Dilate left FFT, dot product with rightFFT; & vice versa. (ICASSP’2008)

2 Maximum over resulting spectrumrepresents change in octaves per second.

−2 −1 0 +1 +20

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 3: F0 Variation (FFV) Computation

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

FILTERBANK

WINDOWING

LIKELIHOOD

CLASSIFICATION

F0 VARIATION

1 Dilate left FFT, dot product with rightFFT; & vice versa. (ICASSP’2008)

2 Maximum over resulting spectrumrepresents change in octaves per second.

−2 −1 0 +1 +20

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 4: Application of Filterbank

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

WINDOWING

LIKELIHOOD

CLASSIFICATION

F0 VARIATION

FILTERBANK

1 Compress spectral representation to7-element vector. (SpeechProsody’2008)

−5.4 −3.4 2.4 −1.0 +1.0 2.4 +3.4 +5.40

0.05

0.1

0.15

0.2

0.25

0.3FBold and FBnewFBnew only

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 6: Modeling

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

WINDOWING

F0 VARIATION

FILTERBANK

LIKELIHOOD

CLASSIFICATION

1 For each class (SC/¬SC), train 10HMMs.

2 Maximum likelihood classification.

3 → 100 candidate dividing hyperplanes.

4 Compute the mean/min/maxdiscrimination over these 100.

5 Compute the single hyperplane (“prod”)between 2 class products of 10 modelseach.

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 6: Modeling

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

WINDOWING

F0 VARIATION

FILTERBANK

LIKELIHOOD

CLASSIFICATION

1 For each class (SC/¬SC), train 10HMMs.

2 Maximum likelihood classification.

3 → 100 candidate dividing hyperplanes.

4 Compute the mean/min/maxdiscrimination over these 100.

5 Compute the single hyperplane (“prod”)between 2 class products of 10 modelseach.

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 6: Modeling

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

WINDOWING

F0 VARIATION

FILTERBANK

LIKELIHOOD

CLASSIFICATION

1 For each class (SC/¬SC), train 10HMMs.

2 Maximum likelihood classification.

3 → 100 candidate dividing hyperplanes.

4 Compute the mean/min/maxdiscrimination over these 100.

5 Compute the single hyperplane (“prod”)between 2 class products of 10 modelseach.

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 6: Modeling

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

WINDOWING

F0 VARIATION

FILTERBANK

LIKELIHOOD

CLASSIFICATION

1 For each class (SC/¬SC), train 10HMMs.

2 Maximum likelihood classification.

3 → 100 candidate dividing hyperplanes.

4 Compute the mean/min/maxdiscrimination over these 100.

5 Compute the single hyperplane (“prod”)between 2 class products of 10 modelseach.

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Step 6: Modeling

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

WINDOWING

F0 VARIATION

FILTERBANK

LIKELIHOOD

CLASSIFICATION

1 For each class (SC/¬SC), train 10HMMs.

2 Maximum likelihood classification.

3 → 100 candidate dividing hyperplanes.

4 Compute the mean/min/maxdiscrimination over these 100.

5 Compute the single hyperplane (“prod”)between 2 class products of 10 modelseach.

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Focus of This Work

5.

7.

4.

3.

2.

1.

0.

6.

SAD

AUDIO CAPTURE

PREEMPHASIS

WHITENING

FILTERBANK

F0 VARIATION

LIKELIHOOD

CLASSIFICATION

WINDOWING

In this work, investigate sensitivity ofspeaker-change prediction performanceon windowing policy

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Two Experiments

Observation: Baseline asymmetric windows are known to havepoor frequency resolution.

1 Keep window separation fixed; increase overlap to symmetrize.

2 Keep overlap fixed; increase window separation to symmetrize.

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1

keep window maxima a constant tsep apart

less asymmetry ↔ more window support overlap

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1

keep window maxima a constant tsep apart

less asymmetry ↔ more window support overlap

−0.016 −0.004−0.002 0 0.002 0.004 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1

keep window maxima a constant tsep apart

less asymmetry ↔ more window support overlap

−0.016 −0.004 0 0.004 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1

keep window maxima a constant tsep apart

less asymmetry ↔ more window support overlap

−0.016 −0.006−0.004 0 0.004 0.006 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1

keep window maxima a constant tsep apart

less asymmetry ↔ more window support overlap

−0.016 −0.008 −0.004 0 0.004 0.008 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1

keep window maxima a constant tsep apart

less asymmetry ↔ more window support overlap

−0.016 −0.01 −0.004 0 0.004 0.01 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1: Results

devSet

baseline −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

evalSet

baseline −−> symmetric0

5

10

15

20

25

WINDOW SHAPER

OC

dis

crim

inat

ion

abov

e 50

%

meanprod

symmetric windows appear to lead to:

lower ROC discrimination than baseline, in all cases

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1: Results

devSet

baseline −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

evalSet

baseline −−> symmetric0

5

10

15

20

25

WINDOW SHAPER

OC

dis

crim

inat

ion

abov

e 50

%

meanprod

symmetric windows appear to lead to:

lower ROC discrimination than baseline, in all cases

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 1: Results

devSet

baseline −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

evalSet

baseline −−> symmetric0

5

10

15

20

25

WINDOW SHAPER

OC

dis

crim

inat

ion

abov

e 50

%

meanprod

symmetric windows appear to lead to:

lower ROC discrimination than baseline, in all cases

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2

keep window support overlap constant

less asymmetry ↔ window maxima further apart

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2

keep window support overlap constant

less asymmetry ↔ window maxima further apart

−0.016 −0.004 0 0.004 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2

keep window support overlap constant

less asymmetry ↔ window maxima further apart

−0.016 −0.005−0.004 0 0.0040.005 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2

keep window support overlap constant

less asymmetry ↔ window maxima further apart

−0.016 −0.006−0.004 0 0.004 0.006 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2

keep window support overlap constant

less asymmetry ↔ window maxima further apart

−0.016 −0.007 −0.004 0 0.004 0.007 0.0160

0.2

0.4

0.6

0.8

1

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2: Results

devSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

evalSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

symmetric windows appear to lead to:

higher ROC discrimination than baseline, in all casessmaller variability between best and worst partitions

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2: Results

devSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

evalSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

symmetric windows appear to lead to:

higher ROC discrimination than baseline, in all casessmaller variability between best and worst partitions

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2: Results

devSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

evalSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

symmetric windows appear to lead to:

higher ROC discrimination than baseline, in all casessmaller variability between best and worst partitions

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Experiment 2: Results

devSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

evalSet

baseline −−> more sym −−> symmetric0

5

10

15

20

25

WINDOW SHAPE

RO

C d

iscr

imin

atio

n ab

ove

50%

meanprod

symmetric windows appear to lead to:

higher ROC discrimination than baseline, in all casessmaller variability between best and worst partitions

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Conclusions

tsep: separation between window maxima

tfra: duration of analysis frame

1 when tsep > 13tfra, symmetric-support windows appear best

2 when tsep < 13tfra, first priority should be to limit overlap in

support to a maximum of tsep at the expense of symmetryif necessary

3 results suggest that better ROC discrimination may bepossible when symmetric-support windows are placed evenfurther apart in time than tried here

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

Introduction SC/¬SC Prediction Experiments Conclusions

Thanks for attending.

(kornel@cs.cmu.edu)

K. Laskowski, M. Wolfel, M. Heldner, J. Edlund Acoustics 2008, Paris, France

top related