clap detection and discrimination for rhythm therapydpwe/talks/claps-2005-03.pdf · 2005. 3....

17
Clap Detection - Lesser & Ellis 2005-03-22 p. /14 1 1. “Rhythm Therapy” 2. Clap Range Estimation 3. Experiments 4. Conclusions Clap Detection and Discrimination for Rhythm Therapy Nathan Lesser & Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA {nathan,dpwe}@ee.columbia.edu

Upload: others

Post on 26-Jul-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /141

1. “Rhythm Therapy”2. Clap Range Estimation3. Experiments4. Conclusions

Clap Detection and Discrimination for Rhythm Therapy

Nathan Lesser & Dan EllisLaboratory for Recognition and Organization of Speech and Audio

Dept. Electrical Engineering, Columbia University, NY USA

{nathan,dpwe}@ee.columbia.edu

Page 2: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /142

1. “Rhythm Therapy”• Rhythmic clapping may

help neural developmentsensori-motor planningfocus and attention

• “Interactive metronome”devicesgive feedback on synchronysensor-based

• Classroom deployment?acoustic-based?for multiple simultaneous users??

from interactivemetronome.com

Page 3: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /143

Clap Discrimination

• Scenario: Many students in same classroomeach clapping in time to their own laptopstudents wear headphones (but no sensor)computer hears neighbors

• Goal: Discriminate between ‘near-field’and ‘far-field’ claps‘near-field’ = ~1 meter, on-axis‘far-field’ = > 2 meters, maybe off-axis

Page 4: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /144

Data Collection

• Record isolated claps at various locationscan superimpose them later...

• Grid of seats:claps from locations 0..9record at locations 5 & 9 only

• Multiple roomspilot: 1 room, 2 x 5 claps/locationmain data: 2 (+2) rooms, 1 x 50 farfield claps/location + 300 nearfield claps/rec.loc. = 1500 claps/room

0

1 2 3

4 5 6

7 8 9

5

9

Classroom Plan View

Clappinglocations

Recordinglocations

Front of Room

Page 5: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /145

2. Clap Range Estimation

• Task:Discriminate claps from in front of rig from all others (more distant)main perceptual cue to distance (range):direct-to-reverberant ratio (DRR)how to differentiate direct and reverb?

• Novel problem: Acoustic range estimationdefine correlates of DRRexploit properties of claps (wideband, compact).. then just feed to classifier

Page 6: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /146

Clap Examples

• Absolute level varies

• Decay slopes ~ samereverberation(RT60 ~ 900ms)

• Initial burst for near-field“direct sound”

-0.5

-0.25

0

0.25

0.5Near-field (327MUDD nf50:4)

am

plit

ud

e

Far-field (327MUDD ff50:4)

-80

-60

-40

-20

0

en

erg

y (

4m

s)

/ d

B

time / s time / s

fre

q /

kH

x

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

2

4

6

8

10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Page 7: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /147

Processing

• Detection → Features → Classifier

!"#$#%& '()*++*, !

!"#$%&'&(&

)*#+&,%"%-"./0

-.+*/01.*,230

*45678

9:,*+:;5<=.2

12/0"&304

>=5/*,0?6.@0

A=.<;B=.2

&CDB=.<;B+5#6&784./

!"#$%&''(&

)#"%$/2.9#"./0

:%"&1%#"82%;

E5;7*0

4*./*,0;F0G6++

<2#.0.0$=<%;"

H)E4

IF*6/J,*0K*L/;,M0C0N;<*5

9*+/

;,0

9,6=.=.2

9,6=.=.2

9*+/

45670O.<=L*+

<2#.0.0$&>#?%*;

G;<*5

5%;8*";4,;++04;,,*56/=;.

Page 8: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /148

Clap Detection

• Simple transient detector limits feature calculation to ‘clap events’

• Adjust thresholdon Δ(Energy20ms) to get desired number of clapsknown for our data

• Backup from maxima to find precise onsetFielded system will need to adapt thresholdand reject non-claps

time / s

freq / k

Hz

Far-field (327MUDD ff50:1-5)

0

2

4

6

8

10

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

0

20

40

ratio / d

B

Page 9: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /149

Range Features

• Paper: Ctr. of Mass, Slope in 0..20 , 0..100ms

•• New: Slope in 0..20ms , 20..100ms

+ Energy Ratio 0..20ms / 20..100ms

-60

-40

-20

Near-field (327MUDD nf50:4)

dB

0 0.05 0.1 0.15

CoM20ms

-80

-60

-40

Far-field (327MUDD ff50:4)

0 0.05 0.1 0.15time / s

CoM100ms

slope20ms

slope100ms

Near-field (327MUDD nf50:4)

0 0.05 0.1 0.15

-60

-40

-20

energy ratio

dB

Far-field (327MUDD ff50:4)

0 0.05 0.1 0.15

-80

-60

-40

time / s

slope20:100ms

Page 10: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1410

Range Feature Behavior

• Original 4 featuresgood separation except CoM20

• New featuresEratio excellentslope20:100 useless...

• Range estimation?CoM20, slope20 show promise

(each plot shows 4-8 kHz band vs. 2-4 kHz band)

0 2 4 6 80

2

4

6CoM20

0 0.5 1 1.5 20

1

2

3CoM100

-40 -20 0 20 40-40

-20

0

20slope20

-40 -30 -20 -10 0-60

-40

-20

0

20slope100

0 5 10 15 20 25-10

0

10

20

30

Eratio

4-8kHz band

2-4

kH

z b

an

d

-30 -20 -10-30

-20

-10

0

10

slope20:100

327MUDD loc 5

NF p6p4

p8 p9p7

p2

p0

p3p1

Page 11: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1411

3. Experiments

• Build and test actual near/far-field classifier

• Feature experimentsquantitative feature comparisonbest combinations

• Data experimentstraining data: amount, locationstest data: same/different room/location

• Regularized Least-Squares Classifier (RLSC)find a hyperplane in (expanded) feature space~ simplified Support Vector Machine - no QP

Page 12: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1412

Feature Comparisons

• Train on room 327Mudd; Test on 627Mudd

• Eratio alone (9/1500 = 0.6% errors) beatsbest combination of rest: (CoM20+ CoM100+ slo20 = 0.9% errors)

difference of ~0.5% required for signficance

CoM_20 CoM_100 slo_20 slo_100 slo_20:100 Eratio0

10

20

30

40

50Feature comparison: All 3 bands, train on all M327, test on all M627

cla

p e

rror

rate

/ %

feature set

Page 13: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1413

Generalizing Location, Room

• Matrix of 2 rooms x 2 recording locations

627Mudd loc5 is hard data; 327Mudd loc9 is easy!Cross-room (shaded) cases generalize better !?Plenty of data: 5 claps/loc (20%) just as good

CER%Test

M627L5 M627L9 M327L5 M327L9

Train

M627L5 2.0 0.5 0.4 0.0

M627L9 3.7 0.4 0.7 0.0

M327L5 1.5 0.5 0.4 0.0

M327L9 0.1 0.7 0.4 0.0

Page 14: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1414

4. Conclusions

• Discriminating isolated near- and far-field claps is feasible (use Eratio 0..20/20..100ms)

• Detection of candidate claps likely to limit accuracy in practicebut have ‘rhythmic’ expectations...

• Applicability to general range estimation?Eratio relies on short-duration direct-sound..but other sounds have clicks (e.g. speech bursts)CoM20, slope20 closer to proportional to range

Page 15: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1415

Azimuth Features

• Cross-correlation of L and R for azimuth:

nearby locations distinguished - usefuldistant locations (p2) give random resultsneeds nonlinear feature space expansion!

-1.5 -1 -0.5 0 0.5 1 1.5-1.5

-1

-0.5

ITD scatter vs. source (for MUDD327 pos 5)

4-8kHz

2-4kHz

NF p6p4

p8 p9p7

p2

p0

p3p1

Page 16: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1416

Error Analysis

• 627Mudd (record loc 5) is the tough set; look at classifier margins:

50 51 52 53 54 56 57 58 59 5n 5n 5n 5n 5n 5n 90 91 92 93 94 95 96 97 98 9n 9n 9n 9n 9n-2

-1

0

1

2

m4[14:16] vs f627

50 51 52 53 54 56 57 58 59 5n 5n 5n 5n 5n 5n 90 91 92 93 94 95 96 97 98 9n 9n 9n 9n 9n-2

-1

0

1

2

m6[14:16] vs f627

0 5 10 15 20 25 30 35 40 45 50-2

-1

0

1

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

-1

-0.5

0

0.5

1

Claps 33 and 34 from 627M:nf90

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-40

-20

0

20

Time

Fre

qu

en

cy

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

0.5

1

1.5

2

x 104

false accepts for loc 6

a few solidfalse rejects...

... really look like far-field???(ambiguous

Eratio)

01 2 34 5 67 8 9

Page 17: Clap Detection and Discrimination for Rhythm Therapydpwe/talks/claps-2005-03.pdf · 2005. 3. 26. · Clap Detection - Lesser & Ellis 2005-03-22 p. /141 1. “Rhythm Therapy” 2

Clap Detection - Lesser & Ellis 2005-03-22 p. /1417

Usefulness of Each Position

• Train on 50 near-field claps + 50 far-field claps from a single location:

all recorded at location 5‘behind’ (p7-p9) less usefulright-side (p3, p6) most useful !?

p0p1 p2 p3p4 p5 p6p7 p8 p9

p0 p1 p2 p3 p4 p6 p7 p8 p9 all0

1

2

3

4

5

6

7Location comparison (Erat ftrs): train M327L5 one loc, test on all M627L5

cla

p e

rror

rate

/ %

Far field training examples location