sound detection derek hoiem rahul sukthankar (mentor) august 24, 2004
TRANSCRIPT
Objective
Learn model of sound object from few (10-20) examples and distinguish from all other sounds
Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog
bark, etc
Applications
“Tell me if you hear a gunshot.” (monitoring)
“Get me video clips containing dogs barking.” (search and retrieval)
“What’s going on?” (scene understanding)
Why its difficult
Sound classes have large variations
Sounds are often ambiguous without context
Overlaid “noise” obscures sound
Previous work
Sound Classification (Wold 1996, Casey 2001, etc) Categorize short sound clips Reasonable accuracy (5-20% error)
Sound Detection (Defaux 2000, Piamsa-nga 1999) Localize and recognize sound objects in long clips Poor performance or assumption of unrealistic
conditions (e.g., very quiet background)
Detection via Windowed Search
Long Track
…
Clip 1
Clip 2
Clip N
Break audio track into short overlapping short clips
Clip Classifier
Independently classify short clips as object or non-object
Return locations of detected sound object
Representation
meows
phone rings
Raw RepresentationTime-frequency analysis: windowed Fourier transform
Extract power percentage in each band over time and total power over time
Features
Features
Features
Features
Compute features used for classification
Classification Features
Diverse feature set:Different sound classes are distinctive
in different waysmeans and standard deviations of
power at different frequenciesBand-width, peaks, loudness, etc.138 features in all
Classification by Decision Trees Try to find simple rules that discriminate object
from non-object Each decision is based on a threshold of a
feature value Assign confidence based on likelihood of data
for object and non-object classes at each leaf node
Decision nodes
Leaf Nodes
Boosted Trees
Problem: One decision tree by itself may not be a great classifier
Solution: Use several trees, with each one focusing on the mistakes of previously learned trees
Adaboost: Weight training data uniformly Learn a decision tree classifier on weighted data Re-weight data giving more weight to incorrectly
classified examples Final classification based on linear combination of
confidences from all learned decision trees
Examples of Decision Trees
Low percentage of power in low frequencies in
mid-time of sound
Very high power amplitude range
Meow Gunshot
High power amplitude range
More complex tree that
focuses on examples
misclassified by tree above
Gunshot
Cascade of Classifiers
Goal: eliminate false positives with few false negatives in early stages
Advantages: Allows use of large set of negative training examples Improves classification speed
Dangers: cannot recover from false negatives
Stage 1Sound Clip Stage 2 Stage 3 Pass
Fail
Pass (5%) Pass (2%) Pass (0.005%)
Fail Fail Fail
Results: Classification Error
Average Error vs Stages in Cascade
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
9.0%
10.0%
stage 1 stage 2 stage 3
pos error
neg error
Best Performance
WorstPerformance
stage 1 stage 2 stages 3
pos neg pos neg pos neg
meow 0.0% 1.4% 0.0% 1.2% 2.2% 0.8%
phone 0.0% 0.4% 4.3% 0.1% 5.9% 0.0%
car horn 0.0% 3.9% 0.6% 2.2% 3.6% 1.3%
door bell 1.4% 2.1% 2.1% 0.4% 6.3% 0.1%
swords 6.1% 1.3% 6.7% 0.1% 6.7% 0.0%
scream 0.3% 5.5% 2.7% 1.4% 5.3% 1.1%
dog bark 0.7% 1.0% 6.0% 0.3% 7.7% 0.2%
laser gun 0.0% 6.8% 4.4% 5.1% 6.7% 0.9%
explosion 4.1% 5.2% 7.5% 1.5% 12.0% 0.5%
light saber 4.8% 6.8% 9.7% 1.0% 13.9% 0.2%
gunshot 8.1% 6.1% 12.5% 2.3% 14.5% 1.1%
close door 7.9% 7.8% 14.5% 4.8% 17.6% 2.3%
male laugh 4.3% 14.7% 9.5% 9.7% 13.3% 7.0%
average 2.9% 4.4% 6.0% 2.2% 8.5% 1.1%