hands on: multimedia methods for large scale video ...fractor/fall2012/cs294-7-2012.pdf · 4 power...
TRANSCRIPT
![Page 1: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/1.jpg)
Hands On: Multimedia Methods for Large Scale Video Analysis (Lecture)
Dr. Gerald Friedland, [email protected]
1
![Page 2: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/2.jpg)
Today
2
•More on Audio Features•Recap: Some Basic Machine Learning
•Some Error Metrics
![Page 3: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/3.jpg)
More on Features
3
• Mel-Frequency-Scaled Coefficients (MFCC)
Other (not explained here):• LPC (Linear Prediction Coefficients)• PLP (Perceptual Linear Predictive) Features• RASTA (see Morgan et al)• MSG (Modulation Spectrogram)
![Page 4: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/4.jpg)
MFCC: Idea
4
power cepstrum of signal
Pre-emphasis
Windowing
FFT
Mel-Scale
Filterbank
Log-Scale
DCT
Audio Signal
MFCC
![Page 5: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/5.jpg)
MFCC: Mel Scale
5
![Page 6: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/6.jpg)
MFCC: Result
6
![Page 7: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/7.jpg)
MFCC Variants and Derivates
7
Derivates: •LFCC (no Mel scale)•AMFCC (anti Mel scale)
Parameters: •MFCC12 (often used for ASR)•MFCC19 (often used in speaker id, diarization)•“delta”: coefficients subtracted (“first derivative”)•“deltadelta”: “second derivative”•Short term: Usually calculated on 10-50ms window
![Page 8: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/8.jpg)
Typical Machine Learning for Audio Analysis
8
![Page 9: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/9.jpg)
Typical Machine Learning for Audio Analysis
8
Today:
![Page 10: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/10.jpg)
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models
![Page 11: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/11.jpg)
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
![Page 12: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/12.jpg)
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:
![Page 13: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/13.jpg)
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:•HMMs/FSAs
![Page 14: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/14.jpg)
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:•HMMs/FSAs
@home:
![Page 15: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/15.jpg)
Typical Machine Learning for Audio Analysis
8
Today:•Gaussian Mixture Models•Bayesian Information Criterion
Later:•HMMs/FSAs
@home:•Supervector Approaches
![Page 16: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/16.jpg)
Recap: Architecture of Content Analysis Algorithms
9
![Page 17: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/17.jpg)
The Data...
10
![Page 18: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/18.jpg)
The Data...
10
• ...should be plenty (there is no data than more data).
![Page 19: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/19.jpg)
The Data...
10
• ...should be plenty (there is no data than more data).
• Training set and test set must be different
![Page 20: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/20.jpg)
The Data...
10
• ...should be plenty (there is no data than more data).
• Training set and test set must be different
• Training should consists of a representative sample for good results
![Page 21: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/21.jpg)
The Data...
10
• ...should be plenty (there is no data than more data).
• Training set and test set must be different
• Training should consists of a representative sample for good results
• If there is not enough data, significance must be tested
![Page 22: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/22.jpg)
Test/train data mismatch that will detoriate accuracy
11
![Page 23: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/23.jpg)
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch
![Page 24: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/24.jpg)
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch•Domain mismatch
![Page 25: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/25.jpg)
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch•Domain mismatch•Unseen test data
![Page 26: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/26.jpg)
Test/train data mismatch that will detoriate accuracy
11
•Channel mismatch•Domain mismatch•Unseen test data•Too many parameters in
training model (overfitting)
![Page 27: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/27.jpg)
Type of Algorithms
12
![Page 28: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/28.jpg)
Type of Algorithms
12
•Classification/Identification
![Page 29: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/29.jpg)
Type of Algorithms
12
•Classification/Identification•Verification/Detection
![Page 30: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/30.jpg)
Type of Algorithms
12
•Classification/Identification•Verification/Detection•Estimation/Regression
![Page 31: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/31.jpg)
Ground Truth
13
![Page 32: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/32.jpg)
Ground Truth
13
• Is never 100% accurate.
![Page 33: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/33.jpg)
Ground Truth
13
• Is never 100% accurate.•Annotator agreement should be
measured for high accuracy tasks, low confidence annotators
![Page 34: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/34.jpg)
Reminder: K-Means
14
Choose k initial means µi at randomloop for all samples xj: assign membership of each element to a mean (closest mean) for all means µi calculate a new µi by averaging all values xj that were assigned membersuntil means µi are not updated significantly anymore
Algorithm Outline (Expectation Maximization)
![Page 35: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/35.jpg)
Reminder: Gaussian Mixtures
15
![Page 36: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/36.jpg)
Reminder: Training of Mixture Models
16
Goal: Find ai for
Expectation:
Maximization:
![Page 37: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/37.jpg)
Magic Duo
17
~ 90% of audio papers use the combination of MFCCs and Gaussian Mixture Models to model audio signals!
![Page 38: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/38.jpg)
Bayesian Information Criterion = “Acoustic Edge Detector”
18
BIC =where X is the sequence of features for a segment, Θ are the parameters of the statistical model for the segment, K is the number of parameters for the model, N is the number of frames in the segment,λ is an optimization parameter.
![Page 39: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/39.jpg)
Bayesian Information Criterion: Explanation
19
![Page 40: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/40.jpg)
Bayesian Information Criterion: Explanation
19
• BIC penalizes the complexity of the model (as of number of parameters in model).
![Page 41: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/41.jpg)
Bayesian Information Criterion: Explanation
19
• BIC penalizes the complexity of the model (as of number of parameters in model).
• BIC measures the efficiency of the parameterized model in terms of predicting the data.
![Page 42: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/42.jpg)
Bayesian Information Criterion: Properties
20
![Page 43: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/43.jpg)
Bayesian Information Criterion: Properties
20
• BIC is a minimum description length criterion.
![Page 44: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/44.jpg)
Bayesian Information Criterion: Properties
20
• BIC is a minimum description length criterion.
• BIC is independent of the prior.
![Page 45: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/45.jpg)
Bayesian Information Criterion: Properties
20
• BIC is a minimum description length criterion.
• BIC is independent of the prior.• It is closely related to other penalized
likelihood criteria such as RIC and the Akaike information criterion.
![Page 46: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/46.jpg)
Some Error Metrics
21
![Page 47: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/47.jpg)
Some Error Metrics
21
•Classification error
![Page 48: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/48.jpg)
Some Error Metrics
21
•Classification error •The types of errors
![Page 49: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/49.jpg)
Some Error Metrics
21
•Classification error •The types of errors•ROC/DET Curve
![Page 50: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/50.jpg)
Some Error Metrics
21
•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure
![Page 51: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/51.jpg)
Some Error Metrics
21
•Classification error •The types of errors•ROC/DET Curve•Precision/Recall, F-Measure•Word Error Rate
![Page 52: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/52.jpg)
Classification Error
22
error =wrongclassificationstotalclassifications
![Page 53: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/53.jpg)
Classification Error
22
error =wrongclassificationstotalclassifications
•Usually expressed in %
![Page 54: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/54.jpg)
Classification Error
22
error =wrongclassificationstotalclassifications
•Usually expressed in %•Most simple and most popular
metric
![Page 55: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/55.jpg)
Types of Errors
23
![Page 56: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/56.jpg)
ROC Curve
24
True Positive Rate (TPR) = TP / P = TP / (TP + FN)
False Positive Rate (FPR) = FP / N = FP / (FP + TN)
Receiver-Operator Characteristics:
vs
![Page 57: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/57.jpg)
ROC Curve
24
• Invented in the 1940s (radar detection accuracy)
True Positive Rate (TPR) = TP / P = TP / (TP + FN)
False Positive Rate (FPR) = FP / N = FP / (FP + TN)
Receiver-Operator Characteristics:
vs
![Page 58: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/58.jpg)
ROC Curve
24
• Invented in the 1940s (radar detection accuracy)
•Said to have become very popular after Pearl Harbor incident
True Positive Rate (TPR) = TP / P = TP / (TP + FN)
False Positive Rate (FPR) = FP / N = FP / (FP + TN)
Receiver-Operator Characteristics:
vs
![Page 59: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/59.jpg)
ROC Curve
25
![Page 60: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/60.jpg)
DET Curve
26
![Page 61: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/61.jpg)
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
![Page 62: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/62.jpg)
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
•Very useful for detection tasks (threshold tuning)
![Page 63: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/63.jpg)
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
•Very useful for detection tasks (threshold tuning)
•Very popular in retrieval community
![Page 64: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/64.jpg)
DET Curve
26
•Detection-Error Tradeoff: Miss (=FN) vs. False Alarm (=FP), non-linearly scaled
•Very useful for detection tasks (threshold tuning)
•Very popular in retrieval community
•Equal Error Rate: Point at FN=FP
![Page 65: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/65.jpg)
DET Curve
27
![Page 66: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/66.jpg)
Precision/Recall
28
![Page 67: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/67.jpg)
Precision/Recall
28
•Precision = True Positive Rate
![Page 68: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/68.jpg)
Precision/Recall
28
•Precision = True Positive Rate•Became popular because of
![Page 69: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/69.jpg)
F-Measure
29
![Page 70: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/70.jpg)
F-Measure
29
•Two numbers are hard to compare => F-Measure
![Page 71: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/71.jpg)
F-Measure
29
•Two numbers are hard to compare => F-Measure
•Harmonic Mean of Precision and Recall
![Page 72: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/72.jpg)
F-Measure
29
•Two numbers are hard to compare => F-Measure
•Harmonic Mean of Precision and Recall
•Highly debated
![Page 73: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/73.jpg)
Word Error Rate
30
where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.
![Page 74: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/74.jpg)
Word Error Rate
30
Metric for comparing speech recognizers:
where:• S is the number of substitutions,• D is the number of the deletions,• I is the number of the insertions,• N is the number of words in the reference.
![Page 75: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/75.jpg)
Next Week (Project Meeting)
31
•SeJITs•Project Idea Sketches (from groups)
![Page 76: Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-7-2012.pdf · 4 power cepstrum of signal Pre-emphasis Windowing FFT Mel-Scale Filterbank Log-Scale DCT](https://reader035.vdocuments.site/reader035/viewer/2022081614/5fca48a4cae2a7533069a22e/html5/thumbnails/76.jpg)
Next Week (Lecture)
32
•Visual Content Analysis