project 2q&a( - stanford...
TRANSCRIPT
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Q&A
Alexandre Alahi Vignesh Ramanathan
1!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Outline
2!
• TLD Review • Error metrics • Code Overview • Project 2 Report • Project 2 PresentaCons
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! 3!
• TLD Review • Error metrics • Code Overview • Project 2 Report • Project 2 PresentaCons
Outline
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
• Tracker & Detector (T&D) are running in parallel • Both contribute • “Not visible” is a possible output • Updates of T&D depends on Learning module (L)
TLD review
4!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD: Tracking
• Median-‐shiQ tracker: EsCmate translaCon & scale
• Tracker validaCon: Detector is updated If forward-‐backward consistent
5!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD: DetecCon • Three stages:
-‐ 1st stage filtering (patch variance) -‐ 2nd stage: DetecCon model -‐ 3nd stage classifier: NN, NCC confidence = d-‐/(d-‐+d+)
6!
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 6, NO. 1, JANUARY 2010 8
Ensemble classifier 1-NN classifier
Patch
variance
Rejected patches
Accepted
patches
( ,..., )
1
1
2
32
3
Fig. 9. Block diagram of the object detector.
5.3 Object detector
The detector scans the input image by a scanning-window andfor each patch decides about presence or absence of the object.
Scanning-window grid. We generate all possible scales andshifts of an initial bounding box with the following parameters:scales step = 1.2, horizontal step = 10% of width, verticalstep = 10% of height, minimal bounding box size = 20 pixels.This setting produces around 50k bounding boxes for a QVGAimage (240x320), the exact number depends on the aspect ratioof the initial bounding box.
Cascaded classifier. As the number of bounding boxesto be evaluated is large, the classification of every singlepatch has to be very efficient. A straightforward approachof directly evaluating the NN classifier is problematic as itinvolves evaluation of the Relative similarity (i.e. search fortwo nearest neighbours). As illustrated in figure 9, we structurethe classifier into three stages: (i) patch variance, (ii) ensembleclassifier, and (iii) nearest neighbor. Each stage either rejectsthe patch in question or passes it to the next stage. In ourprevious work [59] we used only the first two stages. Lateron, we observed that the performance improves if the thirdstage is added. Templates allowed us to better estimate thereliability of the detection.
5.3.1 Patch variance
Patch variance is the first stage of our cascade. This stagerejects all patches, for which gray-value variance is smallerthan 50% of variance of the patch that was selected fortracking. The stage exploits the fact that gray-value varianceof a patch p can be expressed as E(p2) � E2
(p), and thatthe expected value E(p) can be measured in constant timeusing integral images [35]. This stage typically rejects morethan 50% of non-object patches (e.g. sky, street). The variancethreshold restricts the maximal appearance change of theobject. However, since the parameter is easily interpretable,it can be adjusted by a user for particular application. In allof our experiments we kept it constant.
5.3.2 Ensemble classifier
Ensemble classifier is the second stage of our detector. The in-put to the ensemble is an image patch that was not rejected bythe variance filter. The ensemble consists of n base classifiers.Each base classifier i performs a number of pixel comparisonson the patch resulting in a binary code x, which indexes to anarray of posteriors Pi(y|x), where y 2 {0, 1}. The posteriorsof individual base classifiers are averaged and the ensembleclassifies the patch as the object if the average posterior islarger than 50%.
1
0
0
0
1
1
0
1
1
1
pixel comparisons binary codeinput image + bounding box
blur measure
blurred image
output
Fig. 10. Conversion of a patch to a binary code.
Pixel comparisons. Every base classifier is based on a setof pixel comparisons. Similarly as in [60], [61], [62], the pixelcomparisons are generated offline at random and stay fixed inrun-time. First, the image is convolved with a Gaussian kernelwith standard deviation of 3 pixels to increase the robustnessto shift and image noise. Next, the predefined set of pixelcomparison is stretched to the patch. Each comparison returns0 or 1 and these measurements are concatenated into x.
Generating pixel comparisons. The vital element of en-semble classifiers is the independence of the base classi-fiers [63]. The independence of the classifiers is in our caseenforced by generating different pixel comparisons for eachbase classifier. First, we discretize the space of pixel locationswithin a normalized patch and generate all possible horizontaland vertical pixel comparisons. Next, we permutate the com-parisons and split them into the base classifiers. As a result,every classifier is guaranteed to be based on a different set offeatures and all the features together uniformly cover the entirepatch. This is in contrast to standard approaches [60], [61],[62], where every pixel comparison is generated independentof other pixel comparisons.
Posterior probabilities. Every base classifier i maintains adistribution of posterior probabilities Pi(y|x). The distributionhas 2
d entries, where d is the number of pixel comparisons.We use 13 comparison, which gives 8192 possible codes thatindex to the posterior probability. The probability is estimatedas Pi(y|x) =
#p#p+#n , where #p and #n correspond to
number of positive and negative patches, respectively, thatwere assigned the same binary code.
Initialization and update. In the initialization stage, allbase posterior probabilities are set to zero, i.e. vote for negativeclass. During run-time the ensemble classifier is updated asfollows. The labeled example is classified by the ensembleand if the classification is incorrect, the corresponding #p
and #n are updated which consequently updates Pi(y|x).
5.3.3 Nearest neighbor classifier
After filtering the patches by the variance filter and the ensem-ble classifier, we are typically left with several of boundingboxes that are not decided yet (⇡ 50). Therefore, we can usethe online model and classify the patch using a NN classifier.A patch is classified as the object if S
r(p,M) > ✓NN, where
✓NN = 0.6. This parameter has been set empirically and itsvalue is not critical. We observed that similar performanceis achieved in the range (0.5-0.7). The positively classifiedpatches represent the responses of the object detector. Whenthe number of templates in NN classifier exceeds some thresh-old (given by memory), we use random forgetting of templates.We observed that the number of templates stabilizes around
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Slide credit from D. Capel (h\p://vision.cse.psu.edu/seminars/talks/2009/random_`f/ForestsAndFernsTalk.pdf)
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD: DetecCon • Three stages:
-‐ 1st stage filtering (patch variance) -‐ 2nd stage: DetecCon model -‐ 3nd stage classifier: NN, NCC confidence = d-‐/(d-‐+d+)
8!
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 6, NO. 1, JANUARY 2010 8
Ensemble classifier 1-NN classifier
Patch
variance
Rejected patches
Accepted
patches
( ,..., )
1
1
2
32
3
Fig. 9. Block diagram of the object detector.
5.3 Object detector
The detector scans the input image by a scanning-window andfor each patch decides about presence or absence of the object.
Scanning-window grid. We generate all possible scales andshifts of an initial bounding box with the following parameters:scales step = 1.2, horizontal step = 10% of width, verticalstep = 10% of height, minimal bounding box size = 20 pixels.This setting produces around 50k bounding boxes for a QVGAimage (240x320), the exact number depends on the aspect ratioof the initial bounding box.
Cascaded classifier. As the number of bounding boxesto be evaluated is large, the classification of every singlepatch has to be very efficient. A straightforward approachof directly evaluating the NN classifier is problematic as itinvolves evaluation of the Relative similarity (i.e. search fortwo nearest neighbours). As illustrated in figure 9, we structurethe classifier into three stages: (i) patch variance, (ii) ensembleclassifier, and (iii) nearest neighbor. Each stage either rejectsthe patch in question or passes it to the next stage. In ourprevious work [59] we used only the first two stages. Lateron, we observed that the performance improves if the thirdstage is added. Templates allowed us to better estimate thereliability of the detection.
5.3.1 Patch variance
Patch variance is the first stage of our cascade. This stagerejects all patches, for which gray-value variance is smallerthan 50% of variance of the patch that was selected fortracking. The stage exploits the fact that gray-value varianceof a patch p can be expressed as E(p2) � E2
(p), and thatthe expected value E(p) can be measured in constant timeusing integral images [35]. This stage typically rejects morethan 50% of non-object patches (e.g. sky, street). The variancethreshold restricts the maximal appearance change of theobject. However, since the parameter is easily interpretable,it can be adjusted by a user for particular application. In allof our experiments we kept it constant.
5.3.2 Ensemble classifier
Ensemble classifier is the second stage of our detector. The in-put to the ensemble is an image patch that was not rejected bythe variance filter. The ensemble consists of n base classifiers.Each base classifier i performs a number of pixel comparisonson the patch resulting in a binary code x, which indexes to anarray of posteriors Pi(y|x), where y 2 {0, 1}. The posteriorsof individual base classifiers are averaged and the ensembleclassifies the patch as the object if the average posterior islarger than 50%.
1
0
0
0
1
1
0
1
1
1
pixel comparisons binary codeinput image + bounding box
blur measure
blurred image
output
Fig. 10. Conversion of a patch to a binary code.
Pixel comparisons. Every base classifier is based on a setof pixel comparisons. Similarly as in [60], [61], [62], the pixelcomparisons are generated offline at random and stay fixed inrun-time. First, the image is convolved with a Gaussian kernelwith standard deviation of 3 pixels to increase the robustnessto shift and image noise. Next, the predefined set of pixelcomparison is stretched to the patch. Each comparison returns0 or 1 and these measurements are concatenated into x.
Generating pixel comparisons. The vital element of en-semble classifiers is the independence of the base classi-fiers [63]. The independence of the classifiers is in our caseenforced by generating different pixel comparisons for eachbase classifier. First, we discretize the space of pixel locationswithin a normalized patch and generate all possible horizontaland vertical pixel comparisons. Next, we permutate the com-parisons and split them into the base classifiers. As a result,every classifier is guaranteed to be based on a different set offeatures and all the features together uniformly cover the entirepatch. This is in contrast to standard approaches [60], [61],[62], where every pixel comparison is generated independentof other pixel comparisons.
Posterior probabilities. Every base classifier i maintains adistribution of posterior probabilities Pi(y|x). The distributionhas 2
d entries, where d is the number of pixel comparisons.We use 13 comparison, which gives 8192 possible codes thatindex to the posterior probability. The probability is estimatedas Pi(y|x) =
#p#p+#n , where #p and #n correspond to
number of positive and negative patches, respectively, thatwere assigned the same binary code.
Initialization and update. In the initialization stage, allbase posterior probabilities are set to zero, i.e. vote for negativeclass. During run-time the ensemble classifier is updated asfollows. The labeled example is classified by the ensembleand if the classification is incorrect, the corresponding #p
and #n are updated which consequently updates Pi(y|x).
5.3.3 Nearest neighbor classifier
After filtering the patches by the variance filter and the ensem-ble classifier, we are typically left with several of boundingboxes that are not decided yet (⇡ 50). Therefore, we can usethe online model and classify the patch using a NN classifier.A patch is classified as the object if S
r(p,M) > ✓NN, where
✓NN = 0.6. This parameter has been set empirically and itsvalue is not critical. We observed that similar performanceis achieved in the range (0.5-0.7). The positively classifiedpatches represent the responses of the object detector. Whenthe number of templates in NN classifier exceeds some thresh-old (given by memory), we use random forgetting of templates.We observed that the number of templates stabilizes around
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD: Learning • P-‐constraints: Patches close to trajectory update the detector with PosiCve label • N-‐constraints: Non-‐maximally confident detecCons update the detector with NegaCve
label • Both constraints make errors.
9!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD: Integrator
Tracker" Detector" Integrator"Found box" Found box"No box" Found box"Found box" No box"No box" No box"
You need to implement the output
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD: Learning (init)
• For 1st frame: – Sample 200 P
• For other frames: – Sample 100 P
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD: Learning (model update) • Augment both P & N when :
-‐ the patch is wrongly classified by NN ówhen integrator relies on tracker response
• The NN uses a threshold to determine P & N patches
12!
Integrator" NN" Retain Or discard"P" N" Retain as P"N" P" Retain as N"P" P" Discard"N" N" Discard"
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
TLD QuesCons?
13!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! 14!
• TLD Review • Error metrics • Code Overview • Project 2 Report • Project 2 PresentaCons
Outline
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
DeviaCon from ground-‐truth
15!
Ground-‐truth bounding box
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
DeviaCon from ground-‐truth
16!
Bound box from TLD (confidence)
Conf=0.9 Conf=0.2
Conf=0.7
Ground-‐truth bounding box
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
DeviaCon from ground-‐truth
17!
Bound box from TLD (confidence)
Conf=0.9 Conf=0.2
Conf=0.7
Ground-‐truth bounding box
Compute overlap as (IntersecCon area)/(Union area)
IntersecCon
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
DeviaCon from ground-‐truth
18!
Bound box from TLD (confidence)
Conf=0.9 Conf=0.2
Conf=0.7
Ground-‐truth bounding box
Compute overlap as (IntersecCon area)/(Union area)
Union
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
DeviaCon from ground-‐truth
19!
Bound box from TLD (confidence)
Conf=0.9 Conf=0.2
Conf=0.7
Ground-‐truth bounding box
Compute overlap as (IntersecCon area)/(Union area)
Overlap = 0.7 Overlap = 0.55 Overlap = 0.15
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 1: Average Overlap
20!
overlap between ground-‐truth and tracked bounding box in frame #i
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
DeviaCon from ground-‐truth
21!
Bound box from TLD (confidence)
Conf=0.9 Conf=0.2
Conf=0.7
Ground-‐truth bounding box
Overlap = 0.7 Overlap = 0.55 Overlap = 0.1 5
Average Overlap = 0.467
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Problem with average overlap
22!
Conf=0.9 Conf=0.2
Conf=0.7
Doesn’t account for confidence score from tracking algorithm.
More confident boxes should be weighted higher
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 2: Mean Average Precision
23!
1. Sort frames by confidence of bounding box from TLD algorithm
Conf=0.9 Conf=0.2
Conf=0.7 Overlap = 0.7 Overlap = 0.55 Overlap = 0.1 5
Frame #1 Frame #2 Frame #3
Decreasing confidence
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 2: Mean Average Precision
24!
1. Sort frames by confidence of bounding box from TLD algorithm 2. A bounding box from TLD is said to be tracked correctly if the overlap > 0.5
Conf=0.9 Conf=0.2
Conf=0.7 Overlap = 0.7 Overlap = 0.55 Overlap = 0.1 5
Frame #1 Frame #2 Frame #3
Correct Wrong Correct
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 2: Mean Average Precision
25!
1. Sort frames by confidence of bounding box from TLD algorithm 2. A bounding box from TLD is said to be tracked correctly if the overlap > 0.5 3. Compute precision at different values of recall
Correct Wrong Correct
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 2: Mean Average Precision
26!
1. Sort frames by confidence of bounding box from TLD algorithm 2. A bounding box from TLD is said to be tracked correctly if the overlap > 0.5 3. Compute precision at different values of recall
Correct Wrong Correct
recall = 0.33 precision = 1.0
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 2: Mean Average Precision
27!
1. Sort frames by confidence of bounding box from TLD algorithm 2. A bounding box from TLD is said to be tracked correctly if the overlap > 0.5 3. Compute precision at different values of recall
Correct Wrong Correct
recall = 0.67 precision = 0.67
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 2: Mean Average Precision
28!
1. Sort frames by confidence of bounding box from TLD algorithm 2. A bounding box from TLD is said to be tracked correctly if the overlap > 0.5 3. Compute precision at different values of recall 4. Compute average precision
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Metric 2: Mean Average Precision
29!
1. Sort frames by confidence of bounding box from TLD algorithm 2. A bounding box from TLD is said to be tracked correctly if the overlap > 0.5 3. Compute precision at different values of recall 4. Compute average precision
Correct Wrong Correct
recall = 0.33 precision = 1.0 recall = 0.67
precision = 0.67
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! 30!
• TLD Review • Error metrics • Code Overview • Project 2 Report • Project 2 PresentaCons
Outline
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
What We Provide • TLD_project_starter_codes_release.tar.gz
– Matlab wrapper with various uClity funcCons and display methods for TLD tracking
– Also includes evaluaCon code – Modified from original implementaGon of TLD by Zendek Kalal
• tiny_tracking_data.tar.gz
– 4 validaCon video sequences (sequence of image frames) – 5 test video sequences (sequence of image frames) – iniCalizing bounding box on first frame + ground-‐truth bounding
box in each frame – All videos less than 200 frames
31!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: Param. IniCalizaCon (A modified version of the original Matlab implementaCon from Zendek Kalal)
32!
• run_TLD_on_video.m – Sets up tld parameters, calls tldExample and saves
tracking results to a text file – TODO: Set all the parameters for the TLD algorithm • Minimal window size of object bbox • Patchsize to resize every patch before learning/detecCon
• Parameters specific to your learning algo (such as regularizaCon constant)
• Parameters for selecCng posiCve and negaCve patches for learning
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: Wrapper (A modified version of the original Matlab implementaCon from Zendek Kalal)
33!
• tldExample.m (Nothing to do) – IniCalizes with tldInit – Calls the tldProcessFrame funcCon on every frame – Also saves the output images with tracked bbox to output
directory
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: IniCalizaCon (A modified version of the original Matlab implementaCon from Zendek Kalal)
34!
• tldInit.m – IniCalizes the LK tracker and also chooses posiCve and
negaCve examples from the first frame for iniCalizing the detector and Nearest Neighbor (NN) method
– TODO: IniCalize your detector based on the posiCve and negaCve examples from first frame
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: Process frame (A modified version of the original Matlab implementaCon from Zendek Kalal)
35!
• tldProcessFrame.m – Calls the LK tracker tldTracking.m to track densely
sampled keypoints from bounding box – Calls the trained detector to idenCfy potenCal object boxes
in frame – Integrates detecCon and tracking bounding boxes • TODO: Modify the integrator to improve performance. The provided integrator might not be a good strategy for all video sequences
– Calls tldLearning to update detector and NN model
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: DetecCon (A modified version of the original Matlab implementaCon from Zendek Kalal)
36!
• tldDetection.m – Calls the detector to idenCfy candidate bounding boxes in
the current frame – TODO: Run your detecCon method on provided image
patches • tldNN.m – Runs Nearest Neighbor model on the patches selected by
detector from previous step – TODO: Compute a confidence measure to determine how
confident the NN is about each patch being a bbox • Use Normalized Cross correlaCon
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: Learning (A modified version of the original Matlab implementaCon from Zendek Kalal)
37!
• tldLearning.m – Updates the detecCon model – Calls tdlTrainNN to update the NN model – TODO: Train your detecCon method
• tldTrainNN.m – TODO: Update stored posiCve and negaCve patches
tld.pex and tld.nex based on newly seen posiCve and negaCve patches
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: +ve & -‐ve examples (A modified version of the original Matlab implementaCon from Zendek Kalal)
38!
• tldGeneratePositiveData.m – Called by tldLearning.m and tldInit.m – TODO: Choose posiCve examples from current image based on
overlap of the grid boxes with the tracked box from frame • tldGenerateNegativeData.m
– Called by tldLearning.m and tldInit.m – TODO: Choose negaCve examples from current image based on
overlap of the grid boxes with the tracked box from frame • tldPatch2Pattern.m
– TODO: Compute features from the given patches to be used by learning/detecCon/NN
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Starter code: Other UCls (Nothing todo) (A modified version of the original Matlab implementaCon from Zendek Kalal)
39!
• tldDisplay.m – Plots the tracked bounding box on each image – Shows the points tracked by LK tracker in blue – Shows center points of patches selected by your detector in grey
• tldEvaluate.m – Evaluates tracking by compuCng avg. overlap and avg. precision
• mex/bb_overlap.cpp: Computes overlap between bboxes • mex/lk.cpp: Lucas Kenade tracker • bbox/bb_cluster.m: Clusters bounding boxes • bbox/bb_scan.m: generates a dense grid of bounding boxes in
image
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
What You Need To Do
1. Implement the TODO secCons in code 1. Learning / DetecCon method 2. Features used 3. PosiCve and NegaCve sampling strategy 4. Integrator to combine detecCon and tracking results
2. Measure performance with provided ground-‐truth for all videos (main.m) • Sanity check: Our baseline TLD has average overlap=0.68, average precision=0.78. Should be able to get be\er performance.
40!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
What You Need To Do
Q: Do I have to use the Matlab starter code? A: No! But ask the TAs if you want to use another language. You might have to be careful about the LK tracking implementaCon and integraCon. Q: Do I need to turn in my code? A: Yes. There should be a script we can call that’ll e.g. run your method on an image without any/much modificaCon.
41!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! 42!
• TLD Review • Error metrics • Code Overview • Project 2 Report • Project 2 PresentaCons
Outline
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Report
• Write-‐up template provided on website (link) • Use CVPR LaTeX template • No more than 5 pages
43!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Report Rough secCons:
1. Overview of the field (online single object tracking) 2. The algorithm overview 3. Components implemented by you (Your contribuCon)
1. Learning / DetecCon method 2. Features used 3. PosiCve and NegaCve sampling strategy 4. Integrator to combine detecCon and tracking results
4. Code README 5. Results
1. QuanCtaCve result for each sequence (ValidaCon + Test) 1. Avg. overlap, Avg. precision and Cme taken/frame
2. QualitaCve result with analysis 3. Error analysis for difficult sequences
44!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Report
Overview of the field • What is the problem • What is the general scope of methods we’ve talked about in class
• Mini-‐summary of class papers • Cite papers!
45!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Report
The algorithm overview • Your understanding of how TLD works • Why would just using a LK tracker fail? • Why does only using detecCon/learning prohibiCve? • How do the tracker (T) and learning/detecCon (LD) interact?
• SuggesCons for improving the method!
46!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Report
Components implemented by you: • Learning / DetecCon method • Features used • PosiCve and NegaCve sampling strategy • Integrator to combine detecCon and tracking results
• MoCvate your choice for each component! • Provide a quanCtaCve/qualitaCve comparison with other
possible model choices
47!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Report
Code • A README for your code • What are the key files/funcCons (if you added addiConal files, explain them too)
• How can the TAs reproduce your results?
48!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 Report Results • QuanCtaCve results
– Average precision per video sequence – Average overlap per video sequence – Time taken per frame to track object in the video
• For project 2, provide results separately for the validaCon and test sets.
• QualitaCve results – 2 interesCng examples where your detecCon method succeeded and 2 examples where it failed
– Detailed error analysis for cases where it failed
49!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Extensions
• The assignment is open ended in terms of the features/learning/detecCon methods you choose to use
• Plenty of possibiliCes to try different methods J
50!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Possible Extensions • Comparison of different features (patch2paXern) – Binary features are usually fast to compute and give
reasonable performance – Try openly available implementaCons of BRIEF, LBP, FREAK – Dense features give good performance but are slower • Resized patch aQer mean subtracCon (or whitening) • HOG from resized patch
• Try different sampling and pooling strategies for features – Densely sample the enCre patch or use keypoints – SpaCal pyramids for pooling
51!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Possible Extensions • Comparison of different learning methods – Slower batch trained classifiers such as linear SVM – Faster online SVM, random ferns – DetecCon strategy • Run classifier densely on all grid bounding boxes • Pre-‐select a smaller subset of good candidates to run classifier
• Data augmentaGon for learning – Warping/shiQing/noise addiCon to posiCve and negaCves – Mine only hard negaCves for training classifiers
52!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Possible Extensions • ExperimenGng with the integrator – When to restart the LK tracker? – How to weight the tracker and detecCon results? – AdapCng the integrator method based on video properCes
• Introducing priors to regularize the tracking – Penalize sudden and large bbox transiCons between
frames – Penalize sudden change in direcCon of moCon
53!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! 54!
• TLD Review • Error metrics • Code Overview • Project 2 Report • Project 2 PresentaCons
Outline
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Project 2 PresentaCons
• These happen the day before the report/code is due.
• Every team should submit 4-‐5 slides to Alex ([email protected]) by 5 pm the day before (Sun May 10)
• Reminder: Teams of 1 or 2 people • If two people, make sure both present!
• Randomly pick ~10 teams to present.
55!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Things to include in presentaCon
• Important contribuCons in your implementaCon
• SubtleCes/things you didn’t expect • Important: 2 video results for your tracking
– Provide result on one video which is not from the provided dataset – (Note: You may use ffmpeg to combine the output frames generated by
the tracking method into a video)
• Any insights!
56!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Grading
• 35%: Technical Approach and Code • Is your code correct? Do anything cool?
• 35%: Experimental EvaluaCon • Performance, insights, thorough evaluaCon
• 20%: Write-‐up • Contains everything, forma\ed well, etc.
• 10%: Project PresentaCon • Clarity, Content. • Not counted if no presentaCon in a week.
57!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Submi}ng
• Submit via CourseWork • One submission per team • We’ll use cheaCng-‐detecCon soQware – Do not use the openly available TLD code! – Cite any external code/library you use! – Please don’t make this an issue!
58!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Late Days
• You have 7, split between the three projects any way you want
• But your project presentaCon itself sCll needs to be on Cme (in class). Late days only apply to write-‐up/code submission
59!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Working in Groups
• You can work with up to one other person • Shared code/report. • We’ll grade fairly regardless of team size
60!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Important Dates
• May 11(in class): PresentaCons • May 12 (5 pm): Reports due
61!
Lecture 6 - ! 4-May-15!Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan!
Other QuesCons?
62!