paper overviews
DESCRIPTION
Paper Overviews. 3 types of descriptors : SIFT / PCA-SIFT ( Ke , Sukthankar ) GLOH ( Mikolajczyk , Schmid ) DAISY ( Tola , et al, Winder, et al) Comparison of descriptors ( Mikolajczyk , Schmid ). Paper Overviews. PCA-SIFT: SIFT-based but with a smaller descriptor - PowerPoint PPT PresentationTRANSCRIPT
Paper Overviews
3 types of descriptors:
SIFT / PCA-SIFT (Ke, Sukthankar)
GLOH (Mikolajczyk, Schmid)
DAISY (Tola, et al, Winder, et al)
Comparison of descriptors (Mikolajczyk, Schmid)
Paper Overviews
PCA-SIFT: SIFT-based but with a smaller descriptor
GLOH: modifies the SIFT descriptor for robustness and distinctiveness
DAISY: novel descriptor that uses graph cuts for matching and depth map estimation
SIFT
• “Scale Invariant Feature Transform”• 4 stages:
1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors
SIFT
• “Scale Invariant Feature Transform”• 4 stages:
1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors
SIFT• 1. Peak Selection• Make Gaussian pyramid
http://www.cra.org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters.html
SIFT• 1. Peak Selection• Find local peaks using difference of
Gaussians–- Peaks are found at different scales
http://www.cra.org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters.html
SIFT
• “Scale Invariant Feature Transform”• 4 stages:
1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors
SIFT• 2. Keypoint Localization
–Remove peaks that are “unstable”:» Peaks in low-contrast areas» Peaks along edges» Features not distinguishable
SIFT
• “Scale Invariant Feature Transform”• 4 stages:
1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors
SIFT• 3. Keypoint Orientation• Make histogram of gradients for a patch
of pixels• Orient all patches so the dominant
gradient direction is vertical
http://www.inf.fu-berlin.de/lehre/SS09/CV/uebungen/uebung09/SIFT.pdf
SIFT
• “Scale Invariant Feature Transform”• 4 stages:
1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors
SIFT• 4. Descriptors
• Ideal descriptor:• Compact• Distinctive from other descriptors• Robust against lighting / viewpoint changes
SIFT• 4. Descriptors
• A SIFT descriptor is a 128-element vector:–4x4 array of 8-bin histograms–Each histogram is a smoothed representation of gradient orientations of the patch
PCA-SIFT• Changes step 4 of the SIFT process to
create different descriptors
• Rationale: –Construction of SIFT descriptors is
complicated–Reason for constructing them that way is
unclear – Is there a simpler alternative?
PCA-SIFT• “Principal Component Analysis” (PCA)• A widely-used method of dimensionality
reduction• Used with SIFT to make a smaller feature
descriptor–By projecting the gradient patch into a smaller space
PCA-SIFT–Creating a descriptor for keypoints:
1.Create patch eigenspace2.Create projection matrix3.Create feature vector
PCA-SIFT–1. Create patch eigenspace–For each keypoint:•Take a 41x41 patch around the keypoint•Compute horizontal / vertical gradients
–Put all gradient vectors for all keypoints into a matrix
PCA-SIFT–1. Create patch eigenspace–M = matrix of gradients for all keypoints–Calculate covariance of M–Calculate eigenvectors of covariance(M)
PCA-SIFT–2. Create projection matrix–Choose first n eigenvectors
–This paper uses n = 20
–This is the projection matrix–Store for later use, no need to re-compute
PCA-SIFT–3. Create feature vector–For a single keypoint:•Take its gradient vector, project it with the projection matrix•Feature vector is of size n
–This is called Grad PCA in the paper–“Img PCA” - use image patch instead of gradient–Size difference: 128 elements (SIFT) vs. n = 20
PCA-SIFT–Results–Tested SIFT vs. “Grad PCA” and “Img PCA” on a series of image variations:
–Gaussian noise–45° rotation followed by 50% scaling–50% intensity scaling–Projective warp
PCA-SIFT–Results (Precision-recall curves)–Grad PCA (black) generally outperforms Img PCA (pink) and SIFT (purple) except when brightness is reduced–Both PCA methods outperform SIFT with illumination changes
PCA-SIFT–Results–PCA-SIFT also gets more matches correct on images taken at different viewpoints
–
A Performance Evaluation of Local Descriptors
Krystian Mikojaczyk and Cordilia Schmid
Problem Setting for Comparison Matching Problem
From a slide of David G. Lowe (IJCV 2004)
As we did in Project2: Panorama, we want to find correctpairs of points in two images.
Overview of Compared Methods Region Detectordetects interest points
Region Descriptordescribes the points
Matching StrategyHow to find pairs of points in two images?
Region Detector Harris Points Blob Structure Detector1. Harris-Laplace Regions (similar to DoG)2. Hessian-Laplace Regions 3. Harris-Affine Region4. Hessian-Affine Region Edge Detector Canny Detector
Region DescriptorsDescriptor Dimension Category Distance Measure
SIFT 128
SIFT Based Descriptors
Euclidean
PCA-SIFT 36GLOH 128
Shape Context 36 Similar to SIFT, but focues on Edge locations with Canny Detector
Spin 50 A sparse set of affine-invariant local patches are used
Steerable Filter 14
Differential DescriptorsForcuses on the properties of local derivaties (local jet)
Mahalanobis
Differential Invariants 14Complex Filters 1681 Consists of many fileters
Gradient Moments 20 Moment based descriptorCross Correlation 81 Uniformaly sampled locations
Matching Strategy Threshold-Based Matching
Nearest Neighbor Matching – Threshold
Nearest Neighbor Matching – Distance Ratiothreshold||DD|| BA
threshold||DD||||DD||
CA
BA
DB: the first neighbor
DB: the first neighborDC: the second neighbor
Peformance Measurements Repeatability rate, ROC
Recall-Precision
Recall =# of correct maches
Total # of correct matches
Precision =# of correct maches
# of correct matches + # of false matches
TP (True Positive)
Actual positive
TP (True Positive)
Predicted positive
=
=
Example of Recall-Precision Let's say that our method detected.. * 50 corrsponding pairs were extracted * 40 detected pairs were correct pairs * As a groud truth, there are 200 correct pairs!Then, Recall = C/B = 40/200 = 20% Precision = C/A = 40/50 = 80%
The perfect descriptor gives 100% recall for any value of Precision!!
Actual posPredicted Pos
A BA C B
DataSet 6 different transformed images
Rotation
Image Blur
Zoom + Rotation
Viewpoint Change
Light ChangeJPEG Compression
Matching Strategies
* Hessian-Affine Regions
Nearnest Neigbor Matching – Threshold Nearnest Neigbor Matching – Distance Ratio
Threshold based Matching
View Point Change
With Hessian Affine Regions With Harris-Affine Regions
Scale Change with Rotation
Hessian-Laplace Regions Harris-Laplace Regions
Image Rotation of 30~45 degree
Harris Points
Image Blur
Hessian Affine Regions
JPEG Compression
* Hessian-Affine Regions
IlluminationChanges
* Hessian-Affine Regions
Ranking of Descriptor
1. SIFT-based descriptors, 128 dimensions GLOH, SIFT2. Shape Context, 36 dimensions
3. PCA-SIFT, 36 dimensions
4. Gradient moments & Steerable Filters ( 20 dimensions ) & ( 14 dimensions)
5. Other descriptors
High Peformance
Low Peformance
Note: This performance is for matching problem. This is not general performance.
Ranking of Difficult Image Transformation
1. Scale & Rotation & illumination
2. JPEG Compression
3. Image Blur
4. View Point Change
easy
difficult
1. Structured Scene
2. Textured Scene
easy
difficult
Two Textured Scenes
Other Results Hessian Regions are better than Harris Regions Nearnest Neigbor based matching is better than a
simple threshold based matching SIFT becomes better when nearenest neigbor
distance ration is used Robust region descriptors peform bettern than
point-wise descriptors Image Rotation does not have big impact on the
accuracy of descriptors
A Fast Local Descriptor for Dense MatchingEngin Tola, Vincent Lepetit, Pascal FuaEcole Polytechnique Federale de Lausanne, Switzerland
Paper novelty
• Introduces DAISY local image descriptor – much faster to compute than SIFT for dense point matching– works on the par or better than SIFT
• DAISY descriptors are fed into expectation-maximization (EM) algorithm which uses graph cuts to estimate the scene’s depth.– works on low-quality images such as the ones captured by video streams
SIFT local image descriptor • SIFT descriptor is a 3–D histogram in which two dimensions correspond to
image spatial dimensions and the additional dimension to the image gradient direction (normally discretized into 8 bins)
SIFT local image descriptor• Each bin contains a weighted sum of the norms of the image gradients
around its center, where the weights roughly depend on the distance to the bin center
DAISY local image descriptor• Gaussian convolved orientation maps are calculated for every direction
: Gaussian convolution filter with variance S : image gradient in direction o (.)+ : operator (a)+ = max(a, 0) : orientation maps
• Every location in contains a value very similar to what a bin in SIFT contains: a weighted sum computed over an area of gradient norms
DAISY local image descriptor
DAISY local image descriptorI. Histograms at every pixel location are computed
: histogram at location (u, v) : Gaussian convolved orientation mapsII. Histograms are normalized to unit normIII. Local image descriptor is computed as
: the location with distance R from (u,v) in the direction given by j when the directions are quantized into N values
From Descriptor to Depth Map• The model uses EM to estimate depth map Z and occlusion map O by
maximizing
: descriptor of image n
Results
Results
Results
Picking the Best DaisySimon Winder, Gang Hua, Matthew Brown
Paper Contribution
• Utilize novel ground-truth training set• Test multiple configurations of low-level filters and DAISY pooling and
optimize over their parameter• Investigate the effects of robust normalization• Apply PCA dimension reduction and dynamic range reduction to compress
the representation of descriptors• Discuss computational efficiency and provide a list of recommendations
for descriptors that are useful in different scenarios
Descriptor Pipeline
• T-block takes the pixels from the image patch and transforms them to produce a vector of k non-linear filter responses at each pixel.– Block T1 involves computing gradients at each pixel and bilinearly quantizing the
gradient angle into k orientation bins as in SIFT– Block T2 rectifies the x and y components of the gradient to produce a vector of length
4:
– Block T3 uses steerable filters evaluated at a number of different orientations
Descriptor Pipeline
• S-block spatially accumulates weighted filter vectors to give N linearly summed vectors of length k and these are concatenated to form a descriptor of kN dimensions.
Descriptor Pipeline
• S-block spatially accumulates weighted filter vectors to give N linearly summed vectors of length k and these are concatenated to form a descriptor of kN dimensions.
Descriptor Pipeline
• N-block normalizes the complete descriptor to provide invariance to lighting changes. Use a form of threshold normalization with the following stages– Normalize the descriptor to a unit vector– Clip all the elements of the vector that are above a threshold by computing
– Scale the vector to a byte range.
Descriptor Pipeline
• Dimension reduction. Apply principle components analysis to compress descriptor.– First optimize the parameters of the descriptor and then compute the matrix of principal
components base on all descriptors computed on the training set.– Next find the best dimensionality for reduction by computing the error rate on random
subsets of the training data.– Progressively increasing the dimensionality by adding PCA bases until minimum error is
found.
Descriptor Pipeline
• Quantization further compress descriptor to reduce memory requirement for large database of descriptor by quantizing descriptor elements into L levels.
Training
• Use 3D reconstructions as a source of training data.
• Use machine learning approach to optimize parameters.
Results
• Gradient-based descriptor
Results
• Dimension Reduction
Results
• Descriptor Quantization