k-nearest neighbors search in high dimensions tomer peled dan kushnir tell me who your neighbors...
TRANSCRIPT
k-Nearest Neighbors Search in High Dimensions
Tomer Peled
Dan Kushnir
Tell me who your neighbors are and Ill know who you are
Outline
bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Outline
bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-