k-nearest neighbors search in high dimensions tomer peled dan kushnir tell me who your neighbors...

111
k-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Upload: omari-dannels

Post on 14-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

k-Nearest Neighbors Search in High Dimensions

Tomer Peled

Dan Kushnir

Tell me who your neighbors are and Ill know who you are

Outline

bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

bull Given a set P of n points in Rd

Over some metric

bull find the nearest neighbor p of q in P

Nearest Neighbor SearchProblem definition

Distance metric

QQ

Applications

bullClassification bullClustering

bullSegmentation

q

bullIndexingbullDimension reduction

(eg lle)

color

Weight

Naiumlve solution

bullNo preprocess

bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd

bullquery time = O(nd)

Keep in mind

Common solution

bullUse a data structure for acceleration

bullScale-ability with n amp with d is important

When to use nearest neighbor

High level algorithms

Assuming no prior knowledge about the underlying probability structure

complex models Sparse data High dimensions

Parametric Non-parametric

Density estimation

Probability distribution estimation

Nearest neighbors

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 2: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Outline

bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

bull Given a set P of n points in Rd

Over some metric

bull find the nearest neighbor p of q in P

Nearest Neighbor SearchProblem definition

Distance metric

QQ

Applications

bullClassification bullClustering

bullSegmentation

q

bullIndexingbullDimension reduction

(eg lle)

color

Weight

Naiumlve solution

bullNo preprocess

bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd

bullquery time = O(nd)

Keep in mind

Common solution

bullUse a data structure for acceleration

bullScale-ability with n amp with d is important

When to use nearest neighbor

High level algorithms

Assuming no prior knowledge about the underlying probability structure

complex models Sparse data High dimensions

Parametric Non-parametric

Density estimation

Probability distribution estimation

Nearest neighbors

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 3: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

bull Given a set P of n points in Rd

Over some metric

bull find the nearest neighbor p of q in P

Nearest Neighbor SearchProblem definition

Distance metric

QQ

Applications

bullClassification bullClustering

bullSegmentation

q

bullIndexingbullDimension reduction

(eg lle)

color

Weight

Naiumlve solution

bullNo preprocess

bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd

bullquery time = O(nd)

Keep in mind

Common solution

bullUse a data structure for acceleration

bullScale-ability with n amp with d is important

When to use nearest neighbor

High level algorithms

Assuming no prior knowledge about the underlying probability structure

complex models Sparse data High dimensions

Parametric Non-parametric

Density estimation

Probability distribution estimation

Nearest neighbors

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 4: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Applications

bullClassification bullClustering

bullSegmentation

q

bullIndexingbullDimension reduction

(eg lle)

color

Weight

Naiumlve solution

bullNo preprocess

bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd

bullquery time = O(nd)

Keep in mind

Common solution

bullUse a data structure for acceleration

bullScale-ability with n amp with d is important

When to use nearest neighbor

High level algorithms

Assuming no prior knowledge about the underlying probability structure

complex models Sparse data High dimensions

Parametric Non-parametric

Density estimation

Probability distribution estimation

Nearest neighbors

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 5: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Naiumlve solution

bullNo preprocess

bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd

bullquery time = O(nd)

Keep in mind

Common solution

bullUse a data structure for acceleration

bullScale-ability with n amp with d is important

When to use nearest neighbor

High level algorithms

Assuming no prior knowledge about the underlying probability structure

complex models Sparse data High dimensions

Parametric Non-parametric

Density estimation

Probability distribution estimation

Nearest neighbors

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 6: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Common solution

bullUse a data structure for acceleration

bullScale-ability with n amp with d is important

When to use nearest neighbor

High level algorithms

Assuming no prior knowledge about the underlying probability structure

complex models Sparse data High dimensions

Parametric Non-parametric

Density estimation

Probability distribution estimation

Nearest neighbors

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 7: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

When to use nearest neighbor

High level algorithms

Assuming no prior knowledge about the underlying probability structure

complex models Sparse data High dimensions

Parametric Non-parametric

Density estimation

Probability distribution estimation

Nearest neighbors

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 8: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Nearest Neighbor

min pi P dist(qpi)

Closestqq

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 9: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 10: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 11: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

The simplest solution

bullLion in the desert

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 12: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Quadtree

Split the first dimension into 2

Repeat iteratively

Stop when each cell has no more than 1 data point

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 13: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Quadtree - structure

X

Y

X1Y1 PgeX1PgeY1

PltX1PltY1

PgeX1PltY1

PltX1PgeY1

X1Y1

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 14: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Quadtree - Query

X

Y

In many cases works

X1Y1PltX1PltY1 PltX1

PgeY1

X1Y1

PgeX1PgeY1

PgeX1PltY1

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 15: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Quadtree ndash Pitfall1

X

Y

In some cases doesnrsquot

X1Y1PgeX1PgeY1

PltX1

PltX1PltY1 PgeX1

PltY1PltX1PgeY1

X1Y1

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 16: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Quadtree ndash Pitfall1

X

Y

In some cases nothing works

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 17: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Quadtree ndash pitfall 2X

Y

O(2d)

Could result in Query time Exponential in dimensions

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 18: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Space partition based algorithms

Multidimensional access methods Volker Gaede O Gunther

Could be improved

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 19: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse

Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 20: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Curse of dimensionality

bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan

ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed

bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002

O( min(nd nd) )Naive

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 21: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Curse of dimensionalitySome intuition

2

22

23

2d

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 22: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Outline

bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse

Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)

bulll2 extensionbullApplications (Dan)

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 23: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 24: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Hash function

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 25: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Hash function

Hash function

Data_Item

Key

BinBucket

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 26: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Hash function

X modulo 3

X=Number in the range 0n

02

Storage Address

Data structure

0

Usually we would like related Data-items to be stored at the same bin

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 27: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Recall r - Nearest Neighbor

r

(1 + ) r

dist(qp1) r

dist(qp2) (1 + ) r r2=(1 + ) r1

qq

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 28: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Locality sensitive hashing

r(1 + ) r

(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q

r2=(1 + ) r1

qq

P1P2

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 29: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l1 amp l2

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 30: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Hamming Space

bullHamming space = 2N binary strings

bullHamming distance = changed digits

aka Signal distanceRichard Hamming

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 31: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Hamming SpaceN

010100001111

010100001111

010010000011Distance = 4

bullHamming space

bullHamming distance

SUM(X1 XOR X2)

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 32: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

L1 to Hamming Space Embedding

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 33: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Hash function

Lj Hash function

p Hdrsquoisin

Gj(p)=p|Ij

j=1L k=3 digits

Bits sampling from p

Store p into bucket p|Ij 2k buckets101

11000000000 111111110000 111000000000 111111110001

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 34: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Construction

1 2 L

p

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 35: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Query

1 2 L

q

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 36: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Alternative intuition random projections

p

8

C=11

1111111100011000000000

2

1111111100011000000000

drsquo=Cd

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 37: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 38: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Alternative intuition random projections

8

C=11

1111111100011000000000

2

1111111100011000000000

p

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 39: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Alternative intuition random projections

101

11000000000 111111110000 111000000000 111111110001

000

100

110

001

101

111

2233 BucketsBucketsp

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 40: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

k samplings

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 41: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Repeating

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 42: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Repeating L times

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 43: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Repeating L times

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 44: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Secondary hashing

Support volume tuning

dataset-size vs storage volume

2k buckets

011

Size=B

M Buckets

Simple Hashing

MB=αn α=2

Skip

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 45: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

The above hashing is locality-sensitive

bullProbability (pq in same bucket)=

k=1 k=2

Distance (qpi) Distance (qpi)

Pro

babi

lity Pr

Adopted from Piotr Indykrsquos slides

kqp

dimensions

)(Distance1

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 46: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Preview

bullGeneral Solution ndash Locality sensitive hashing

bullImplementation for Hamming space

bullGeneralization to l2

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 47: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Direct L2 solution

bullNew hashing function

bullStill based on sampling

bullUsing mathematical trick

bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 48: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Central limit theorem

v1 +v2 hellip+vn =+hellip

(Weighted Gaussians) = Weighted Gaussian

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 49: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Central limit theorem

v1vn = Real Numbers

X1Xn = Independent Identically Distributed(iid)

+v2 X2 hellip+vn Xn =+hellipv1 X1

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 50: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Central limit theorem

XvXvi

ii

ii

21

2||

Dot Product Norm

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 51: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Features vector 1

Features vector 2 Distance

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 52: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Norm Distance

XvuXvXui

iii

iii

ii

21

2||

Dot Product

Dot Product Distance

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 53: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

The full Hashing

w

bvavh ba )(

[34 82 21]1

227742

d

d random numbers

+b

phaseRandom[0w]

wDiscretization step

Features vector

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 54: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

The full Hashing

w

bvavh ba )(

+34

100

7944

7900 8000 8100 82007800

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 55: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

The full Hashing

w

bvavh ba )(

+34

phaseRandom[0w]

100Discretization step

7944

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 56: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

The full Hashing

w

bvavh ba )(

a1 v d

iid from p-stable distribution

+b

phaseRandom[0w]

wDiscretization step

Features vector

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 57: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Generalization P-Stable distribution

bullLp p=eps2

bullGeneralized Central Limit Theorem

bullP-stable distributionCauchy for L2

bullL2

bullCentral Limit Theorem

bullGaussian (normal) distribution

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 58: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

P-Stable summary

bullWorks for bullGeneralizes to 0ltplt=2

bullImproves query time

Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )

r - Nearest Neighbor

Latest resultsReported in Email by

Alexander Andoni

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 59: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Parameters selection

bull90 Probability Best quarry time performance

For Euclidean Space

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 60: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Parameters selectionhellip

For Euclidean Space

bullSingle projection hit an - Nearest Neighbor with Pr=p1

bullk projections hits an - Nearest Neighbor with Pr=p1k

bullL hashings fail to collide with Pr=(1-p1k)L

bullTo ensure Collision (eg 1-δge90)

bull1( -1-p1k)Lge 1-δ)1log(

)log(

1kp

L

L

Reject Non-NeighborsAccept Neighbors

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 61: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

hellipParameters selection

K

k

time Candidates verification Candidates extraction

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 62: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Better Query Time than Spatial Data Structures

Scales well to higher dimensions and larger data size ( Sub-linear dependence )

Predictable running time

Extra storage over-head

Inefficient for data with distances concentrated around average

works best for Hamming distance (although can be generalized to Euclidean space)

In secondary storage linear scan is pretty much all we can do (for high dim)

requires radius r to be fixed in advance

Pros amp Cons

From Pioter Indyk slides

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 63: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Conclusion

bullbut at the endeverything depends on your data set

bullTry it at homendashVisit

httpwebmiteduandoniwwwLSHindexhtml

ndashEmail Alex AndoniAndonimitedundashTest over your own data

(C code under Red Hat Linux )

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 64: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive

Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)

bull Searching image databases (see the following)

bull Image segmentation (see the following)

bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)

bull Texture classification (see the following)

bull Clustering (see the following)

bull Embedding and manifold learning (LLE and many others)

bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)

bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)

bull In short whenever K-Nearest Neighbors (KNN) are needed

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 65: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Motivation

bull A variety of procedures in learning require KNN computation

bull KNN search is a computational bottleneck

bull LSH provides a fast approximate solution to the problem

bull LSH requires hash function construction and parameter tunning

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 66: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Outline

Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell

bull Finding sensitive hash functions

Mean Shift Based Clustering in HighDimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

bull Tuning LSH parametersbull LSH data structure is used for algorithm

speedups

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 67: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Given an image x what are the parameters θ in this image

ie angles of joints orientation of the body etc1048698

The Problem

Fast Pose Estimation with Parameter Sensitive Hashing

G Shakhnarovich P Viola and T Darrell

i

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 68: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Ingredients

bull Input query image with unknown angles (parameters)

bull Database of human poses with known anglesbull Image feature extractor ndash edge detector

bull Distance metric in feature space dx

bull Distance metric in angles space

m

i

iid1

2121 )cos(1)(

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 69: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Example based learning

bull Construct a database of example images with their known angles

bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the

query

Input queryFind KNN in database of examples

Output Average angles of KNN

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 70: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Input Query

Features extraction

Processed query

PSH (LSH)

Database of examples

The algorithm flow

LWR (Regression)

Output Match

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 71: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

The image features

B A

Axx 4107 )(

4

3

2

4 0

Image features are multi-scale edge histograms

Feature Extraction PSH LWR

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 72: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

PSH The basic assumption

There are two metric spaces here feature space ( )

and parameter space ( )

We want similarity to be measured in the angles

space whereas LSH works on the feature space

bull Assumption The feature space is closely related to the parameter space

xd

d

Feature Extraction PSH LWR

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 73: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Insight Manifolds

bull Manifold is a space in which every point has a neighborhood resembling a Euclid space

bull But global structure may be complicated curved

bull For example lines are 1D manifolds planes are 2D manifolds etc

Feature Extraction PSH LWR

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 74: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Parameters Space (angles)

Feature Space

q

Is this Magic

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 75: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Parameter Sensitive Hashing (PSH)

The trick

Estimate performance of different hash functions on examples and select those sensitive to

The hash functions are applied in feature space but the KNN are valid in angle space

d

Feature Extraction PSH LWR

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 76: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Label pairs of examples with similar angles

Define hash functions h on feature space

Feature Extraction PSH LWR

Predict labeling of similarnon-similar examples by using h

Compare labeling

If labeling by h is goodaccept h else change h

PSH as a classification problem

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 77: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

+1 +1 -1 -1

(r=025)

Labels

)1()( if 1

)( if 1y

labeled is

)x()(x examples ofpair A

ij

ji

rd

rd

ji

ji

ji

Feature Extraction PSH LWR

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 78: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

otherwise 1-

T(x) if 1)(

xh T

A binary hash functionfeatures

otherwise 1

if 1ˆ

labels ePredict th

)(xh)(xh)x(xy

jTiTjih

Feature Extraction PSH LWR

Feature

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 79: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Feature Extraction PSH LWR

sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind

themseparateor bin

same in the examplesboth place willTh

)(xT

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 80: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Local Weighted Regression (LWR)bull Given a query image PSH returns

KNNs

bull LWR uses the KNN to compute a weighted average of the estimated angles of the query

weightdist

iXiixNx

xxdKxgdi

0)(

))(())((minarg0

Feature Extraction PSH LWR

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 81: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Results

Synthetic data were generated

bull 13 angles 1 for rotation of the torso 12 for joints

bull 150000 images

bull Nuisance parameters added clothing illumination face expression

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 82: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

bull 1775000 example pairs

bull Selected 137 out of 5123 meaningful features (how)

18 bit hash functions (k) 150 hash tables (l)

bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query

bull Without selection needed 40 bits and

1000 hash tables

Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 83: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Results ndash real data

bull 800 images

bull Processed by a segmentation algorithm

bull 13 of the data were searched

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 84: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Results ndash real data

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 85: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Interesting mismatches

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 86: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Fast pose estimation - summary

bull Fast way to compute the angles of human body figure

bull Moving from one representation space to another

bull Training a sensitive hash function

bull KNN smart averaging

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 87: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Food for Thought

bull The basic assumption may be problematic (distance metric representations)

bull The training set should be dense

bull Texture and clutter

bull General some features are more important than others and should be weighted

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 88: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Food for Thought Point Location in Different Spheres (PLDS)

bull Given n spheres in Rd centered at P=p1hellippn

with radii r1helliprn

bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q

qpi

ri

Courtesy of Mohamad Hegaze

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 89: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Motivationbull Clustering high dimensional data by using local

density measurements (eg feature space)bull Statistical curse of dimensionality

sparseness of the databull Computational curse of dimensionality

expensive range queriesbull LSH parameters should be adjusted for optimal

performance

Mean-Shift Based Clustering in High Dimensions A Texture Classification Example

B Georgescu I Shimshoni and P Meer

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 90: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Outline

bull Mean-shift in a nutshell + examples

Our scope

bull Mean-shift in high dimensions ndash using LSH

bull Speedups1 Finding optimal LSH parameters

2 Data-driven partitions into buckets

3 Additional speedup by using LSH data structure

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 91: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Mean-Shift in a Nutshellbandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

point

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 92: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

KNN in mean-shift

Bandwidth should be inversely proportional to the density in the region

high density - small bandwidth low density - large bandwidth

Based on kth nearest neighbor of the point

The bandwidth is

Adaptive mean-shift vs non-adaptive

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 93: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 94: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

3D

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 95: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Image segmentation algorithm

original segmented

filtered

Filtering pixel value of the nearest mode

Mean-shift trajectories

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 96: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

original squirrel filtered

original baboon filtered

Filtering examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 97: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Segmentation examples

Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 98: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Mean-shift in high dimensions

Computational curse of dimensionality

Statistical curse of dimensionality

Expensive range queries implemented with LSH

Sparseness of the data variable bandwidth

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 99: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

LSH-based data structure

bull Choose L random partitionsEach partition includes K pairs

(dkvk)bull For each point we check

kdi vxK

It Partitions the data into cells

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 100: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Choosing the optimal K and L

bull For a query q compute smallest number of distances to points in its buckets

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 101: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

points extra includemight big toois L ifbut

missed bemight points small toois L If

cell ain points ofnumber smaller k Large

C

l

l

CC

dC

LNN

dKnN

)1(

C

C

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

structure data theof resolution thedetermines

decreases but increases increases L As

C

CC

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 102: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points

distance (bandwidth)

Choose error threshold

The optimal K and L should satisfy

the approximate distance

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 103: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))

minimum

Approximationerror for KL

L(K) for =005 Running timet[KL(K)]

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 104: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Data driven partitions

bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value

uniform data driven pointsbucket distribution

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 105: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Additional speedup

aggregate)an of typea like is (C mode same

the toconverge willCin points all that Assume

Mean-shift LSH optimal kl LSH data partition

LSH LSH data struct

C

C

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 106: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Speedup results

65536 points 1638 points sampled k=100

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 107: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Food for thought

Low dimension High dimension

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 108: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data

dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash

functions neededbull The catch efficient dimensionality learning requires

KNN

1530 cookieshellip

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 109: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Summary

bull LSH suggests a compromise on accuracy for the gain of complexity

bull Applications that involve massive data in high dimension require the LSH fast performance

bull Extension of the LSH to different spaces (PSH)

bull Learning the LSH parameters and hash functions for different applications

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 110: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Conclusion

bull but at the endeverything depends on your data set

bull Try it at homendash Visit

httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data

(C code under Red Hat Linux )

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks
Page 111: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are

Thanks

bull Ilan Shimshoni (Haifa)

bull Mohamad Hegaze (Weizmann)

bull Alex Andoni (MIT)

bull Mica and Denis

  • k-Nearest Neighbors Search in High Dimensions
  • Outline
  • Nearest Neighbor Search Problem definition
  • Applications
  • Naiumlve solution
  • Common solution
  • When to use nearest neighbor
  • Nearest Neighbor
  • r - Nearest Neighbor
  • Slide 10
  • The simplest solution
  • Quadtree
  • Quadtree - structure
  • Quadtree - Query
  • Quadtree ndash Pitfall1
  • Slide 16
  • Quadtree ndash pitfall 2
  • Space partition based algorithms
  • Slide 19
  • Curse of dimensionality
  • Curse of dimensionality Some intuition
  • Slide 22
  • Preview
  • Hash function
  • Slide 25
  • Slide 26
  • Recall r - Nearest Neighbor
  • Locality sensitive hashing
  • Slide 29
  • Hamming Space
  • Slide 31
  • L1 to Hamming Space Embedding
  • Slide 33
  • Construction
  • Query
  • Alternative intuition random projections
  • Slide 37
  • Slide 38
  • Slide 39
  • k samplings
  • Repeating
  • Repeating L times
  • Slide 43
  • Secondary hashing
  • The above hashing is locality-sensitive
  • Slide 46
  • Direct L2 solution
  • Central limit theorem
  • Slide 49
  • Slide 50
  • Norm Distance
  • Slide 52
  • The full Hashing
  • Slide 54
  • Slide 55
  • Slide 56
  • Generalization P-Stable distribution
  • P-Stable summary
  • Parameters selection
  • Parameters selection hellip
  • hellip Parameters selection
  • Pros amp Cons
  • Conclusion
  • LSH - Applications
  • Motivation
  • Slide 66
  • Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
  • Ingredients
  • Example based learning
  • Slide 70
  • The image features
  • PSH The basic assumption
  • Insight Manifolds
  • Slide 74
  • Parameter Sensitive Hashing (PSH)
  • Slide 76
  • Slide 77
  • Slide 78
  • Slide 79
  • Local Weighted Regression (LWR)
  • Results
  • Slide 82
  • Results ndash real data
  • Slide 84
  • Slide 85
  • Fast pose estimation - summary
  • Food for Thought
  • Food for Thought Point Location in Different Spheres (PLDS)
  • Motivation
  • Slide 90
  • Slide 91
  • Slide 92
  • Slide 93
  • Image segmentation algorithm
  • Slide 95
  • Filtering examples
  • Segmentation examples
  • Mean-shift in high dimensions
  • LSH-based data structure
  • Choosing the optimal K and L
  • Slide 101
  • Choosing optimal K and L
  • Slide 103
  • Data driven partitions
  • Additional speedup
  • Speedup results
  • Food for thought
  • A thought for foodhellip
  • Summary
  • Slide 110
  • Thanks