approximate nearest neighbors and the fast johnson-lindenstrauss transform nir ailon, bernard...

23
Approximate Nearest Approximate Nearest Neighbors Neighbors and the and the Fast Johnson- Fast Johnson- Lindenstrauss Lindenstrauss Transform Transform Nir Ailon Nir Ailon , Bernard , Bernard Chazelle Chazelle (Princeton University) (Princeton University)

Post on 20-Dec-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Approximate Nearest NeighborsApproximate Nearest Neighborsand theand the

Fast Johnson-LindenstraussFast Johnson-LindenstraussTransformTransform

Nir AilonNir Ailon, Bernard Chazelle, Bernard Chazelle(Princeton University)(Princeton University)

Dimension ReductionDimension Reduction

Algorithmic metric embedding techniqueAlgorithmic metric embedding technique

(R(Rdd, L, Lqq) ) !! (R (Rkk, L, Lpp))

k << dk << d Useful in algorithms requiring exponential Useful in algorithms requiring exponential

(in d) time/space(in d) time/space

Johnson-Lindenstrauss for LJohnson-Lindenstrauss for L22

What is exact complexity?What is exact complexity?

Dimension Reduction ApplicationsDimension Reduction Applications

Approximate nearest neighbor [KOR00, IM98]…Approximate nearest neighbor [KOR00, IM98]… Text analysis [PRTV98]Text analysis [PRTV98] Clustering [BOR99, S00]Clustering [BOR99, S00] Streaming [I00]Streaming [I00] Linear algebra [DKM05, DKM06]Linear algebra [DKM05, DKM06]

Matrix multiplication Matrix multiplication SVD computationSVD computation LL22 regression regression

VLSI layout Design [V98]VLSI layout Design [V98] Learning [AV99, D99, V98] . . .Learning [AV99, D99, V98] . . .

Three Quick Slides on:Three Quick Slides on:Approximate Nearest Neighbor Approximate Nearest Neighbor

SearchingSearching. . . . . .

Approximate Nearest NeighborApproximate Nearest Neighbor

P = Set of n points

x

pmin

p

dist(x,p) · (1+)dist(x,pmin)

Approximate Nearest NeighborApproximate Nearest Neighbor

d can be very larged can be very large -approx beats “curse of dimensionality”-approx beats “curse of dimensionality” [IM98, H01] (Euclidean), [KOR00] (Cube):[IM98, H01] (Euclidean), [KOR00] (Cube):

Time O(Time O(-2-2d log n)d log n) Space nSpace nO(O(-2-2))

Bottleneck: Dimension reduction

Using FJLTO(d log d + -3 log2 n)

The d-Hypercube CaseThe d-Hypercube Case [KOR00][KOR00] Binary search on distance Binary search on distance 22 [d] [d] For distance For distance multiply space by random matrixmultiply space by random matrix

22 Z Z22k k ££ d d k=O( k=O(-- log n) log n)

ijij i.i.d. i.i.d. »» biased coin biased coin

Preprocess lookup tables for Preprocess lookup tables for x (mod 2)x (mod 2) Our observation: Our observation: can be made sparse can be made sparse

Using “handle” to pUsing “handle” to p22 P s.t. dist(x,p) P s.t. dist(x,p) Time for each step: O(Time for each step: O(-2-2d log n) d log n) )) O(d + O(d + -2-2 log n) log n)

How to make similar improvement for LHow to make similar improvement for L2 2 ??

Back to Euclidean Space andBack to Euclidean Space andJohnson-LindenstraussJohnson-Lindenstrauss. . . . . .

History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction

[JL84] [JL84] : Projection of R: Projection of Rdd onto random onto random

subspace of dimension k=c subspace of dimension k=c-2-2 log n log n w.h.p.:w.h.p.:

88 p pii,p,pjj 22 P P

|| || p pi i - - p pjj || ||2 2 = = (1±O((1±O() ||p) ||pi i - p- pjj||||22

LL22 !! L L22 embedding embedding

History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction

[FM87], [DG99][FM87], [DG99] Simplified proof, improved constant cSimplified proof, improved constant c 22 R Rk k ££ d d : random orthogonal matrix : random orthogonal matrix

1

2

k

||i||2=1

i ¢ j = 0

History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction

[IM98][IM98] 22 R Rkk££ d d : : ijij i.i.d. i.i.d. »» N(0,1/d) N(0,1/d)

1

2

k

E ||i||22=1

Ei ¢ j = 0

History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction

[A03][A03] Need only tight concentration of |Need only tight concentration of |ii ¢¢ v| v|22

22 R Rkk££ d d : : ijij i.i.d. i.i.d. » »

1

2

k

E ||i||22=1

Ei ¢ j = 0

+1 1/2 -1 1/2

History of Johnson-LindenstraussHistory of Johnson-LindenstraussDimension ReductionDimension Reduction

[A03][A03] 22 R Rkk££ d d : : ijij i.i.d. i.i.d. » » SparseSparse

1

2

k

E ||i||22=1

Ei ¢ j = 0

+1 1/6 0 2/3 -1 1/6

00000000000000000000000

Sparse Johnson-LindenstraussSparse Johnson-Lindenstrauss

Sparsity parameter: s = Pr[ Sparsity parameter: s = Pr[ ijij 0 ] 0 ]

Cannot be o(1) due to “hidden coordinate”Cannot be o(1) due to “hidden coordinate”

0100

v = 2 Rd

00000000000000000000000000000000000000000000

Uncertainty PrincipleUncertainty Principle

v sparse v sparse )) v dense v dense

v = H vv = H v

^

^

- Walsh - Hadamard matrix- Fourier transform on {0,1}log2 d

- Computable in time O(d log d)

- Isometry: ||v||2 = ||v||2^

Adding RandomizationAdding Randomization

H deterministic, invertibleH deterministic, invertible)) We’re back to square one! We’re back to square one!

Precondition H with random diagonal DPrecondition H with random diagonal D

±1 ±1

±1

. . .D = - Computable in time O(d)- Isometry

The lThe l11-Bound Lemma-Bound Lemma

w.h.p.:w.h.p.:88 p pii,p,pjj 22 P P µµ R Rd d ::

||HD(p||HD(pi i - p- pjj)||)||11 ·· O(d O(d-1/2-1/2 log log1/21/2 n) ||p n) ||pi i - p- pjj||||2 2

Rules out:Rules out: HD(p HD(pi i – p– pjj) = “hidden coordinate vector” !!) = “hidden coordinate vector” !!

instead...instead...

Hidden Coordinate-SetHidden Coordinate-Set

Worst-case v = pWorst-case v = pi i - p- pjj (assuming l (assuming l11-bound):-bound):

88 j j J: |v J: |vjj| = | = (d(d-1/2-1/2 log log1/21/2 n) n)

88 j j J: vJ: vjj = 0= 0

J J µµ [d], |J| = [d], |J| = (d/log n)(d/log n)

(assume ||v||(assume ||v||22 = 1) = 1)

Fast J-L TransformFast J-L TransformFJLT = FJLT = H DH D

ij ij i.i.d i.i.d » » 0 1-sN(0,1) s

Diag(±1)HadamardSparseJL

l2 ! l1 l2 ! l2

-1 log nd

log2 nds s

Bottleneck:Bias of |i ¢ v|

Bottleneck:Variance of |i ¢ v|2

ApplicationsApplications

Approximate nearest neighbor in (RApproximate nearest neighbor in (Rdd, l, l22))

ll2 2 regression: regression:

minimize ||Ax-b||minimize ||Ax-b||22

A A 22 R Rn n ££ d d over-constrained: d<<n over-constrained: d<<n

[DMM06] approximate by sampling[DMM06] approximate by sampling

[Sarlos06] using FJLT [Sarlos06] using FJLT )) constructive constructive More applications...?More applications...?

non-constructivenon-constructive

Interesting Problem Interesting Problem II

Improvement & lower bound Improvement & lower bound

for J-L computationfor J-L computation

Interesting Problem Interesting Problem IIII Dimension reduction is samplingDimension reduction is sampling Sampling by random walk:Sampling by random walk:

Expander graphs for uniform samplingExpander graphs for uniform sampling Convex bodies for volume estimationConvex bodies for volume estimation

[Kac59]: Random walk on orthogonal group[Kac59]: Random walk on orthogonal groupfor t=1..T:for t=1..T:

pick i,j pick i,j 22RR [d], [d], 22RR [0,2 [0,2)) v vii v vii cos cos+ v+ vj j sin sin

vvjj -v -vii sin sin + v+ vj j coscos Output (vOutput (v11, ..., v, ..., vkk) as dimension reduction of v) as dimension reduction of v How many steps for J-L guarantee?How many steps for J-L guarantee? [CCL01], [DS00], [P99] . . .[CCL01], [DS00], [P99] . . .

Ã

Thank You!

Thank YouThank You!!