faster johnson-lindenstrauss style reductions
TRANSCRIPT
Faster Johnson-Lindenstrauss style reductions
Faster Johnson-Lindenstrauss style reductions
Aditya Menon
August 23, 2007
Faster Johnson-Lindenstrauss style reductions
Outline
1 IntroductionDimensionality reductionThe Johnson-Lindenstrauss LemmaSpeeding up computation
2 The Fast Johnson-Lindenstrauss TransformSparser projectionsTrouble with sparse vectors?Summary
3 Ailon and Liberty’s improvementBounding the mappingThe Walsh-Hadamard transformError-correcting codesPutting it together
4 References
Faster Johnson-Lindenstrauss style reductions
Introduction
Dimensionality reduction
Distances
For high-dimensional vector data, it is of interest to have anotion of distance between two vectors
Recall that the `p norm of a vector x is
||x||p =(∑
|xi |p)1/p
The `2 norm corresponds to the standard Euclidean norm of avector
The `∞ norm is the maximal absolute value of any component
||x||∞ = maxi|xi |
Faster Johnson-Lindenstrauss style reductions
Introduction
Dimensionality reduction
Dimensionality reduction
Suppose we’re given an input vector x ∈ Rd
We want to reduce the dimensionality of x to some k < d ,while preserving the `p norm
Can think of this as a metric embedding problem - can weembed `d
p into `kp?
Formally, we have the following problem
Problem
Suppose we are given an x ∈ Rd , and some parameters p, ε. Canwe find a y ∈ Rk for some k = f (ε) so that
(1− ε)||x||p ≤ ||y||p ≤ (1 + ε)||x||p
Faster Johnson-Lindenstrauss style reductions
Introduction
The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss Lemma
The Johnson-Lindenstrauss Lemma [5] is the archetypal resultfor `2 dimensionality reduction
Tells us that for n points, there is an ε-embedding of
`d2 → `
O(log n/ε2)2
Theorem
Suppose uii=1...n ∈ Rn×d . Then, for ε > 0 and k = O(log n/ε2),there is a mapping f : Rd → Rk so that
(∀i , j)(1− ε)||ui − uj||2 ≤ ||f (ui)− f (uj)||2 ≤ (1 + ε)||ui − uj||2
Faster Johnson-Lindenstrauss style reductions
Introduction
The Johnson-Lindenstrauss Lemma
Johnson-Lindenstrauss in practice
Proof of Johnson-Lindenstrauss lemma is non-constructive(unfortunately!)
In practise, we use the probabilistic method to do aJohnson-Lindenstrauss style reduction
Insert randomness at the cost of an exact guarantee
Now the guarantee becomes probabilistic
Faster Johnson-Lindenstrauss style reductions
Introduction
The Johnson-Lindenstrauss Lemma
Johnson-Lindenstrauss in practice
Standard version:
Theorem
Suppose uii=1...n ∈ Rn×d . Then, for ε > 0 andk = O(β log n/ε2), the mapping f (ui) = 1√
kuiR, where R is a
d × k matrix of i.i.d. Gaussian variables, satisfies with probabilityat least 1− 1
nβ ,
(∀i , j)(1− ε)||ui − uj||2 ≤ ||f (ui)− f (uj)||2 ≤ (1 + ε)||ui − uj||2
Faster Johnson-Lindenstrauss style reductions
Introduction
Speeding up computation
Achlioptas’ improvement
Achlioptas [1] gave an ever simpler matrix construction:
Rij =√
3
+1 probability = 1
6
0 probability = 23
−1 probability = 16
23 rds sparse, and simpler to construct than a Gaussian matrix
With no loss in accuracy!
Faster Johnson-Lindenstrauss style reductions
Introduction
Speeding up computation
A question
23 rds sparsity is a good speedup in practise
But density is still O(dk)Computing the mapping is still an O(dk) operationasymptotically
Let
A = A : ∀ unit x ∈ Rd , with v.h.p., (1−ε) ≤ ||Ax||2 ≤ (1+ε)
Question: For which A ∈ A can Ax be computed quickerthan O(dk)?
Faster Johnson-Lindenstrauss style reductions
Introduction
Speeding up computation
The answer?
We look at two approaches that allow for quicker computation
First is the Fast Johnson-Lindenstrauss transform, based on aFourier transform
Next is the Ailon-Liberty Transform, based on a Fouriertransform and error correcting codes!
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
The Fast Johnson-Lindenstrauss Transform
Ailon and Chazelle [2] proposed the FastJohnson-Lindenstrauss transform
Can speedup `2 reduction from O(dk) to (roughly) O(d log d)
How?
Make the projection matrix even sparserNeed some “tricks” to solve the problems associated with this
Let’s reverse engineer the construction...
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Sparser projections
Sparser projection matrix
Use the projection matrix
P ∼
N(0, 1
q
)p = q
0 p = 1− q
where
q = min
Θ
(log2 n
d
), 1
Density of the matrix is O
(1ε2 min
log3 n, d log n
)In practise, this is typically significantly sparser thanAchlioptas’ matrix
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Trouble with sparse vectors?
What do we lose?
Can follow standard concentration-proof methods
But we end up needing to assume that ||x||∞ is bounded -namely, that information is spread out
We fail on vectors like x = (1, 0, . . . , 0) i.e. sparse data and asparse projection don’t mix well
So are we forced to choose between generality or usefulness?
Not if we try to insert randomness...
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Trouble with sparse vectors?
What do we lose?
Can follow standard concentration-proof methods
But we end up needing to assume that ||x||∞ is bounded -namely, that information is spread out
We fail on vectors like x = (1, 0, . . . , 0) i.e. sparse data and asparse projection don’t mix well
So are we forced to choose between generality or usefulness?
Not if we try to insert randomness...
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Trouble with sparse vectors?
A clever idea
Can we randomly transform x so that
||Φ(x)||2 = ||x||2||Φ(x)||∞ is bounded with v.h.p.?
Answer: Yes! Use a Fourier transform Φ = FDistance preservingHas an “uncertainty principle” - a “signal” and its Fouriertransform cannot both be concentrated
Use the FFT to give an O(d log d) random mapping
Details on the specifics in next section...
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Trouble with sparse vectors?
A clever idea
Can we randomly transform x so that
||Φ(x)||2 = ||x||2||Φ(x)||∞ is bounded with v.h.p.?
Answer: Yes! Use a Fourier transform Φ = FDistance preservingHas an “uncertainty principle” - a “signal” and its Fouriertransform cannot both be concentrated
Use the FFT to give an O(d log d) random mapping
Details on the specifics in next section...
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Trouble with sparse vectors?
Applying a Fourier transform
Fourier transform will guarantee that
||x||∞ = ω(1) ⇐⇒ ||x||∞ = o(1)
But now we will be in trouble if the input is uniformlydistributed!
To deal with this, do a random sign change:
x = Dx
where D is a random diagonal ±1 matrix
Now we get a guarantee of spread with high probability, so the“random” Fourier transform gives us back generality
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Trouble with sparse vectors?
Random sign change
The sign change mapping Dx will give us
x =
d1x1
d2x2...
ddxd
=
±x1
±x2...±xd
where the ± are attained with equal probability
Clearly norm preserving
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Trouble with sparse vectors?
Putting it together
So, we compute the mapping f : x 7→ PF(Dx)
Runtime will be
O
(d log d + min
d log n
ε2,log3 n
ε2
)Under some loose conditions, runtime is
O(max
d log d , k3
)If k ∈
[Ω(log d),O(
√d)], this is quicker than the O(dk)
simple mapping
In practise, upper bound is reasonable, lower bound might notbe though
Faster Johnson-Lindenstrauss style reductions
The Fast Johnson-Lindenstrauss Transform
Summary
Summary
Tried increasing sparsity with disregard for generality
Used randomization to get back generality (probabilistically)
Key ingredient was a Fourier transform, with a randomizationstep first
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Ailon and Liberty’s improvement
Ailon and Liberty [3] improved the runtime from O(d log d) toO(d log k), for k = O
(d1/2−δ
), δ > 0
Idea: Sparsity isn’t the only way to speedup computationtime
Can also speedup runtime when the projection matrix has aspecial structureSo find a matrix with a convenient structure and which willsatisfy the JL property
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Operator norm
We need something called the operator norm in our analysis
The operator norm of a transformation matrix A is
||A||p→q = sup||x||p=1
||Ax||q
i.e. maximal q norm of the transformation of unit `p-normpoints
A fact we will need to employ:
||A||p1→p2 = ||AT ||q2→q1
where 1p1
+ 1q1
= 1, 1p2
+ 1q2
= 1
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Reverse engineering
Let’s say the mapping is a matrix multiplication
In particular, say we have a mapping of the form
f : x 7→ BDx
where B is some k × d matrix with unit columns, and D is adiagonal matrix whose entries are randomly ±1
Doing a random sign change again
Now we just need to see what properties we will need B tosatisfy in order for
||BDx||2 ≈ ||x||2
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
Bounding the mapping
Easy to see that
BDx =
B11d1x1 + . . . + B1dddxd...
Bk1d1x1 + . . . + Bkdddxd
Write as BDx = Mz, where
M(i) = xiB(i)
z =[d1 . . . dd
]TThere is a special name for a vector like Mz...
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
Rademacher series
Definition
If M is an arbitrary k × d real matrix, and z ∈ Rd is so that
zi =
+1 p = 1/2
−1 p = 1/2
then Mz is called a Rademacher random variable. This is a vectorwhose entries are arbitrary sums/differences of each of the entiresin rows of M.
Such a variable is interesting because of a powerful theorem...
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
Talagrand’s theorem
Theorem
Suppose M, z are as above. Let Z = ||Mz||p, and let
σ = ||M||2→p
µ = median(Z )
Then,Pr [|Z − µ| > t] ≤ 4e−t2/8σ2
(see [6])
σ (the “deviation”) is the maximal p-norm of all points on theunit circle
Theorem says that the norm of a Rademacher variable issharply concentrated about the median
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
Implications for us
Our mapping, BDx, has given us a Rademacher randomvariable
We know that we can apply Talagrand’s theorem to get aconcentration result
So, all we need to do is find out what the median anddeviation are...
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
Deviation
Let Y = ||BDx||2 = ||Mz||2Deviation is
σ = sup||y||2=1
||yTM||2
= sup
(d∑
i=1
x2i
(yTB(i)
)2)1/2
≤ ||x||4 sup
(d∑
i=1
(yTB(i))4
)1/4
by Cauchy-Schwartz
= ||x||4||BT ||2→4
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
What do we need?
So, σ ≤ ||x||4||BT ||2→4
Fact: |1− µ| ≤√
32σ
Can combine to get
Pr [|Y − 1| > t] ≤ c0e−c1t2/(||x||24||BT ||22→4)
Result: We need to control both ||x||4 and ||BT ||2→4
i.e. we want them both to be smallIf we manage this, we’ve got our concentration bound
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
What do we need?
So, σ ≤ ||x||4||BT ||2→4
Fact: |1− µ| ≤√
32σ
Can combine to get
Pr [|Y − 1| > t] ≤ c0e−c1t2/(||x||24||BT ||22→4)
Result: We need to control both ||x||4 and ||BT ||2→4
i.e. we want them both to be smallIf we manage this, we’ve got our concentration bound
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
The two ingredients
To get the concentration bound, we need to ensure that||x||4, ||BT ||2→4 are sufficiently small
How to control ||x||4?
Use repeated Fourier/Walsh-Hadamard transforms
How to control ||BT ||2→4?
Use error correcting codes
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
The two ingredients
To get the concentration bound, we need to ensure that||x||4, ||BT ||2→4 are sufficiently small
How to control ||x||4?Use repeated Fourier/Walsh-Hadamard transforms
How to control ||BT ||2→4?
Use error correcting codes
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Bounding the mapping
The two ingredients
To get the concentration bound, we need to ensure that||x||4, ||BT ||2→4 are sufficiently small
How to control ||x||4?Use repeated Fourier/Walsh-Hadamard transforms
How to control ||BT ||2→4?
Use error correcting codes
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Controlling ||x||4
Problem: Input x is “adversarial” - so how to make ||x||4small?
Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability
Problem: What is such a Φ?
(Final!) Solution: Back to the Fourier transform!
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Controlling ||x||4
Problem: Input x is “adversarial” - so how to make ||x||4small?
Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability
Problem: What is such a Φ?
(Final!) Solution: Back to the Fourier transform!
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Controlling ||x||4
Problem: Input x is “adversarial” - so how to make ||x||4small?
Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability
Problem: What is such a Φ?
(Final!) Solution: Back to the Fourier transform!
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Controlling ||x||4
Problem: Input x is “adversarial” - so how to make ||x||4small?
Solution: Use an isometric mapping Φ, with a guarantee that||Φx||4 is small with very high probability
Problem: What is such a Φ?
(Final!) Solution: Back to the Fourier transform!
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
The Discrete Fourier transform
Discrete Fourier transform on a0, a1, . . . , aN−1 is
ak 7→N−1∑n=0
ane−2πikn/N
=N−1∑n=0
an
(e−2πik/N
)n
Can think of it as a polynomial evaluation - if
P(x) = a0 + a1x + a2x2 + . . . + aN−1x
N−1
then we haveak 7→ P
(e−2πik/N
)
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
The finite-field Fourier transform
Notice that ωk = e−2πik/N 6= 1 satisfies (ωk)N = 1
ωk is a primitive root of 1
Transform isak 7→ P
(ωk)
for any primitive root ω
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
The multi-dimensional Fourier transform
We can also consider the transform of multi-dimensional data
1-D case:
ak 7→N−1∑n=0
anωkn
υ-D case: If n = (n1, . . . , nυ),
ak 7→N−1∑
n1,...,nυ=0
anωk.n
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
The Walsh-Hadamard transform
Consider the case N = 2, ω = −1 [7]:
ak1,k2 7→1∑
n1,n2=0
an1,n2(−1)k1n1+k2n2
This is called the Walsh-Hadamard transform
Intuition: Instead of using sinusoidal basis functions, usesquare-wave functions
The square waves are called Walsh-functions
Why not the standard discrete FT?
We use a technical property about the Walsh-Hadamardtransform matrix...
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Fourier transform on the binary hyper-cube
Suppose we work with F2 = 0, 1We can encode the Fourier transform with theWalsh-Hadamard matrix Hd ,
Hd(i , j) =1
2d/2(−1)<i−1,j−1>
where < i , j > is the dot-product of i , j as expressed in binary
Fact:
Hd =1√2
[Hd/2 Hd/2
Hd/2 −Hd/2
]Corollary: We can compute Hd in O(d log d) time
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Example of Hadamard matrix
When d = 4, we get
H4 =1
2
1 1 1 11 −1 1 −11 1 −1 −11 −1 −1 1
Note entries are always ±1
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Fourier again?
Let Φ : x 7→ HdD0xD0 as before a random diagonal ±1 matrix
Already know that it will preserve the `2 norm
But is ||Φ(x)||4 small?
Answer: Yes - by another application of Talagrand’s theorem!
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Fourier again?
Let Φ : x 7→ HdD0xD0 as before a random diagonal ±1 matrix
Already know that it will preserve the `2 norm
But is ||Φ(x)||4 small?
Answer: Yes - by another application of Talagrand’s theorem!
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Towards Talagrand
Need σ, µ for Talagrand’s theorem
Write Φ(x) = Mz as before, where M(i) = xiH(i)
Estimate deviation:
σ = ||M||2→4
= ||MT || 43→2 (from earlier fact)
≤(∑
x4i
)1/4sup
||y||4/3=1
(∑(yTH(i)
)4)1/4
= ||x||4||H|| 43→4
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Some magic
We now employ the following theorem [4]
Theorem
Haussdorf-Young theorem. For any p ∈ [1, 2], if H is theHadamard matrix, and 1
p + 1q = 1, then
||H||p→q ≤√
d .d−1p
As a result, for p = 43 ,
σ ≤ ||x||4d−1/4
Further, we have the following fact (see [3] for proof!)...
Fact: µ = O(
1d1/4
)
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Getting the desired result
With the above σ, µ, an application of Talagrand, along withthe assumption k = O(d1/2−δ), reveals
||HD0x||4 ≤ c0d−1/4 + c1d
−δ/2||x||4
If we compose the mapping,
||HD1(HD0x)|| ≤ c0d−1/4 + c0c1d
−1/4−δ/2 + c21d−δ||x||4
If we repeat this r = 12δ times,
||HDr−1HDr−2 . . .HD0x||4 = O(d−1/4
)
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
The Walsh-Hadamard transform
Our resultant transform
To control ||x||4, use the composed transform
Φ(r) : x 7→ HDr−1HDr−2 . . .HD0x
We manage to preserve ||x||2, and contract
||Φrx||4
Runtime is O(
d log dδ
)
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Error-correcting codes
The Hadamard matrix also has a connection toerror-correcting codes
Such codes look to represent one’s message in such a waythat it can be decoded correctly even if there are some errorsduring transmission
Suppose we want to send out a message to a decoder whichallows for at most d errors
i.e. we can recover from d or less errors in the transmission
Fact: By choosing our “code-words” from the matrix[H2d
−H2d
], where −1 7→ 0, we can correct up to d errors
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Code matrix
An m × d matrix A is called a code matrix if
A =
√d
m
Hd(i1, :)Hd(i2, :)
...Hd(im, :)
Picking out only m out of d rows of the Hadamard matrix
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Independence in codes
A code matrix is called a-wise independent if exactly d2a
columns agree in a places
Independence is very useful for us:
Theorem
Suppose B is a k × d, 4-wise independent code matrix. Then,
||BT ||2→4 = O
(d1/4
√k
)
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Proof of theorem
Recall that we need to bound
||BT ||2→4 = sup||y||2=1
||yTB||4
Consider:
||yTB||44 = dE[(yTB(j))4
]=
d
k2
∑i1
∑i2
∑i3
∑i4
E [yi1yi2yi3yi4b1b2b3b4]
=d
k2(3||y||42 − 2||y||44)
≤ 3d
k2
Consequently,
||BT ||2→4 ≤(3d)1/4
√k
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Making our matrix
We’re set if we get a k × d , 4-wise independent code matrix
Problem: How do we make such a matrix?
Fact: There exists a 4-wise independent code matrix of sizek × BCH(k) = Θ(k2)
Called the BCH code matrix
Which is good, because...
Fact: By padding and “copy-pasting”, we retainindependence. In particular, we can construct a k × d matrixfrom a k × BCH(k) matrix:
B =[BBCH BBCH . . . BBCH
]︸ ︷︷ ︸d
BCH(k)copies
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Making our matrix
We’re set if we get a k × d , 4-wise independent code matrix
Problem: How do we make such a matrix?
Fact: There exists a 4-wise independent code matrix of sizek × BCH(k) = Θ(k2)
Called the BCH code matrix
Which is good, because...
Fact: By padding and “copy-pasting”, we retainindependence. In particular, we can construct a k × d matrixfrom a k × BCH(k) matrix:
B =[BBCH BBCH . . . BBCH
]︸ ︷︷ ︸d
BCH(k)copies
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Making our matrix
We’re set if we get a k × d , 4-wise independent code matrix
Problem: How do we make such a matrix?
Fact: There exists a 4-wise independent code matrix of sizek × BCH(k) = Θ(k2)
Called the BCH code matrix
Which is good, because...
Fact: By padding and “copy-pasting”, we retainindependence. In particular, we can construct a k × d matrixfrom a k × BCH(k) matrix:
B =[BBCH BBCH . . . BBCH
]︸ ︷︷ ︸d
BCH(k)copies
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Error-correcting codes
Time to make matrix
Time to compute the mapping x 7→ Bx?
We have to do dBCH(k) mappings BBCHxBCH
Each such mapping can be done via a Walsh-Hadamardtransform, by construction of BCH codes
Takes time O(BCH(k). log BCH(k))
Total runtime is therefore O(d log k)
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Putting it together
Merging results
Use the randomized Fourier transform to keep ||x||4 small
O(d log d) time
Use the error-correcting code matrix to keep ||B||2→4 small
O(d log k) time
Result: We get the concentration bound!
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Putting it together
Runtime
Runtime is still going to be O(d log d)
Question: Can we speed up the computation of Φ(r)?
Answer: Yes - use the same “block” idea as with theerror-correcting codes
Some rather technical calculation reveals this will still work
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Putting it together
Runtime
Runtime is still going to be O(d log d)
Question: Can we speed up the computation of Φ(r)?
Answer: Yes - use the same “block” idea as with theerror-correcting codes
Some rather technical calculation reveals this will still work
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Putting it together
Blocked transform
Choose β = BCH(k).kδ = Θ(k2+δ)
Let
H =
H1
H2
. . .
Hd/β
where each Hi is of size β × β
Fact: The above mapping can replace Φr
The mapping HD ′x can be computed in time O(d log k), soour total runtime is O(d log k)
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Putting it together
A tabular comparison
Runtimes of the three approaches (standard JL, Fast JLT, andAilon-Liberty) (from [3]):
k = o(log d)
k ∈ [ω(log d),
o(poly(d))]
k ∈ [Ω(poly(d)),
o((d log d)1/3)]
k ∈ [ω((d log d)1/3),
O(d1/2−δ)]
AL AL AL, FJLT ALJL FJLT FJLT
FJLT JL JL JL
Faster Johnson-Lindenstrauss style reductions
Ailon and Liberty’s improvement
Putting it together
Conclusion
`2 dimensionality reduction is based on theJohnson-Lindenstrauss lemma
The standard approach takes O(dk) time to perform thereduction
By sparsifying, and compensating with a randomized Fouriertransform, we can reduce the runtime to roughly O(d log d)via the Fast Johnson-Lindenstrauss transform [2]
By using error-correcting codes and a randomized Fouriertransform, we can reduce the runtime to roughly O(d log k)via Ailon and Liberty’s transform [3]
Open questions: Can one extend this to k = O(d1−δ)?k = Ω(d)?
Faster Johnson-Lindenstrauss style reductions
References
Achlioptas, D.
Database-friendly random projections.
In PODS ’01: Proceedings of the Twentieth ACMSIGMOD-SIGACT-SIGART Symposium on Principles of DatabaseSystems (New York, NY, USA, 2001), ACM Press, pp. 274–281.
Ailon, N., and Chazelle, B.
Approximate nearest neighbors and the fast Johnson-Lindenstrausstransform.
In STOC ’06: Proceedings of the thirty-eighth annual ACM symposium onTheory of computing (New York, NY, USA, 2006), ACM Press,pp. 557–563.
Ailon, N., and Liberty, E.
Fast dimension reduction using Rademacher series on dual BCH codes.
Tech. Rep. TR07-070, Electronic Colloquium on ComputationalComplexity, 2007.
Bergh, J., and Lofstrom, J.
Interpolation Spaces.
Springer-Verlag, 1976.
Faster Johnson-Lindenstrauss style reductions
References
Johnson, W., and Lindenstrauss, J.
Extensions of Lipschitz mappings into a Hilbert space.
In Conference in Modern Analysis and Probability (Providence, RI, USA,1984), American Mathematical Society, pp. 189–206.
Ledoux, M., and Talagrand, M.
Probability in Banach Spaces: Isoperimetry and Processes.
Springer, 2006.
Massey, J. L.
Design and analysis of block ciphers.
http://www.win.tue.nl/math/eidma/courses/minicourses/massey/
dabcmay2000f3.pdf, May 2000.
Presentation.