dimensionality reduction

42
Dimensionality Reduction

Upload: willow-burton

Post on 31-Dec-2015

48 views

Category:

Documents


5 download

DESCRIPTION

Dimensionality Reduction. Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: Documents: Features: Thousands of words, millions of word pairs Surveys – Netflix: 480k users x 177k movies. Dimensionality Reduction. Compress / reduce dimensionality: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dimensionality Reduction

Dimensionality Reduction

Page 2: Dimensionality Reduction

Slides by Jure Leskovec 2

Dimensionality Reduction

• High-dimensional == many features• Find concepts/topics/genres:

– Documents:• Features: Thousands of words, millions of word pairs

– Surveys – Netflix: 480k users x 177k movies

Page 3: Dimensionality Reduction

Slides by Jure Leskovec 3

Dimensionality Reduction

• Compress / reduce dimensionality:– 106 rows; 103 columns; no updates– random access to any cell(s); small error: OK

Page 4: Dimensionality Reduction

Slides by Jure Leskovec 4

Dimensionality Reduction

• Assumption: Data lies on or near a low d-dimensional subspace

• Axes of this subspace are effective representation of the data

Page 5: Dimensionality Reduction

Slides by Jure Leskovec 5

Why Reduce Dimensions?

Why reduce dimensions?• Discover hidden correlations/topics

– Words that occur commonly together• Remove redundant and noisy features

– Not all words are useful• Interpretation and visualization• Easier storage and processing of the data

Page 6: Dimensionality Reduction

Slides by Jure Leskovec 6

SVD - Definition

A[m x n] = U[m x r] [ r x r] (V[n x r])T

• A: Input data matrix– m x n matrix (e.g., m documents, n terms)

• U: Left singular vectors – m x r matrix (m documents, r concepts)

• : Singular values– r x r diagonal matrix (strength of each ‘concept’)

(r : rank of the matrix A)• V: Right singular vectors

– n x r matrix (n terms, r concepts)

Page 7: Dimensionality Reduction

Slides by Jure Leskovec 7

SVD

Am

n

m

n

U

VT

T

Page 8: Dimensionality Reduction

Slides by Jure Leskovec 8

SVD

Am

n

+

1u1v1 2u2v2

σi … scalarui … vectorvi … vector

T

Page 9: Dimensionality Reduction

Slides by Jure Leskovec 9

SVD - Properties

It is always possible to decompose a real matrix A into A = U VT , where

• U, , V: unique• U, V: column orthonormal:

– UT U = I; VT V = I (I: identity matrix)– (Cols. are orthogonal unit vectors)

• : diagonal– Entries (singular values) are positive,

and sorted in decreasing order (σ1 σ2 ... 0)

Page 10: Dimensionality Reduction

Slides by Jure Leskovec 10

SVD – Example: Users-to-Movies

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=

SciFi

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 11: Dimensionality Reduction

Slides by Jure Leskovec 11

SVD – Example: Users-to-Movies

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SciFi-conceptRomance-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 12: Dimensionality Reduction

Slides by Jure Leskovec 12

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SciFi-conceptRomance-concept

U is “user-to-concept” similarity matrix

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 13: Dimensionality Reduction

Slides by Jure Leskovec 13

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

‘strength’ of SciFi-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 14: Dimensionality Reduction

Slides by Jure Leskovec 14

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

V is “movie-to-concept”similarity matrix

SciFi-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 15: Dimensionality Reduction

Slides by Jure Leskovec 15

SVD - Example

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

SciFi-concept

SciFi

Romnce

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

V is “movie-to-concept”similarity matrix

Page 16: Dimensionality Reduction

Slides by Jure Leskovec 16

SVD - Interpretation #1

‘movies’, ‘users’ and ‘concepts’:• U: user-to-concept similarity matrix

• V: movie-to-concept sim. matrix

• : its diagonal elements: ‘strength’ of each concept

Page 17: Dimensionality Reduction

Slides by Jure Leskovec 17

SVD - interpretation #2

SVD gives best axis to project on:

• ‘best’ = min sum of squares of projection errors

• minimum reconstruction error

v1

first right singular vector

Movie 1 rating

Mo

vie

2 ra

tin

g

Page 18: Dimensionality Reduction

Slides by Jure Leskovec 18

SVD - Interpretation #2

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

v1

=

v1

first right singular vector

Movie 1 rating

Mo

vie

2 ra

tin

g

Page 19: Dimensionality Reduction

Slides by Jure Leskovec 19

SVD - Interpretation #2

• A = U VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

variance (‘spread’) on the v1 axis

=

Page 20: Dimensionality Reduction

Slides by Jure Leskovec 20

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x=

Page 21: Dimensionality Reduction

Slides by Jure Leskovec 21

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

A=

Page 22: Dimensionality Reduction

Slides by Jure Leskovec 22

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

A=

~

Page 23: Dimensionality Reduction

Slides by Jure Leskovec 23

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

A=

~

Page 24: Dimensionality Reduction

Slides by Jure Leskovec 24

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18

0.36

0.18

0.90

0

00

9.64x

0.58 0.58 0.58 0 0

x

A=

~

Page 25: Dimensionality Reduction

Slides by Jure Leskovec 25

SVD - Interpretation #2

More details• Q: How exactly is dim. reduction done?• A: Set the smallest singular values to zero

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

~

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0

A=

B=

Frobenius norm:

ǁMǁF = Σij Mij2

ǁA-BǁF = Σij (Aij-Bij)2

is “small”

Page 26: Dimensionality Reduction

Slides by Jure Leskovec 26

A U

Sigma

VT=

B U

Sigma

VT

=

B is approx A

Page 27: Dimensionality Reduction

27

SVD – Best Low Rank Approx.• Theorem: Let A = U VT (σ1σ2…, rank(A)=r)

then B = U S VT – S = diagonal nxn matrix where si=σi (i=1…k) else si=0

is a best rank-k approximation to A:– B is solution to minB ǁA-BǁF where rank(B)=k

• We will need 2 facts:– where M = P Q R is SVD of M– U VT - U S VT = U ( - S) VT

Slides by Jure Leskovec

Σ

𝜎 11

𝜎 𝑟𝑟

Page 28: Dimensionality Reduction

Slides by Jure Leskovec 28

SVD – Best Low Rank Approx.• We will need 2 facts:

– where M = P Q R is SVD of M

– U VT - U S VT = U ( - S) VT

We apply:-- P column orthonormal-- R row orthonormal-- Q is diagonal

Page 29: Dimensionality Reduction

Slides by Jure Leskovec 29

SVD – Best Low Rank Approx.• A = U VT , B = U S VT (σ1σ2… 0, rank(A)=r)

– S = diagonal nxn matrix where si=σi (i=1…k) else si=0

then B is solution to minB ǁA-BǁF , rank(B)=kWhy?

• We want to choose si to minimize we set si=σi (i=1…k) else si=0

r

kii

r

kii

k

iiis s

i1

2

1

2

1

2)(min

r

iiisFFkBrankBsSBA

i1

2

)(,)(minminmin

U VT - U S VT = U ( - S) VT

Page 30: Dimensionality Reduction

Slides by Jure Leskovec 30

SVD - Interpretation #2

Equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

σ1

σ2

v1

v2

Page 31: Dimensionality Reduction

Slides by Jure Leskovec 31

SVD - Interpretation #2Equivalent:‘spectral decomposition’ of the matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1σ1 vT1 u2σ2 vT

2+ +...

n

m

n x 1 1 x m

k terms

Assume: σ1 σ2 σ3 ... 0

Why is setting small σs the thing to do?Vectors ui and vi are unit length, so σi scales them.So, zeroing small σs introduces less error.

Page 32: Dimensionality Reduction

Slides by Jure Leskovec 32

SVD - Interpretation #2

Q: How many σs to keep?A: Rule-of-a thumb:

keep 80-90% of ‘energy’ (=σi2)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1σ1 vT1 u2σ2 vT

2+ +...n

m

assume: σ1 σ2 σ3 ...

Page 33: Dimensionality Reduction

Slides by Jure Leskovec 33

SVD - Complexity

• To compute SVD:– O(nm2) or O(n2m) (whichever is less)

• But:– Less work, if we just want singular values– or if we want first k singular vectors– or if the matrix is sparse

• Implemented in linear algebra packages like– LINPACK, Matlab, SPlus, Mathematica ...

Page 34: Dimensionality Reduction

Slides by Jure Leskovec 34

SVD - Conclusions so far

• SVD: A= U VT: unique– U: user-to-concept similarities– V: movie-to-concept similarities– : strength of each concept

• Dimensionality reduction: – keep the few largest singular values

(80-90% of ‘energy’)– SVD: picks up linear correlations

Page 35: Dimensionality Reduction

Slides by Jure Leskovec 35

Case study: How to query?

Q: Find users that like ‘Matrix’ and ‘Alien’

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=

SciFi

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 36: Dimensionality Reduction

Slides by Jure Leskovec 36

Case study: How to query?

Q: Find users that like ‘Matrix’A: Map query into a ‘concept space’ – how?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 3 0 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=

SciFi

Romnce

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 37: Dimensionality Reduction

Slides by Jure Leskovec 37

Case study: How to query?

Q: Find users that like ‘Matrix’A: map query vectors into ‘concept space’ – how?

5 0 0 0 0

q=

MatrixAl

ien

v1

q

v2

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Project into concept space:Inner product with each ‘concept’ vector vi

Page 38: Dimensionality Reduction

Slides by Jure Leskovec 38

Case study: How to query?

Q: Find users that like ‘Matrix’A: map the vector into ‘concept space’ – how?

v1

q

q*v1

5 0 0 0 0

q=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

v2

MatrixAl

ien

Project into concept space:Inner product with each ‘concept’ vector vi

Page 39: Dimensionality Reduction

Slides by Jure Leskovec 39

Case study: How to query?

Compactly, we have:qconcept = q V

E.g.: 0.58 0

0.58 0

0.58 0

0 0.71

0 0.71

movie-to-concept similarities

= 2.9 0

SciFi-concept

5 0 0 0 0

q=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 40: Dimensionality Reduction

Slides by Jure Leskovec 40

Case study: How to query?How would the user d that rated

(‘Alien’, ‘Serenity’) be handled?dconcept = d V

E.g.:0.58 0

0.58 0

0.58 0

0 0.71

0 0.71

movie-to-concept similarities

= 5.22 0

SciFi-concept

0 4 5 0 0

d=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Page 41: Dimensionality Reduction

Slides by Jure Leskovec 41

Case study: How to query?

Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’!

0 4 5 0 0

d= 2.9 0

SciFi-concept

5 0 0 0 0

5.22 0

q=

Mat

rix

Alie

n

Sere

nity

Casa

blan

ca

Am

elie

Similarity = 0 Similarity ≠ 0

Page 42: Dimensionality Reduction

Slides by Jure Leskovec 42

SVD: Drawbacks

+ Optimal low-rank approximation:• in Frobenius norm

- Interpretability problem:– A singular vector specifies a linear

combination of all input columns or rows- Lack of sparsity:

– Singular vectors are dense!

=

U

VT