introducing latent semantic analysis

Introducing Latent Semantic Analysis

Tomas K Landauer et al., “An introduction to latent semantic analysis,” Discourse Processes, Vol. 25 (2-3), pp. 259-284, 1998.Scott Deerwester et al., “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, Vol. 41 (6), pp. 391-407, 1990.Kirk Baker, “Singular Value Decomposition Tutorial,” Electronic document, 2005.

Aug 22, 2014Hee-Gook Jun

2 / 25

Outline

SVD SVD to LSA Conclusion

3 / 25

Eigendecomposition vs. Singular Value Decomposition

Eigendecomposition– Must be a diagonalizable matrix– Must be a square matrix– Matrix (n x n size) must have n linearly independent eigenvector

e.g. symmetric matrix ..

Singular Value Decomposition– Computable for any size (M x n) of matrix

A U ∑ VT

A P Ʌ P-1

4 / 25

U: Left Singular Vectors of A

Unitary matrix– Columns of U are orthonormal (orthogonal + normal)– orthonormal eigenvectors of AAT

A U ∑ VT

and is orthogonal

= [0,0,0,1] = [0,1,0,0]

= (0x0) + (0x1) + (0x0) + (1x0)

= 0

is normal vector

= [0,0,0,1]

|| =

= 1

5 / 25

V: Right Singular Vectors of A

Unitary matrix– Columns of V are orthonormal (orthogonal + normal)– orthonormal eigenvectors of ATA

A U ∑ VT

6 / 25

∑ (or S)

Diagonal Matrix– Diagonal entries are the singular values of A

Singular values– Non-zero singular values– Square roots of eigenvalues from U (or V) in descending order

A U ∑ VT

7 / 25

Calculation Procedure

1. U is a list of eigenvectors of AAT

1. Compute AAT

2. Compute eigenvectors of AAT

3. Matrix Orthonormalization

2. V is a list of eigenvectors of ATA1. Compute ATA2. Compute eigenvalues of ATA3. Orthonormalize and transpose

3. ∑ is a list of eigenvalues of U or V1. (eigenvalues of U = eigenvalues of V)

A U ∑ VT

① ② ③

8 / 25

1.1 Matrix U – Compute AAT

Start with the matrix

Transpose of A

Then

9 / 25

1.2 Matrix U – Eigenvectors and Eigenvalues [1/2]

Eigenvector– Nonzero vector that satisfies the equation– A is a square matrix, is an eigenvalue (scalar), is the eigenvector

≡rearrange

set determinent of the coefficient matrix to zero

10 / 25

1.2 Matrix U – Eigenvectors and Eigenvalues [2/2]

Thus, set of eigenvectors [𝟏 𝟏𝟏 −𝟏]

② For

Calculated eigenvalues

① For

eigenvector

eigenvector

11 / 25

1.3 Matrix U – Orthonormalization

Gram-Schmidt orthonormalization

𝑤𝑘=𝑣𝑘−∑𝑖=1

𝑘−1

(𝑢𝑖 ∙𝑣𝑘 )×𝑢𝑖

set of eigenvectors orthonormal matrix

𝑣1𝑣2 𝑢1𝑢2

normalize v1

normalize w2

find w2 (orthogonal to u1)

12 / 25

2.1 Matrix VT – Compute ATA

Start with the matrix

Transpose of A

Then

13 / 25

2.2 Matrix VT – Eigenvectors and Eigenvalues [1/2]

Eigenvector– Nonzero vector that satisfies the equation– A is a square matrix, is an eigenvalue (scalar), is the eigenvector

≡rearrange

set determinent of the coefficient matrix to zeroby cofactor expansion ( 여인수 전개 )

14 / 25

2.2 Matrix VT – Eigenvectors and Eigenvalues [2/2]

Thus, set of eigenvectors

② For

① For eigenvector

[𝟏 𝟏𝟏 −𝟏]

③ For

15 / 25

2.3 Matrix VT – Orthonormalization and Transformation

Gram-Schmidtorthonormalization

𝑤𝑘=𝑣𝑘−∑𝑖=1

𝑘−1

(𝑢𝑖 ∙𝑣𝑘 )×𝑢𝑖

set of eigenvectors orthonormal matrix

𝑣1𝑣2𝑣3 𝑢1𝑢2𝑢3

normalize v1

normalize w2find w2 (orthogonal to u1)

normalize w3

find w3 (orthogonal to u2)

Transpose

16 / 25

3.1 Matrix ∑ (= S)

Square roots of the non-zero eigenvalues– Populate the diagonal with the values– Diagonal entries in ∑ are the singular values of A

17 / 25

Outline

SVD SVD to LSA

18 / 25

Latent Semantic Analysis

Use SVD (Singular Value Decomposition)– to simulate human learning of word and passage meaning

Represent word and passage meaning– as high-dimensional vectors in the semantic space

19 / 25

LSA Example

doc 1 " modem the steering linux. modem, linux the modem. steering the modem. linux "

doc 2 " linux; the linux. the linux modem linux. the modem, clutch the modem. petrol "

doc 3 " petrol! clutch the steering, steering, linux. the steering clutch petrol. clutch the petrol; the clutch "

doc 4 " the the the. clutch clutch clutch! steering petrol; steering petrol petrol; steering petrol "

First analysis – Document Similarity

Second analysis – Term Similarity

20 / 25

LSA Example: Build a Term Frequency Matrix

d1 d2 d3 d4

linux 3 4 1 0

modem 4 3 0 1

the 3 4 4 3

clutch 0 1 4 3

steer-ing

2 0 3 3

petrol 0 1 3 4

Let Matrix A =

21 / 25

LSA Example: Compute SVD of Matrix A

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

d1 d2 d3 d4

t1 (linux) 3 4 1 0

t2 (mo-dem)

4 3 0 1

t3 (the) 3 4 4 3

t4 (clutch) 0 1 4 3

t5 (steer-ing)

2 0 3 3

t6 (petrol) 0 1 3 4

A

x x

6 x 4 4 x 4 4 x 4

U S VT

=

- R code -result ← svd(A)

22 / 25

LSA Example: Reduced SVD

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x x

6 x 4 4 x 4 4 x 4

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x x

6 x 2 2 x 2 2 x 4

23 / 25

LSA Example: Document Similarity

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x

2 x 2 2 x 4

S V

=

d1 d2 d3 d4

dim1

-4.83

5.49 -6.49

-5.86

dim2

-3.52

-3.28

2.79 2.88

d1 d2 d3 d4

d1 1 0.99 0.51 0.46

d2 0.99 1 0.58 0.54

d3 0.51 0.58 1 0.99

d4 0.46 0.54 0.99 1

𝑆𝑖𝑚 ( 𝐴 ,𝐵 )=𝑐𝑜𝑠𝑖𝑛𝑒𝜃=𝐴 ∙𝐵

|𝐴||𝐵|=

(−4.83×5.49 )+(−3.52×−3.28)

√ (−4.83 )2+(−3.52 )2×√(5.49 )2+ (−3.28 )2doc 1"modem the steering linux. modem, linux the modem. steering the modem. linux "

doc 2"linux; the linux. the linux modem linux. the modem, clutch the modem. petrol "

doc 3"petrol! clutch the steering, steering, linux. the steering clutch petrol. clutch the petrol; the clutch "

doc 4 "the the the. clutch clutch clutch! steering petrol; steering petrol petrol; steering petrol "

24 / 25

LSA Example: Term Similarity

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

x =

S V

𝑆𝑖𝑚 ( 𝐴 ,𝐵 )=𝑐𝑜𝑠𝑖𝑛𝑒𝜃=𝐴 ∙𝐵

|𝐴||𝐵|

dim1

dim2

t1-

3.76

-3.33

t2-

3.65

-3.35

t3-

7.01

-0.61

t4-

4.30

2.63

t5-

4.09

1.59

t6-

4.24

2.65

t1 t2 t3 t4 t5 t6

t1 10.99

0.80

0.29

0.45

0.28

t20.99

1.00

0.79

0.27

0.44

0.26

t30.80

0.79

10.80

0.89

0.79

t40.29

0.27

0.80

10.98

0.99

t50.45

0.44

0.89

0.98

10.98

t60.28

0.26

0.79

0.99

0.98

1

linux modem the clutch steering petrol

linux modem the

modem linux the

the linux modem clutch steering petrol

clutch the steering petrol

steering the clutch petrol

petrol the clutch steering

25 / 25

Conclusion

Pros– Compute document similarity– even if they do not have common words

Cons– Statistical foundation missing → PLSA

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x x

Which one is to be chosen to reduce?

introducing latent semantic analysis

Documents