introducing latent semantic analysis

25
Introducing Latent Semantic Analysis Tomas K Landauer et al., “An introduction to latent semantic analysis,” Discourse Processes, Vol. 25 (2-3), pp. 259-284, 1998. Scott Deerwester et al., “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, Vol. 41 (6), pp. 391-407, 1990. Kirk Baker, “Singular Value Decomposition Tutorial,” Electronic document, 2005. Aug 22, 2014 Hee-Gook Jun

Upload: otis

Post on 05-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Introducing Latent Semantic Analysis. Tomas K Landauer et al., “An introduction to latent semantic analysis,” Discourse Processes, Vol. 25 (2-3), pp. 259-284, 1998. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introducing Latent Semantic Analysis

Introducing Latent Semantic Analysis

Tomas K Landauer et al., “An introduction to latent semantic analysis,” Discourse Processes, Vol. 25 (2-3), pp. 259-284, 1998.Scott Deerwester et al., “Indexing by latent semantic analysis,” Journal of the American Society for Information Science, Vol. 41 (6), pp. 391-407, 1990.Kirk Baker, “Singular Value Decomposition Tutorial,” Electronic document, 2005.

Aug 22, 2014Hee-Gook Jun

Page 2: Introducing Latent Semantic Analysis

2 / 25

Outline

SVD SVD to LSA Conclusion

Page 3: Introducing Latent Semantic Analysis

3 / 25

Eigendecomposition vs. Singular Value Decomposition

Eigendecomposition– Must be a diagonalizable matrix– Must be a square matrix– Matrix (n x n size) must have n linearly independent eigenvector

e.g. symmetric matrix ..

Singular Value Decomposition– Computable for any size (M x n) of matrix

A U ∑ VT

A P Ʌ P-1

Page 4: Introducing Latent Semantic Analysis

4 / 25

U: Left Singular Vectors of A

Unitary matrix– Columns of U are orthonormal (orthogonal + normal)– orthonormal eigenvectors of AAT

A U ∑ VT

and is orthogonal

= [0,0,0,1] = [0,1,0,0]

= (0x0) + (0x1) + (0x0) + (1x0)

= 0

is normal vector

= [0,0,0,1]

|| =

= 1

Page 5: Introducing Latent Semantic Analysis

5 / 25

V: Right Singular Vectors of A

Unitary matrix– Columns of V are orthonormal (orthogonal + normal)– orthonormal eigenvectors of ATA

A U ∑ VT

Page 6: Introducing Latent Semantic Analysis

6 / 25

∑ (or S)

Diagonal Matrix– Diagonal entries are the singular values of A

Singular values– Non-zero singular values– Square roots of eigenvalues from U (or V) in descending order

A U ∑ VT

Page 7: Introducing Latent Semantic Analysis

7 / 25

Calculation Procedure

1. U is a list of eigenvectors of AAT

1. Compute AAT

2. Compute eigenvectors of AAT

3. Matrix Orthonormalization

2. V is a list of eigenvectors of ATA1. Compute ATA2. Compute eigenvalues of ATA3. Orthonormalize and transpose

3. ∑ is a list of eigenvalues of U or V1. (eigenvalues of U = eigenvalues of V)

A U ∑ VT

① ② ③

Page 8: Introducing Latent Semantic Analysis

8 / 25

1.1 Matrix U – Compute AAT

Start with the matrix

Transpose of A

Then

Page 9: Introducing Latent Semantic Analysis

9 / 25

1.2 Matrix U – Eigenvectors and Eigenvalues [1/2]

Eigenvector– Nonzero vector that satisfies the equation– A is a square matrix, is an eigenvalue (scalar), is the eigenvector

≡rearrange

set determinent of the coefficient matrix to zero

Page 10: Introducing Latent Semantic Analysis

10 / 25

1.2 Matrix U – Eigenvectors and Eigenvalues [2/2]

Thus, set of eigenvectors [𝟏 𝟏𝟏 −𝟏]

② For

Calculated eigenvalues

① For

eigenvector

eigenvector

Page 11: Introducing Latent Semantic Analysis

11 / 25

1.3 Matrix U – Orthonormalization

Gram-Schmidt orthonormalization

𝑤𝑘=𝑣𝑘−∑𝑖=1

𝑘−1

(𝑢𝑖 ∙𝑣𝑘 )×𝑢𝑖

set of eigenvectors orthonormal matrix

𝑣1𝑣2 𝑢1𝑢2

normalize v1

normalize w2

find w2 (orthogonal to u1)

Page 12: Introducing Latent Semantic Analysis

12 / 25

2.1 Matrix VT – Compute ATA

Start with the matrix

Transpose of A

Then

Page 13: Introducing Latent Semantic Analysis

13 / 25

2.2 Matrix VT – Eigenvectors and Eigenvalues [1/2]

Eigenvector– Nonzero vector that satisfies the equation– A is a square matrix, is an eigenvalue (scalar), is the eigenvector

≡rearrange

set determinent of the coefficient matrix to zeroby cofactor expansion ( 여인수 전개 )

Page 14: Introducing Latent Semantic Analysis

14 / 25

2.2 Matrix VT – Eigenvectors and Eigenvalues [2/2]

Thus, set of eigenvectors

② For

① For eigenvector

[𝟏 𝟏𝟏 −𝟏]

③ For

Page 15: Introducing Latent Semantic Analysis

15 / 25

2.3 Matrix VT – Orthonormalization and Transformation

Gram-Schmidtorthonormalization

𝑤𝑘=𝑣𝑘−∑𝑖=1

𝑘−1

(𝑢𝑖 ∙𝑣𝑘 )×𝑢𝑖

set of eigenvectors orthonormal matrix

𝑣1𝑣2𝑣3 𝑢1𝑢2𝑢3

normalize v1

normalize w2find w2 (orthogonal to u1)

normalize w3

find w3 (orthogonal to u2)

Transpose

Page 16: Introducing Latent Semantic Analysis

16 / 25

3.1 Matrix ∑ (= S)

Square roots of the non-zero eigenvalues– Populate the diagonal with the values– Diagonal entries in ∑ are the singular values of A

Page 17: Introducing Latent Semantic Analysis

17 / 25

Outline

SVD SVD to LSA

Page 18: Introducing Latent Semantic Analysis

18 / 25

Latent Semantic Analysis

Use SVD (Singular Value Decomposition)– to simulate human learning of word and passage meaning

Represent word and passage meaning– as high-dimensional vectors in the semantic space

Page 19: Introducing Latent Semantic Analysis

19 / 25

LSA Example

doc 1 " modem the steering linux. modem, linux the modem. steering the modem. linux "

doc 2 " linux; the linux. the linux modem linux. the modem, clutch the modem. petrol "

doc 3 " petrol! clutch the steering, steering, linux. the steering clutch petrol. clutch the petrol; the clutch "

doc 4 " the the the. clutch clutch clutch! steering petrol; steering petrol petrol; steering petrol "

First analysis – Document Similarity

Second analysis – Term Similarity

Page 20: Introducing Latent Semantic Analysis

20 / 25

LSA Example: Build a Term Frequency Matrix

d1 d2 d3 d4

linux 3 4 1 0

modem 4 3 0 1

the 3 4 4 3

clutch 0 1 4 3

steer-ing

2 0 3 3

petrol 0 1 3 4

Let Matrix A =

Page 21: Introducing Latent Semantic Analysis

21 / 25

LSA Example: Compute SVD of Matrix A

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

d1 d2 d3 d4

t1 (linux) 3 4 1 0

t2 (mo-dem)

4 3 0 1

t3 (the) 3 4 4 3

t4 (clutch) 0 1 4 3

t5 (steer-ing)

2 0 3 3

t6 (petrol) 0 1 3 4

A

x x

6 x 4 4 x 4 4 x 4

U S VT

=

- R code -result ← svd(A)

Page 22: Introducing Latent Semantic Analysis

22 / 25

LSA Example: Reduced SVD

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x x

6 x 4 4 x 4 4 x 4

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x x

6 x 2 2 x 2 2 x 4

Page 23: Introducing Latent Semantic Analysis

23 / 25

LSA Example: Document Similarity

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x

2 x 2 2 x 4

S V

=

d1 d2 d3 d4

dim1

-4.83

5.49 -6.49

-5.86

dim2

-3.52

-3.28

2.79 2.88

d1 d2 d3 d4

d1 1 0.99 0.51 0.46

d2 0.99 1 0.58 0.54

d3 0.51 0.58 1 0.99

d4 0.46 0.54 0.99 1

𝑆𝑖𝑚 ( 𝐴 ,𝐵 )=𝑐𝑜𝑠𝑖𝑛𝑒𝜃=𝐴 ∙𝐵

|𝐴||𝐵|=

(−4.83×5.49 )+(−3.52×−3.28)

√ (−4.83 )2+(−3.52 )2×√(5.49 )2+ (−3.28 )2doc 1"modem the steering linux. modem, linux the modem. steering the modem. linux "

doc 2"linux; the linux. the linux modem linux. the modem, clutch the modem. petrol "

doc 3"petrol! clutch the steering, steering, linux. the steering clutch petrol. clutch the petrol; the clutch "

doc 4 "the the the. clutch clutch clutch! steering petrol; steering petrol petrol; steering petrol "

Page 24: Introducing Latent Semantic Analysis

24 / 25

LSA Example: Term Similarity

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

x =

S V

𝑆𝑖𝑚 ( 𝐴 ,𝐵 )=𝑐𝑜𝑠𝑖𝑛𝑒𝜃=𝐴 ∙𝐵

|𝐴||𝐵|

dim1

dim2

t1-

3.76

-3.33

t2-

3.65

-3.35

t3-

7.01

-0.61

t4-

4.30

2.63

t5-

4.09

1.59

t6-

4.24

2.65

t1 t2 t3 t4 t5 t6

t1 10.99

0.80

0.29

0.45

0.28

t20.99

1.00

0.79

0.27

0.44

0.26

t30.80

0.79

10.80

0.89

0.79

t40.29

0.27

0.80

10.98

0.99

t50.45

0.44

0.89

0.98

10.98

t60.28

0.26

0.79

0.99

0.98

1

linux modem the clutch steering petrol

linux modem the

modem linux the

the linux modem clutch steering petrol

clutch the steering petrol

steering the clutch petrol

petrol the clutch steering

Page 25: Introducing Latent Semantic Analysis

25 / 25

Conclusion

Pros– Compute document similarity– even if they do not have common words

Cons– Statistical foundation missing → PLSA

dim1

dim2

dim3

dim4

t1-

0.33

-0.53

0.36

-0.14

t2-

0.32

-0.53

-0.48

0.35

t3-

0.61

-0.09

0.26

-0.14

t4-

0.37

0.42

0.60

-0.23

t5-

0.35

0.25

-0.68

-0.46

t6-

0.37

0.42

0.01

0.74

dim1

dim2

dim3

dim4

dim1

11.4 0 0 0

dim2

0 6.27 0 0

dim3

0 0 2.21 0

dim4

0 0 0 1.28

d1 d2 d3 d4

dim1

-0.42

-0.56

-0.64

-0.29

dim2

-0.48

-0.52

0.61 0.33

dim3

-0.56

0.44 0.27 -0.63

dim4

-0.51

0.46 -0.35

0.63

x x

Which one is to be chosen to reduce?