ubc-alm: combining k-nn with svd for wsd'verbs.colorado.edu/~mpalmer/ling7800/brutz.pdf ·...

45
”UBC-ALM: Combining k-NN with SVD for WSD” Michael Brutz CU-Boulder Department of Applied Mathematics 21 March 2013 M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 1 / 41

Upload: others

Post on 07-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

”UBC-ALM: Combining k-NN with SVD for WSD”

Michael Brutz

CU-Boulder Department of Applied Mathematics

21 March 2013

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 1 / 41

Page 2: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

UBC-ALM: Combining k-NN and SVD for WSD

This 2007 paper by Agirre and Lacalle looks at representing senses of aword with feature vectors and combining two mathematical methods to doWSD using those vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 2 / 41

Page 3: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Outline

1 Feature Vectors2 SVD3 k-Nearest Neighbors4 Paper’s Results

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 3 / 41

Page 4: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

The idea is to represent a word with a vector (a list of numbers).

One of the simplest ways to do this is what is known as the bag-of-wordsapproach.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 4 / 41

Page 5: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

The idea is to represent a word with a vector (a list of numbers).

One of the simplest ways to do this is what is known as the bag-of-wordsapproach.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 4 / 41

Page 6: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

If you have the sentence:

”The cat ran.”

Then the bag-of-words representation for the words in this sentence is tohave a vector where the only non-zero entries are:

the = 1, cat =1, ran =1

basically, just having a value of 1 for every position in the vectorcorresponding to a word that appears in the sentence.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 5 / 41

Page 7: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

To make this into a vector, all we do is associate each entry in the vectorwith a word.

For the ”The cat ran.” example, this vector would look like

|A1

|

=

10...10...1

Again, where the 1’s are in the slots for ”the”, ”cat”, and ”ran”, and all

the others are 0.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 6 / 41

Page 8: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

When we collect a set of such vectors into a single object like so

| |A1 A2

| |

=

1...

0 1... 0

1...

0 1...

...1 0

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 7 / 41

Page 9: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

We have what is called a matrix.

| |A1 A2

| |

=

1...

0 1... 0

1...

0 1...

...1 0

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 8 / 41

Page 10: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

I’m only using two vectors for illustration, you can include any number andstill have a matrix.

| | |A1 A2 ... An

| | |

=

1...

...0 1 0... 0 1

1... ...

...0 1 0...

......

1 0 1

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 9 / 41

Page 11: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Feature Vectors

Feature Vectors

This leads us into SVD.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 10 / 41

Page 12: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

SVD stands for Singular Value Decomposition.

In mathy terms, it’s where you take a matrix, A, and break it up into theproduct of three special matrices.

SVD of A

A = UΣV T

U contains the left singular vectors of the matrix A, Σ contains its singularvalues, and V contains its right singular vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 11 / 41

Page 13: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

SVD stands for Singular Value Decomposition.

In mathy terms, it’s where you take a matrix, A, and break it up into theproduct of three special matrices.

SVD of A

A = UΣV T

U contains the left singular vectors of the matrix A, Σ contains its singularvalues, and V contains its right singular vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 11 / 41

Page 14: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

This is what it looks like for a 3 by 3 matrix.

a11 a12 a13a21 a22 a23a31 a32 a33

= | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 σ3

− v1 −− v2 −− v3 −

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 12 / 41

Page 15: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

The terms in the middle matrix, the σ’s, are what are called the ”singularvalues”. a11 a12 a13

a21 a22 a23a31 a32 a33

= | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 σ3

− v1 −− v2 −− v3 −

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 13 / 41

Page 16: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

When we delete the singular values the farthest along the diagonal, we stillhave a good approximation to our original matrix A a11 a12 a13

a21 a22 a23a31 a32 a33

≈ | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 0

− v1 −− v2 −− v3 −

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 14 / 41

Page 17: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

This implies we can approximate the three column vectors of the A matrixby only using the first two u vectors. a11 a12 a13

a21 a22 a23a31 a32 a33

≈ | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 0

− v1 −− v2 −− v3 −

You have to know how matrix multiplication works to see this, but the ideaof using SVD to represent more vectors with fewer is the important thing.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 15 / 41

Page 18: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

This implies we can approximate the three column vectors of the A matrixby only using the first two u vectors. a11 a12 a13

a21 a22 a23a31 a32 a33

≈ | | |

u1 u2 u3| | |

σ1 0 00 σ2 00 0 0

− v1 −− v2 −− v3 −

You have to know how matrix multiplication works to see this, but the ideaof using SVD to represent more vectors with fewer is the important thing.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 15 / 41

Page 19: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

So what do these approximations do?

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 16 / 41

Page 20: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

Let’s ask Big Bird!

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 17 / 41

Page 21: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

Grayscale images are represented with numbers that determine by howbright to make each pixel, meaning this 104 by 138 pixel image is actually

represented by a matrix.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 18 / 41

Page 22: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

Using 1 Singular Value

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 19 / 41

Page 23: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 20 / 41

Page 24: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

Using 40 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 21 / 41

Page 25: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

Using All 104 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 22 / 41

Page 26: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

As you can see, we can get a pretty good approximation to what all thevectors comprising the image are doing, using only a fraction of thesingular vectors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 23 / 41

Page 27: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

Moreover, with regards to WSD it isn’t that we want this

Using 40 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 24 / 41

Page 28: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

We actually want this

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 25 / 41

Page 29: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

The reason is that while using fewer singular values blurs the image, this istantamount to adding in unseen word counts with respect to our vectordescription of words.

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 26 / 41

Page 30: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

It’s a double whammy! Not only do we have fewer vectors to deal with,but we also get guesstimates to word counts we haven’t seen in collectingdata, but suspect are relevant.

Using 5 Singular Values

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 27 / 41

Page 31: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

SVD

SVD

So instead of using simple bag-of-words vectors, we instead use thesesingular vectors to describe senses of a word as they help with sparsity ofdata issues.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 28 / 41

Page 32: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

The last piece of the puzzle is to use k-NN (k-Nearest Neighbors) todisambiguate words.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 29 / 41

Page 33: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

Before understanding how it works, one needs to realize that vectors havea geometric interpretation as points in space.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 30 / 41

Page 34: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

For example, take the vector

(0.40.6

)

This can be represented graphically as

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 31 / 41

Page 35: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

For example, take the vector

(0.40.6

)

This can be represented graphically as

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 31 / 41

Page 36: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

Graphically Representing (0.4,0.6)

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 32 / 41

Page 37: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

For the sake of illustration, say that this ”X” represents the word ”bass”and we are trying to find the sense for it.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 33 / 41

Page 38: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

Now add in the points that have been tagged in training our WSDclassifier.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 34 / 41

Page 39: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

Let ∝ represent the ”fish” version and the eighth-notes represent themusic version for feature vectors from a hand-tagged corpus.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 35 / 41

Page 40: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

The way k-NN works is to assign the label to the target word that isshared the most by its k nearest hand-tagged neighbors.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 36 / 41

Page 41: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

For example, if k = 3 then in this example we would say that the targetword has the fish sense .

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 37 / 41

Page 42: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

k-Nearest Neighbors

k-NN

If k = 5, then the music sense.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 38 / 41

Page 43: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Paper’s Results

Paper’s Results

In the paper, all of these approaches had some extra bells and whistlesthrown on them (e.g. the feature vectors weren’t just bag-of-words, andthey used a weighted k-NN), but nothing really worth getting into.

If you’ve understood what I’ve said so far, then you understand what theywere fundamentally doing in the paper.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 39 / 41

Page 44: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Paper’s Results

Paper’s Results

They compared their WSD classifier to the others entered in theSemEval-2007 on the Lexical Sample and All-Words tasks, and overall didpretty well:

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 40 / 41

Page 45: UBC-ALM: Combining k-NN with SVD for WSD'verbs.colorado.edu/~mpalmer/Ling7800/Brutz.pdf · UBC-ALM: Combining k-NN and SVD for WSD This 2007 paper by Agirre and Lacalle looks at

Paper’s Results

Questions?

Thank you for your time.

M. Brutz (CU-Boulder) LING 7800: Paper Presentation Spr13 41 / 41