semi-supervised natural language learning reading group i set up a site at: acarlson/semisup...

26
Semi-Supervised Natural Language Learning Reading Group • I set up a site at: http://www.cs.cmu.edu/~acarlson /semisupervised/ • Cover other applications of semi-supervised learning? • Volunteers? • Every week or bi-weekly? • Time change? 1pm? Noon?

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Semi-Supervised Natural Language Learning Reading Group

• I set up a site at: http://www.cs.cmu.edu/~acarlson/semisupervised/

• Cover other applications of semi-supervised learning?

• Volunteers?

• Every week or bi-weekly?

• Time change? 1pm? Noon?

Unsupervised Word Sense Disambiguation Rivaling

Supervised Methods

Author: David Yarowsky (1995)

Presented by: Andy Carlson

Word Sense Disambiguation

• Determining what sense of a word is meant in a given sentence

• “Toyota is considering opening a plant in Detroit.”

• “The banana plant is grown all over the tropics for its fruit.”

• Different from sense induction– we assume we already know distinct senses

Using unlabeled data

• Two properties of language let us use unlabeled data:

• One sense per collocation– Nearby words provide strong and consistent clues

• One sense per discourse– With a document, the sense of a word is highly

consistent

• We can base an iterative bootstrapping algorithm on these two properties

One sense per discourse

• How accurate?

• How frequently does it apply?

Decision Lists

• List of rules of the form “collocation => sense”

• Example: life (within 2-10 words) => biological sense of plant

• Rules are ordered by log-likelihood ratio

The algorithm – step 1

• Find all occurrences of the given polysemous word

• We follow examples for the word plant

Step 2 – Initial Labeling

• For each sense of the word, identify a small number of training examples

• Strategies: dictionary words, human-labelling of most frequent collocates, or human-chosen collocates

• Example: the words life and manufacturing are used as seed collocations

Labeled as ‘living’ plant

Unlabeled examples

Labeled as ‘factory’ plant

Sample initial state

Step 3a

• Train the decision list based on the current labeling of the state space

Step 3b

• Apply learned classifier to all examples

Step 3c

• Optionally, apply the one-sense-per-discourse constraint

Step 3c

Step 3c

After steps 3b and 3c

Step 3d

• Repeat step 3 iteratively

• Details – grow window size for collocations, and randomly perturb the class inclusion threshold

Step 4

• Stop. The algorithm converges to a stable residual set.

Sample final state

Final decision list

Results