2004/05anle1 lexicon and lexical semantics wordnet

91
2004/05 ANLE 1 LEXICON AND LEXICAL SEMANTICS LEXICON AND LEXICAL SEMANTICS WORDNET WORDNET

Post on 21-Dec-2015

233 views

Category:

Documents


1 download

TRANSCRIPT

2004/05 ANLE 1

LEXICON AND LEXICAL SEMANTICSLEXICON AND LEXICAL SEMANTICSWORDNETWORDNET

2004/05 ANLE 2

Beyond part of speech tagging: Beyond part of speech tagging: lexical information and NL lexical information and NL applicationsapplications

NLE applications often need to know the MEANING of words at least, or (e.g., in the case of spoken dialogue systems), whole utterancesMany word-strings express apparently unrelated senses / meanings, even after their POS has been determined

Well-known examples: BANK, SCORE, RIGHT, SET, STOCK Homonymy may affect the results of applications such as IR and machine translation

The opposite case of different words with the same meaning (SYNONYMY) also important

E.g., for IR systems (synonym expansion)

HOMOGRAPHY may affect Speech Synthesis

2004/05 ANLE 3

What’s in a lexiconWhat’s in a lexicon

A lexicon is a repository of lexical knowledgeAlready seen an example of the simplest form of lexicon: a list of words, specifying their POS tag (SYNTACTIC information only)But even for English – let alone languages with a more complex morphology – it makes sense to split WORD FORMS from LEXICAL ENTRIES or LEXEMEs:

LEXEME BANK POS: N

WORD BANKS LEXEME: BANKSYN:

NUM: PLUR

And lexical knowledge also includes information about the MEANING of words

2004/05 ANLE 4

Meaning ….Meaning ….

•Characterizing the meaning of words not easy

• Most of the methods considered in these lecture characterize the meaning of a word by stating its relations with other words•This method however doesn’t say much about what the word ACTUALLY mean (e.g., what can you do with a car)

2004/05 ANLE 5

Lexical resources for CL: Lexical resources for CL: MACHINE READABLE DICTIONARIESMACHINE READABLE DICTIONARIES

A traditional DICTIONARY is a database containing information about

the PRONUNCIATION of a certain wordits possible PARTS of SPEECHits possible SENSES (or MEANINGS)

In recent years, most dictionaries have appeared in Machine Readable form (MRD)

Oxford English DictionaryCollinsLongman Dictionary of Ordinary Contemporary English (LDOCE)

2004/05 ANLE 6

An example LEXICAL ENTRY from a An example LEXICAL ENTRY from a machine-readable dictionary: STOCK,from machine-readable dictionary: STOCK,from the LDOCEthe LDOCE

0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..

2004/05 ANLE 7

Meaning in a dictionary: HomonymyMeaning in a dictionary: Homonymy

Word-strings like STOCK are used to express apparently unrelated senses / meanings, even in contexts in which their part-of-speech has been determined

Other well-known examples: BANK, RIGHT, SET, SCALE

An example of the problems homonimy may cause for IR systems

Search for 'West Bank' with Google

2004/05 ANLE 8

Homonymy and machine translationHomonymy and machine translation

2004/05 ANLE 9

Pronunciation: homography, Pronunciation: homography, homophonyhomophony

HOMOGRAPHS: BASSThe expert angler from Dora, Mo was fly-casting for BASS rather than the traditional trout.The curtain rises to the sound of angry dogs baying and ominous BASS chords sounding.

Problems caused by homography: text to speechRhetoricalAT&T text to speech

Many spelling errors are caused by HOMOPHONES – distinct lexemes with a single pronunciation

Its vs. it’sweather vs. whetherExamples from the sport page of the Rabbit

2004/05 ANLE 11

Meaning in MRDs, 2: SYNONYMYMeaning in MRDs, 2: SYNONYMY

Two words are SYNONYMS if they have the same meaning at least in some contextsE.g., PRICE and FARE; CHEAP and INEXPENSIVE; LAPTOP and NOTEBOOK; HOME and HOUSE

I’m looking for a CHEAP FLIGHT / INEXPENSIVE FLIGHT

From Roget’s thesaurus:OBLITERATION, erasure, cancellation, deletion

But few words are truly synonymous in ALL contexts:I wanna go HOME / ?? I wanna go HOUSEThe flight was CANCELLED / ?? OBLITERATED / ??? DELETED

Knowing about synonyms may help in IR: NOTEBOOK (get LAPTOPs as well)CHEAP PRICE (get INEXPENSIVE FARE)

2004/05 ANLE 12

Problems and limitations of MRDsProblems and limitations of MRDs

Identifying distinct senses always difficult- Sense distinctions often subjective

Definitions often circular

Very limited characterization of the meaning of words

2004/05 ANLE 13

Homonymy vs polysemyHomonymy vs polysemy

0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..

2004/05 ANLE 14

POLYSEMY vs HOMONIMYPOLYSEMY vs HOMONIMY

In cases like BANK, it’s fairly easy to identify two distinct senses (etymology also different). But in other cases, distinctions more questionable

E.g., senses 0100 and 0200 of stock clearly related, like 0600 and 0700, or 0900 and 1000

In some cases, syntactic tests may help. E.g., KEEP (Hirst, 1987):

Ross KEPT staring at Nadia’s decolletageNadia KEPT calm and made a cutting remarkRoss wrote of his embarassment in the diary that he KEPT.

POLYSEMOUS WORDS: meanings are related to each otherCfr. Human’s foot vs. mountain’s foot

In general, distinction between HOMONIMY and POLYSEMY not always easy

2004/05 ANLE 15

Other aspects of lexical meaning not Other aspects of lexical meaning not captured by MRDscaptured by MRDs

Other semantic relations:HYPONYMYANTONYMY

A lot of other information typically considered part of ENCYCLOPEDIAs:

Trees grow bark and twigsAdult trees are much taller than human beings

2004/05 ANLE 16

Hyponymy and HypernymyHyponymy and Hypernymy

HYPONYMY is the relation between a subclass and a superclass:

CAR and VEHICLEDOG and ANIMALBUNGALOW and HOUSE

Generally speaking, a hyponymy relation holds between X and Y whenever it is possible to substitute Y for X:

That is a X -> That is a YE.g., That is a CAR -> That is a VEHICLE.

HYPERNYMY is the opposite relationKnowledge about TAXONOMIES useful to classify web pages

Eg., Semantic WebAutomatically (e.g., Udo Kruschwitz’s system)

This information not generally contained in MRD

2004/05 ANLE 18

The organization of the lexiconThe organization of the lexicon

“ate”

WORD-FORMS LEXEMES SENSES

EAT-LEX-1eat0600

eat0700

“eat”

“eats”

“eaten”

2004/05 ANLE 19

The organization of the lexiconThe organization of the lexicon

“stock”

WORD-STRINGS LEXEMES SENSES

STOCK-LEX-1

STOCK-LEX-2

STOCK-LEX-3

stock0100

stock0200

stock0600

stock0700

stock0900

stock1000

2004/05 ANLE 20

SynonymySynonymy

“cheap”

WORD-STRINGS LEXEMES SENSES

CHEAP-LEX-1

CHEAP-LEX-2

INEXP-LEX-3

cheap0100

….

……

cheapXXXX

inexp0900

inexpYYYY

“inexpensive”

2004/05 ANLE 21

A more advanced lexical resource: A more advanced lexical resource: WordNetWordNet

A lexical database created at PrincetonFreely available for research from the Princeton sitehttp://www.cogsci.princeton.edu/~wn/

Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965))

NOUNsVERBSADJECTIVES and ADVERBS

Each database organized around SYNSETS

2004/05 ANLE 22

The noun databaseThe noun database

About 90,000 forms, 116,000 sensesRelations:

hypernym breakfast -> meal

hyponym meal -> lunch

has-member faculty -> professor

member-of copilot -> crew

has-Part table -> leg

part-of course -> meal

antonym leader -> follower

2004/05 ANLE 23

SynsetsSynsets

Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET

E.g.,

{chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}

(gloss: person who is gullible and easy to take advantage of)

2004/05 ANLE 24

HypernymsHypernyms2 senses of robin                                                       

Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast)       => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast)           => oscine, oscine bird -- (passerine bird having specialized vocal apparatus)               => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless)                   => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings)                       => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium)                           => chordate -- (any animal of the phylum Chordata having a notochord or spinal column)                               => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement)                                   => organism, being -- (a living thing that has (or can develop) the ability to act or function independently)                                       => living thing, animate thing -- (a living (or once living) entity)                                           => object, physical object --                                                => entity, physical thing --

2004/05 ANLE 25

MeronymyMeronymy

wn beak –holon

Holonyms of noun beak

1 of 3 senses of beak

Sense 2

beak, bill, neb, nib

PART OF: bird

2004/05 ANLE 26

The verb databaseThe verb database

About 10,000 forms, 20,000 sensesRelations between verb meanings:

Hypernym fly-> travel

Troponym Walk -> stroll

Entails Snore -> sleep

Antonym Increase -> decrease

2004/05 ANLE 27

Relations between verbal meaningsRelations between verbal meanings

V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep

TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk

2004/05 ANLE 28

The adjective and adverb databaseThe adjective and adverb database

About 20,000 adjective forms, 30,000 senses4,000 adverbs, 5600 sensesRelations:

Antonym (adjective) Heavy <-> light

Antonym (adverb) Quickly <-> slowly

2004/05 ANLE 29

How to useHow to use

Online: http://cogsci.princeton.edu/cgi-bin/webwnCommand line:

Get synonyms:wn –synsn bank

Get hypernyms:wn –hypen robin

(also for adjectives and verbs): get antonymswn –antsa right

2004/05 ANLE 31

Other machine-readable lexical Other machine-readable lexical resourcesresources

Machine readable dictionaries:LDOCE

Roget’s ThesaurusThe biggest encyclopedia: CYCItalian:

http://multiwordnet.itc.it/ (IRST)

2004/05 ANLE 32

ReadingsReadings

Jurafsky and Martin, chapter 16WordNet online manualsC. Fellbaum (ed), Wordnet: An Electronic Lexical Database, The MIT Press

2004/05 ANLE 33

WORD SENSE DISAMBIGUATIONWORD SENSE DISAMBIGUATION

2004/05 ANLE 34

Identifying the sense of a word in its Identifying the sense of a word in its contextcontext

In most NLE, this is viewed as a TAGGING taskMany respects, a similar task to that of POS taggingThis is only a simplification!

But less agreement on what the senses are, so the UPPER BOUND is lowerMain methods we’ll discuss:

Frequency-based disambiguationSelectional restrictionsDictionary-basedStatistical methods: Naïve Bayes

2004/05 ANLE 35

Psychological research on word-Psychological research on word-sense disambiguationsense disambiguation

Lexical access: the COHORT model (Marslen-Wilson and Welsh, 1978)

All lexical entries accessed in parallel

Frequency effects: common words such as DOOR are responded more quickly than uncommon words such as CASK (Forster and Chambers, 1973)Context effects: words are identified more quickly in context than out of context (Meyer and Schvaneveldt, 1971)

bread and butter

2004/05 ANLE 36

Corpora used for word sense Corpora used for word sense disambiguation workdisambiguation work

Sense Annotated (Difficult and expensive to build)Semcor (200.000)DSO (192.000 semantically annotated occurrences of 121 nouns and 70 verbs), Training data Senseval (8699 texts for 73 words)Tal-treebank(80000)

Non Annotated (Available in large quantity)Brown, LOB, Reuters

2004/05 ANLE 37

SEMCORSEMCOR

<contextfile concordance="brown"><context filename="br-h15" paras="yes">…..<wf cmd="ignore" pos="IN">in</wf> <wf cmd="done" pos="NN" lemma="fig" wnsn="1" lexsn="1:10:00::">fig.</wf>   <wf cmd="done" pos="NN" lemma="6" wnsn="1“ lexsn="1:23:00::">6</wf>   <punc>)</punc>   <wf cmd="done" pos="VBP" ot="notag">are</wf>   <wf cmd="done" pos="VB" lemma="slip" wnsn="3" lexsn="2:38:00::">slipped</wf>   <wf cmd="ignore" pos="IN">into</wf>   <wf cmd="done" pos="NN" lemma="place" wnsn="9" lexsn="1:15:05::">place</wf>   <wf cmd="ignore" pos="IN">across</wf>   <wf cmd="ignore" pos="DT">the</wf>   <wf cmd="done" pos="NN" lemma="roof" wnsn="1" lexsn="1:06:00::">roof</wf>   <wf cmd="done" pos="NN" lemma="beam" wnsn="2" lexsn="1:06:00::">beams</wf>   <punc>,</punc>

2004/05 ANLE 38

Dictionary-based approachesDictionary-based approaches

Lesk (1986): 1. Retrieve from MRD all sense definitions of the word to be

disambiguated2. Compare with sense definitions of words in context3. Choose sense with most overlap

Example:PINE

1 kinds of evergreen tree with needle-shaped leaves2 waste away through sorrow or illness

CONE 1 solid body which narrows to a point2 something of this shape whether solid or hollow3 fruit of certain evergreen trees

Disambiguate: PINE CONE

2004/05 ANLE 39

Frequency-based word-sense Frequency-based word-sense disambiguationdisambiguation

If you have a corpus in which each word is annotated with its sense, you can collect unigram statistics (count the number of times each sense occurs in the corpus)

P(SENSE)P(SENSE|WORD)

E.g., if you have 5845 uses of the word bridge,5641 cases in which it is tagged with the sense STRUCTURE194 instances with the sense DENTAL-DEVICE

Frequency-based WSD gets about 70% correctTo improve upon these results, need context

2004/05 ANLE 40

Selectional restrictionsSelectional restrictions

One type of contextual information is the information about the type of arguments that a verb takes – its SELECTIONAL RESTRICTIONS:

AGENT EAT FOOD-STUFFAGENT DRIVE VEHICLE

Example:Which airlines serve DENVER?Which airlines serve BREAKFAST?

Limitations:In his two championship trials, Mr. Kulkarni ate glass on an empty stomach, accompanied only by water and tea.But if fell apart in 1931, perhaps because people realized that you can’t EAT gold for lunch if you’re hungry

Resnik (1998): 44% with these methods

2004/05 ANLE 41

Supervised approaches to WSD: Supervised approaches to WSD: Naïve BayesNaïve Bayes

A Naïve Bayes Classifier chooses the most probable sense for a word given the context:

As usual, this can be expressed as:

The BAYES ASSUMPTION: all the features are independent

)|(argmax CsPs k

)(

)()|(maxarg

CP

sPsCPs kk

)|()|(1

kj

n

jk svPsCP

2004/05 ANLE 42

An example of use of Naïve Bayes An example of use of Naïve Bayes classifiers: Gale et al (1992)classifiers: Gale et al (1992)

Used this method to disambiguated word senses using an ALIGNED CORPUS (Hansard) to get the word senses

English French Sense Number of examples

duty droitdevoir

taxobligation

1114691

drug medicamentdrogue

medicalillicit

2292855

land terrepays

propertycountry

1022386

2004/05 ANLE 43

Gale et al: words as contextual cluesGale et al: words as contextual clues

Gale et al view a ‘context’ as a set of wordsGood clues for the different senses of DRUG:

Medication: prices, prescription, patent, increase, consumer, pharmaceuticalIllegal substance: abuse, paraphernalia, illicit, alcohol, cocaine, traffickers

To determine which interpretation is more likely, extract words (e.g. ABUSE) from context, and use P(abuse|medicament), P(abuse|drogue)To estimate these probabilities, use MLE:

P(abuse|medicament) = C(abuse, medicament) / C(medicament)P(medicament) = C(medicament) / C(drug)

2004/05 ANLE 44

The algorithmThe algorithm

TRAINING:

for all senses sk of w do for all words vj in the vocabulary do P(vj|sk) = C(vj,sk)/C(sk) endend

for all senses sk of w do P(sk) = C(sk)/C(w)end

DISAMBIGUATION:

for all senses sk of w do score(sk) = log P(sk) for all words vj in the context window c do score(sk) = score(sk) + log P(vj|sk) endend

choose s’ = argmax score(sk)

2004/05 ANLE 45

ResultsResults

Gale et al (1992): disambiguation system using this algorithm correct for about 90% of occurrences of six ambiguous nouns in the Hansard corpus:

duty, drug, land, language, position, sentence

Good clues for drug:medication sense: prices, prescription, patent, increaseillegal substance sense: abuse, paraphernalia, illicit, alcohol, cocaine, traffickers

2004/05 ANLE 46

Other methods for WSDOther methods for WSD

Supervised:Brown et al, 1991: using mutual information to combine senses into groupsYarowski (1992): using a thesaurus and a topic-classified corpus

Unsupervised: sense DISCRIMINATIONSchuetze 1996: using the EM algorithm

2004/05 ANLE 47

Thesaurus-based classificationThesaurus-based classification

Basic idea (e.g., Walker, 1987): A thesaurus assigns to a word a number of SUBJECT CODES Assumption: each subject code a distinct senseDisambiguate word W by assigning to it the subject code most commonly associated with the words in the context Problems:

general topic categorization may not be very useful for a specific domain (e.g., MOUSE in computer manuals)

coverage problems: NAVRATILOVA

Yarowski, 1992:Assign words to thesaurus category T if they occur more often than chance in contexts of category T in the corpus

2004/05 ANLE 48

EvaluationEvaluation

Baseline: is the system an improvement?Unsupervised: Random, Simple-LeskSupervised: Most Frequent,Lesk-plus-corpus.

Upper bound: agreement between humans?

2004/05 ANLE 49

SSENSEVALENSEVAL

Goals:Provide a common framework to compare WSD systemsStandardise the task (especially evaluation procedures)Build and distribute new lexical resources (dictionaries and sense tagged corpora)

Web site: http://www.senseval.org/

“There are now many computer programs for automatically determining the sense of a word in context (Word Sense Disambiguation or WSD).  The purpose of Senseval is to evaluate the strengths and weaknesses of such programs with respect to different words, different varieties of language, and different languages.” from: http://www.sle.sharp.co.uk/senseval2

2004/05 ANLE 50

SSENSEVAL HistoryENSEVAL History

ACL-SIGLEX workshop (1997)Yarowsky and Resnik paper

SENSEVAL-I (1998)Lexical Sample for English, French, and Italian

SENSEVAL-II (Toulouse, 2001)Lexical Sample and All WordsOrganization: Kilkgarriff (Brighton)

SENSEVAL-III (2003)

2004/05 ANLE 51

WSD at SWSD at SENSEVALENSEVAL-II-II

Choosing the right sense for a word among those of WordNet

Sense 1: horse, Equus caballus -- (solid-hoofed herbivorous quadruped domesticated since prehistoric times) Sense 2: horse -- (a padded gymnastic apparatus on legs) Sense 3: cavalry, horse cavalry, horse -- (troops trained to fight on horseback: "500 horse led the attack") Sense 4: sawhorse, horse, sawbuck, buck -- (a framework for holding wood that is being sawed) Sense 5: knight, horse -- (a chessman in the shape of a horse's head; can move two squares horizontally and one vertically (or vice versa)) Sense 6: heroin, diacetyl morphine, H, horse, junk, scag, shit, smack -- (a morphine derivative)

Corton has been involved in the design, manufacture and installation of horse stalls and horse-related equipment like external doors, shutters and accessories.

2004/05 ANLE 52

SSENSEVALENSEVAL-II Tasks-II Tasks

All Words (without training data): Czech, Dutch, English, Estonian Lexical Sample (with training data): Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish, Swedish

2004/05 ANLE 53

English English SSENSEVAL-IIENSEVAL-II

Organization: Martha Palmer (UPENN)Gold-standard: 2 annotators and 1 supervisor (Fellbaum)Interchange data format: XMLSense repository: WordNet 1.7 (special Senseval release)Competitors:

All Words: 11systems; Lexical Sample: 16 systems

2004/05 ANLE 54

English All WordsEnglish All Words

Data: 3 texts for a total of 1770 wordsAverage polysemy: 6.5Example: (part of) Text 1

The art of change-ringing is peculiar to the English and, like most English peculiarities , unintelligible to the rest of the world . -- Dorothy L. Sayers , " The Nine Tailors " ASLACTON , England -- Of all scenes that evoke rural England , this is one of the loveliest : An ancient stone church stands amid the fields , the sound of bells cascading from its tower , calling the faithful to evensong . The parishioners of St. Michael and All Angels stop to chat at the church door , as members here always have . […]

2004/05 ANLE 55

English All Words SystemsEnglish All Words Systems

Unsupervised (6):UMED (relevance matrix over Gutemberg project corpus)Illinois (Lexical Proximity) Malaysia (MTD, Machine Tractable Dictionary)Litkowsky (New Oxford Dictionary and Contextual Clues)Sheffield (Anaphora and WN hierarchy)IRST (WordNet Domains)

Supervised (5):S. Sebastian (decision lists in Semcor)UCLA (Semcor, Semantic Distance and Density, AltaVista for frequency)Sinequa (Semcor and Semantic Classes)Antwerp (Semcor, Memory Based Learning)Moldovan (Semcor plus an additional sense tagged corpus, heuristics)

2004/05 ANLE 56

2004/05 ANLE 57

Lexical SampleLexical Sample

Data: 8699 texts for 73 words Average WN polysemy: 9.22 Training Data: 8166 (average 118/word)Baseline (commonest): 0.47 precisionBaseline (Lesk): 0.51 precision

2004/05 ANLE 58

Lexical SampleLexical Sample

Example: to leave

<instance id="leave.130"><context>I 'd been seeing Johnnie almost a year now, but I still didn't want to <head>leave</head> him for five whole days. </context></instance>

<instance id="leave.157"><context>And he saw them all as he walked up and down. At two that morning, he was still walking -- up and down Peony, up and down the veranda, up and down the silent, moonlit beach.Finally, in desperation, he opened the refrigerator, filched her hand lotion, and <head>left</head> a note. </context></instance>

2004/05 ANLE 59

English Lexical Sample SystemsEnglish Lexical Sample Systems

Unsupervised (5): Sunderlard, UNED, Illinois, Litkowsky, ITRISupervised (12): S. Sebastian, Sinequa, Manning, Pedersen, Korea, Yarowsky, Resnik, Pennsylvania, Barcellona, Moldovan, Alicante, IRST

2004/05 ANLE 60Boosting

Decisi

on Lists

Domains

2004/05 ANLE 61

Boosting 1/2Boosting 1/2

Combine many simple and moderately accurate Weak Classifiers (WC)Train WCs sequentially, each on the examples which were most difficult to classify by the preceding WCsExamples of WCs:

preceding_word=“house”domain=“sport”...

2004/05 ANLE 62

Boosting 2/2Boosting 2/2

WCi is trained and tested on the whole corpus;

Each pair {word, synset} is given an “importance weight” h depending on how difficult it was for WC1,…,WCi to classify;

WCi+1 is tuned to classify the worst pairs {word, synset} correctly and it is tested on the whole corpus;

so h is updated at each step

At the end all the WCs are combined into a single rule, the combined hypothesis; each WCs is weighted according to its effectiveness in the tests

2004/05 ANLE 63

BootstrappingBootstrapping

Problem with supervised methods: need to have annotated corpus!Partial solution (Hearst, Yarowski): train on small set (bootstrap), then use the trained system to annotate a larger corpus, which is then use to retrain

2004/05 ANLE 64

ReadingsReadings

Jurafsky and Martin, chapter 17.1 and 17.2For more info: Manning and Schuetze, chapter 7Papers:

Gale, Church, and Yarowsky. 1992. A method for disambiguating word senses in large corpora. Computers and the Humanities, v. 26, 415-439.

2004/05 ANLE 65

AcknowledgmentsAcknowledgments

Slides on SENSEVAL borrowed from Bernardo Magnini, IRST

2004/05 ANLE 66

LEXICAL ACQUISITIONLEXICAL ACQUISITION

2004/05 ANLE 67

The limits of hand-encoded lexical The limits of hand-encoded lexical resourcesresources

Manual construction of lexical resources is very costlyBecause language keeps changing, these resources have to be continuously updated Some information (e.g., about frequencies) has to be computed automatically anyway

2004/05 ANLE 68

The coverage problemThe coverage problem

Sampson (1989): tested coverage of Oxford ALD (~70,000 entries) looking at a 45,000 subpart of the LOB. About 3% of tokens not listed in dictionaryExamples:

Type of problem Example

Proper noun Caramello, Chateau-Chalon

Foreign word perestroika

Code R101

Non-standard English Havin’

Hyphen omitted bedclothes

Technical vocabulary normoglycaemia

2004/05 ANLE 69

Lexical acquisitionLexical acquisition

Most MRD these days contain at least some information derived by computational meansMethods developed to

Augment resources such as WordNet with additional information (e.g., in technical contexts such as bio informatics)Develop these resources from scratch

These methods also interesting because of hope they can address the problems encountered by lexicographers when trying to specify word senses

2004/05 ANLE 70

Vector-based lexical semanticsVector-based lexical semantics

Very old idea in NLE: the meaning of a word can be specified in terms of the values of certain `features’ (`DECOMPOSITIONAL SEMANTICS’)

dog : ANIMATE= +, EAT=MEAT, SOCIAL=+horse : ANIMATE= +, EAT=GRASS, SOCIAL=+cat : ANIMATE= +, EAT=MEAT, SOCIAL=-

Similarity / relatedness: proximity in feature space

2004/05 ANLE 71

Vector-based lexical semanticsVector-based lexical semantics

DOG

CAT

HORSE

2004/05 ANLE 72

General characterization of vector-General characterization of vector-based semantics (from Charniak)based semantics (from Charniak)

Vectors as models of conceptsThe CLUSTERING approach to lexical semantics:1. Define properties one cares about, and give values to each

property (generally, numerical)2. Create a vector of length n for each item to be classified3. Viewing the n-dimensional vector as a point in n-space,

cluster points that are near one another

What changes between models:1. The properties used in the vector2. The distance metric used to decide if two points are `close’3. The algorithm used to cluster

2004/05 ANLE 73

Using words as features in a vector-Using words as features in a vector-based semanticsbased semantics

The old decompositional semantics approach requires i. Specifying the featuresii. Characterizing the value of these features for each lexeme

Simpler approach: use as features the WORDS that occur in the proximity of that word / lexical entry

Intuition: “You can tell a word’s meaning from the company it keeps”

More specifically, you can use as `values’ of these features The FREQUENCIES with which these words occur near the words whose meaning we are definingOr perhaps the PROBABILITIES that these words occur next to each other

Alternative: use the DOCUMENTS in which these words occur (e.g., LSA)Some psychological results support this view. Lund, Burgess, et al (1995, 1997): lexical associations learned this way correlate very well with priming experiments. Landauer et al: good correlation on a variety of topics, including human categorization & vocabulary tests.

2004/05 ANLE 74

Using neighboring words to specify Using neighboring words to specify the meaning of wordsthe meaning of words

Take, e.g., the following corpus:1. John ate a banana.2. John ate an apple.3. John drove a lorry.

We can extract the following co-occurrence matrix:

john ate drove banana apple lorry

john 0 2 1 1 1 1

ate 2 0 0 1 1 0

drove 1 0 0 0 0 1

banana 1 1 0 0 0 0

apple 1 1 0 0 0 0

lorry 1 0 1 0 0 0

2004/05 ANLE 75

Acquiring lexical vectors from a Acquiring lexical vectors from a corpuscorpus(Schuetze, 1991; Burgess and Lund, (Schuetze, 1991; Burgess and Lund, 1997)1997)

To construct vectors C(w) for each word w:1. Scan a text2. Whenever a word w is encountered, increment all cells of C(w)

corresponding to the words v that occur in the vicinity of w, typically within a window of fixed size

Differences among methods:Size of windowWeighted or notWhether every word in the vocabulary counts as a dimension (including function words such as the or and) or whether instead only some specially chosen words are used (typically, the m most common content words in the corpus; or perhaps modifiers only). The words chosen as dimensions are often called CONTEXT WORDSWhether dimensionality reduction methods are applied

2004/05 ANLE 76

Variant: using probabilities (e.g., Variant: using probabilities (e.g., Dagan et al, 1997)Dagan et al, 1997)

E.g., for house

Context vector (using probabilities)0.001394 0.016212 0.003169 0.000734 0.001460 0.002901 0.004725 0.000598 0 0 0.008993 0.008322 0.000164 0.010771 0.012098 0.002799 0.002064 0.007697 0 0 0.001693 0.000624 0.001624 0.000458 0.002449 0.002732 0 0.008483 0.007929 0 0.001101 0.001806 0 0.005537 0.000726 0.011563 0.010487 0 0.001809 0.010601 0.000348 0.000759 0.000807 0.000302 0.002331 0.002715 0.020845 0.000860 0.000497 0.002317 0.003938 0.001505 0.035262 0.002090 0.004811 0.001248 0.000920 0.001164 0.003577 0.001337 0.000259 0.002470 0.001793 0.003582 0.005228 0.008356 0.005771 0.001810 0 0.001127 0.001225 0 0.008904 0.001544 0.003223 0

2004/05 ANLE 77

Variant: using modifiers to specify Variant: using modifiers to specify the meaning of wordsthe meaning of words

…. The Soviet cosmonaut …. The American astronaut …. The red American car …. The old red truck … the spacewalking cosmonaut … the full Moon …

cosmonaut

astronaut moon

car truck

Soviet 1 0 0 1 1

American 0 1 0 1 1

spacewalking

1 1 0 0 0

red 0 0 0 1 1

full 0 0 1 0 0

old 0 0 0 1 1

2004/05 ANLE 78

Another variant: Another variant: word / document matricesword / document matrices

d1 d2 d3 d4 d5 d6

cosmonaut 1 0 1 0 0 0

astronaut 0 1 0 0 0 0

moon 1 1 0 0 0 0

car 1 0 0 1 1 0

truck 0 0 0 1 0 1

2004/05 ANLE 79

Measures of semantic similarityMeasures of semantic similarity

Euclidean distance:

Cosine:

Manhattan Metric:

n

i ii yxd1

n

i i

n

i i

n

i ii

yx

yx

1

2

1

2

1)cos(

n

i ii yxd1

2

2004/05 ANLE 80

Some psychological evidence for Some psychological evidence for vector-space representationsvector-space representations

Burgess and Lund (1996, 1997): the clusters found with HAL correlate well with those observed using semantic priming experiments.Landauer, Foltz, and Laham (1997): scores overlap with those of humans on standard vocabulary and topic tests; mimic human scores on category judgments; etc.Evidence about `prototype theory’ (Rosch et al, 1976)

Posner and Keel, 1968subjects presented with patterns of dots that had been obtained by variations from single pattern (`prototype’)Later, they recalled prototypes better than samples they had actually seen

Rosch et al, 1976: `basic level’ categories (apple, orange, potato, carrot) have higher `cue validity’ than elements higher in the hierarchy (fruit, vegetable) or lower (red delicious, cox)

2004/05 ANLE 81

The HAL model (Burgess and Lund, The HAL model (Burgess and Lund, 1995, 1997)1995, 1997)

A 160 million words corpus of articles extracted from all newsgroups containing English dialogueContext words: the 70,000 most frequently occurring symbols within the corpusWindow size: 10 words to the left and the right of the wordMeasure of similarity: cosine

2004/05 ANLE 82

Latent Semantic Analysis (LSA) Latent Semantic Analysis (LSA) (Landauer et al, 1997)(Landauer et al, 1997)

Goal: extract relatons of expected contextual usage from passages Two steps:1. Build a word / document cooccurrence matrix2. `Weigh’ each cell 3. Perform a DIMENSIONALITY REDUCTION

Argued to correlate well with humans on a number of tests

2004/05 ANLE 83

LSA: the method, 1LSA: the method, 1

2004/05 ANLE 84

LSA: Singular Value DecompositionLSA: Singular Value Decomposition

2004/05 ANLE 85

LSA: Reconstructed matrixLSA: Reconstructed matrix

2004/05 ANLE 86

Topic correlations in `raw’ and Topic correlations in `raw’ and `reconstructed’ data`reconstructed’ data

2004/05 ANLE 87

ClusteringClustering

Clustering algorithms partition a set of objects into CLUSTERSAn UNSUPERVISED methodTwo applications:

Exploratory Data AnalysisGENERALIZATION

2004/05 ANLE 88

ClusteringClustering

X X X X

X X X X

2004/05 ANLE 89

Clustering with Syntactic InformationClustering with Syntactic Information

Pereira and Tishby, 1992: two words are similar if they occur as objects of the same verbs

John ate POPCORNJohn ate BANANAS

C(w) is the distribution of verbs for which w served as direct object.

First approximation: just countsIn fact: probabilities

Similarity: RELATIVE ENTROPY

2004/05 ANLE 90

SEXTANT (Grefenstette, 1992)SEXTANT (Grefenstette, 1992)

It was concluded that the carcinoembryonic antigens represent cellular constituents which are repressed during the course of differentiation the normal digestive system epithelium and reappear in the corresponding malignant cells by a process of derepressive dedifferentiation

antigen carcinoembryonic-ADJantigen repress-DOBJantigen represent-SUBJconstituent cellular-ADJconstituent represent-DOBJcourse repress-IOBJ……..

2004/05 ANLE 91

SEXTANT: Similarity measureSEXTANT: Similarity measure

dog pet-DOBJdog eat-SUBJ dog shaggy-ADJdog brown-ADJdog leash-NN

cat pet-DOBJcat pet-DOBJ cat hairy-ADJcat leash-NN

CATDOG

B andA by possessed attributes Unique

B andA by shared Attributes

Count

CountJaccard:

6

2

ADJ}-shaggyDOBJ,-petNN,-leashADJ,-hairySUBJ,-eatADJ,-{brown

DOBJ}-pet NN,-{leash

Count

Count

2004/05 ANLE 92

Some caveatsSome caveats

Two senses of `similarity’Schuetze: two words are similar if one can replace the otherBrown et al: two words are similar if they occur in similar contexts

What notion of `meaning’ is learned here?“One might consider LSA’s maximal knowledge of the world to be analogous to a well-read nun’s knowledge of sex, a level of knowledge often deemed a sufficient basis for advising the young” (Landauer et al, 1997)

Can one do semantics with these representations?Our own experience: using HAL-style vectors for resolving bridging referencesVery limited successApplying dimensionality reduction didn’t seem to help

2004/05 ANLE 93

Applications of these techniques: Applications of these techniques: Information RetrievalInformation Retrieval

cosmonaut

astronaut moon

car truck

d1 1 0 1 1 0

d2 0 1 1 0 0

d3 1 0 0 0 0

d4 0 0 0 1 1

d5 0 0 0 1 0

d6 0 0 0 0 1

2004/05 ANLE 94

ReadingsReadings

Jurafsky and Martin, chapter 17.3Also useful:

Manning and Schuetze, chapter 8Charniak, chapters 9-10

Some papers:HAL: see the Higher Dimensional Space pageLSA: Various papers on the Colorado site

Good reference: Landauer, Foltz, and Laham. (1997). Introduction to Latent Semantic Analysis. Discourse Processes.