2004/05modelli simulativi1 modelli simulativi nelle scienze cognitive il lessico: modelli...

61
2004/05 Modelli simulativi 1 Modelli simulativi nelle Scienze Modelli simulativi nelle Scienze Cognitive Cognitive Il lessico: modelli linguistici, WordNet, acquisizione lessicale Massimo Poesio

Upload: lester-fleming

Post on 31-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

2004/05 Modelli simulativi 1

Modelli simulativi nelle Scienze Modelli simulativi nelle Scienze CognitiveCognitive

Il lessico: modelli linguistici, WordNet, acquisizione lessicale

Massimo Poesio

2004/05 Modelli simulativi 2

PART I:PART I:LEXICON AND LEXICAL SEMANTICSLEXICON AND LEXICAL SEMANTICSWORDNETWORDNET

2004/05 Metodi simulativi 3

What’s in a lexiconWhat’s in a lexicon

A lexicon is a repository of lexical knowledgeThe simplest form of lexicon: a list of wordsBut even for English – let alone languages with a more complex morphology, such as Italian – it makes sense to split WORD FORMS from LEXICAL ENTRIES or LEXEMEs:

LEXEME BANK POS: N

WORD BANKS LEXEME: BANKSYN:

NUM: PLUR

And lexical knowledge also includes information about the MEANING of words

2004/05 Metodi simulativi 4

Meaning ….Meaning ….

•Characterizing the meaning of words not easy

• Most of the methods considered in these lecture characterize the meaning of a word by stating its relations with other words•This method however doesn’t say much about what the word ACTUALLY mean (e.g., what can you do with a car)

2004/05 Metodi simulativi 5

Un esempio di lexical entry: VICINO Un esempio di lexical entry: VICINO (da it.wiktionary.org)(da it.wiktionary.org)

vicino sostantivo m (vicina f, vicini pl m, vicine pl f)

1. Colui che abita accanto. (“I miei vicini vengono da Frosinone”

vicino aggettivo m (vicina f, vicini pl m, vicine pl f) (“La piu’ vicina stella a neutroni e’ RX J185635-3754”)

vicino avverbio (invariabile) (“Itunes visto da vicino”)

2004/05 Metodi simulativi 6

Lexical resources for computers: Lexical resources for computers: MACHINE READABLE DICTIONARIESMACHINE READABLE DICTIONARIES

A traditional DICTIONARY is a database containing information about

the PRONUNCIATION of a certain wordits possible PARTS of SPEECHits possible SENSES (or MEANINGS)

In recent years, most dictionaries have appeared in Machine Readable form (MRD)

English:Oxford English DictionaryCollinsLongman Dictionary of Ordinary Contemporary English (LDOCE)

Italian:GarzantiZanichelliParaviait.wiktionary.org

2004/05 Metodi simulativi 7

An example LEXICAL ENTRY from a An example LEXICAL ENTRY from a machine-readable dictionary: STOCK,from machine-readable dictionary: STOCK,from the LDOCEthe LDOCE

0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..

2004/05 Metodi simulativi 8

HomonymyHomonymy

Word-strings like STOCK are used to express apparently unrelated senses / meanings, even in contexts in which their part-of-speech has been determined

Other well-known examples: BANK, LIME, RIGHT, SET, SCALEItalian: CALCIO, OBBIETTIVO

An example of the problems homonimy may cause for IR systems

Search for 'West Bank' with Google

2004/05 Metodi simulativi 9

CALCIO, da “Il grande dizionario CALCIO, da “Il grande dizionario Garzanti”Garzanti”

calcio1 [càl-cio] s.m. 1. colpo dato con il piede o con la zampa; pedata; dare, assestare, ricevere un _2. (sport) gioco che si svolge tra due squadre di undici giocatori ciascuna …3. nel football, colpo dato con il piede al pallone: - di punizione, … - di rigore …. – d’angolo …. – piazzato calcio2 parte inferiore della cassa di un fucile … derivato del lat. calx calcis …. calcio3 elemento chimico il cui simbolo è Ca; metallo alcalinoterroso ……

2004/05 Metodi simulativi 10

Omonimia in un MRD per l’Italiano Omonimia in un MRD per l’Italiano (ItalWordNet)(ItalWordNet)

obbiettivo, Nome

[1] - scopo di un'operazione militare.(obbiettivo [1], obiettivo [1])

[2] - bersaglio nel tiro di artiglieria(obbiettivo [2], obiettivo [2])

[4] - sistema di lenti per proiettare l'immagine reale di un oggetto(obbiettivo [4], obiettivo [4])

2004/05 Metodi simulativi 11

Homonymy and machine translationHomonymy and machine translation

2004/05 Metodi simulativi 14

Meaning in MRDs, 2: SYNONYMYMeaning in MRDs, 2: SYNONYMY

Two words are SYNONYMS if they have the same meaning at least in some contextsE.g., PRICE and FARE; CHEAP and INEXPENSIVE; LAPTOP and NOTEBOOK; HOME and HOUSE

I’m looking for a CHEAP FLIGHT / INEXPENSIVE FLIGHT

From Roget’s thesaurus:OBLITERATION, erasure, cancellation, deletion

But few words are truly synonymous in ALL contexts:I wanna go HOME / ?? I wanna go HOUSEThe flight was CANCELLED / ?? OBLITERATED / ??? DELETED

Knowing about synonyms may help in IR: NOTEBOOK (get LAPTOPs as well)CHEAP PRICE (get INEXPENSIVE FARE)

2004/05 Metodi simulativi 15

Sinonimia in ItalianoSinonimia in Italiano

scorza, Nome

[1] - (corteccia [1], scorza [1])

[2] - parte esterna, involucro dei frutti(buccia [1], scorza [2])

[4] - (scorza [4]) "sotto la sua scorza scortese si nasconde un animo nobile"

2004/05 Metodi simulativi 16

Problems and limitations of MRDsProblems and limitations of MRDs

Identifying distinct senses always difficult- Sense distinctions often subjective

Definitions often circular

Very limited characterization of the meaning of words

2004/05 Metodi simulativi 17

Homonymy vs polysemyHomonymy vs polysemy

0100 a supply (of something) for use: a good stock of food 0200 goods for sale: Some of the stock is being taken without being paid for 0300 the thick part of a tree trunk 0400 (a) a piece of wood used as a support or handle, as for a gun or tool (b) the piece which goes across the top of an ANCHOR^1 (1) from side to side 0500 (a) a plant from which CUTTINGs are grown (b) a stem onto which another plant is GRAFTed 0600 a group of animals used for breeding 0700 farm animals usu. cattle; LIVESTOCK 0800 a family line, esp. of the stated character 0900 money lent to a government at a fixed rate of interest 1000 the money (CAPITAL) owned by a company, divided into SHAREs 1100 a type of garden flower with a sweet smell 1200 a liquid made from the juices of meat, bones, etc., used in cooking …..

2004/05 Metodi simulativi 18

POLYSEMY vs HOMONIMYPOLYSEMY vs HOMONIMY

In cases like BANK, it’s fairly easy to identify two distinct senses (etymology also different). But in other cases, distinctions more questionable

E.g., senses 0100 and 0200 of stock clearly related, like 0600 and 0700, or 0900 and 1000

In some cases, syntactic tests may help. E.g., KEEP (Hirst, 1987):

Ross KEPT staring at Nadia’s decolletageNadia KEPT calm and made a cutting remarkRoss wrote of his embarassment in the diary that he KEPT.

POLYSEMOUS WORDS: meanings are related to each otherCfr. Human’s foot vs. mountain’s foot

In general, distinction between HOMONIMY and POLYSEMY not always easy (especially with VERBS)

2004/05 Metodi simulativi 19

Other aspects of lexical meaning not Other aspects of lexical meaning not captured by MRDscaptured by MRDs

Other semantic relations:HYPONYMYANTONYMY

A lot of other information typically considered part of ENCYCLOPEDIAs:

Trees grow bark and twigsAdult trees are much taller than human beings

2004/05 Metodi simulativi 20

Hyponymy and HypernymyHyponymy and Hypernymy

HYPONYMY is the relation between a subclass and a superclass:

CAR and VEHICLEDOG and ANIMALBUNGALOW and HOUSE

Generally speaking, a hyponymy relation holds between X and Y whenever it is possible to substitute Y for X:

That is a X -> That is a YE.g., That is a CAR -> That is a VEHICLE.

HYPERNYMY is the opposite relationKnowledge about TAXONOMIES useful to classify web pages

Eg., Semantic WebAutomatically (e.g., Udo Kruschwitz’s system)

This information not generally contained in MRD

2004/05 Metodi simulativi 22

The organization of the lexiconThe organization of the lexicon

“ate”

WORD-FORMS LEXEMES SENSES

EAT-LEX-1eat0600

eat0700

“eat”

“eats”

“eaten”

2004/05 Metodi simulativi 23

The organization of the lexiconThe organization of the lexicon

“stock”

WORD-STRINGS LEXEMES SENSES

STOCK-LEX-1

STOCK-LEX-2

STOCK-LEX-3

stock0100

stock0200

stock0600

stock0700

stock0900

stock1000

2004/05 Metodi simulativi 24

SynonymySynonymy

“cheap”

WORD-STRINGS LEXEMES SENSES

CHEAP-LEX-1

CHEAP-LEX-2

INEXP-LEX-3

cheap0100

….

……

cheapXXXX

inexp0900

inexpYYYY

“inexpensive”

2004/05 Metodi simulativi 25

A more advanced lexical resource: A more advanced lexical resource: WordNetWordNet

A lexical database created at PrincetonFreely available for research from the Princeton sitehttp://www.cogsci.princeton.edu/~wn/

Information about a variety of SEMANTICAL RELATIONS Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965))

NOUNsVERBSADJECTIVES and ADVERBS

Each database organized around SYNSETS

2004/05 Metodi simulativi 26

The noun databaseThe noun database

About 90,000 forms, 116,000 sensesRelations:

hypernym breakfast -> meal

hyponym meal -> lunch

has-member faculty -> professor

member-of copilot -> crew

has-Part table -> leg

part-of course -> meal

antonym leader -> follower

2004/05 Metodi simulativi 27

SynsetsSynsets

Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET

E.g.,

{chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}

(gloss: person who is gullible and easy to take advantage of)

2004/05 Metodi simulativi 28

HypernymsHypernyms2 senses of robin                                                       

Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast)       => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast)           => oscine, oscine bird -- (passerine bird having specialized vocal apparatus)               => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless)                   => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings)                       => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium)                           => chordate -- (any animal of the phylum Chordata having a notochord or spinal column)                               => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement)                                   => organism, being -- (a living thing that has (or can develop) the ability to act or function independently)                                       => living thing, animate thing -- (a living (or once living) entity)                                           => object, physical object --                                                => entity, physical thing --

2004/05 Metodi simulativi 29

MeronymyMeronymy

wn beak –holon

Holonyms of noun beak

1 of 3 senses of beak

Sense 2

beak, bill, neb, nib

PART OF: bird

2004/05 Metodi simulativi 30

The verb databaseThe verb database

About 10,000 forms, 20,000 sensesRelations between verb meanings:

Hypernym fly-> travel

Troponym Walk -> stroll

Entails Snore -> sleep

Antonym Increase -> decrease

2004/05 Metodi simulativi 31

Relations between verbal meaningsRelations between verbal meanings

V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep

TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk

2004/05 Metodi simulativi 32

The adjective and adverb databaseThe adjective and adverb database

About 20,000 adjective forms, 30,000 senses4,000 adverbs, 5600 sensesRelations:

Antonym (adjective) Heavy <-> light

Antonym (adverb) Quickly <-> slowly

2004/05 Metodi simulativi 33

How to useHow to use

Online: http://cogsci.princeton.edu/cgi-bin/webwnCommand line:

Get synonyms:wn –synsn bank

Get hypernyms:wn –hypen robin

(also for adjectives and verbs): get antonymswn –antsa right

2004/05 Metodi simulativi 34

ItalWordNet (una produzione locale)ItalWordNet (una produzione locale)

EuroWordNet: creato da un consorzio EuropeoItalWordNet: creato da ITC

http://www.ilc.cnr.it/iwndb_php/

2004/05 Metodi simulativi 36

Other machine-readable lexical Other machine-readable lexical resourcesresources

Machine readable dictionaries:LDOCE

Roget’s ThesaurusThe biggest encyclopedia: CYCItalian:

http://multiwordnet.itc.it/ (IRST)

2004/05 Metodi simulativi 37

ReadingsReadings

WordNet online manualsC. Fellbaum (ed), Wordnet: An Electronic Lexical Database, The MIT Press

2004/05 Modelli simulativi 38

PART II: VECTOR-BASED MODELS OF THE PART II: VECTOR-BASED MODELS OF THE LEXICON AND LEXICAL ACQUISITIONLEXICON AND LEXICAL ACQUISITION

2004/05 Metodi simulativi 41

VECTOR-BASED LEXICAL MODELSVECTOR-BASED LEXICAL MODELS

Both in Linguistics and in Psychology researchers have developed theories of the lexicon in which concepts are characterized in terms of FEATURES

E.g., Smith and Medin, 1981; Sartori and Job, 1988

This type of approach leads to a ‘geometrical’ view of lexical entries as points , or VECTORS, in FEATURE SPACE

This type of model can account for which words ‘mean the same’

A particularly simple version of this theory is the one in which the ‘features’ are simply other wordsVector-space models have been shown to correlate well with the results of psychological experiments, particularly about SEMANTIC PRIMING

2004/05 Metodi simulativi 42

VECTOR-BASED MODELS AND LEXICAL VECTOR-BASED MODELS AND LEXICAL ACQUISITIONACQUISITION

Vector-based models (both the feature-based and the word-based variety) also interesting because they can serve as the basis for models of lexical acquisitionThese models are interesting

From a psychological point of view, to explain how concepts are stored in memoryIn neural science, they are being used to investigate SEMANTIC CATEGORY DEFICITS (e.g., Caramazza, Tyler et al, Vigliocco et al)From a linguistic point of view, because they can address the problems encountered by lexicographers when trying to specify word sensesFrom a practical point of view: most MRD these days contain at least some information derived by computational means

2004/05 Metodi simulativi 43

Feature-based lexical semanticsFeature-based lexical semantics

Very old idea in Linguistics: the meaning of a word can be specified in terms of the values of certain `features’ (`DECOMPOSITIONAL SEMANTICS’)

dog : ANIMATE= +, EAT=MEAT, SOCIAL=+horse : ANIMATE= +, EAT=GRASS, SOCIAL=+cat : ANIMATE= +, EAT=MEAT, SOCIAL=-

E.g., Katz and Fodor, 1968

2004/05 Metodi simulativi 44

PSYCHOLOGY: THE FUSS MODEL PSYCHOLOGY: THE FUSS MODEL (Vinson and Vigliocco, 2002, 2003)(Vinson and Vigliocco, 2002, 2003)

2004/05 Metodi simulativi 45

Vector-based lexical semanticsVector-based lexical semantics

DOG

CAT

HORSE

2004/05 Metodi simulativi 46

WORD-BASED VECTOR-SPACE WORD-BASED VECTOR-SPACE LEXICAL MODELS, ILEXICAL MODELS, I

2004/05 Metodi simulativi 47

WORD-BASED WORD-BASED VECTOR SPACE MODELS, IIVECTOR SPACE MODELS, II

2004/05 Metodi simulativi 48

WORD-BASED WORD-BASED VECTOR-SPACE MODELS, IIIVECTOR-SPACE MODELS, III

2004/05 Metodi simulativi 49

Measures of semantic similarityMeasures of semantic similarity

Euclidean distance:

Cosine:

Manhattan Metric:

n

i ii yxd1

n

i i

n

i i

n

i ii

yx

yx

1

2

1

2

1)cos(

n

i ii yxd1

2

2004/05 Metodi simulativi 50

DIMENSIONALITY REDUCTIONDIMENSIONALITY REDUCTION

2004/05 Metodi simulativi 51

Time

Day

FeelingVehicle

Concept clusteringConcept clustering(aka: automatic taxonomy discovery)(aka: automatic taxonomy discovery)

Car

Airplane

Van

Month

Year

JoyLove

Fear

2004/05 Metodi simulativi 52

Some psychological evidence for Some psychological evidence for vector-space representationsvector-space representations

Burgess and Lund (1996, 1997): the clusters found with HAL correlate well with those observed using semantic priming experiments.Landauer, Foltz, and Laham (1997): scores overlap with those of humans on standard vocabulary and topic tests; mimic human scores on category judgments; etc.Evidence about `prototype theory’ (Rosch et al, 1976)

Posner and Keel, 1968subjects presented with patterns of dots that had been obtained by variations from single pattern (`prototype’)Later, they recalled prototypes better than samples they had actually seen

Rosch et al, 1976: `basic level’ categories (apple, orange, potato, carrot) have higher `cue validity’ than elements higher in the hierarchy (fruit, vegetable) or lower (red delicious, cox)

2004/05 Metodi simulativi 53

General characterization of vector-General characterization of vector-based semantics (from Charniak)based semantics (from Charniak)

Vectors as models of conceptsThe CLUSTERING approach to lexical semantics:1. Define properties one cares about, and give values to each

property (generally, numerical)2. Create a vector of length n for each item to be classified3. Viewing the n-dimensional vector as a point in n-space,

cluster points that are near one another

What changes between models:1. The properties used in the vector2. The distance metric used to decide if two points are `close’3. The algorithm used to cluster

2004/05 Metodi simulativi 54

Using words as features in a vector-Using words as features in a vector-based semanticsbased semantics

The old decompositional semantics approach requires i. Specifying the featuresii. Characterizing the value of these features for each lexeme

Simpler approach: use as features the WORDS that occur in the proximity of that word / lexical entry

Intuition: “You can tell a word’s meaning from the company it keeps”

More specifically, you can use as `values’ of these features The FREQUENCIES with which these words occur near the words whose meaning we are definingOr perhaps the PROBABILITIES that these words occur next to each other

Alternative: use the DOCUMENTS in which these words occur (e.g., LSA)

2004/05 Metodi simulativi 55

Using neighboring words to specify Using neighboring words to specify the meaning of wordsthe meaning of words

Take, e.g., the following corpus:1. John ate a banana.2. John ate an apple.3. John drove a lorry.

We can extract the following co-occurrence matrix:

john ate drove banana apple lorry

john 0 2 1 1 1 1

ate 2 0 0 1 1 0

drove 1 0 0 0 0 1

banana 1 1 0 0 0 0

apple 1 1 0 0 0 0

lorry 1 0 1 0 0 0

2004/05 Metodi simulativi 56

Acquiring lexical vectors from a Acquiring lexical vectors from a corpuscorpus(Schuetze, 1991; Burgess and Lund, (Schuetze, 1991; Burgess and Lund, 1997)1997)

To construct vectors C(w) for each word w:1. Scan a text2. Whenever a word w is encountered, increment all cells of C(w)

corresponding to the words v that occur in the vicinity of w, typically within a window of fixed size

Differences among methods:Size of windowWeighted or notWhether every word in the vocabulary counts as a dimension (including function words such as the or and) or whether instead only some specially chosen words are used (typically, the m most common content words in the corpus; or perhaps modifiers only). The words chosen as dimensions are often called CONTEXT WORDSWhether dimensionality reduction methods are applied

2004/05 Metodi simulativi 60

The HAL model (Burgess and Lund, The HAL model (Burgess and Lund, 1995, 1997)1995, 1997)

A 160 million words corpus of articles extracted from all newsgroups containing English dialogueContext words: the 70,000 most frequently occurring symbols within the corpusWindow size: 10 words to the left and the right of the wordMeasure of similarity: cosine

2004/05 Metodi simulativi 61

Latent Semantic Analysis (LSA) Latent Semantic Analysis (LSA) (Landauer et al, 1997)(Landauer et al, 1997)

Goal: extract relatons of expected contextual usage from passages Two steps:1. Build a word / document cooccurrence matrix2. `Weigh’ each cell 3. Perform a DIMENSIONALITY REDUCTION

Argued to correlate well with humans on a number of tests

2004/05 Metodi simulativi 62

LSA: the method, 1LSA: the method, 1

2004/05 Metodi simulativi 63

LSA: Singular Value DecompositionLSA: Singular Value Decomposition

2004/05 Metodi simulativi 64

LSA: Reconstructed matrixLSA: Reconstructed matrix

2004/05 Metodi simulativi 65

Topic correlations in `raw’ and Topic correlations in `raw’ and `reconstructed’ data`reconstructed’ data

2004/05 Metodi simulativi 69

SEXTANT (Grefenstette, 1992)SEXTANT (Grefenstette, 1992)

It was concluded that the carcinoembryonic antigens represent cellular constituents which are repressed during the course of differentiation the normal digestive system epithelium and reappear in the corresponding malignant cells by a process of derepressive dedifferentiation

antigen carcinoembryonic-ADJantigen repress-DOBJantigen represent-SUBJconstituent cellular-ADJconstituent represent-DOBJcourse repress-IOBJ……..

2004/05 Metodi simulativi 70

SEXTANT: Similarity measureSEXTANT: Similarity measure

dog pet-DOBJdog eat-SUBJ dog shaggy-ADJdog brown-ADJdog leash-NN

cat pet-DOBJcat pet-DOBJ cat hairy-ADJcat leash-NN

CATDOG

B andA by possessed attributes Unique

B andA by shared Attributes

Count

CountJaccard:

6

2

ADJ}-shaggyDOBJ,-petNN,-leashADJ,-hairySUBJ,-eatADJ,-{brown

DOBJ}-pet NN,-{leash

Count

Count

2004/05 Metodi simulativi 71

Some caveatsSome caveats

Two senses of `similarity’Schuetze: two words are similar if one can replace the otherBrown et al: two words are similar if they occur in similar contexts

What notion of `meaning’ is learned here?“One might consider LSA’s maximal knowledge of the world to be analogous to a well-read nun’s knowledge of sex, a level of knowledge often deemed a sufficient basis for advising the young” (Landauer et al, 1997)

Can one do semantics with these representations?Our own experience: using HAL-style vectors for resolving bridging referencesVery limited successApplying dimensionality reduction didn’t seem to help

2004/05 Metodi simulativi 72

Applications of these techniques: Applications of these techniques: Information RetrievalInformation Retrieval

cosmonaut

astronaut moon

car truck

d1 1 0 1 1 0

d2 0 1 1 0 0

d3 1 0 0 0 0

d4 0 0 0 1 1

d5 0 0 0 1 0

d6 0 0 0 0 1

2004/05 Metodi simulativi 73

ReadingsReadings

Jurafsky and Martin, chapter 17.3Also useful:

Manning and Schuetze, chapter 8Charniak, chapters 9-10

Some papers:HAL: see the Higher Dimensional Space pageLSA: Various papers on the Colorado site

Good reference: Landauer, Foltz, and Laham. (1997). Introduction to Latent Semantic Analysis. Discourse Processes.