wordnet: connecting words and concepts

30
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University

Upload: hali

Post on 06-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

WordNet: Connecting words and concepts. Christiane Fellbaum Cognitive Science Laboratory Princeton University. What is WordNet?. A large lexical database, or “electronic dictionary” Covers most English nouns, verbs, adjectives, adverbs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WordNet: Connecting words and concepts

WordNet:Connecting words and concepts

Christiane Fellbaum

Cognitive Science Laboratory

Princeton University

Page 2: WordNet: Connecting words and concepts

What is WordNet?

• A large lexical database, or “electronic dictionary”

• Covers most English nouns, verbs, adjectives, adverbs

• Electronic format makes it amenable to automatic manipulation

• Used in many applications (document retrieval and sorting, machine translation,...)

Page 3: WordNet: Connecting words and concepts

What’s so special about WordNet?

• Traditional paper dictionaries are organized alphabetically, so words that are grouped together (on the same page) are unrelated

• WordNet is organized by meaning, so words in close proximity are related

Page 4: WordNet: Connecting words and concepts

What’s so special...?

• Users can browse WordNet and find words related to their queries (like in a thesaurus)

Page 5: WordNet: Connecting words and concepts

Basic Design of WordNet

WordNet entries are word-concept mappings

Natural Languages map many-to many:One concept can be expressed by many words(synonymy):{car, auto, automobile}{close, shut}

Page 6: WordNet: Connecting words and concepts

Basic Design of WordNet

One word can express many concepts (polysemy):

{club, stick}{club, nightclub}{club, playing card}

Page 7: WordNet: Connecting words and concepts

Basic Design of WordNet

Added problem in Natural Language:

The words we use most frequently are the most polysemous (have the most meanings)!

Page 8: WordNet: Connecting words and concepts

Basic Design of WordNet

WordNet harnesses synonymy and polysemy

Represents words and concepts unambiguously

Meaningfully relates words and concepts

Page 9: WordNet: Connecting words and concepts

Basic Design of WordNet

WordNet’s building blocks: sets of synonyms (synsets)

{hit, beat}

{big, large}

{queue, line}

Each synset expresses a distinct concept.

Currently, WordNet contains appr. 117,000 synsets

Page 10: WordNet: Connecting words and concepts

Basic Design of WordNet

WordNet stores, and allows one to retrieve,

--all concepts that a given word can express

--all words that express a given concept

Page 11: WordNet: Connecting words and concepts

But wait--there’s more!

• Words and synsets are connected via meaning-based relations

• Result: a large semantic network

(as opposed to a flat list in a paper dictionary)

Page 12: WordNet: Connecting words and concepts

Relations among WN noun synsets

• Hyperonymy/hyponymy relates super/subordinate synsets (denting more/less general concepts):

{vehicle}

/ \

{car, automobile} {bicycle, bike}

/ \ \

{convertible} {SUV} {mountain bike}

Transitivity:

A car is a kind of vehicle

An SUV is a kind of car

=> An SUV is a kind of vehicle

Page 13: WordNet: Connecting words and concepts

Relations among noun synsets

• Meronymy/holonymy (part/whole) {car, automobile} | {engine} / \ {spark plug} {cylinder}Inheritance:A car has an engine An engine has spark plugs=> A car has spark plugs

Page 14: WordNet: Connecting words and concepts

Relations among verb synsets

Verbs denote eventRelated by a “manner” relation {communicate} | {talk} / \ {stammer} {whisper}

Page 15: WordNet: Connecting words and concepts

Relations among verb synset

Semantics of events (verbs) are very different from semantics of entities (nouns)

WordNet captures this fact with different relations

Relation refer to temporal properties of events

--partial and complete overlap of two events

--prior or posterior events

Page 16: WordNet: Connecting words and concepts

WordNet

Relations among synsets create interconnected network

Different senses of polysemous words are members of distinct synsets that are related to different synsets (i.e., occupy different locations in the network)

e.g., {stock, broth} has superordinate synset {dish} {stock, breed} has superordinate {variety}These different synsets are also linked to different part/whole

synsets

Page 17: WordNet: Connecting words and concepts

WordNet

A word’s meaning can be defined in terms of its position in the network

club1 is a kind of association/has members

club2 is a kind of stick

Relatedness between words or synsets can be quantified in terms of path length (number of connections among synsets)

Page 18: WordNet: Connecting words and concepts

WordNet

• How closely related are {zebra} and {horse}?Very: Both share the direct superordinate equine• What about {horse, sawhorse} and {horse,

gymnastic horse}?Related, but less so: joint superordinate {artifact} is

4-5 levels up • What about {zebra} and {horse, gymnastic

horse}?Unrelated: the trees containing them never intersect!

Page 19: WordNet: Connecting words and concepts

WordNet for Word Sense Disambiguation

• WSD is a major problem in Natural Language Processing

• Assumption: words in a context (phrase, sentence, discourse) are semantically related

• So, horse in the neighborhood of zebra is likely to mean “equine”; in the neighborhood of gym it likely means “gymnastic horse.”

Page 20: WordNet: Connecting words and concepts

WordNet for WSD

If you want to disambiguate “horse” in the context of “zebra,” look for all WordNet paths from “zebra” to “horse.” The shortest one is likely to give you the correct sense of “horse.”

Page 21: WordNet: Connecting words and concepts

WordNet for WSD

• Can take advantage of WordNet classes (trees of hierarchically related synsets)

• e.g., run1 co-occurs with nouns that are all hyponyms (subordinate, more specific concepts) of office (mayor, congresswoman, President,...)

• run2 co-occurs with nouns that are hyponyms of machine (computer, washer, printing press, engine,...)

Page 22: WordNet: Connecting words and concepts

Topics/Domain in WordNet

• Hierachical organization leaves many related concepts unconnected

• Solution: link synsets across “trees” in terms of their membership in a “domain” or topic

• E.g., synsets {contraindication},{surgery}, {physician},....are all linked to {medicine},

the concept that defines a domain or topic

Page 23: WordNet: Connecting words and concepts

Topics/Domain in WordNet

• Customizable: user can define new topics

• Topics can be as coarse- or fine-grained as desired

• By using synsets as topic labels, the concepts subsumed under the new topic(s) will continue to be part of the network

Page 24: WordNet: Connecting words and concepts

Current and Future Work

• Increase density of WordNet• More links, new relations• E.g. “role” relation among nouns:

distinguish {poodle}-{dog} (a “type” relation)

from {poodle}-{pet} (a “role” relation)

poodle is a type of dog, but not a type of pet

poodle can (but must not) play the “role” of pet

Page 25: WordNet: Connecting words and concepts

Work just completed...(sponsored by ARDA/AQUAINT)

Manually link nouns, verbs, adjectives, adverbs in the definitions (“glosses”) to the appropriate synset:

{bank (a financial institution that accepts deposits...)}

{bank (sloping land..)}

Page 26: WordNet: Connecting words and concepts

Gloss Disambiguation

{bank (a financial institution that accepts deposits...)}

{financial, fiscal} {institution, establishment}

{institution, custom}

{bank (sloping land..)}

{slope, incline} {land, ground, earth}

{land, country}

Page 27: WordNet: Connecting words and concepts

Gloss Disambiguation: Results

• A closed system linking glosses and synsets (and a more densely connected network)

• Each gloss is more informative as it adds synset information for the words in the gloss

• Glosses are examples of contexts for many word-sense pairs, telling us how words with specific senses are being used in context

• Glosses can be used as training data for machine learning systems that want to “learn” to disambiguate words automatically

Page 28: WordNet: Connecting words and concepts

Where to find WordNet

Freely downloadable:

http://wordnet.princeton.edu/

Database, browser, documentation

Page 29: WordNet: Connecting words and concepts

Global WordNet

Currently, wordnets exist for some 40 languages, including

Arabic, Basque, Bulgarian, Estonian, Hebrew, Icelandic, Italian, Kannada, Latvian, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish,...

http://www.globalwordnet.org

Page 30: WordNet: Connecting words and concepts

Thank you!

For questions, comments, and papers:

[email protected]