structured lexicons and lexical semantics especially wordnet ® see d jurafsky & jh martin:...
Post on 21-Dec-2015
218 views
TRANSCRIPT
![Page 1: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/1.jpg)
Structured lexiconsand Lexical semantics
Especially WordNet®
See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000): Prentice Hall, Chapter 16.
and http://en.wikipedia.org/wiki/WordNet
and explore WordNet: http://wordnet.princeton.edu/
![Page 2: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/2.jpg)
2/27
Structured lexicons
• Alternative to alphabetical dictionary• List of words grouped according to meaning• Classic example Roget’s Thesaurus• Hierarchical organization is important• Hierarchies familiar as taxonomies, eg in natural
sciences– Daughters are “types of” and share certain properties,
inherited from the mother
• Similar idea for ordinary words: hyponymy and synonymy
![Page 3: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/3.jpg)
3/27
animal
bird fish ...
canary eagle trout shark
bald e. golden e. hawk e. bateleur
space
in general dimensions form motion
size expansion distance interval contiguity
reduction, deflation, shrinkage, curtailment, condensation ....
hyponymy
synonymy
![Page 4: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/4.jpg)
4/27
Thesaurus
• A way to show the structure of (lexical) knowledge
• Much used for technical terminology• Can be enriched by having other lexical
relations:– Antonyms (as well as synonyms)– Different hyponymy relations, not just is-a-type-of, but
has-as-part/member• Thesaurus can be explored in any direction
– across, up, down– Some obvious distance metrics can be used to
measure similarity between words
![Page 5: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/5.jpg)
5/27
WordNet: History
• 1985: a group of psychologists and linguists start to develop a “lexical database”– Princeton University– theoretical basis: results from
• psycholinguistics and psycholexicology– What are properties of the “mental lexicon”?
![Page 6: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/6.jpg)
6/27
Global organisation
• division of the lexicon into five categories:– Nouns– Verbs– Adjectives– Adverbs– function words (“probably stored separately
as part of the syntactic component of language” [Miller et al.]
![Page 7: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/7.jpg)
7/27
Global organization
• nouns: organized as topical hierarchies
• verbs: entailment relations
• adjectives: multi-dimensional hyperspaces
• adverbs: multi-dimensional hyperspaces
![Page 8: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/8.jpg)
8/27
Lexical semantics
• How are word meanings represented in WordNet?– synsets (synonym sets) as basic units– a word ‘meaning’ is represented by simply listing the word forms
that can be used to express it
• example: senses of board– a piece of lumber vs. a group of people assembled for some
purpose– synsets as unambiguous designators:– {board, plank, ...} vs. {board, committee, ...}
• Members of synsets are rarely true synonyms– WordNet does not attempt to capture subtle distinctions among
members of the synset– may be due to specific details, or simply connotation, collocation
![Page 9: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/9.jpg)
9/27
Synsets
• synsets often sufficient for differential purposes– if an appropriate synonym is not available a
short gloss may be used– e.g. {board, (a person’s meals, provided
regularly for money)}– Preferable for cardinality of synset to be >1– WordNet also gives a gloss for each word
meaning, and (often) an example
![Page 10: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/10.jpg)
10/27
![Page 11: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/11.jpg)
11/27
WordNet is big
![Page 12: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/12.jpg)
12/27
Lexical relations in WordNet
• WordNet is organized by semantic relations.– It is characteristic of semantic relations that they are
reciprocated– if there is a semantic relation R between meaning {x1,
x2, ...} and meaning {y1, y2, ...}, then there is a relation R between {y1,y2, ...} and {x1, x2, ...}
– Individual relations may or may not be• Symmetric R(A,B) R(B,A) (eg synonymy, not hyponymy)• Transitive R(A,B) & R(B,C) R(A,C) (eg synonymy may be)• Reflexive R(A,A) is true (synonymy is, antonymy isn’t)
![Page 13: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/13.jpg)
13/27
Lexical relations• Nouns
– Synonym ~ antonym (opposite of)– Hypernyms (is a kind of) ~ hyponym (for example)– Coordinate (sister) terms: share the same hypernym– Holonym (is part of) ~ meronym (has as part)
• Verbs– Synonym ~ antonym– Hypernym ~ troponym (eg lisp – talk) – Entailment (eg snore – sleep)– Coordinate (sister) terms: share the same hypernym
• Adjectives/Adverbs in addition to above– Related nouns– Verb participles– Derivational information
![Page 14: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/14.jpg)
14/27
Lexical relations: synonymy
• similarity of meaning– Leibniz: two expressions are synonymous if the
substitution of one for the other never changes the truth value of a sentence in which the substitution is made
• such global synonymy is rare (it would be redundant)– synonymy relative to a context: two expressions are
synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value
– consequence of this synonymy in terms of substitutability: words in different syntactic categories cannot be synonyms
![Page 15: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/15.jpg)
15/27
Lexical relations: antonymy
• antonym of a word x is sometimes not-x, but not always– rich and poor are antonyms– but: not rich does not imply poor– (because many people consider themselves neither
rich nor poor)
• antonymy is a lexical relation between word forms, not a semantic relation between word meanings– meanings {rise, ascend} and {fall, descend} are
conceptual opposites, but they are not antonyms [rise/fall] and [ascend/descend] are pairs of antonyms
![Page 16: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/16.jpg)
16/27
Lexical relations: hyponymy
• hyponymy is a semantic relation between word meanings– {maple} is a hyponym of {tree}
• inverse: hypernymy– {tree} is a hypernym of {maple}
• also called: subordination/superordination; subset/superset; ISA relation
• test for hyponomy:– native speaker must accept sentences built from the
frame “An x is a (kind of) y”• called troponomy when applied to verbs
![Page 17: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/17.jpg)
17/27
Lexical relations: meronymy
• A concept represented by the synset {x1, x2,...} is a meronym of a concept represented by the synset {y1, y2, ...} if native speakers of English accept sentences constructed from such frames as “A y has an x (as a part)”, “An x is a part of y”.
• inverse relation: holonymy• HAS-AS-PART
– part hierarchy– part-of is asymmetric and (with caution) transitive
![Page 18: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/18.jpg)
18/27
Lexical relations: meronymy
• failures of transitivity caused by different part-whole relations, e.g.– A musician has an arm.– An orchestra has a musician.– but: ? An orchestra has an arm.
• Types of meronymy in WordNet:– component [most frequently found]– member– composition– phase process
![Page 19: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/19.jpg)
19/27
![Page 20: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/20.jpg)
20/27
WordNet’s noun hierarchy
• noun hierarchy partitioned into separate hierarchies with unique top hypernyms
• vague abstractions would be semantically empty, e.g. {entity} with immediate hyponyms {object, thing} and {idea}
![Page 21: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/21.jpg)
21/27
• {act,action,activity}• {animal,fauna}• {artifact}• {attribute,property}• {body,corpus}• {cognition,knowledge}• {communication}• {event,happening}• {feeling,emotion}• {food}• {group,collection}• {location,place}• {motive}
• {natural object}• {natural phenomenon}• {person,human being}• {plant,flora}• {possession}• {process}• {quantity,amount}• {relation}• {shape}• {state,condition}• {substance}• {time}
![Page 22: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/22.jpg)
22/27
Nouns in WordNet
• noun hierarchy as lexical inheritance system– seldom goes more than ten levels deep, – the deepest examples usually contain
technical levels that are not part of everyday vocabulary
– shallowest levels are too vague– “Inherited hypernym” option shows full
hierarchy
![Page 23: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/23.jpg)
23/27
deep
shallow
![Page 24: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/24.jpg)
24/27
Nouns in WordNet
• man-made artefacts: sometimes six or seven levels deep– roadster → car → motor vehicle → wheeled vehicle
→ vehicle → conveyance → artefact
• hierarchy of persons: about three or four levels– televangelist → evangelist → preacher → clergyman
→ spiritual leader → person
• Like all thesaurus structures, words can have multiple hypernyms
![Page 25: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/25.jpg)
25/27
WordNets for other languages
• Idea has been widely copied• Sometimes by “translating” Princeton WordNet
– Lexical relations in general are universal ...– But are they in practice?– Are synsets universal?
• EuroWordNet: combining multilingual WordNets to include cross-language equivalence– Inherent difficulties, as above
![Page 26: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/26.jpg)
26/27
What can WordNet be used for?
• As a lexical resource, an online dictionary, for human use
• Word-sense disambiguation (including homophone correction)– neighbouring words will be more closely
related to correct sense (desert/dessert ~ camel)
• Document classification– What is this text about? Look for recurring
hypernyms
![Page 27: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/27.jpg)
27/27
What can WordNet be used for?
• Document retrieval– eg looking for texts about sports cars, search
for synonyms and hyponyms of sports car
• Open-domain Q/A– Searching texts (eg WWW) to answer
questions expressed in natural language– eg http://uk.ask.com/ [example]
• Textual entailment– Answering questions implied by text
![Page 28: Structured lexicons and Lexical semantics Especially WordNet ® See D Jurafsky & JH Martin: Speech and Language Processing, Upper Saddle River NJ (2000):](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d545503460f94a3040c/html5/thumbnails/28.jpg)
28/27