wordnet and extended wordnet

29
23- November-09 1 WordNet and Extended WordNet Sriram Rajaraman

Upload: gray-boone

Post on 04-Jan-2016

77 views

Category:

Documents


0 download

DESCRIPTION

WordNet and Extended WordNet. Sriram Rajaraman. Objective. Introduce the idea of an semantic lexicon ontology, especially WordNet and eXtended WordNet. Focus. Introduction WordNet eXtended WordNet Summary. Reference. WordNet: http://wordnet.princeton.edu/ - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WordNet  and Extended WordNet

23- November-09 1

WordNet and Extended WordNet

Sriram Rajaraman

Page 2: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

2

WordNet and eXtended WordNet

Objective

Introduce the idea of an semantic lexicon ontology, especially WordNet and eXtended WordNet

Page 3: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

3

WordNet and eXtended WordNet

Focus

Introduction WordNet eXtended WordNet Summary

Page 4: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

4

WordNet and eXtended WordNet

Reference

1. WordNet: http://wordnet.princeton.edu/

2. eXtended WordNet: http://xwn.hlt.utdallas.edu/

3. Christiane Fellbaum,MIT ,”WordNet : an electronic lexical database”, MIT Press, 1999, c1998.

4. George A. Miller, Richard Beckwith, Christiane Fellbaum,Derek Gross, and Katherine Miller, “Introduction to WordNet: An On-line Lexical Database”, core working paper

5. Rada Mihalcea, Dan I. Moldovan,” eXtended WordNet: progress report  ” Proceedings of NAACL Workshop on WordNet and Other Lexical Resources , 2001

6. Sanda M. Harabagiu, George A. Miller, Dan I. Moldovan, “WordNet 2 - A Morphologically and Semantically Enhanced Resource”, SIGLEX 1999

Page 5: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

5

WordNet and eXtended WordNet

Focus

Introduction WordNet eXtended WordNet Summary

Page 6: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

6

WordNet and eXtended WordNet

Introduction

Traditional Dictionary What is available:

spelling pronunciation inflected and derivative forms etymology part of speech definitions illustrative uses of alternative senses synonyms and antonyms special usage notes

Page 7: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

7

WordNet and eXtended WordNet

TreeRef: http://www.merriam-webster.com/dictionary/Tree

Main Entry: tree Pronunciation: \ˈtrē\ Function: noun Etymology: Middle English, from Old English trēow; akin to Old Norse trē tree, Greek drys, Sanskrit dāru wood

Date: before 12th century - a woody perennial plant having a single

usually elongate main stem generally with few or no branches on its lower part

Page 8: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

8

WordNet and eXtended WordNet

Drawback of traditional dictionary

What is missing: It does not say, for example, that trees have roots, or that they

consist of cells having cellulose walls, or even that they are living organisms

“Sense” of the super ordinate term aka hypernym (living plant or industrial plant)

Coordinate terms (bushes, shrubs, …) Hyponyms - types of trees (pine, tropical,deciduous..) Information assumed to be known to everyone ( trees have

barks and leaves, they grow from seeds, they make their own food by photosynthesis- probably information for encyclopedia!)

Page 9: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

9

WordNet and eXtended WordNet

How can we improve ?

The missing information is structural – every word points upwards to its super-ordinate (hypernym), but not sideward to its co-ordinates or downward to the hyponym.

Restriction due to alphabetical ordering, budget and size constraints- which can be overcome in an electronic lexical database

Page 10: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

10

WordNet and eXtended WordNet

Focus

Introduction

WordNet eXtended WordNet Summary

Page 11: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

11

WordNet and eXtended WordNet

What is WordNet?

WordNet is a lexical database for the English language.

WordNet 3.0 has [1]: – 117,097 nouns (average noun has 1.23 senses) – 11,488 verbs (average verb has 2.16 sense) – 22,141 adjectives – 4,601 adverbs

Created and maintained at the Cognitive Science Laboratory of Princeton University

Accessible online @http://wordnetweb.princeton.edu/perl/webwn(Also Downloadable)

Interfaces available in , c, dot Net , java, perl, php, python, sql etc..(JWNL, WordNet.Net, RTiA wordNet, pywordne ..)

Page 12: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

12

WordNet and eXtended WordNet

WordNet Structure

Words are organized as synsets in WordNet

There are four disjoint kinds of synsets, containing either

Nouns verbs Adjectives Adverbs

Page 13: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

13

WordNet and eXtended WordNet

What is a synset?

Basic unit of WordNet A group of synonymous words which refer to

a common semantic concept Words may belong to more than one synset –

first sense is the most frequent sense Words also include collocations (“eye

contact’, “mix up”) Example

Page 14: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

14

WordNet and eXtended WordNet

Synset example

“car” as in {car, auto, automobile, machine, motorcar} {car, railcar, railway car, railroad car}.

“Chocolate” as in-

Page 15: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

15

WordNet and eXtended WordNet

How are synsets related?

A list of pointers associated with each sysnet to express the relationship between synsets

WordNet defines 17 relations 10 between synsets 5 between wordsense "gloss" (between a synset and a sentence, i.e a textual

definition for each synset) "frame" (between a synset and a verb construction

pattern)

Page 16: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

16

WordNet and eXtended WordNet

WordNet relations

Page 17: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

17

WordNet and eXtended WordNet

Page 18: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

18

WordNet and eXtended WordNet

Applications of WordNet

Information Extraction Information Retreival Question Answering Word Sense Disambiguation Text Inference Coreference, coherence and metonymy Knowledge acquisition Internet Search engine

Page 19: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

19

WordNet and eXtended WordNet

Limitations of WordNet

Designed as a semantic lexicon, not a knowledge base

Limited connections between topically related words

Lack of morphological relationship (special algorithm does that)

Lack of selectional restriction And more…. [6]

Page 20: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

20

WordNet and eXtended WordNet

Focus

Introduction WordNet

eXtended WordNet Summary

Page 21: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

21

WordNet and eXtended WordNet

eXtended WordNet[2]

A project at the Human Language Technology Research Institute , at The University of Texas at Dallas(http://xwn.hlt.utdallas.edu)

Provides several important enhancements (over WordNet2.0) intended to remedy the present limitations of WordNet

Current Version: eXtended WordNet 2.0 (xwn 2.0-1.1)

Page 22: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

22

WordNet and eXtended WordNet

Objective of eXtended WordNet

Exploit the rich information, available in synset glosses (gloss is a sentence, i.e a textual definition for each synset)

Semantic and logical enhancements to WordNet Increase the connectivity among the synsets by

at least one order of magnitude Enable access to a broader context for each

concept

Page 23: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

23

WordNet and eXtended WordNet

What eXtended WordNet does?[5]

Preprocessing and Parsing Separation of glosses into definition and examples,

tokenization and identification of compound words Word Sense Disambiguation

All words in a gloss are tagged with appropriate senses and linked to corresponding synsets

Logical Form Transformation Gloss Logical Forms

Topical Relations Connections are established between the words,

based on the context/topic

Page 24: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

24

WordNet and eXtended WordNet

Extended WordNet

tennis court

“Tennis court: A court on which tennis is played.”

playcourt

tennisobject

location-ofdef

{“tennis”, “lawn tennis”}

Page 25: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

25

WordNet and eXtended WordNet

eXtended WordNet format

Consists of four XML files--one for each part of speech: Noun Verb Adjective Adverb

The xml tags contain attributes that specify the relationships

Page 26: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

26

WordNet and eXtended WordNet

eXtended WordNet- Applications

Core Knowledge Base for applications - Question Answering Information Retrieval Information Extraction Summarization Natural Language Generation Inferences Other knowledge intensive applications

Page 27: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

27

WordNet and eXtended WordNet

Focus

Introduction WordNet eXtended WordNet

Summary

Page 28: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

28

WordNet and eXtended WordNet

Further Reading

W3C- RDF/OWL Representation of WordNet http://www.w3.org/TR/wordnet-rdf/

eXtended WordNet Format/algorithm http://xwn.hlt.utdallas.edu/wsd.html

Current research at Princeton http://wordnet.cs.princeton.edu/projects.html

Related Projects (APIs, Web Interface, Extension) http://wordnet.princeton.edu/wordnet/related-projects/

Page 29: WordNet  and Extended WordNet

23- November-09 University of Texas at DallasErik Jonnson School of Engineering and Computer

Science

29

WordNet and eXtended WordNet

Back up