wordnet ® and its java api ♦ introduction to wordnet ♦...

Download WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489

Post on 29-Dec-2015

215 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • WordNet and its Java API

    Introduction to WordNet

    WordNet API for Java

    Name: Hao Li Uni: hl2489

  • Introduction to WordNet 1.WordNet is a large lexical database of English. It is kind of a dictionary. It is developed by Cognitive Science Laboratory of Priceton University.

    2.In WordNet, Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

    3.In WordNet, Synsets are interlinked by means of conceptual-semantic and lexical relations.

    4.WordNet is freely and publicly available for download and also have APIs for different programming languages. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.

  • WordNet API for JAVA(1)Method Summary of Class WordNetDatabase abstract String[] getBaseFormCandidates(String inflection, SynsetType type) Returns lemma representing word forms that might be present in WordNet. static WordNetDatabase getFileInstance() Returns an implementation of this class that can access the WordNet database by searching files on the local file system. Synset[] getSynsets(String wordForm) Returns all synsets that contain the specified word form or a morphological variation of that word form. Synset[] getSynsets(String wordForm, SynsetType type) Returns only the synsets of a particular type (e.g., noun) that contain a word form or morphological variation of that form. abstract Synset[] getSynsets(String wordForm, SynsetType type, boolean useMorphology) Returns only the synsets of a particular type (e.g., noun) that contain a word form matching the specified text or one of that word form's variants.

  • WordNet API for JAVA(2)Method Summary of Calss Synset WordSense[] getAntonyms(String wordForm) Returns the antonyms (words with the opposite meaning), if any, associated with a word form in this synset. String getDefinition() Retrieve a short description / definition of this concept. WordSense[] getDerivationallyRelatedForms(String wordForm) Returns word forms that derivationally related to the one specified. int getTagCount(String wordForm) Returns a number that's intended to provide an approximation of how frequently the specified word form is used to represent this meaning relative to how often it's used to represent other meanings. SynsetType getType() Retrieve the type of synset this object represents. String[] getUsageExamples() Retrieve sentences showing examples of how this synset is used. String[] getWordForms() Retrieve the word forms.

  • Method used in the project(1) WordNetDatabase.getSynsets(String wordForm, SynsetType type) Take word pig as example:

    Synset[0]=Noun@2395406[hog,pig,grunter,squealer,Sus scrofa] - domestic swine

    Synset[1]=Noun@10612210[slob,sloven,pig,slovenly person] - a coarse obnoxious person

    Synset[2]=Noun@10179649[hog,pig] - a person regarded as greedy and pig-like

    Synset[3]=Noun@9879144[bull,cop,copper,fuzz,pig] - uncomplimentary terms for a policeman

    Synset[4]=Noun@3935116[pig bed,pig] - mold consisting of a bed of sand in which pig iron is cast

    Synset[5]=Noun@3934998[pig] - a crude block of metal (lead or iron) poured from a smelting furnace

  • Method used in the project(2)Synset. getDefinition() Take Synset[0] of word pig as example:

    domestic swine

  • Method used in the project(3) Synset.getTagCount(String wordForm) It is a very useful method. It represent the frequency of the specified word used to represent this meaning relative to how often it's used to represent other meanings.

    This method has two usage according to my understanding: (1)Analyse the same word of its different synets.

    (2) Analyse different words of the same synset.

  • Analyse the same word of its different synets. Synset.getTagCount(String wordForm) The results shows us which meaning of the word is more frequently used.

    For example:The frequemcy of the word bridge in the following synset is 4. Synset[0]=Noun@2898711[bridge,span] - a structure that allows people or vehicles to cross an obstacle such as a river or canal or railway etc.

    And in another synset of bridge is 1. Synset[4]=Noun@490569[bridge] - any of various card games based on whist for four players

    The above example means when people talk about bridge, it is more likely about a structure bridge than the card game bridge.

  • Analyse different words of the same synset. Synset.getTagCount(String wordForm) The result shows us in order to express a definition, which word is more accurate and will not cause word sense ambiguation.

    For example:In a synset of the word javaSynset=Noun@7929519[coffee,java] - a beverage consisting of an infusion of ground coffee beansThe frequency of the word coffee is 46 and the word java is 1 .

    It means coffee is more representative in the meaning of a beverage consisting of an infusion of ground coffee beans than the word java. when people talks about coffee, you will understand they are talking about a beverage consisting of an infusion of ground coffee beans but not other meanings. And when people talks about java, they may talk about the beverage or the programming language java.

  • Conclusion

    1.There are two purpose of WordNet application: one is to produce a combination of dictionary and thesaurus that is more intuitively usable, and the other is to support automatic text analysis and artificial intelligence applications.

    2.Because of its features, WordNet is now videly used in information systems, including word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and even automatic crossword puzzle generation.

    And it is also used in our project!

  • I will tell you --------------what our WordNet based algorithm is in demo next week .

    Thank you!

Recommended

View more >