wordnet ® and its java api â™¦ introduction to wordnet â™¦...
Post on 29-Dec-2015
Embed Size (px)
WordNet and its Java API
Introduction to WordNet
WordNet API for Java
Name: Hao Li Uni: hl2489
Introduction to WordNet 1.WordNet is a large lexical database of English. It is kind of a dictionary. It is developed by Cognitive Science Laboratory of Priceton University.
2.In WordNet, Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.
3.In WordNet, Synsets are interlinked by means of conceptual-semantic and lexical relations.
4.WordNet is freely and publicly available for download and also have APIs for different programming languages. WordNet's structure makes it a useful tool for computational linguistics and natural language processing.
WordNet API for JAVA(1)Method Summary of Class WordNetDatabase abstract String getBaseFormCandidates(String inflection, SynsetType type) Returns lemma representing word forms that might be present in WordNet. static WordNetDatabase getFileInstance() Returns an implementation of this class that can access the WordNet database by searching files on the local file system. Synset getSynsets(String wordForm) Returns all synsets that contain the specified word form or a morphological variation of that word form. Synset getSynsets(String wordForm, SynsetType type) Returns only the synsets of a particular type (e.g., noun) that contain a word form or morphological variation of that form. abstract Synset getSynsets(String wordForm, SynsetType type, boolean useMorphology) Returns only the synsets of a particular type (e.g., noun) that contain a word form matching the specified text or one of that word form's variants.
WordNet API for JAVA(2)Method Summary of Calss Synset WordSense getAntonyms(String wordForm) Returns the antonyms (words with the opposite meaning), if any, associated with a word form in this synset. String getDefinition() Retrieve a short description / definition of this concept. WordSense getDerivationallyRelatedForms(String wordForm) Returns word forms that derivationally related to the one specified. int getTagCount(String wordForm) Returns a number that's intended to provide an approximation of how frequently the specified word form is used to represent this meaning relative to how often it's used to represent other meanings. SynsetType getType() Retrieve the type of synset this object represents. String getUsageExamples() Retrieve sentences showing examples of how this synset is used. String getWordForms() Retrieve the word forms.
Method used in the project(1) WordNetDatabase.getSynsets(String wordForm, SynsetType type) Take word pig as example:
Synset=Noun@2395406[hog,pig,grunter,squealer,Sus scrofa] - domestic swine
Synset=Noun@10612210[slob,sloven,pig,slovenly person] - a coarse obnoxious person
Synset=Noun@10179649[hog,pig] - a person regarded as greedy and pig-like
Synset=Noun@9879144[bull,cop,copper,fuzz,pig] - uncomplimentary terms for a policeman
Synset=Noun@3935116[pig bed,pig] - mold consisting of a bed of sand in which pig iron is cast
Synset=Noun@3934998[pig] - a crude block of metal (lead or iron) poured from a smelting furnace
Method used in the project(2)Synset. getDefinition() Take Synset of word pig as example:
Method used in the project(3) Synset.getTagCount(String wordForm) It is a very useful method. It represent the frequency of the specified word used to represent this meaning relative to how often it's used to represent other meanings.
This method has two usage according to my understanding: (1)Analyse the same word of its different synets.
(2) Analyse different words of the same synset.
Analyse the same word of its different synets. Synset.getTagCount(String wordForm) The results shows us which meaning of the word is more frequently used.
For example:The frequemcy of the word bridge in the following synset is 4. Synset=Noun@2898711[bridge,span] - a structure that allows people or vehicles to cross an obstacle such as a river or canal or railway etc.
And in another synset of bridge is 1. Synset=Noun@490569[bridge] - any of various card games based on whist for four players
The above example means when people talk about bridge, it is more likely about a structure bridge than the card game bridge.
Analyse different words of the same synset. Synset.getTagCount(String wordForm) The result shows us in order to express a definition, which word is more accurate and will not cause word sense ambiguation.
For example:In a synset of the word javaSynset=Noun@7929519[coffee,java] - a beverage consisting of an infusion of ground coffee beansThe frequency of the word coffee is 46 and the word java is 1 .
It means coffee is more representative in the meaning of a beverage consisting of an infusion of ground coffee beans than the word java. when people talks about coffee, you will understand they are talking about a beverage consisting of an infusion of ground coffee beans but not other meanings. And when people talks about java, they may talk about the beverage or the programming language java.
1.There are two purpose of WordNet application: one is to produce a combination of dictionary and thesaurus that is more intuitively usable, and the other is to support automatic text analysis and artificial intelligence applications.
2.Because of its features, WordNet is now videly used in information systems, including word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and even automatic crossword puzzle generation.
And it is also used in our project!
I will tell you --------------what our WordNet based algorithm is in demo next week .