examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html natural language toolkit

6
Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit

Upload: hector-young

Post on 11-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit

Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

Natural Language Toolkit

Page 2: Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit

Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

Overview• The NLTK is a set of Python modules to carry

out many common natural language tasks.• Access it at nltk.sourceforge.net• There are versions for Windows, OS X, Unix,

Linux. Detailed instructions on Installation tab• In addition to the toolkit you will need two other

modules: tkinter and Numeric. We haven’t been able to get numeric to install smoothly with Python 2.4 under Windows, only with 2.3.

• You do also want the contrib and data packages.• Pay attention to what INSTALL.TXT in the data

package says about the NLTK_CORPORA path.

Page 3: Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit

Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

Accessing NLTK• Standard Python import command• >>> from nltk.corpus import gutenberg• >>> gutenberg.items()• ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt',

'bible-kjv.txt', 'blake-poems.txt', 'blake-songs.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']

Or• >>> import nltk.corpus• >>> nltk.corpus.gutenberg.items()• ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt',

'bible-kjv.txt', 'blake-poems.txt', 'blake-songs.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']

Page 4: Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit

Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

Modules• The NLTK modules include:

– token: classes for representing and processing individual elements of text, such as words and sentences

– probability: classes for representing and processing probabilistic information.

– tree: classes for representing and processing hierarchical information over text.

– cfg: classes for representing and processing context free grammars.

– fsa: finite state automata– tagger: tagging each word with a part-of-speech, a sense, etc – parser: building trees over text (includes chart, chunk and

probabilistic parsers) – classifier: classify text into categories (includes feature,

featureSelection, maxent, naivebayes– draw: visualize NLP structures and processes– corpus: access (tagged) corpus data

• We will cover some of these explicitly as we reach topics.

Page 5: Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit

Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

One Simple ExampleIDLE 1.0.3 >>> from nltk.tokenizer import *>>> text_token = Token(TEXT='Hello world. This is a test file.')>>> print text_token<Hello world. This is a test file.>>>> WhitespaceTokenizer(SUBTOKENS='WORDS').tokenize(text_token)>>> print text_token<[<Hello>, <world.>, <This>, <is>, <a>, <test>, <file.>]>>>> print text_token['TEXT']Hello world. This is a test file.>>> print text_token['WORDS'][<Hello>, <world.>, <This>, <is>, <a>, <test>, <file.>]

Page 6: Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html Natural Language Toolkit

Examples taken from: nltk.sourceforge.net/tutorial/introduction/index.html

LAB

• Detailed documentation and tutorials under the Documentation tab at the Sourceforge site.

• Work through the “gentle introduction” and “elementary language processing” tutorials on the NLTK:

nltk.sourceforge.net/tutorial/introduction/index.html