practical nlp and information extraction with gate · practical nlp and information extraction with...

67
Practical NLP and Information Extraction with GATE Dr. Diana Maynard University of Sheffield, UK

Upload: others

Post on 17-Jul-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

PracticalNLPandInformationExtractionwithGATE

Dr. DianaMaynardUniversityofSheffield,UK

Page 2: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Whatistextmining?

• TextMiningisthediscoveryofnew,previouslyunknowninformation,byautomaticallyextractinginformationfromdifferenttextualresources.

• Akeyelementisthelinkingtogetheroftheextractedinformationtogethertoformnewfactsornewhypothesestobeexploredfurther

• Textminingletsyouinvestigatewhat’sactuallyinadocumentoracollectionofdocuments

• Itletsusanswerwh-questions:who,what,why,when,how,where,which?

Page 3: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

TextMiningisnotDataMining

Examples:• using consumer purchasing

patterns to predict which products to place close together on shelves in supermarkets

• analysing spending patterns on credit cards to detect fraudulent card use.

• Data mining is about using analytical techniques to find interesting patterns from large structured databases

Page 4: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

TextMiningisnotWebSearch

• Text mining is also different from traditional web search.• In search, the user is typically looking for something that is

already known and has been written by someone else.• The problem lies in sifting through all the material that currently

isn't relevant to your needs, in order to find the information that is.• The solution often lies in better ways to ask the right question• You can't ask Google to tell you:

• How does the language used by Donald Trump differ from the language used by Hilary Clinton?

• In which parts of the country did people talk more about the environment during the UK elections?

• Which female MPs talked in the last 6 months about British hospitals with more than 100 deaths per month since 2010?

Page 5: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Basictextprocessing

His MMSE was 23/30 on 15 January 2008.

(MMSE=MiniMentalStateExamination)

Page 6: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Characteroffsets

His MMSE was 23/30 on 15 January 2008.

0....5....10...15...|....|....|....|....

Page 7: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Sentences

His MMSE was 23/30 on 15 January 2008.

0....5....10...15...|....|....|....|....

Page 8: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Tokens

His MMSE was 23/30 on 15 January 2008.

0....5....10...15...|....|....|....|....

Page 9: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Partofspeechcategories

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

PP NN VB CD CD PR CD NN CD

Page 10: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

MorphologicalAnalysis

VB

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

PP NN CD CD PR CD NN CDbe

Page 11: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering:findingpatterns

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

Month

PP NN VB CD CD PR CD NN CDbe

Page 12: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

MMSE Month

PP NN VB CD CD PR CD NN CDbe

Page 13: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

MMSE Month

PP NN VB CD CD PR CD NN CDbe

{number}{Month}{number}

Page 14: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

MMSE Month

PP NN VB CD CD PR CD NN CDbe

Date

Page 15: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

MMSE Month

PP NN VB CD CD PR CD NN CDbe

{number}{slash}{number} Date

Page 16: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

MMSE Month

Score Date

PP NN VB CD CD PR CD NN CDbe

Page 17: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

MMSE Month

Score Date

PP NN VB CD CD PR CD NN CDbe

{MMSE}{BE}{Score}{?}{Date}

Page 18: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Knowledgeengineering

MMSE Month

Score Date

MMSE with score and date

His MMSE was 23/30 on 15 January 2008.0....5....10...15...|....|....|....|....

PP NN VB CD CD PR CD NN CDbe

Page 19: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Typical Information Extraction pipeline

• Pre-processing (tokenisation, sentence splitting, morphological analysis, POS tagging)

• Entity finding (gazetteer lookup, NE grammars)• Co-reference (alias finding, orthographic co-

reference etc.)• Export the results somewhere (database / XML

/ ontology)

Page 20: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Example of Information Extraction

John lives in London . He works there for Polar Bear Design .

Page 21: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Basic Named Entity Recognition

John lives in London . He works there for Polar Bear Design .

PER LOC ORG

Page 22: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

same_as

John lives in London . He works there for Polar Bear Design .

Co-reference

PER LOC ORG

Page 23: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

John lives in London . He works there for Polar Bear Design .

Relations

PER LOC ORG

live_in

Page 24: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

John lives in London . He works there for Polar Bear Design .

Relations (2)

PER LOC ORG

employee_of

Page 25: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

John lives in London . He works there for Polar Bear Design .

Relations (3)

PER LOC ORG

based_in

Page 26: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

WhatisEventRecognition?

l Aneventisanactionorsituationrelevanttothedomainexpressedbysomerelationbetweenentitiesorterms.

l Itisalwaysgroundedintime,e.g.theperformanceofaband,anelection,thedeathofaperson

Mitt Romney, the favorite to win the Republican nomination for president in 2012

Event DatePerson

Relation Relation

Page 27: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

WhyareEntitiesandEventsUseful?

l Theycanhelpanswerthe“Big5”journalismquestions(who,what,when,where,why)

l Theycanbeusedtocategorisethetextsindifferentwaysl lookatalltextsaboutDonaldTrumpl Theycanbeusedastargetsforopinionminingl findoutwhatpeoplethinkaboutDonaldTrump

l Whenlinkedtoanontologyand/orcombinedwithotherinformation,theycanbeusedforreasoningaboutthingsnotexplicitinthetextl seeinghowopinionsaboutdifferentAmericanpresidents

havechangedovertheyears

Page 28: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

GATE: General Architecture for Text

Engineering

Page 29: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

WhyGATE?

• GATEisthemostwidelyusedopensourcetoolkitforNLPintheworld• We’reusingitbecauseit’sagreatwaytoshowcaseallthecoreNLP

componentsthatareusedfortextanalysistasks• YoucanplaywithallthetoolsinGATEandtryoutthingsforyourselftosee

howitworks• Experts

• DevelopedattheUniversityofSheffieldsince2000(initscurrentform)• ThepersonwhohasledthedevelopmentoftheNLPtoolsinGATEsince2000

istheonepresentingtoyounowJ

• Andbytheway,justbecauseit’solddoesn’tmeanit’soutofdate.GATEisinconstantdevelopmentwithnewtechnologiesbeingconstantlyadded.

Page 30: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Aboutthistutorial

• ThistutorialwillgetyoustartedwiththeGATEgraphicaluserinterface(GUI),alsoknownas“GATEDeveloper”

• EverythingyoudointheGUIcanbedoneviatheAPI,butit’seasiertoseewhat’sgoingonintheGUI

• Itwillbeahands-onsession.YoucantrythingsoutinGATEasthetopicsarepresented.

• Thingssuggestedforyoutotryyourselfarein red.• DownloadandinstallGATE8.5.1(ifyouhaven’talready)from

http://gate.ac.uk/download• StartGATEonyourcomputer(ifyouhaven'talready)bydoubleclickingthe

icon

Page 31: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

GATEGUI

Resources Pane

Menu Bar

Shortcut Buttons

ResourceFeatures

Messages

DisplayPane

Page 32: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Resources

• MostthingsyouusewithinGATEare“resources”:• Languageresources (LRs)aredocuments,documentcollections,

ontologies...• Acollectionofdocumentsisknownasacorpus

• Processingresources (PRs)areprogramsthatoperateontextwithinthedocuments,andoftencreateormodifyannotations

• Datastores areforstoringdocumentsandcorporaforlateruse• Applications (“pipelines”)aresequencesofprocessingresourcesthat

runononeormoredocuments

Page 33: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

DisplayingResources

• WhenyoufirstopenGATE,thedisplaypanewillshowmessagesfromthesysteminthe“Messages”tab

• Thedisplaypanedisplayswhateverelementsyouarecurrentlyworkingwith,e.g.anapplication,adocumentoraprocessingresource,eachinitsowntab

• Doubleclickingonaresourceintheresourcespanewilldisplayit• Tabsalongthetopofthedisplaypaneallowyoutochoosewhichof

theopenresourcestodisplay

Page 34: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

CreateNewDocument

• FromtheResourcesPane,rightclick“LanguageResources”→New→GATEDocument

• Ignoretheparametersettingsthatwillbedisplayed• ClickOK• “GATEDocument_<id>”willnowbeaddedto“LanguageResources”• Doubleclickthatdocumentname• Atabisopenedinthedisplaypane,showingtheemptydocument.• Nowentersometextthere.

Page 35: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

EmptyDocument

DocumentTab

DocumentEditor

DocumentName

DocumentEditor Buttons

DocumentResource Views

Page 36: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

DocumentEditor

• TheDocumentEditorisshownasanewTabintheDisplayPane,alongsidetheMessagePane

• TherearebuttonsonthetopoftheEditor,e.g.“AnnotationSets”–wewilllearnaboutthemlater.

• TherearetabsatthebottomoftheDocumentTab:theseshowdifferent“Views”ofthedocument.

• Thesmallpaneinthelowerleftshowsthe“documentfeatures”(optionalinformationassociatedwiththedocumentresourceaskey/valuepairs)

Page 37: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Simpleoperationsonresources

• Rightclickingonthenameofaresourceintheresourcepanegivesaccesstoamenuofactions

• Doubleclickingonthenameofaresourceopensaviewoftheresourceinthedisplaypane(tripleclickingthenamecanbeusedtorename)

• SelectingaresourceinstanceandpressingtheDelete(Mac:Fn+BS)keywillgenerallycloseit

• Youcanalsorightclickandthenselect“Close”

Page 38: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Parameters

• Resourcescanhaveparameterswhichneedtogetspecifiedwhentheresourceiscreated:Initialization(init)Parameters

• Processingresourcescanalsohaveparameterswhichcanbechangedforeachrun:RuntimeParameters

• Init parametersspecifyhowaresourceiscreated,e.g.thelocationofadocumenttoload

• Runtimeparametersconfigurewhataprocessingresourcedoes,e.g.ifsomeprocessingiscase-sensitiveornot.

Page 39: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Loadinganexistingdocument

• GATEcanreadandloaddocumentsinmanyformats:e.g.plaintext,HTML,XML,PDF,Word,CoNLL ,CSV,JSON

• GATEcanloaddocumentsfromfilesandfromURLs• Whenadocumentisloaded,itgetsconvertedtoGATEinternal

formatasdocumenttext+annotations

Page 40: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Loadingadocument

• Toloadadocument:- rightclickonLanguageResources→“New→GATEDocument”OR- Filemenu→ NewLanguageResource→GATEDocument

• UsethesourceURL parametertospecifythedocumenttobeloaded:- typethefilenameorURL,or- clickthefilebrowsericontonavigatetothecorrectdocument

• Loadafilefromyourhands-onmaterials:corpora→news-texts→ft-airlines-27-jul-2001.xml

• Loadawebpage,e.g.http://news.bbc.co.uk

Page 41: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Documentviewer

Documentviewerbuttons

Document

Highlighted tab is the resource currently being viewed

Page 42: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Annotations

• AnnotationsarecentraltoGATE• Annotationsrepresentaspectsofthetextyouwanttoanalyze:

words,sentences,Dates,PersonNames• Annotationsarenamedbytheirtype,e.g.“Person”• Annotationconsistsof

• Annotationtype• startandendoffsets• setoffeatures,eachfeatureisanarbitraryname/valuepair,e.g.

gender=male

Page 43: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

AnnotationSets

• Annotationsaregroupedintosets• Eachsetcancontainanynumberofannotationsofanytype• Youcancreateandorganizeyourannotationsetsasyouwish.• Predefinedsets

• Defaultset(emptyname):cannotbedeleted• “Originalmarkups”:annotationsfromthemarkupsinthefile• “Key”:byconvention,usedforgoldstandardannotations

• Clickthe“AnnotationSets”buttoninthedocumentviewerfortheft-airlinesdocumentyouloaded

Page 44: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

AnnotationSets

Defaultannotationset

Original markupsannotation set

Annotation types

DocumentViewerButtons

Tabs

Page 45: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Viewingannotations

• ClickingontheAnnotationSetsbuttonopensanewpaneontherighthandsideinsidethedocumentview(AnnotationSetsview)

• Default(unnamed)setcontainssomeexamplesofannotations• ClickontheAnnotationSetname(eg Key)todisplaytheannotation

typesbelongingtothatset• YoushouldseetypessuchasLocation,Date,Personetc.• Clickthecheckboxforanannotationtypetoviewallthe

annotationsofthattypeinthedocument

Page 46: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Acloserlookattheannotations

• ClicktheAnnotationsListbuttonfromthemenuabovetheDisplaypane• Tableshowsannotationtype,annotationset,offsets,annotationid,and

features(forallselectedannotations)• Selectarowinthetabletohighlighttheannotationinthetext• TherearealsootherannotationviewspossiblesuchastheAnnotation

StackandCoreference Editor• TrytheAnnotationStackview

Page 47: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Annotations

Date annotation

Annotations table

Page 48: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Editingexistingannotations• SelectanannotationtypefromtheAnnotationSetsviewandhover

overahighlightedannotationinthetext• Apopupwindowdisplaysmoreinformationaboutit:thisisthe

annotationeditor• Clickthedrawingpinsymbolatthetopoftheeditor.Thiswill“pin” the

windowopen(youcanstillmovethewindowaroundonyourscreenifyouwish)

• Tryeditingtheannotation:youcanchangetheannotationtype,featurenamesandvalues,thespanoftheannotation(clickingleftandrightarrowsatthetopofthebox)ordeletetheannotationoritsfeatures(redXs)

• ClosetheannotationeditorbyclickingtheXinthetoprightcorner,thenviewyoureditedannotationintheAnnotationList

Page 49: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Annotationeditor

annotation editorfeature name value

Annotation type

Page 50: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

CreatingaCorpus

• Acorpusisacollectionofdocuments.• Wetendtorunapplicationsonacorpusratherthanonadocument

itself• FirstclosethedocumentsyouhaveloadedinGATE,justsowedon’tget

confused(rightclickandClose)• Nowcreateanewemptycorpus• RightclickonLanguageResources→New→GATECorpus• Youcangivethecorpusaname,orusethedefaultone

Page 51: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

PopulatingaCorpus

• Sometimestherecouldbehundredsofdocumentsinacorpus.• Usingthepopulatefunctionmeansyoucanloadlotsofdocumentsinto

thecorpusinonego• RightclickonthenameofyournewcorpusintheResourcespaneand

select“Populate”• Selectthenameofthedirectorywithyourdocuments(hands-

on/corpora/news-texts)

• Allthedocumentswillbeloadedinonego• Viewadocumentorthecorpusbydoubleclickingonit

Page 52: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

ProcessingResources

• Processingresources(PRs)arethetoolsthatprocessandannotatetext(textprocessingalgorithms).

• An“application”or“pipeline”consistsofanynumberofPRs,runsequentiallyoveracorpusofdocuments

• ANNIEcontainsPRsfortokenisation,sentencesplitting,POStagging,NamedEntityrecognitionetc.

Page 53: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Applications

Page 54: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Here'sonewemadeearlier:ANNIE

• ANNIEisaready-madecollectionofPRsthatperformsInformationExtractiononunstructuredtext.

• ClicktheiconfromthetopGATEmenuORSelectFile→LoadANNIEsystem

• ViewtheANNIEapplicationbydoubleclickingonit• RunANNIEonyourcorpus(selectthecorpusnameandclick“Run

thisapplication”)

Page 55: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Runninganapplication

PRs selected in application (in order of their execution)

Corpus on which the application is executed

Runtime parameters of the selected PR

Execute the application

Page 56: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Viewingtheresults

• WhenamessageappearsinthebottomleftcornerofyourGATEwindowsayingsomethinglike“ANNIErunin1.3seconds”,theapplicationhasfinished.

• Doubleclickonthedocumenttoviewit• ViewtheannotationsbyselectingAnnotationSetsand

clickingonanyAnnotationtypesintheDefault(unnamed)set

• Ifyouwant,youcanviewtheannotationstableorstackviewtoo.

• Rememberthatnotalltheresultswillbeperfect!

Page 57: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Plugins

• ApluginisacollectionofPRs,andotherresourcesbundledtogether.• EverythingneededforIEinANNIEisintheANNIEplugin.• EverythingneededforIEinFrenchisinthelang_french plugin.

• AnapplicationcanusePRsfromoneormoredifferentplugins.• InordertousePRs,youneedtoloadtherelevantplugin(s)• PluginsareloadedviathePluginManager(greenjigsawpieceicon)

Page 58: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Plugins

• ClicktheicononthetopGATEmenutoopenthePluginManager[orgoviaFile →ManageCREOLEPlugins]

Page 59: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Plugins

List of available pluginsResources in the selected pluginLoad the

plugin for this session only

Load the plugin everytime GATE starts

Apply all the settings

Close the plugins manager

Page 60: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Plugins

• Clickonaplugintosee(ontheRHS)thenamesoftheresourcesitcontains.Havealookatafew.

• NowloadtheToolspluginbycheckingtherelevant“LoadNow” boxforit

• Click“ApplyAll” toloadtheplugin• Click“Close”• RightclickonProcessingResourcestoseewhichnewPRsare

nowavailable

Page 61: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

AddinganewPR

• Let'saddaVerbPhraseChunker PRtoANNIE.• First,wehavetoloadthepluginthatcontainsit,andthen

loadthePRintoGATE,beforewecanaddittotheapplication.

• Ifyouwerelookingclosely,you’llhavenoticedthattheToolspluginyoujustloadedcontainstheANNIEVPChunker.

• RightclickonProcessingResourcesandselect“New”→“ANNIEVPChunker”

• Leaveallthedefaultparameterssetandclick“OK”

Page 62: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

AddinganewPR(2)

• NowweneedtoaddthenewPRtotheapplication.• DoubleclickonANNIE.• You'llseetheANNIEVPChunker isinthelistofloadedPRs.This

meansit'savailableinGATE,butisn'tyetcontainedintheapplication.

• Addittotheapplicationbyselectingitandusingtherightarrowtotransferit.

• Nowusetheuparrowtomoveittotherightplaceintheapplication.Itshouldgoafter(below)thePOStaggerbutbefore(above)theNEtransducer.

• Runtheapplicationandviewtheresultsonthedocument.• Youshouldseeanewannotationtype“VG”.

Page 63: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

IEforotherlanguages

• Youcantryoutotherapplicationsintwoways• Loadapluginthatcontainsaready-madeapplication• ClickApplications->Readymadeapplications• Loadsomedocumentsandruntheapplication• Youcanalsojustloadablankdocumentandtypesometextinit• Ifyoudothis,youneedtorightclickonthedocumentandselect

“Newcorpuswiththisdocument”first• Runthenewapplicationonyourcorpus

Page 64: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

NERinFrench

Page 65: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

NERinArabic

Page 66: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Hands-onwithTwitIE

l TwitIE isaversionofANNIEthat’sbeenretrainedfortweetsl LoadtheTWITIEplugin(greenjigsawicon)l Nowright-clickon“Applications”,select“Ready-madeapplications”and

“TwitIE”l Createanewcorpus,nameit“Tweets”l Right-clickonthecorpusandselect“populatefromTwitterJSON”,

selectingthefilehands-on-materials/corpora/energy-tweets.jsonl Onceloaded,doubleclickonTwitIE toopenit,andthenselect“Runthis

application”(makesurethetweetscorpusischosen)l Lookatthedifferentannotationsinthedefaultannotationsetl ToseeTokensinhashtags,usetheAnnotationStackview

Page 67: Practical NLP and Information Extraction with GATE · Practical NLP and Information Extraction with GATE Dr.Diana Maynard University of Sheffield, UK. What is text mining? • Text

Analysing tweets