survey of semantic annotation platforms

20
Survey of Semantic Survey of Semantic Annotation Annotation Platforms Platforms Lawrence Reeve Lawrence Reeve Hyoil Han Hyoil Han SAC 2005

Upload: rich

Post on 25-Feb-2016

95 views

Category:

Documents


5 download

DESCRIPTION

SAC 2005. Survey of Semantic Annotation Platforms. Lawrence Reeve Hyoil Han. Semantic Annotation. Creating semantic labels within documents for the Semantic Web Used to support: Advanced searching (e.g. concept) Information Visualization (using ontology) Reasoning about Web resources - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Survey of Semantic     Annotation Platforms

Survey of Semantic Survey of Semantic

Annotation Annotation PlatformsPlatforms

Lawrence ReeveLawrence ReeveHyoil HanHyoil Han

SAC 2005

Page 2: Survey of Semantic     Annotation Platforms

Semantic AnnotationSemantic Annotation Creating semantic labels within Creating semantic labels within

documents for the Semantic Webdocuments for the Semantic Web

Used to support:Used to support: Advanced searching (e.g. concept)Advanced searching (e.g. concept) Information Visualization (using ontology)Information Visualization (using ontology) Reasoning about Web resourcesReasoning about Web resources

Converting syntactic structures into Converting syntactic structures into knowledge structures (humanknowledge structures (humanmachine)machine)

Page 3: Survey of Semantic     Annotation Platforms

Semantic Annotation Semantic Annotation ProcessProcess

Page 4: Survey of Semantic     Annotation Platforms

Semantic Annotation Semantic Annotation ConcernsConcerns

Scale, VolumeScale, Volume Existing & new documents on the WebExisting & new documents on the Web Manual annotationManual annotation

Expensive – economic, timeExpensive – economic, time Subject to personal motivationSubject to personal motivation Schema ComplexitySchema Complexity

StorageStorage support for multiple ontologiessupport for multiple ontologies within or external to source document?within or external to source document? Knowledge base refinementKnowledge base refinement

Access - How are annotations accessed?Access - How are annotations accessed? API, custom UI, plug-insAPI, custom UI, plug-ins

Page 5: Survey of Semantic     Annotation Platforms

Why semantic annotation platforms Why semantic annotation platforms (‘SAPs’)?(‘SAPs’)? Reduces human involvementReduces human involvement Consistent application of ontologiesConsistent application of ontologies Reduced cost – economic & timeReduced cost – economic & time ScalabilityScalability Multiple ontologies for single documentMultiple ontologies for single document

Semantic Annotation Semantic Annotation PlatformsPlatforms

Page 6: Survey of Semantic     Annotation Platforms

Semantic Annotation Semantic Annotation PlatformsPlatforms

CharacteristicsCharacteristics Provide many services, not just Provide many services, not just

annotationannotation Storage: ontology, KB, and annotationStorage: ontology, KB, and annotation Access APIs (query annotations)Access APIs (query annotations) Integrate information extraction methodsIntegrate information extraction methods

Support for IE (gazetteers)Support for IE (gazetteers)

ExtensibleExtensible

Page 7: Survey of Semantic     Annotation Platforms

SAP General SAP General ArchitectureArchitecture

Page 8: Survey of Semantic     Annotation Platforms

SAP ClassificationSAP Classification

Page 9: Survey of Semantic     Annotation Platforms

SAP ClassificationSAP Classification Pattern-basedPattern-based

Pattern-discovery Pattern-discovery Iterative learningIterative learning

provide initial seed setprovide initial seed set find new entities find new entities find new patterns find new patterns repeatrepeat

RulesRules Manually define rules to find entities in textManually define rules to find entities in text Simple label matchingSimple label matching

Page 10: Survey of Semantic     Annotation Platforms

SAP ClassificationSAP Classification Machine-learning basedMachine-learning based

Wrapper InductionWrapper Induction LPLP22

Uses structural and linguistic informationUses structural and linguistic information Produces tagging & correction rules as outputProduces tagging & correction rules as output

Statistical modelsStatistical models Hidden Markov ModelHidden Markov Model

Page 11: Survey of Semantic     Annotation Platforms

SAP ClassificationSAP Classification MultistrategyMultistrategy

Combine pattern and machine-learning Combine pattern and machine-learning approachesapproaches

Did not find a platform that implements Did not find a platform that implements this approachthis approach

Platform extensibility important for Platform extensibility important for implementationimplementation

Page 12: Survey of Semantic     Annotation Platforms

Semantic Annotation Semantic Annotation PlatformsPlatforms

SelectionSelection Idea is to get a representative sample of Idea is to get a representative sample of

platforms using various information platforms using various information extraction techniquesextraction techniques

System needed to be a platform offering System needed to be a platform offering services, not just algorithmservices, not just algorithm

Page 13: Survey of Semantic     Annotation Platforms

Semantic Annotation Semantic Annotation PlatformsPlatforms

Page 14: Survey of Semantic     Annotation Platforms

Language ToolkitsLanguage Toolkits

GATE – language processing systemGATE – language processing system Component architecture, SDK, IDEComponent architecture, SDK, IDE

ANNIE (‘A Nearly-New IE system’)ANNIE (‘A Nearly-New IE system’) tokenizer, gazetteer, POS tagger, sentence splitter, etctokenizer, gazetteer, POS tagger, sentence splitter, etc

JAPE – Java Annotations Pattern EngineJAPE – Java Annotations Pattern Engine provides regular-expression based pattern/action rules provides regular-expression based pattern/action rules

AmilcareAmilcare adaptive IE system designed for document annotationadaptive IE system designed for document annotation based on LPbased on LP22

uses ANNIE uses ANNIE

Page 15: Survey of Semantic     Annotation Platforms

KIM (2003)KIM (2003) ontology, kb, semantic ontology, kb, semantic

annotation, indexing and annotation, indexing and retrieval server, front-retrieval server, front-ends (Web UI, IE plug-in)ends (Web UI, IE plug-in)

KIMO ontologyKIMO ontology 250 classes, 100 250 classes, 100

propertiesproperties

80,000 entities from 80,000 entities from general news corpus in general news corpus in KBKB

(plus >100,000 aliases)(plus >100,000 aliases)

IE IE Uses GATE, JAPEUses GATE, JAPE Gazetteers (from KB)Gazetteers (from KB)

Source: http://www.ontotext.com/kim/SemWebIE.pdf

Page 16: Survey of Semantic     Annotation Platforms

Ont-O-Mat (2002)Ont-O-Mat (2002) Uses AmilcareUses Amilcare

Wrapper induction Wrapper induction (LP(LP22))

ExtensibleExtensible Adapted in 2004 for Adapted in 2004 for

PANKOW algorithmPANKOW algorithm

Disambiguation by Disambiguation by maximal evidencemaximal evidence

Proper nouns + Proper nouns + ontology ontology linguistic phraseslinguistic phrases

Source: http://www.aifb.uni-karlsruhe.de/WBS/sha/papers/kcap2001-annotate-sub.pdf

Page 17: Survey of Semantic     Annotation Platforms

MUSE (2003)MUSE (2003) Pipeline of processing Pipeline of processing

resources (PRs)resources (PRs) PRs called conditionally PRs called conditionally

based on text attributesbased on text attributes

Makes use of JAPE Makes use of JAPE Adaptive rulesAdaptive rules

Can link multiple Can link multiple resources togetherresources together

Gazetteer + part-of-Gazetteer + part-of-speech taggerspeech tagger

Resolve entity Resolve entity ambiguitiesambiguities

Source: http://gate.ac.uk/sale/expertupdate/muse.pdf

Page 18: Survey of Semantic     Annotation Platforms

SemTag (2003)SemTag (2003) Large-scale annotationLarge-scale annotation

Annotations separate from Annotations separate from sourcesource

““Semantic Label Bureau”Semantic Label Bureau”

Uses the TAP taxonomyUses the TAP taxonomy

Approach is:Approach is: Find match to label in taxonomyFind match to label in taxonomy

Save window before & after Save window before & after matchmatch

Perform disambiguationPerform disambiguation

Main contribution is using Main contribution is using taxonomy for disambiguationtaxonomy for disambiguation

Source: http://www.almaden.ibm.com/webfountain/resources/semtag.pdf

Page 19: Survey of Semantic     Annotation Platforms

Platform EffectivenessPlatform Effectiveness

*as reported by platform authors

Page 20: Survey of Semantic     Annotation Platforms

SummarySummary Several platforms developed in last several yearsSeveral platforms developed in last several years

Large implementation effort; many servicesLarge implementation effort; many services

Differentiated by Differentiated by IE methods usedIE methods used Services providedServices provided

Future Future IE integration will likely improve annotation accuracyIE integration will likely improve annotation accuracy Extension of existing platforms will allow for quicker Extension of existing platforms will allow for quicker

researchresearch