survey of semantic annotation platforms
DESCRIPTION
SAC 2005. Survey of Semantic Annotation Platforms. Lawrence Reeve Hyoil Han. Semantic Annotation. Creating semantic labels within documents for the Semantic Web Used to support: Advanced searching (e.g. concept) Information Visualization (using ontology) Reasoning about Web resources - PowerPoint PPT PresentationTRANSCRIPT
Survey of Semantic Survey of Semantic
Annotation Annotation PlatformsPlatforms
Lawrence ReeveLawrence ReeveHyoil HanHyoil Han
SAC 2005
Semantic AnnotationSemantic Annotation Creating semantic labels within Creating semantic labels within
documents for the Semantic Webdocuments for the Semantic Web
Used to support:Used to support: Advanced searching (e.g. concept)Advanced searching (e.g. concept) Information Visualization (using ontology)Information Visualization (using ontology) Reasoning about Web resourcesReasoning about Web resources
Converting syntactic structures into Converting syntactic structures into knowledge structures (humanknowledge structures (humanmachine)machine)
Semantic Annotation Semantic Annotation ProcessProcess
Semantic Annotation Semantic Annotation ConcernsConcerns
Scale, VolumeScale, Volume Existing & new documents on the WebExisting & new documents on the Web Manual annotationManual annotation
Expensive – economic, timeExpensive – economic, time Subject to personal motivationSubject to personal motivation Schema ComplexitySchema Complexity
StorageStorage support for multiple ontologiessupport for multiple ontologies within or external to source document?within or external to source document? Knowledge base refinementKnowledge base refinement
Access - How are annotations accessed?Access - How are annotations accessed? API, custom UI, plug-insAPI, custom UI, plug-ins
Why semantic annotation platforms Why semantic annotation platforms (‘SAPs’)?(‘SAPs’)? Reduces human involvementReduces human involvement Consistent application of ontologiesConsistent application of ontologies Reduced cost – economic & timeReduced cost – economic & time ScalabilityScalability Multiple ontologies for single documentMultiple ontologies for single document
Semantic Annotation Semantic Annotation PlatformsPlatforms
Semantic Annotation Semantic Annotation PlatformsPlatforms
CharacteristicsCharacteristics Provide many services, not just Provide many services, not just
annotationannotation Storage: ontology, KB, and annotationStorage: ontology, KB, and annotation Access APIs (query annotations)Access APIs (query annotations) Integrate information extraction methodsIntegrate information extraction methods
Support for IE (gazetteers)Support for IE (gazetteers)
ExtensibleExtensible
SAP General SAP General ArchitectureArchitecture
SAP ClassificationSAP Classification
SAP ClassificationSAP Classification Pattern-basedPattern-based
Pattern-discovery Pattern-discovery Iterative learningIterative learning
provide initial seed setprovide initial seed set find new entities find new entities find new patterns find new patterns repeatrepeat
RulesRules Manually define rules to find entities in textManually define rules to find entities in text Simple label matchingSimple label matching
SAP ClassificationSAP Classification Machine-learning basedMachine-learning based
Wrapper InductionWrapper Induction LPLP22
Uses structural and linguistic informationUses structural and linguistic information Produces tagging & correction rules as outputProduces tagging & correction rules as output
Statistical modelsStatistical models Hidden Markov ModelHidden Markov Model
SAP ClassificationSAP Classification MultistrategyMultistrategy
Combine pattern and machine-learning Combine pattern and machine-learning approachesapproaches
Did not find a platform that implements Did not find a platform that implements this approachthis approach
Platform extensibility important for Platform extensibility important for implementationimplementation
Semantic Annotation Semantic Annotation PlatformsPlatforms
SelectionSelection Idea is to get a representative sample of Idea is to get a representative sample of
platforms using various information platforms using various information extraction techniquesextraction techniques
System needed to be a platform offering System needed to be a platform offering services, not just algorithmservices, not just algorithm
Semantic Annotation Semantic Annotation PlatformsPlatforms
Language ToolkitsLanguage Toolkits
GATE – language processing systemGATE – language processing system Component architecture, SDK, IDEComponent architecture, SDK, IDE
ANNIE (‘A Nearly-New IE system’)ANNIE (‘A Nearly-New IE system’) tokenizer, gazetteer, POS tagger, sentence splitter, etctokenizer, gazetteer, POS tagger, sentence splitter, etc
JAPE – Java Annotations Pattern EngineJAPE – Java Annotations Pattern Engine provides regular-expression based pattern/action rules provides regular-expression based pattern/action rules
AmilcareAmilcare adaptive IE system designed for document annotationadaptive IE system designed for document annotation based on LPbased on LP22
uses ANNIE uses ANNIE
KIM (2003)KIM (2003) ontology, kb, semantic ontology, kb, semantic
annotation, indexing and annotation, indexing and retrieval server, front-retrieval server, front-ends (Web UI, IE plug-in)ends (Web UI, IE plug-in)
KIMO ontologyKIMO ontology 250 classes, 100 250 classes, 100
propertiesproperties
80,000 entities from 80,000 entities from general news corpus in general news corpus in KBKB
(plus >100,000 aliases)(plus >100,000 aliases)
IE IE Uses GATE, JAPEUses GATE, JAPE Gazetteers (from KB)Gazetteers (from KB)
Source: http://www.ontotext.com/kim/SemWebIE.pdf
Ont-O-Mat (2002)Ont-O-Mat (2002) Uses AmilcareUses Amilcare
Wrapper induction Wrapper induction (LP(LP22))
ExtensibleExtensible Adapted in 2004 for Adapted in 2004 for
PANKOW algorithmPANKOW algorithm
Disambiguation by Disambiguation by maximal evidencemaximal evidence
Proper nouns + Proper nouns + ontology ontology linguistic phraseslinguistic phrases
Source: http://www.aifb.uni-karlsruhe.de/WBS/sha/papers/kcap2001-annotate-sub.pdf
MUSE (2003)MUSE (2003) Pipeline of processing Pipeline of processing
resources (PRs)resources (PRs) PRs called conditionally PRs called conditionally
based on text attributesbased on text attributes
Makes use of JAPE Makes use of JAPE Adaptive rulesAdaptive rules
Can link multiple Can link multiple resources togetherresources together
Gazetteer + part-of-Gazetteer + part-of-speech taggerspeech tagger
Resolve entity Resolve entity ambiguitiesambiguities
Source: http://gate.ac.uk/sale/expertupdate/muse.pdf
SemTag (2003)SemTag (2003) Large-scale annotationLarge-scale annotation
Annotations separate from Annotations separate from sourcesource
““Semantic Label Bureau”Semantic Label Bureau”
Uses the TAP taxonomyUses the TAP taxonomy
Approach is:Approach is: Find match to label in taxonomyFind match to label in taxonomy
Save window before & after Save window before & after matchmatch
Perform disambiguationPerform disambiguation
Main contribution is using Main contribution is using taxonomy for disambiguationtaxonomy for disambiguation
Source: http://www.almaden.ibm.com/webfountain/resources/semtag.pdf
Platform EffectivenessPlatform Effectiveness
*as reported by platform authors
SummarySummary Several platforms developed in last several yearsSeveral platforms developed in last several years
Large implementation effort; many servicesLarge implementation effort; many services
Differentiated by Differentiated by IE methods usedIE methods used Services providedServices provided
Future Future IE integration will likely improve annotation accuracyIE integration will likely improve annotation accuracy Extension of existing platforms will allow for quicker Extension of existing platforms will allow for quicker
researchresearch