semantically conceptualizing and annotating tables
Post on 12-Jan-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
ASWC’08Semantically Conceptualizing and Annotating Tables
Semantically Conceptualizing and Annotating Tables
Stephen Lynn & David W. EmbleyData Extraction Research GroupDepartment of Computer ScienceBrigham Young University
Supported by the
ASWC’08Semantically Conceptualizing and Annotating Tables
Overview
Context WoK: Web of KnowledgeTANGO: Table ANalysis for Generating OntologiesMOGO: Mini-Ontology GeneratOr
Semantic Enrichment via MOGO ImplementationExperimentationEnhancements
Challenges & Opportunities
ASWC’08Semantically Conceptualizing and Annotating Tables
WoK: a Web of Knowledge
ASWC’08Semantically Conceptualizing and Annotating Tables
TANGO
fleck velter
gonsity (ld/gg)
hepth(gd)
burlam 1.2 120
falder 2.3 230
multon 2.5 400
velter
hepth
gonosity
fleck1
has1:*
1has 1:*
velter
hepth
gonosity
fleck1
has1:*
1has 1:*
TANGO repeatedly turns raw tables into conceptual mini-ontologies and integrates them into a growing ontology.
GrowingOntology
ASWC’08Semantically Conceptualizing and Annotating Tables
MOGO
fleck velter
gonsity (ld/gg)
hepth(gd)
burlam 1.2 120
falder 2.3 230
multon 2.5 400
velter
hepth
gonosity
fleck1
has1:*
1has 1:*
velter
hepth
gonosity
fleck1
has1:*
1has 1:*
TANGO repeatedly turns raw tables into conceptual mini-ontologies and integrates them into a growing ontology.
GrowingOntology
MOGO generates mini-ontologiesfrom interpreted tables.
ASWC’08Semantically Conceptualizing and Annotating Tables
MOGO Overview Table
InterpretationYields a canonical table
Canonical TableConcept/Value RecognitionRelationship DiscoveryConstraint DiscoveryYields a semantically enriched conceptual model
Mini-ontology Integration into a growing ontology
MOGO
ASWC’08Semantically Conceptualizing and Annotating Tables
Sample Input
Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Sample Output
ASWC’08Semantically Conceptualizing and Annotating Tables
Concept/Value Recognition Lexical Clues
Labels as data values Data value assignment
Data Frame Clues Labels as data values Data value assignment
Default Recognize concepts and
values by syntax and layout
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
ASWC’08Semantically Conceptualizing and Annotating Tables
Concept/Value Recognition Lexical Clues
Labels as data values Data value assignment
Data Frame Clues Labels as data values Data value assignment
Default Recognize concepts and
values by syntax and layout
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Concepts and Value Assignments
NortheastNorthwest
DelawareMaineOregonWashington
Location Region State
ASWC’08Semantically Conceptualizing and Annotating Tables
Concept/Value Recognition Lexical Clues
Labels as data values Data value assignment
Data Frame Clues Labels as data values Data value assignment
Default Recognize concepts and
values by syntax and layout
Population Latitude Longitude
2,122,869817,3761,305,4939,690,6653,559,5476,131,118
45444543
-90-93-120-120
Year
20022003
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Concepts and Value Assignments
NortheastNorthwest
DelawareMaineOregonWashington
Location Region State
ASWC’08Semantically Conceptualizing and Annotating Tables
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Relationship Discovery Dimension Tree Mappings Lexical Clues
Generalization/Specialization Aggregation
Data Frames Ontology Fragment Merge
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
2000
ASWC’08Semantically Conceptualizing and Annotating Tables
Relationship Discovery Dimension Tree Mappings Lexical Clues
Generalization/Specialization Aggregation
Data Frames Ontology Fragment Merge
ASWC’08Semantically Conceptualizing and Annotating Tables
Constraint Discovery Generalization/Specialization Computed Values Functional Relationships Optional Participation
Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120
ASWC’08Semantically Conceptualizing and Annotating Tables
Validation Concept/Value Recognition
Correctly identified concepts Missed concepts False positives Data values assignment
Relationship Discovery Valid relationship sets Invalid relationship sets Missed relationship sets
Constraint Discovery Valid constraints Invalid constraints Missed constraints
Precision Recall F-measure
Concept Recognition
87% 94% 90%
Relationship Discovery
73% 81% 77%
Constraint Discovery
89% 91% 90%
FoundIncorrectTotalCorrectActual
FoundCorrectTotalprecision
___
__
CorrectActual
FoundCorrectTotalrecall
_
__
precisionrecall
precisionrecallmeasureF
**2
ASWC’08Semantically Conceptualizing and Annotating Tables
Concept Recognition Counted:
Correct/Incorrect/Missing Concepts
Correct/Incorrect/Missing Labels
Data value assignments
ASWC’08Semantically Conceptualizing and Annotating Tables
Relationship Discovery Counted:
Correct/incorrect/missing relationship sets
Correct/incorrect/missing aggregations and generalization/specializations
ASWC’08Semantically Conceptualizing and Annotating Tables
Constraint Discovery Counted:
Correct/Incorrect/Missing: Generalization/Specialization
constraints Computed value constraints Functional constraints Optional constraints
ASWC’08Semantically Conceptualizing and Annotating Tables
Concept Recognition Successes
98% of concepts identifiedMissing label identification97% of values assigned to
correct concept
Common problemsFinding an appropriate labelDuplicate concepts
ASWC’08Semantically Conceptualizing and Annotating Tables
Relationship Discovery Recall of 92% for relationship sets Missing aggregations and gen./spec.’s (only found in
label nesting) Unnecessary rel. sets generated (are computable)
ASWC’08Semantically Conceptualizing and Annotating Tables
Constraint Discovery
F-measure of 98% for functional relationship sets Computed value discovery Funtional/non-functional lists in cells
ASWC’08Semantically Conceptualizing and Annotating Tables
MOGO Contributions
Tool to generate mini-ontologies Accuracy encouraging
Precision Recall F-measure
Concept Recognition
87% 94% 90%
Relationship Discovery
73% 81% 77%
Constraint Discovery
89% 91% 90%
ASWC’08Semantically Conceptualizing and Annotating Tables
Opportunities & Challenges: MOGO Enhancements
Check for inter-label relationshipsCheck for more complex computationsCheck for lists in cells…
Wish ListData-frame library
Atomic knowledge components Instance recognizers
Library of molecular componentsSemi-automatic construction of a WordNet-like resource for
knowledge components
ASWC’08Semantically Conceptualizing and Annotating Tables
Summary MOGO
Semantic EnrichmentEncouraging ResultsBut More Possible
Broader Implications ~ Vision & ChallengesTANGOWoK
Web of Data Semantic Annotation User-friendly Query Answering
www.deg.byu.eduembley@cs.byu.edu
ASWC’08Semantically Conceptualizing and Annotating Tables
Opportunities & Challenges: TANGO Table Interpretation
Transforming tables to F-logic [Pivk07]Layout-independent table representation [Jha08]Table interpretation by sibling tables [Tao07]
Semantic Enhancement / Ontology GenerationNaming unnamed table concepts [Pivk07]MOGO [Lynn09]
Semi-automatic Ontology IntegrationOntology Matching [Euzenat07]Ontology-mapping tools [Falconer07]Direct and indirect schema mappings for TANGO [Xu06]
ASWC’08Semantically Conceptualizing and Annotating Tables
Opportunities & Challenges: WoK
Web of Data “The Semantic Web is a web of data.” [W3C]Upcoming special issue of Journal of Web Semantics “Enabling a Web of Knowledge” [Tao09]
Information ExtractionDomain-independent IE from web tables [Gatterbauer07]Open IE [Banko07]
…
ASWC’08Semantically Conceptualizing and Annotating Tables
Opportunities & Challenges: WoK … Semantic Annotation wrt Ontologies
Linking Data to Ontologies [Poggi08]TISP [Tao07]FOCIH [Tao09]
Reasoning & Query AnsweringDescription Logics [Baadar03]NLIDB CommunityAskOntos [Ding06]SerFR [Al-Muhammed07]
ASWC’08Semantically Conceptualizing and Annotating Tables
References [Al-Muhammed07] Al-Muhammed and Embley, “Ontology-Based Constraint Recognition for Free-Form Service
Requests”, Proceedings of the 23rd International Conference on Data Engineering, 2007. [Baader, Calvanese, McGuinness, Nardi and Patel-Schneider, The Description Logic Handbook, Cambridge
University Press, 2003. [Banko07] Banko, Cafarella, Soderland, Broadhead and Etzioni, “Open Information Extraction from the Web”,
Proceedings of the International Joint Conference on Artificial Intelligence, 2007. [Ding06] Ding, Embley and Liddle, “Automatic Creation and Simplified Querying of Semantic Web Content: An
Approach Based on Information-Extraction Ontologies”, Proceedings of the First Asian Semantic Web Conference, 2006.
[Euzenat07] Eusenat and Shvaiko, Ontology Matching, Springer Verlag, 2007. [Falconer07] Falconer, Noy and Storey, “Ontology Mapping—A User Survey”, Proceedings of the Second
International Workshop on Ontology Mapping, 2007. [Gatterbauer07] Gatterbauer, Bohunsky, Herzog and Pollak, “Towards Domain-Independent Information
Extraction from Web Tables”, Proceedings of the Sixteenth International World Wide Web Conference, 2007. [Jha07] Jha and Nagy, “Wang Notation Tool: Layout Independent Representation of Tables”, Proceedings of the
19th International Conference on Pattern Recognition, 2007. [Pivk07] Pivk, Sure, Cimiano, Gams, Rajkovič and Studer, “Transforming Arbitrary Tables into Logical Form with
TARTAR”, Data & Knowledge Engineering, 2007. [Poggi08] Poggi, Lembo, Calvanese, DeGiacomo, Lenzerini and Rosati, “Linking Data to Ontologies”, Journal on
Data Semantics, 2008. [Tao07] Tao and Embley, “Automatic Hidden-Web Table Interpretation by Sibling page Comparison”, Proceedings
of the 26th International Conference on Conceptual Modeling, 2007. [Tao09] Tao, Embley and Liddle, “Enabling a Web of Knowledge”, Technical Report : tango.byu.edu/papers, 2009. [Xu06] Xu and Embley, “A Composite Approach to Automating Direct and Indirect Schema Mappings”, Information
Systems, 2006.
top related