thesis defense mini-ontology generator (mogo) mini-ontology generation from canonicalized tables...
Post on 20-Dec-2015
219 Views
Preview:
TRANSCRIPT
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Mini-Ontology Generation from Canonicalized Tables
Stephen LynnData Extraction Research GroupDepartment of Computer ScienceBrigham Young University
Supported by the
Thesis DefenseMini-Ontology GeneratOr (MOGO)
TANGO Overview
1. Transform tables into a canonicalized form
2. Generate mini-ontologies
3. Merge into a growing ontology
TANGO: Table ANalysis for Generating Ontologies
Project consists of the following three components:
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Sample Input
Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Sample Output
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Mini-Ontology GeneratOr (MOGO)
Concept/Value Recognition Relationship Discovery Constraint Discovery
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Concept/Value Recognition Lexical Clues
Labels as data values Data value assignment
Data Frame Clues Labels as data values Data value assignment
Default Classifies any unclassified
elements according to simple heuristic.
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Concepts and Value Assignments
NortheastNorthwest
DelawareMaineOregonWashington
Location Population Latitude Longitude
2,122,869817,3761,305,4939,690,6653,559,5476,131,118
45444543
-90-93-120-120
Region State
Year
20022003
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Relationship Discovery Dimension Tree Mappings Lexical Clues
Generalization/Specialization Aggregation
Data Frames Ontology Fragment Merge
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Location
Northeast Northwest
Maine WashingtonOregonDelaware
[Dimension2]
LongitudeLatitudePopulation
2,122,869 -120817,376
Title: Region and State Information
2000
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Constraint Discovery Generalization/Specialization Computed Values Functional Relationships Optional Participation
Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Validation Concept/Value Recognition
Correctly identified concepts Missed concepts False positives Data values assignment
Relationship Discovery Valid relationship sets Invalid relationship sets Missed relationship sets
Constraint Discovery Valid constraints Invalid constraints Missed constraints
Precision Recall F-measure
Concept Recognition
87% 94% 90%
Relationship Discovery
73% 81% 77%
Constraint Discovery
89% 91% 90%
FoundIncorrectTotalCorrectActual
FoundCorrectTotalprecision
___
__
CorrectActual
FoundCorrectTotalrecall
_
__
precisionrecall
precisionrecallmeasureF
**2
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Concept Recognition What we counted:
Correct/Incorrect/Missing Concepts
Correct/Incorrect/Missing Labels
Data value assignments
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Relationship Discovery What we counted:
Correct/incorrect/missing relationship sets
Correct/incorrect/missing aggregations and generalization/specializations
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Constraint Discovery What we counted:
Correct/Incorrect/Missing: Generalization/Specialization
constraints Computed value constraints Functional constraints Optional constraints
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Concept Recognition Successes
98% of concepts identifiedMissing label identification97% of values assigned to
correct concept
Common problemsFinding an appropriate labelDuplicate concepts
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Relationship Discovery Recall of 92% for relationship sets Missing aggregations and
generalizations/specializationsOnly found in label nesting
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Constraint Discovery F-measure of 98% for functional relationship sets Poor computed value discovery
Rows/Columns with totals
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Conclusions
Tool to generate mini-ontologies Assessment of accuracy of automatic generation
Precision Recall F-measure
Concept Recognition
87% 94% 90%
Relationship Discovery
73% 81% 77%
Constraint Discovery
89% 91% 90%
Thesis DefenseMini-Ontology GeneratOr (MOGO)
Future Work Tool Enhancements
Linguistic processingData frame libraryDomain specific heuristics
Alternate UsesAnnotation for the Semantic Web
top related