thesis proposal mini-ontology generator (mogo) mini-ontology generation from canonicalized tables...

10
Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the

Upload: lucinda-sullivan

Post on 24-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Mini-Ontology Generation from Canonicalized Tables

Stephen LynnData Extraction Research GroupDepartment of Computer ScienceBrigham Young University

Supported by the

Page 2: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

TANGO Overview

1. Transform tables into a canonicalized form

2. Generate mini-ontologies

3. Merge into a growing ontology

TANGO: Table ANalysis for Generating Ontologies

Project consists of the following three components:

Page 3: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Thesis Statement Proposed Solution

Develop a tool to accurately generate mini-ontologies from canonicalized tables of data automatically, semi-automatically, or manually.

EvaluationEvaluate accuracy of tool with respect to: concept/value

recognition, relationship discovery, and constraint discovery.

Page 4: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Sample Input

Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Sample Output

Page 5: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Mini-Ontology GeneratOr (MOGO)

Concept/Value Recognition Relationship Discovery Constraint Discovery

NOTE: MOGO implements a base set of algorithms for each step of the process and allows for runtime integration of new algorithms.

Page 6: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Concept/Value Recognition Lexical Clues

Data value assignment Labels as data values

Default Classifies any unclassified

elements according to simple heuristic.

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Concepts and Value Assignments

NortheastNorthwest

DelawareMaineOregonWashington

Population Latitude Longitude

2,122,869817,3761,305,4939,690,6653,559,5476,131,118

45444543

-90-93-120-120

Region State

Page 7: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Relationship Discovery Dimension Tree Mappings Lexical Clues

Generalization/Specialization Aggregation

Data Frames Ontology Fragment Merge

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Page 8: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Constraint Discovery Generalization/Specialization Computed Values Functional Relationships Optional Participation

Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120

Page 9: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Validation Concept/Value Recognition

Correctly identified concepts Missed concepts False positives Data values assignment

Relationship Discovery Valid relationship sets Invalid relationship sets Missed relationship sets

Constraint Discovery Valid constraints Invalid constraints Missed constraints

Precision Recall

Concept Recognition

Relationship Discovery

Constraint Discovery

FoundIncorrectTotalCorrectActual

FoundCorrectTotalprecision

___

__

CorrectActual

FoundCorrectTotalrecall

_

__

Page 10: Thesis Proposal Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis ProposalMini-Ontology GeneratOr (MOGO)

Contribution

Tool to generate mini-ontologies Assessment of accuracy of automatic generation