a synergistic semantic annotation model december 2007

22
BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding, http://www.deg.byu.edu/ding/

Upload: katy

Post on 22-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

A Synergistic Semantic Annotation Model December 2007. Yihong Ding, http://www.deg.byu.edu/ding/. Grand challenge: new generation World Wide Web. The current Web Enormous amount content Feasible for humans to read/write But … Content is simply too much to read The future Web - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Synergistic Semantic Annotation Model December 2007

BYU

A Synergistic Semantic Annotation Model

December 2007

Yihong Ding, http://www.deg.byu.edu/ding/

Page 2: A Synergistic Semantic Annotation Model December 2007

12/7/2007

2

BYUGrand challenge: new generation World Wide Web

The current Web

Enormous amount content

Feasible for humans to read/write

But …Content is simply too much to read

The future Web

Even more content but machine-processable

Feasible for humans and machines to read/write

Key issueConverting non-machine-processable content

to machine-processable content, i.e., semantic annotation

Page 3: A Synergistic Semantic Annotation Model December 2007

12/7/2007

3

BYUSemantic annotation, the general picture

Data Extraction/Instance RecognitionEngine

AptRental Ontology

Page 4: A Synergistic Semantic Annotation Model December 2007

12/7/2007

4

BYUSemantic annotation, the general picture

AptRental Ontology

Page 5: A Synergistic Semantic Annotation Model December 2007

12/7/2007

5

BYUOntology

Definition: Explicit, formal specifications of conceptualizations

Unique identity of each concept

Unique identity of each relationship among concepts

Logic derivation rules underneath every declared relationship

Annotation:

533-0293 is-a AptRental:ContactPhone

$1250 is-a AptRental:MonthlyRate

533-0293 is-about AptRentalAd-instance-1

$1250 is-about AptRentalAd-instance-1

Ontology:

AptRentalAd has ContactPhone

AptRentalAd has MonthlyRate

Logic derivation:

To rent the apartment that costs $1250 monthly please call 533-0293. (machine understanding)

Page 6: A Synergistic Semantic Annotation Model December 2007

12/7/2007

6

BYUAutomated semantic annotation, methods

Layout-driven method (e.g. [Mukherjee et. al. 03])

Machine-learning-based method (e.g. [Handschuh et. al. 02])

Rule-based method (e.g. [Dill et. al. 03])

NLP-based method (e.g. [Popov et. al. 03])

Ontology-based method (e.g. [Ding et. al. 06])

Page 7: A Synergistic Semantic Annotation Model December 2007

12/7/2007

7

BYUOntology-based annotation

Page 8: A Synergistic Semantic Annotation Model December 2007

12/7/2007

8

BYUData extraction ontology

Standard Ontology

BedroomNr

epistemological extension (instance recognizer)

CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views,

1700 sq ft. $1250 mo. Call 533-0293

BedroomNr

External representation

Context Phrase

Exception Phrase

X

Page 9: A Synergistic Semantic Annotation Model December 2007

12/7/2007

9

BYUOntology-based annotation

BedroomNr

External representation

Context Phrase

BathNr

External representation

Context Phrase

Feature

External representation

MonthRate

External representation

Context Phrase

ContactPhone

External representation

CAPITOL HILL Luxury 2 bdrm 2 bath, 2 grg, w/d,views,

1700 sq ft. $1250 mo. Call 533-0293

Context Keyword

Page 10: A Synergistic Semantic Annotation Model December 2007

12/7/2007

10

BYUOntology-based annotation: strength and weakness

Strengths

Ignore layout difference

Ignore layout change

Less maintenance once built

Weakness

Expensive to build instance recognizers

Page 11: A Synergistic Semantic Annotation Model December 2007

12/7/2007

11

BYULayout-driven annotation

Page 12: A Synergistic Semantic Annotation Model December 2007

12/7/2007

12

BYULayout-driven annotation

Page 13: A Synergistic Semantic Annotation Model December 2007

12/7/2007

13

BYULayout-driven annotation, strength and weakness

Strengths

Accurate

Simple and straightforward

Less domain knowledge requirement

Weakness

Expensive in layout-pattern maintenance

Page 14: A Synergistic Semantic Annotation Model December 2007

12/7/2007

14

BYUProblem

How to

overcome the weaknesses

but

retaining the strengths

at the same time?

Page 15: A Synergistic Semantic Annotation Model December 2007

12/7/2007

15

BYUObservation

Extraction Domain ontology

A Document

Conceptual Annotator

(ontology-based annotation)

Annotated Document

Layout Patterns

Structural Annotator

(layout-driven annotation)

Domain ontology

A Document

Annotated Document

accurate

resilient

Page 16: A Synergistic Semantic Annotation Model December 2007

12/7/2007

16

BYUSynergistic model

Extraction Domain ontology

A Document

Conceptual Annotator

(ontology-based annotation)

Annotated Document Pattern

Generation

Layout Patterns

Structural Annotator

(layout-driven annotation)

Annotated Document

Instance Recognizer Enrichment

Page 17: A Synergistic Semantic Annotation Model December 2007

12/7/2007

17

BYUPattern Generation

Get the annotated outputs from ontology-based annotator

Apply HTML-structure analysis and produce a typical layout pattern for each extracted field

If applicable, produce a sequential dependency between the generated layouts

If applicable, produce simple heuristic rules such as “if A then B” between the generated layouts

Page 18: A Synergistic Semantic Annotation Model December 2007

12/7/2007

18

BYUInstance recognizer enrichment

Get the annotated outputs from layout-driven annotator

Apply the results to the current corresponding instance recognizers

If recognized, continue;

Otherwise,

if dictionary-type recognizers, insert.

if regular-expression-type recognizers, try to generate a new regular expression and alert the user to check

Page 19: A Synergistic Semantic Annotation Model December 2007

12/7/2007

19

BYUPreliminary results

Apartment Rental domain

Ontology-based annotation90% accuracy in average on both precision

and recall for nearly all fields

Except Location and Contact Name

Layout-driven annotationNearly 100% accuracy on both precision and

recall on Location and Contact Name

Less recall on fields such as BedroomNr

Pattern generationGreat on well structured fields such as

Location

Less successful on semi-structured fields such as BedroomNr

Instance recognizer enrichmentGood results even with poorly constructed

initial instance recognizers

Page 20: A Synergistic Semantic Annotation Model December 2007

12/7/2007

20

BYUSummary

Automatically produce layout patterns using outputs of ontology-based annotation

Automatically enrich domain-specific instance recognizers using outputs of layout-driven annotation

A new synergistic annotation model that retains original strengths and minimizes original weaknesses

An annotation system that self-improves its performance during its execution

Page 21: A Synergistic Semantic Annotation Model December 2007

12/7/2007

21

BYUFuture work

Dynamical tuning annotation based on user perspectives

Ensemble of various annotators

Collaborative annotation

Page 22: A Synergistic Semantic Annotation Model December 2007

12/7/2007

22

BYUThank you

Yihong Ding [email protected]

(801) 422-7604

2262 TMCB, Brigham Young University

Provo, UT 84601

Data Extraction Research Lab at Brigham Young University

http://www.deg.byu.edu

Homepage, my virtual home on Web 1.0

http://www.deg.byu.edu/ding/

Thinking Space, my virtual home on Web 2.0

http://yihongs-research.blogspot.com/