evaluating semantic metadata without the presence of a gold standard

32
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute, The Open University {y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk

Upload: tiffany-leda

Post on 01-Jan-2016

16 views

Category:

Documents


2 download

DESCRIPTION

Evaluating Semantic Metadata without the Presence of a Gold Standard. Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute, The Open University {y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk. Focuses. - PowerPoint PPT Presentation

TRANSCRIPT

Evaluating Semantic Metadata without the

Presence of a Gold StandardYuangui Lei, Andriy Nikolov, Victoria Uren, Enrico

Motta

Knowledge Media Institute,The Open University

{y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk

Focuses

• A quality model which characterizes quality problems in semantic metadata

• An automatic detection algorithm

• Experiments

Ontology

Metadata

Data

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>

Semantic Metadata Generation

Semantic Metadata Acquisition

Semantic Metadata Repositories

Semantic Metadata Generation

Semantic Metadata Acquisition

Semantic Metadata Repositories

A number of problems can happen that decrease the quality of metadata

Quality Evaluation

• Metadata providers: ensuring high quality

• Users: facilitate assessing the trustworthiness

• Applications: filtering out poor quality data

Our Quality Evaluation Framework

• A quality model

• Assessment metrics

• An automatic evaluation algorithm

The Quality Model

Real World

Semantic Metadata

OntologiesData Sources

Modelling

InstantiatingAnnotating

Representing

Describing

Quality Problems

(a) Incomplete Annotation

Data Objects Semantic Entities

Quality Problems

(a) Incomplete Annotation (b) Duplicate Annotation

Quality Problems

(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation

Quality Problems

(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation

(d) Spurious Annotation

Quality Problems

(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation

(d) Spurious Annotation (e) Inaccurate Annotation

Quality Problems

(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation

(d) Spurious Annotation (e) Inaccurate Annotation

Semantic metadata

I1

I2

I3

R1 R2

Class

C1

C2

C3

I4

R2

(f) Inconsistent Annotation

Current Support for Evaluation

• Gold standard based:– Examples: Gate[1], LA[2], BDM[3]

• Feature: assessing the performance of information extraction techniques used.

• Not suitable for evaluating semantic metadata– Gold standard annotations are often not

available

The Semantic Metadata Acquisition Scenario

KMi News Stories Information

Extraction Engine

(ESpotter)

Semantic Data Transformation

Engine

Departmental Databases

Raw Metadat

a

High Quality

Metadata

Evaluation

• Evaluation needs to take place dynamically whenever a new entry is generated.

• In such context, gold standard is NOT available.

Our Approach

• Using available knowledge instead of asking for gold standard annotations– Knowledge sources specific for the domain:

• Domain ontologies, data repositories, domain specific lexicons

– Knowledge available at background• Semantic Web, Web, and general lexicon resources

• Advantages:– Making possible for automatic operation – Making possible for large scale data evaluation

Using Domain Knowledge

1. Domain OntologiesConstraints and restrictions Inconsistent Problems

Example: one person classified as both KMi-Member and None-KMi-Member when they are disjoint classes.

Using Domain Knowledge

1. Domain OntologiesConstraints and restrictions Inconsistent Annotations

2. Domain LexiconsLexicon – instance mappings

Duplicate Annotations

Example: when OU and Open-University both appear as values of the same property of the same instance

Using Domain Knowledge

1. Domain OntologiesConstraints and restrictions Inconsistent Annotations

2. Domain LexiconsLexicon – instance mappings

Duplicate Annotations

3. Domain Data Repositories

Ambiguous Annotations

Inaccurate Annotations

• When nothing can be found in the domain knowledge, the data can be:– Correct but outside the domain (e.g., IBM in

the KMi domain)– Inaccurate annotation: mis-classification

(e.g., Sun Micro-systems as a person)– Spurious (e.g., workshop chair as an

organization)

• Background knowledge is then used to further investigate the problems

The Overall Picture

WebSemantic

Web

Background Knowledge

Domain Knowledge

Metadata Evaluation Results

Ontologies

Lexical Resources

WordNet

Web

PANKOWWATSON

Semantic Web

SemSearch

Step1: Using domain knowledge

Step2: Using background knowledge

Evaluation Engine

Pellet + Reiter

(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation

(d) Spurious Annotation (e) Inaccurate Annotation

Semantic metadata

I1

I2

I3

R1 R2

Class

C1

C2

C3

I4

R2

(f) Inconsistent Annotation

Addressed Quality Problems

Experiments

• Data settings: gathered in our previous work [4] in KMi semantic web portal– Randomly chose 36 news stories from the KMi news

archive– Collected a metadata set by using ASDI– Constructed a gold standard annotation

• Method:– A gold standard based evaluation as a comparison

base line– Evaluating the data set using domain knowledge only– Evaluating the data set using both domain knowledge

and background knowledge

A number of entities are not contained in the problem domain

Background knowledge is useful in data evaluation

Discussion

• The performance of such an approach largely depends on:– A good domain specific knowledge

source– A good publicity of the entities that

are contained in the data set, otherwise there would be lots of false alarms.

References

1. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL02), 2002.

2. P. Cimiano, S. Staab, and J. Tane. Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, pages 10 – 17, 2003.

3. D. Maynard, W. Peters, and Y. Li. Metrics for Evaluation of Ontology-based Information Extraction. In Proceedings of the 4th International Workshop on Evaluation of Ontologies on the Web, Edinburgh, UK, May 2006.

4. Y. Lei, M. Sabou, V. Lopez, J. Zhu, V. S. Uren, and E. Motta. An Infrastructure for Acquiring High Quality Semantic Metadata. In Proceedings of the 3rd European Semantic Web Conference, 2006.