toward semantic web information extraction b. popov, a. kiryakov, d. manov, a. kirilov, d....

19
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Toward Semantic Web Information Extraction

B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov

Presenter: Yihong Ding

Page 2: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Toward a Semantic Web

Fully automatic methods for the semantic annotation are needed

Related topics Information retrieval (IR) Information extraction (IE) Name-entity recognition (NER) Annotation processes

Page 3: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Semantic Annotation Diagram

Page 4: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Name Entities

Named Entities (NE) people, organizations, locations, and others referred by

name.

May also include scalars and expressions numbers, amounts of money, dates, etc. (NUMEX,

TIMEX)

Hypothesis Named entities (and the relations between them)

mentioned in a resource constitute an important part of its semantics

Page 5: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Semantic Annotation of NEs

Semantic Annotation of the NEs in a text includes: Recognition of the type of the entities in the text Identification of the entity individual

Comparison the traditional NER approach results in:

<Person>Yihong Ding</Person> the Semantic Annotation of NEs should result in something

like the following: <BYUPerson ID=“http://..byu../YihongDing”>Yihong

Ding</BYUPerson>

Page 6: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

The KIM Platform

The Knowledge and Information Management Platform provides: Automatic Semantic Annotation of NEs (and

relations between them) Ontology Population with NE individuals and

relations Indexing and Retrieval w.r.t NEs Query and Navigation over the Formal Knowledge

Page 7: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM Constituents

KIM Ontology (KIMO)

KIM World KB

KIM Server – with API for remote access and integration

Front-ends: KIM Web UI, Plug-in for Internet Explorer, and KB Explorer

Page 8: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM Bases

KIM is based on the following open-source platforms: GATE – NLP and IE platform in

University of Sheffield Sesame – RDF(S) repository

Administrator b.v. Ontology Middleware and

Custom Inference by Ontotext as extensions of Sesame

Lucene – open source IR-engine from Apache

Page 9: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM Architecture

Page 10: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM Ontology (KIMO)

Light-weight upper-level ontology 250 NE classes 100 relations and

attributes: covers mostly NE

classes, and ignores general concepts

includes classes representing lexical resources

www.ontotext.com/KIM/kimo.rdfs

Page 11: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM World KB

A projection of the world (domain ontology) Quasi-exhaustive coverage of the most popular

entities in the world Entities of general importance – like the ones

that appear in the news At present KIM KB consists of about

200,000 entities: 50,000 locations, 130,000 organizations, 6000

people, etc.

Page 12: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Entity Description

NEs are represented in KIM World KB with their Semantic Descriptions consisting of… Aliases (Florida & FL) Relations with other

entities (Person hasPosition Position)

Attributes (latitude & longitude of geographic entities)

Proper class of the NE

Page 13: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM Server

APIs for: Semantic Annotation Document Persistence Indexing & Retrieval of documents w.r.t NEs Semantic Repository Access & Exploration

Page 14: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM Semantic Information Extraction

Based on GATE NLP IE platform Rules now based on ontology classes instead of a flat

set of NE types

Recognition and Identification of the NEs IE supported by a Semantic Repository

Containing lexical and gazetteer resources Annotations referring to Entity Descriptions

Ontology Population with the newly recognized entities & relations

Page 15: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM IE Pipeline

Page 16: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM Plug-in

Page 17: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

KIM IE Performance

Evaluated over 3 human-annotated corpora of news articles: International Business News, International

Political News, and UK Political News (~500 articles):

Precision 86%, Recall 84% w.r.t the standard NE types

But these metrics are not representative for semantic annotation

Page 18: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Semantic Annotation Metrics

There are no established metrics for semantic annotation: No human-annotated corpora with precise class

and instance information No metrics for various partial matches

When a more specific class is recognized When a more general class is recognized When the class is correctly recognized, but the

individual entity is not correctly identified.

Page 19: Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding

Conclusion

It is possible to adopt traditional IE techniques for semantic annotation

It is worth using almost-exhaustive entity knowledge for IE

KIM is still under development Proper evaluation metrics Precise disambiguation More advanced IE techniques KIM ontology and KB development