integration of information extraction with an ontology m. vargas-vera, j.domingue, y.kalfoglou, e....

14
Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Integration of Information Extraction with an Ontology

M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum

Introduction

• Ontology -> Information Extractor

• English text (NLP)

• Group of tools their IE system:• KMi Ontology

• From UMass:• Marmot• Crystal• Badger

• OCML preprocessor

Presentation Layout

• Background on tool origins and area of work

• Description of tool integration

• Coping with ambiguity

• Description of output

• Population of Ontology

• Future Work

UMassUniversity of Massachutes Amherst

• Marmot, Crystal, Badger

– Classifies text by recognizing extraction patterns and semantic features associated to slots in predefined frames.

Testing Area: KMi Planet• Web-based new server

– Story Library• Collections of news

stories and postings

– Ontology Library• Ontologies stored for

use in extracting information from the story library.

• Uses OCML•myPlanet uses cue-phrases defined as “research areas” to query KMi planet through the ontology library and the information extraction tools we’re about to talk about

The Ontology Library

• 40 different types of events or activities that can be described by the ontology library.

•Event type 3: demonstration-of-technology

•technology-being-demostrated (technology) (Info Extraction)

•has-duration (duration) (30 min)

•start-time (time-point) (3:30pm)

•end-time (time-point) (4pm)

•has-location (a place) (room 120 TMCB BYU campus)

•other agents-involved (list of person(s)) (Dr. Embley)

•main-agent (list of person(s)) (Brian Goodrich)

•location-at-start (a place) (room 120 TMCB BYU campus)

•location-at-end (a place) (room 120 TMCB BYU campus)

•medium-used (equipment) (mutli-media projector, ppt)

•subject-of-the-demo (title) (Integration of Information Extraction with an Ontology)

Marmot

•Natural Language Processor•Noun, Verb, and Prepositional Phrases

“John Domingue Wed, 15 Oct 1997.

David Brown, University for Industry visits the OU.”

•<ex> 2 1

•SUBJ(1): DAVID BROWN %COMMA% UNIVERSITY

•PP (2): FOR INDUSTRY

•VB (3): VISITS

•OBJ1(4): THE OU

•PUNC(5): %PERIOD%

•</ex>

•<ex> 1 1

•SUBJ(1): JOHN DOMINGUE

•ADVP(2): @WED_%COMMA%_15_OCT_1997@

•PUNC(3): %PERIOD%

•</ex>

Crystal

•Dictionary Induction Tool•Using keyword to annotate text with semantic tags.

•Visitor (<VI> David Brown <VI>)

•Place (<PL> the OU <PL>)

•Specific-to-general driven data search

•Relaxes constraints on initial definitions until it finds the most specific definition that covers all instances of the word in the text.

•Retains results for future use

•Tested on over 300 stories, 100% precision and recall

Badger

•http://rockape.qgl.org/crap/badger.swf

Matches sentences from text against concept nodes passed from Crystal. Select the best match by max number of features matching the concept node.

Can remove irrelevant sentences from problem set.

•(fairly certain whoever wrote this section did not speak English as first language)

Coping with Ambiguity•Query list of institutions

•Query list of projects

•Return list of institutions – no match

•Return list of project - match

•No discussion of whether this was automatically done by the extractor or manually by the users.

OCML Code Translator (Operational Conceptual Modeling Language)

• Tokenise Badger output, find corresponding CN definitions and extract all the objects found in the story

Ontology Maintenance

• Use Badger (lexicon) and Crystal (concept) output to automatically update Ontology library whenever a new story is added to the Story library

• Some cannot be automatically updated:

– There is not enough information in the story

– No current template to match with the sentence concepts.

Conclusion

• IE system created using Marmot, Crystal, Badger and the OCML translator.

• Obtained good results in KMi stories.

Assessment•Sporadic periods of quality technical writing, interspersed with nearly impenetrable English

•A borrowing of tools, translated to OCML and ported for KMi

Future Work• Deriving the type of an object when it does not match a

predefined template.

• Automatic creation of new classes and subclasses.

• Using this IE tool in other domains (need new training data?)

• Trying out a new Machine Learning algorithm in Crystal and comparing performance.

• Using the IE tool hypertext.

• Saving Badger’s output in XML

• Creating a more visual gui for the ontologies.