services for uima knowledge integration (suki) | knowledge integration and transformation engine...
TRANSCRIPT
Services for UIMA Knowledge Integration (SUKI) |Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
KITE Current Design and RoadmapKITE Current Design and Roadmap
IBM ResearchIBM Research
J. William MurdockJ. William MurdockChristopher WeltyChristopher WeltyDavid FerrucciDavid Ferrucci Last Update: Mar. 6, 2006Last Update: Mar. 6, 2006
2
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
BackgroundBackground
3
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Transforming Knowledge
The transformation of knowledge from one form to another requires the explicit mapping across ontologies.
Relation(ManagerOf)
Entity (Person):Fred Center
Entity (Organization):Center Micros
Executive:Fred Center
SocialAggregate:Center Micros
hasManager
Organization(?x) SocialAggregate(?x)Person(?x) ^ ManagerOf(?x, ?y) Executive(?x)
KITE Mapping Plugins
ManagerOf(?x, ?y) hasManager(?y, ?x)
TargetOntology
SourceOntology
4
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Motivation: Why Transform Knowledge?
Different systems have different ontologies and/or different representational schemes
Sometimes those differences are arbitrary
Other times they are specifically motivated by differences in the purposes of the systems
In either case, interoperation requires that knowledge be transformed
5
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Reference Scenario: Transforming extracted knowledge
Transforming extracted knowledge into a form suited for reasoning.
Representations and ontologies for legacy extractors tend to be radically different from those for legacy reasoners.
Those differences are generally dramatic and are motivated by significant functional issues.
– Extraction ontologies tend to be very close to how things are expressed in language. Types are grouped by how instances of those types can be described.
– Reasoning ontologies tend to permit parsimonious rules. Types are grouped by the inferences that can be drawn over them.
A powerful/flexible framework is needed to resolve these differences.
This is not the only use for KITE, but it is an important use.
6
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Current ImplementationCurrent Implementation
7
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
KITE-based applications
Source Plugin
OntologyLanguage
Plugin
Mapper Plugin(s)
Target Plugin
OntologyLanguage
Plugin
Source Data Target Data
Provenance Plugin
SourceRepository
TargetRepository
ProvenanceRepository
SourceOntology
TargetOntology
8
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Building KITE applications Framework provides:
– API’s for:• Mapper plugins• Source plugins• Target plugins• Provenance plugins• Language plugins
– Classes for Data
– Top-level control from sourcemappertarget
– Some broadly applicable plugins (of each of the types)
Application developer provides:
– Configuration for some of KITE’s broadly applicable plugins
– New, application specific plugins (if needed)
Source Plugin
OntologyLanguage
Plugin
MapperPlugin(s)
Provenance Plugin
Target Plugin
OntologyLanguage
Plugin
9
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Some Built-in Broadly Applicable Components
Aggregate mappers that provide control flow
– Selection aggregate: Runs the first applicable delegate
– Cascade aggregate: Runs each delegate in order
Configurable primitive mappers
– e.g., Table lookup: Configured with a table of one-to-one sourcetarget mappings
EKDB source, target, and provenance plugins
“Lispy” source and target plugins
UIMA type system ontology plugin
OWL ontology plugin
10
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Broad-class of KITE applications: UIMAOWL
Source Plugin
UIMA Type System Plugin
Mapper Plugin(s)
Target Plugin
OWL Ontology Plugin
Source Data Target Data
Provenance Plugin
UIMAAnalysisResults
OWLStore
ProvenanceRepository
TypeSystem
OWLOntology
UIMA Analytics(recognition, coreference,
etc.)
OWL Tools(Protégé, reasoners, etc.)
11
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
interface IntegratorPlugin
interface Mapper
Collection<Data> map(Data)
addMapper(Mapper);init(Integrator);Collection<Data> integrateDeferred()
interface AggregateMapper
interface IntegratorResource
Iterator<Instance> instanceIterator()Iterator<Tuple> tupleIterator()
close()
interface SourceResource
interface TargetResource
write(Collection<Data>)
interface ProvenanceResource
write(Collection<Data>)
interface OntologyResource
bool subsumesClass(String, String)bool subsumesProperty(String, String)
...
SelectionAggregate
CascadeAggregate
TableMapper
TypeNameMapper
OwlModelResource
OntModel model
TypeSystemResource
TypeSystem typeSystem
KITE Plugins: UML class model
MapperNotApplicableException
MapperNotApplicableYetException
<<send>> <<send>>
IdentityMapper
12
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
KITE Data: UML class model
Data
String typeNameString id
Instance
Tuple
List<String> arguments
LabeledInstance
String canoncialFormString[] variantForms
(assorted convenience methods & data structures not shown)
13
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Developing mappers in KITE
map method takes source Data and returns any number of target Data items
map throws:
– MapperNotApplicableException: Indicates that the mapper can not be run at all on this data
– MapperNotApplicableYetException : Indicates that the mapper could be run on this data in a different context; recommends that the caller try again later
(Nation uid11) Mapper (GeographicRegion uid11b)
(NationalGovernment uid11a)
(governs uid11a uid11b)
interface Mapper
Collection<Data> map(Data)
MapperNotApplicableException
MapperNotApplicableYetException
<<send>> <<send>>
14
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Example primitive mapper: One-to-one lookup table
public Collection<Data> map(Data d) throws MapperNotApplicableException {
String sourceType = d.getTypeName();
if (!table.containsKey(sourceType))
throw new MapperNotApplicableException();
String targetType = table.get(sourceType);
Instance i = new Instance(type, d.getId());
List<Data> retval = new LinkedList<Data>();
retval.add(i);
return retval;
}
Source Target
PER Person
ORG Organization
FAC Facility
PERuid105
Personuid105
15
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Aggregate Mappers
An aggregate mapper is composed of delegate (i.e., lower level) mappers that may be primitive or aggregate
KITE provides two built-in aggregate mapper plugins:
– Selection aggregate: The first delegate mapper that applies to the data item is applied and the other mappers are ignored
– Cascade aggregate: Each delegate mapper is run in sequence; the output of each is an input to the next
KITE also provides an API for developers to build their own aggregate mapper plugins.
addMapper(Mapper);init(Integrator);Collection<Data> integrateDeferred()
interface AggregateMapper
interface Mapper
Collection<Data> map(Data)
16
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Selection Aggregate Mapper
(Vehicle uid16)
(Date uid15)
(TemporalInterval uid15a)
Selection Aggregate
Temporal Entity Mapper
(Primitive)
Physical Entity Mapper
(Primitive)
(TransportationDevice uid16a)
The first delegate mapper that applies to the data item is applied and the other mappers are ignored
17
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Cascade Aggregate Mapper
(Nation uid11) Cascade Aggregate
Political Entity Mapper
(Primitive)
Geospatial Entity Mapper
(Primitive)
• Each delegate mapper is run in sequence
• The output of each is an input to the next
• Results accumulate
• Later mappers can be defined in terms of the target ontology
• Especially useful if the target ontology is designed for reasoning
(GeographicRegion uid11b)
(governs uid11a uid11b)
(NationalGovernment uid11a)
18
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
EKDB HUTTKANI: A complex KITE application
EKDB Extraction Source Plugin
UIMA Type System Plugin
HUTTKANI Aggregate
Mapper
EKDB RDF Target Plugin
OWL Ontology Plugin
Source Data Target Data
EKDB ExtractionRDFProvenance Plugin
EKDBExtraction
TablesEKDB
RDF Tables
EKDBProvenance
Table
HUTTType System
KANI OWLOntology
19
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
EKDB HUTTKANI: A complex KITE application[simplified, for paper]
UIMA Extraction Database
Source Plugin
UIMA Type System Plugin
HUTTKANI Aggregate
Mapper
RDF StoreTarget Plugin
OWL Ontology Plugin
Source Data Target Data
ExtractionRDFProvenance Plugin
UIMAExtractionDatabase
RDF StoreDatabase
UIMA/RDFProvenanceDatabase
HUTTType System
KANI OWLOntology
20
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
EKDB HUTTKANI: A complex KITE application
HUTTKANI Aggregate
Mapper
HUTTKANI
(Selection Aggregate)
HoldsDuringMapper
(Primitive)
HUTTKANI lookup-table
(Cascade Aggregate)
HUTTKANItype name matching
(Cascade Aggregate)
Table Mapper
(Primitive)
Type Name Matcher
(Primitive)
OWL-Time
(Primitive)
RDF labels
(Primitive)
HUTTKANIad hoc
(Primitive)
TimeSlice
(Primitive)
21
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
KITE for Queries
In some cases, the ontology in which a user (or an automated system) poses a query is different from one in which data is encoded.
Some KITE applications (e.g., NIMD knowledge integrator) handle this by mapping the data at indexing time.
Other KITE applications map the query at run time.
22
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Example: KITE for JuruXML Queries
(Nation uid1)
(KEYWORD uid2“Republic”)
(KEYWORD uid3“Angola”)
JuruXML Source Plugin
OntologyLanguage
Plugin
MapperPlugin(s)
OntologyLanguage
Plugin
<Nation> Republic Angola</Nation>
(National-Government
uid1a)
(KEYWORD uid2a“Republic”)
(KEYWORD uid3a“Angola”)
(Geographic-Regionuid1b)
<NationalGovernment> Republic Angola</NationalGovernment>
<GeographicRegion>Republic Angola</GeographicRegion>
(CONTAINS uid1uid2 uid3)
(CONTAINS uid1auid2a uid2b)
...(KEYWORD uid2b
“Republic”)
JuruXML Target Plugin
23
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Mapping types in KITE
KITE is typically used to map concrete data (instances), but it can be used to map types in an ontology (meta-instances)
For example, KITE can map a UIMA Type System Descriptor into an OWL RDF ontology
– KITE built-in UIMA KLT source plugin produces one KITE “instance” for each entity type, plus KITE tuples for each relation type, and plus tuples for parents of types
– KITE built-in OWL model target plugin takes a stream of tuples and writes them to an OWL RDF file
With KITE built-in “identity” mapper: a direct translation
With other mappers: a partial/complex translations
In some cases, the mappers can then be reused to map instances across the two ontologies
– In other cases, mapping instances may depend on contextual issues that are not relevant to mapping types
24
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Example: Mapping UIMA types to OWL classes/properties
(org.example.Nation uid1)
(org.example.Place uid2)
(PARENT uid1 uid2)
UIMA Knowledge-Level Types
Source Plugin
MapperPlugin(s)
<owl:Class rdf:about=“example:Country">
<rdfs:subClassOf>
<owl:Class rdf:about=“example:Place"/>
</rdfs:subClassOf>
</owl:Class>
OWL TargetPlugin
UIMA Type System Plugin
OWL Ontology Plugin
(example:Country uid1a)
(example:Place uid2a)
(PARENT uid1a uid2a)
<typeDescription>
<name>org.example.Nation</name>
<supertypeName>org.example.Place</supertypeName>
</typeDescription>
<typeDescription>
<name>org.example.Place</name>
<supertypeName>org.example.TopEntity</supertypeName>
</typeDescription>
25
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Future DevelopmentFuture Development
26
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Future Objectives
Recall that the existing framework provides:
– API’s for plugins (mappers, sources, targets, etc.) and classes for input/output data
– Control flow code
– Some broadly applicable plugins
Future versions of the framework will provide:
– API’s and classes that are better aligned with established products and standards (e.g., UIMA, Ecore)
– Control flow that is more scalable
– More built-in plugins (e.g., target plugins for existing RDF storage systems)
27
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Tighter integration with UIMA
Many of the capabilities of KITE seem very similar to capabilities already found in UIMA.
– e.g., KITE allows developers to build an aggregate mapper and specify some control flow among delegate mappers; UIMA allows similar functionality for analytics.
If we could reuse some of that functionality, we could leverage existing UIMA infrastructure and tool support.
Furthermore, recall that our reference scenario involves transforming extracted knowledge.
– UIMA is frequently used for extraction.
– Thus developers working on our reference scenario are likely to be familiar with UIMA; easier for them to “get up to speed” on KITE if we are reusing UIMA capabilities in KITE.
28
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
UIMA Integration Level 1: UIMA data structures & API’s
KITE defines various interfaces and classes (plugins, data, etc.). However, many elements of UIMA serve similar purposes, e.g.:
We could redefine KITE to use the corresponding UIMA structures instead of its own customized structures.
This would allow us to use UIMA descriptor language, corresponding tool support, etc.
UIMA KITE
Feature Structure Data
Annotator Primitive Mapper
Aggregate Analysis Engine Aggregate Mapper
Collection Reader Source Plugin
CAS Consumer Target Plugin
29
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
UIMA Integration Level 2: UIMA control flow
If KITE plugins were UIMA components, then presumably the UIMA collection processing manager (CPM) could provide flow control among them
Flow from source mapper target is handled well by UIMA’s built in “fixed flow.”
Flow within an aggregate mapper in KITE is more complex.
– Cascade aggregate is essentially “fixed flow” with deferment
– Selection aggregate is a different flow and also requires deferment
Fortunately, flow control is a pluggable element of the UIMA framework.
Thus (presumably) the KITE built-in aggregate mapper types could be written by KITE developers as UIMA flow control plugins.
If KITE application developers wanted their own aggregate mappers, they could develop their own UIMA flow control plugins.
30
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
ECore Integration
Recall the UML model for KITE data:
There are many existing standards for storing instances and links among them.
ECore is one such standard that has a great deal of existing tool support.
UIMA interoperability with ECore is currently under development.
Maybe we should use ECore for KITE data.
Data
String typeNameString id
Instance
Tuple
List<String> arguments
LabeledInstance
String canoncialFormString[] variantForms
31
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Larger Scale
We have been using the KITE-based “EKDB HUTTKANI” application for a 2006 evaluation being conducted by National Institute of Standards & Technology
– Input: ~580 thousand entities, ~450 thousand relations extracted from a 169MB text corpus with 37,442 documents
– The KITE-based application takes about 2 hours to run on this data.
– It requires more than 1.5 GB of Java heap space and thus can only run on a 64 bit computer.
This application must be faster and more memory efficient if it is to effectively scale to multi-GB corpora.
Some improvements will be local to specific plugins used in the application (e.g., EKDB RDF Target Plugin)
Other improvements may involve more fundamental alterations to the KITE architecture
32
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Development RoadmapDevelopment Roadmap
33
Services for UIMA Knowledge Integration (SUKI) | Knowledge Integration and Transformation Engine (KITE)
© 2006 IBM Corporation – All Rights Reserved
Open Questions
What are the top priorities for future development?
What external requirements are driving deadlines?
– NIST evaluation
– Commercialization of SAW
– Others?
What is the timeline?