environmental data exchange network (eden). mcc microelectronics and computer technology corporation...
TRANSCRIPT
MCC
Microelectronics and Computer Technology Corporation
Outline
EDEN Project Overview InfoSleuth in a microsecond The Ontology in InfoSleuth Value Mapping and the Environmental Data
Registry Virtual demo
MCC
Microelectronics and Computer Technology Corporation
Environmental Data Exchange Network
The challenge:• Acquisition, use and dissemination of environmental information
is of increasing strategic importance to EPA, DOD, DOE, and EEA
EDEN is an application of MCC's InfoSleuth technology• Employs intelligent agent technology through the Internet to
conduct concept-based searches of heterogeneous, distributed information
The EDEN Project demonstrates how organizations can save time and money: • Provides easy access over intranet or the Internet
• Enables users to access information from multiple sources
• Simplifies the exchange and sharing of data
• Reduces the reporting burden
• Brings information together for presentation and analysis
MCC
Microelectronics and Computer Technology Corporation
Common Set of Requirements
Reduce the reporting burden imposed by the parties on each other
Sharing of best available and most timely information
Enable users to access information from multiple sources
Coordinate only the common vocabulary – not the end use of information resources; focus on the inputs with each participant; individually interpreting and communicating outputs
MCC
Microelectronics and Computer Technology Corporation
CERCLIS-3: EPA Superfund (Oracle, VA)
ITT: EPA Remediation Technology (MS-Access, TX)
HazDat: EPA Hazardous Substances (Sybase, GA)
ERPIMS: Air Force Env. Restoration (Oracle, TX)
EEA: Basel Convention (Ms-Access, TX)
IRDMIS: Army Installation Restoration (Oracle, MD)
DOE INEEL (Oracle, ID)
DOE ORNL (Oracle, TN)
Pilot Databases
MCC
Microelectronics and Computer Technology Corporation
InfoSleuth
System of “competent” agents for dynamic, scalable (SQL-based) access to heterogeneous distributed information sources
Ontology-based information management
Advertise-discover paradigm supported by brokering over semantic constraints
MCC
Microelectronics and Computer Technology Corporation
InfoSleuth System
Java-based agents Knowledge Query Manipulation Language message
layer provides speech-act agent interface Agent conversation shell provides structure for
KQML messages Open KnowledgeBase Connectivity language
provides semantic communication layer Brokering reasoning provided by Logical Data
Language, LDL++
MCC
Microelectronics and Computer Technology Corporation
More InfoSleuth Agents
JDBC Resource agents translate between application domain ontology and database schemata
Multi-resource query agent uses either LDL++ or Oracle to support query decomposition and result recomposition
Value mapper translates to/from canonical value domains
Text agent supports ontology-based query Task execution agent manages CLIPS rule base for
task planning and subscription maintenance Sentinel and Deviation detection agents cooperate to
detect complex event patterns
MCC
Microelectronics and Computer Technology Corporation
Basic InfoSleuth Application Recipe
6 cups ontology 3 cups resource agent configuration 1-3 cups user interface development Lightly brown the multi-resource query agent Pour in other agents out of the box Stir Serve ... add or remove resource agents as desired add other functionality with more configuration
effort
MCC
Microelectronics and Computer Technology Corporation
A Distributed Query
SQL
Resourcesmapping
info
RefinedData
text
Resourcesmapping
info
ontologyagent
valuemapagent
multi-queryagent
brokeragent
brokeragent
taskagent
taskagent
multi-queryagent
useragent
resourceagent
ViewerApplets
ViewerApplets
useragent
ViewerApplets
User
User
User
resourceagent
resourceagent
MCC
Microelectronics and Computer Technology Corporation
Purpose of The Ontology in InfoSleuth
To describe the domain with minimal ambiguity• the structure defines the domain• documentation strings
To be the integration hub for the DB schema• query relaxation through the taxonomy• vertical fragmentation• multi-resource path expressions
To provide the language of the queries and the language of expression of the results• value mapping
MCC
Microelectronics and Computer Technology Corporation
Expressing the Ontology
OKBC (Open Knowledge Base Connectivity): a standard for Knowledge Representation
Classes, Slots, Facets:• (class Observed_Contamination)• (template-slot-of analysis_method Observed_Contamination)• (template-facet-value :VALUE-TYPE analysis_method
Observed_Contamination :STRING)• (template-slot-of site Observed_Contamination)• (template-facet-value :VALUE-TYPE site
Observed_Contamination Eden_Site)
Subclass and Instance-Of Links
MCC
Microelectronics and Computer Technology Corporation
Value Mapping Modelling
Ontology Features
Quantity Unit Of Measure
Person height Distance Meter
Foot
unit
unit
canonical unit
STRING
data-type
MCC
Microelectronics and Computer Technology Corporation
Value mapping requirements
Translate terms in queries• Allow users to choose a coding scheme for querying• Query each database in terms of its own coding scheme
Translate results of queries• Facilitate merging of data from different sources• Display results according to user preference
MCC
Microelectronics and Computer Technology Corporation
Value mapping and the ontology
A class has one or more slots Each slot has a conceptual domain name Each slot has preferred value domain Resource Agents must advertise in the preferred
value domain• possibly translating to/from a different value domain
Users may query and view data in a different value domain• User Agent handles translation to/from preferred value
domain
MCC
Microelectronics and Computer Technology Corporation
EDR contents
C o nc e p tu a ld o m a in
C D _ V M _ A s s o c
V a lu em e a ning
V a lu e d o m a inP e rm is s ib le
v a lu e
We use a specialized resource agent (map agent) to access the EDR
MCC
Microelectronics and Computer Technology Corporation
Additions to EDR
Downloaded files of permissible values for CAS number and Chemical name (Merck index) from EPA site
Assigned value meanings Created value domains for CAS code, CAS padded,
ycode; loaded permissible values Added 3 extra chemical names because Merck
index file was incomplete
MCC
Microelectronics and Computer Technology Corporation
Linking EDR to EDEN ontology
Conceptualdomain
CD_ID Valuedomain
VD_ID Preferreddomain
PD_ID
state 8 stateabbr
210 statename
6
state 8 statecode
219 statename
6
chemicalsubstance
123 chemicalname
357 casnumber
430
chemicalsubstance
123 caspadded
901 casnumber
430
chemicalsubstance
123 cas code 902 casnumber
430
chemicalsubstance
123 ycode 903 casnumber
430
MCC
Microelectronics and Computer Technology Corporation
View of the EDR
Conceptualdomain
Codingscheme
Preferredvalue
Actualvalue
chemicalsubstance
chemicalname
001332214 Arsenic
CREATE VIEW edr_map (conceptual_domain, cd_id,value_domain, vd_id, preferred_domain, pd_id) ASSELECT emc.conceptual_domain, emc.value_domain, pref.pv_nm, act.pv_nmFROM edr_map_class emc, cd_vm_assoc a, permissible_value pref, permissible_value actWHERE a.cd_id = emc.cd_id AND a.vm_id = act.vm_idAND a.vm_id = pref.vm_id AND emc.vd_id = act.vd_idAND emc.pd_id = pref.vd_id
MCC
Microelectronics and Computer Technology Corporation
Query Processing
U s e r A ge nt
Q u e ry A ge nt
R e s o u rc e A ge nt
M a p A ge nt
D B M S
1 . Q u e ry2 . Q u e ry
3 . Q u e ry 4 . Q u e ry
5 . Q u e ry / R e s u lt
6 . R e s u lt
7 . R e s u lt
8 . R e s u lt
9 . R e s u lt
MCC
Microelectronics and Computer Technology Corporation
Query translation
SELECT name FROM site WHERE state = ‘Texas’’
translated to
SELECT name FROM site WHERE state = ‘TX’
MCC
Microelectronics and Computer Technology Corporation
Result translation
State Chemical
TX 1332-21-4
Translated to
State Chemical
Texas Arsenic
MCC
Microelectronics and Computer Technology Corporation
EDR lookup
SELECT preferred_valueFROM edr_mapWHERE actual_value = ‘Benzene’AND coding_scheme = ‘chemical_name’AND conceptual_domain = ‘chemical_substance’
MCC
Microelectronics and Computer Technology Corporation
Outstanding issues
No match in EDR for database value• differences in case (‘Texas’, ‘TEXAS’)• CAS number format (dashes, leading zeros)• word order (‘n-Propyl benzene’, ‘Benzene, n-Propyl’)• bad data
Functional mapping needed Approximate string matching
MCC
Microelectronics and Computer Technology Corporation
Netscape Hypertext
Document
DEMO