environmental data exchange network (eden). mcc microelectronics and computer technology corporation...

24
MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond The Ontology in InfoSleuth Value Mapping and the Environmental Data Registry Virtual demo

Upload: andrew-hunt

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Outline

EDEN Project Overview InfoSleuth in a microsecond The Ontology in InfoSleuth Value Mapping and the Environmental Data

Registry Virtual demo

Page 2: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Environmental Data Exchange Network

The challenge:• Acquisition, use and dissemination of environmental information

is of increasing strategic importance to EPA, DOD, DOE, and EEA

EDEN is an application of MCC's InfoSleuth technology• Employs intelligent agent technology through the Internet to

conduct concept-based searches of heterogeneous, distributed information

The EDEN Project demonstrates how organizations can save time and money: • Provides easy access over intranet or the Internet

• Enables users to access information from multiple sources

• Simplifies the exchange and sharing of data

• Reduces the reporting burden

• Brings information together for presentation and analysis

Page 3: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Common Set of Requirements

Reduce the reporting burden imposed by the parties on each other

Sharing of best available and most timely information

Enable users to access information from multiple sources

Coordinate only the common vocabulary – not the end use of information resources; focus on the inputs with each participant; individually interpreting and communicating outputs

Page 4: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

CERCLIS-3: EPA Superfund (Oracle, VA)

ITT: EPA Remediation Technology (MS-Access, TX)

HazDat: EPA Hazardous Substances (Sybase, GA)

ERPIMS: Air Force Env. Restoration (Oracle, TX)

EEA: Basel Convention (Ms-Access, TX)

IRDMIS: Army Installation Restoration (Oracle, MD)

DOE INEEL (Oracle, ID)

DOE ORNL (Oracle, TN)

Pilot Databases

Page 5: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

InfoSleuth

System of “competent” agents for dynamic, scalable (SQL-based) access to heterogeneous distributed information sources

Ontology-based information management

Advertise-discover paradigm supported by brokering over semantic constraints

Page 6: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

InfoSleuth System

Java-based agents Knowledge Query Manipulation Language message

layer provides speech-act agent interface Agent conversation shell provides structure for

KQML messages Open KnowledgeBase Connectivity language

provides semantic communication layer Brokering reasoning provided by Logical Data

Language, LDL++

Page 7: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

More InfoSleuth Agents

JDBC Resource agents translate between application domain ontology and database schemata

Multi-resource query agent uses either LDL++ or Oracle to support query decomposition and result recomposition

Value mapper translates to/from canonical value domains

Text agent supports ontology-based query Task execution agent manages CLIPS rule base for

task planning and subscription maintenance Sentinel and Deviation detection agents cooperate to

detect complex event patterns

Page 8: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Basic InfoSleuth Application Recipe

6 cups ontology 3 cups resource agent configuration 1-3 cups user interface development Lightly brown the multi-resource query agent Pour in other agents out of the box Stir Serve ... add or remove resource agents as desired add other functionality with more configuration

effort

Page 9: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

A Distributed Query

SQL

Resourcesmapping

info

RefinedData

text

Resourcesmapping

info

ontologyagent

valuemapagent

multi-queryagent

brokeragent

brokeragent

taskagent

taskagent

multi-queryagent

useragent

resourceagent

ViewerApplets

ViewerApplets

useragent

ViewerApplets

User

User

User

resourceagent

resourceagent

Page 10: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Purpose of The Ontology in InfoSleuth

To describe the domain with minimal ambiguity• the structure defines the domain• documentation strings

To be the integration hub for the DB schema• query relaxation through the taxonomy• vertical fragmentation• multi-resource path expressions

To provide the language of the queries and the language of expression of the results• value mapping

Page 11: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Expressing the Ontology

OKBC (Open Knowledge Base Connectivity): a standard for Knowledge Representation

Classes, Slots, Facets:• (class Observed_Contamination)• (template-slot-of analysis_method Observed_Contamination)• (template-facet-value :VALUE-TYPE analysis_method

Observed_Contamination :STRING)• (template-slot-of site Observed_Contamination)• (template-facet-value :VALUE-TYPE site

Observed_Contamination Eden_Site)

Subclass and Instance-Of Links

Page 12: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Value Mapping Modelling

Ontology Features

Quantity Unit Of Measure

Person height Distance Meter

Foot

unit

unit

canonical unit

STRING

data-type

Page 13: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Value mapping requirements

Translate terms in queries• Allow users to choose a coding scheme for querying• Query each database in terms of its own coding scheme

Translate results of queries• Facilitate merging of data from different sources• Display results according to user preference

Page 14: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Value mapping and the ontology

A class has one or more slots Each slot has a conceptual domain name Each slot has preferred value domain Resource Agents must advertise in the preferred

value domain• possibly translating to/from a different value domain

Users may query and view data in a different value domain• User Agent handles translation to/from preferred value

domain

Page 15: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

EDR contents

C o nc e p tu a ld o m a in

C D _ V M _ A s s o c

V a lu em e a ning

V a lu e d o m a inP e rm is s ib le

v a lu e

We use a specialized resource agent (map agent) to access the EDR

Page 16: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Additions to EDR

Downloaded files of permissible values for CAS number and Chemical name (Merck index) from EPA site

Assigned value meanings Created value domains for CAS code, CAS padded,

ycode; loaded permissible values Added 3 extra chemical names because Merck

index file was incomplete

Page 17: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Linking EDR to EDEN ontology

Conceptualdomain

CD_ID Valuedomain

VD_ID Preferreddomain

PD_ID

state 8 stateabbr

210 statename

6

state 8 statecode

219 statename

6

chemicalsubstance

123 chemicalname

357 casnumber

430

chemicalsubstance

123 caspadded

901 casnumber

430

chemicalsubstance

123 cas code 902 casnumber

430

chemicalsubstance

123 ycode 903 casnumber

430

Page 18: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

View of the EDR

Conceptualdomain

Codingscheme

Preferredvalue

Actualvalue

chemicalsubstance

chemicalname

001332214 Arsenic

CREATE VIEW edr_map (conceptual_domain, cd_id,value_domain, vd_id, preferred_domain, pd_id) ASSELECT emc.conceptual_domain, emc.value_domain, pref.pv_nm, act.pv_nmFROM edr_map_class emc, cd_vm_assoc a, permissible_value pref, permissible_value actWHERE a.cd_id = emc.cd_id AND a.vm_id = act.vm_idAND a.vm_id = pref.vm_id AND emc.vd_id = act.vd_idAND emc.pd_id = pref.vd_id

Page 19: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Query Processing

U s e r A ge nt

Q u e ry A ge nt

R e s o u rc e A ge nt

M a p A ge nt

D B M S

1 . Q u e ry2 . Q u e ry

3 . Q u e ry 4 . Q u e ry

5 . Q u e ry / R e s u lt

6 . R e s u lt

7 . R e s u lt

8 . R e s u lt

9 . R e s u lt

Page 20: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Query translation

SELECT name FROM site WHERE state = ‘Texas’’

translated to

SELECT name FROM site WHERE state = ‘TX’

Page 21: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Result translation

State Chemical

TX 1332-21-4

Translated to

State Chemical

Texas Arsenic

Page 22: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

EDR lookup

SELECT preferred_valueFROM edr_mapWHERE actual_value = ‘Benzene’AND coding_scheme = ‘chemical_name’AND conceptual_domain = ‘chemical_substance’

Page 23: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Outstanding issues

No match in EDR for database value• differences in case (‘Texas’, ‘TEXAS’)• CAS number format (dashes, leading zeros)• word order (‘n-Propyl benzene’, ‘Benzene, n-Propyl’)• bad data

Functional mapping needed Approximate string matching

Page 24: Environmental Data Exchange Network (EDEN). MCC Microelectronics and Computer Technology Corporation Outline EDEN Project Overview InfoSleuth in a microsecond

MCC

Microelectronics and Computer Technology Corporation

Netscape Hypertext

Document

DEMO