odb course. mediation environment for object db integration leonid a. kalinichenko institute of...

44
ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences [email protected]

Upload: norman-owens

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

ODB Course. Mediation environment for object DB integration

Leonid A. Kalinichenko

Institute of Informatics Problems, Russian Academy of Sciences

[email protected]

Page 2: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

Talk outline

Motivation Resource registration at a mediator Query rewriting for information integration Query rewriting algorithm in the typed environment

Page 3: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

Motivation

Outline :

Application-driven EIS development Canonical Information Model Consolidation of a mediator Mediator schema example

Page 4: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

4

Concerns of EIS development

Enterprise inter- and intra-organizational models: virtual corporations (e.g., virtual observatories in e-science)

Integrating at the model level - taking fragments of information within the enterprise and placing them in a larger context

What model is to be taken and how a proper context is to be formed Heterogeneous information resources of various kinds (data resources, service

resources, process resources, ontological resources) relevant to EIS are to be used in a specific context of an application

Often such resources are autonomous and evolve with time. A set of resources relevant to a specific EIS may be changed quite rapidly. The technologies applied for relevant resources are also rapidly evolving.

Justifiable identification of relevant to EIS resources, reaching semantic integration of various kinds of them in contexts of appropriate applications

Making EIS stable in the rapidly evolving world New methods and tools for EIS application development over multiple

distributed collections of data and services are required

Page 5: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

5

Subject Domain in Natural Science

Material System Def in NLDomain Terminology and Concepts(abstract, methodological, concrete)

Theory (Model) 1. T1 Signature

Concretization A of T1

Concretization B of T1

(attributes, types, classes, processes) [simulators]

Semantics of T1…Tn constituents Observable/Measurable

Characteristics

Methods and Instruments for observa-tion, experimentation, measurement, data analysis, discovery

T1 Measurable Characteristics(attributes, types, classes, procs)

Observations, simulations, measurements for T1 Explaining, forecasting

Semantics

T2, … , Tn measu-rable characteristics

Theories (Models)T2, … , Tn

Problems, methods of solutions,algorithms, programs, workflows

Simulation

Interpreta-tion

Page 6: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

6

Requirements for scientific results publishing

To publish means to make information resource available through services:

To unify theory, experiment, and simulation To query integrated information To allow independent checks of conclusions based on theoretical results,

reproducing certain results. To allow comparisons with similar results/methodologies or with the

corresponding data by observers/theoreticians. To make theoretical results more easily accessible and understandable for

observers. To establish invariants for observable classes, to treat observable classes as

interpretations of theories (models), triggers watching for inconsistencies of observations and theoretical models.

Page 7: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

7

Two approaches to resource integration

Moving from resources to problems (an integrated schema of multiple resources is created independently of specific applications)

Moving from an application to resources (a description of an application subject domain (in terms of concepts, data structures, functions, processes) is created, in which resources relevant to the application are mapped)

The first approach driven by information resources is not scalable with respect to the number of resources, does not make semantic integration of resources in a context of specific application possible, does not lead to justifiable identification of relevant to EIS resources, does not provide for enhancing of EIS stability

The second approach (application-driven) assumes creation of subject mediator that supports an interaction between an application and resources on the basis of the application domain definition (description of the mediator)

Application-driven, subject mediation approach provides for overcoming of deficiencies of the resource-driven approach

Basic methods for application-driven approach will be characterized

Page 8: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

8

Principles of application-driven EIS development

Basic principles of application-driven EIS development over multiple heterogeneous information resources are the following:1. independence of application (mediator) specification of the existing information

resources;2. definition of an application mediator as a result of consolidation effort of the

respective community;3. emphasizing semantic canonical definitions for the mediator specification;4. independence of user interfaces of the resources registered at the mediator:

application users should be only conscious of definition of the application domain (definition of mediator);

5. independence of publication of the newly developed information resources of the mediators

6. three stage identification of information resources relevant to mediator 7. semantic integration of relevant heterogeneous information resources in

canonical mediator specification 8. integrated access to the information resources registered at mediator applying

the canonical model and query rewriting system 9. recursive structure of mediators: each mediator can be registered as a new

information resource

Page 9: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

9

Synthesis of canonical information model Explosive growth of various information representation models (OMG

architectures (e.g., MDA architecture), SemanticWeb and Web service architectures, digital library architectures as collective memories, information Grid architectures, languages and data models (ODMG, SQL, UML, XML and RDF stacks of data models), process models and workflow models, semantic models (including ontological models and models of metadata), models of digital repositories of data and knowledge in particular domains

Another trend — intensive development of based on such models information components and services, accelerating need for integration in various applications of components and services

Development of adequate methods for manipulation of various information models are required

The basis of these methods is constituted by the concept of a canonical information model serving as the common language, ”Esperanto”

Initially ideas of mapping structured data models and canonical model construction for them were developed

Main principle of mapping of an arbitrary resource data model (represented by its DDL and DML) into the target one (the canonical model) constituted the principle of commutative data model mapping

Preserving of operations and information of a resource data model while mapping it into the canonical one applying proofs in denotational semantics

Page 10: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

10

Synthesis of canonical information model (2) For the object data models, the method of data model mapping and canonical

models constructions used as a formalism (metamodel) of the method the Abstract Machine Notation (AMN). It allowed to define the model-theoretic specifications in the first order logics and to prove the fact of specification refinement

The main principle of canonical model synthesis is that its extensibility is required for semantic integration and information interoperability in heterogeneous environment, including various models

A kernel of the canonical model is fixed. For each specific information model M of the environment an extension of the kernel is defined so that this extension together with the kernel is refined by M

The canonical model for the environment is synthesized as the union of extensions, constructed for models M of the environment. The resource schema refines the canonical model schema

The refinement of the schema mapping is formally checked Canonical data model synthesis method provides a seminal role for synthesis of

canonical models for various kinds of resource information models including process models (workflows), service models (Web services), ontological models (OWL)

Page 11: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

11

Heterogeneous models absorbed by the canonical model

Core

Canonical Model

Extensions

Component Models(IDL, CDL, BOF)

Object & HeterogeneousDB Models

(ODL, SQL3, Garlic)

Knowledge BaseRepresentations

(OKBC, Ontolingua)

Unstructured Data(vocabularies, thesauri)

Semistructured Data Models

(OEM, ADM, OQL-doc)

Document ObjectModel

Metadata for DL(Dublin Core, Warwick,

Starts, Z.39.50)

Metadata Expressiblein Meta Models

(MOF, RDF)

is_refined_by

Workflow Models

Page 12: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

12

Mediator Definition as Subject Metainformation ConsolidationFor the mediator's scalability two separate phases of the mediator's functioning are distinguished: consolidation phase and operational phase.

On the consolidation phase the efforts of the community are focused on the mediator subject definition by declaring its metainformation. The well-known representative resources of information in the subject domain are used during the process of metainformation definition. The metainformation created at the consolidation phase constitutes the mediated level of the integrated system.

During the operational phase arbitrary information resources can be registered at the mediator expressed in terms of the mediated level. Process of the registration is autonomous and can be done by resource providers independently of each other. Users of the mediator know only the metainformation defining the mediator’s subject and formulate their queries in terms of the mediator’s subject. For a query the mediator decides what registered resources are relevant to the query.

Page 13: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

13

Mediator’s Recursion

Querymediator

Data frommediator

Querycollection

Data fromcollection

Registercollection

Registermediator(as collection)

Mediator

Page 14: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

14

Advantages of subject domain mediation

1. Semantic integration of heterogeneous information collections is reached

2. Users should know only subject definitions as defined by a community

3. Information providers can disseminate their information for integration independently of each other and at any time.

4. Autonomous information collections are absolutely independent on the mediator and its consolidated metainformation definitions

5. Users have integrated access to all information registered up to the moment of a query.

6. Mediators form recursive structure. Multiple subjects can be semantically integrated defining mediators of the higher level.

Page 15: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

15

Cultural Heritage Subject Domain

Person

Creator Collector Owner

Heritage_Entity

Painting Sculpture Antiquities

Repository

Museum Gallery Exhibition

created_by*date*narrative*idintifier*relation*…place_of_originhistory_periodcontentorigin_historyin_collectionowned_bydigital_form...

«type»

containsnearwithinfollows…

«type»Text

OntologiesThesauri:

Cultural Heritage

History

Jurisdiction

Page 16: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

16

Cultural Heritage Subject Mediator

+value(in e : Entity) : real

-title-date

«type»Entity

-place_of_origin-date_of_origin-content

«type»Heritage_Entity

-material_medium-exposition_space

«type»Sculpture

-dimensions

«type»Painting

-type_specimen-archaeology

«type»Antiquities

-name-nationality-date_of_birth-date_of_death-residence

«type»Person

-culture-general_Info

«type»Creator

-name-place

«type»Repository

-name-description

«type»Collection

-in_repository1

-collections*

-created_by

1

*

1

-works

*

-contains

*-in_collection

1

Page 17: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

Resource registration at a mediator

Outline :

Resource classes as views over mediator schema Example Resource registration facilities Contextualization of ontology Process of an information resource registration

Page 18: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

18

Registration of relevant resources in mediator Definition of a subject mediator and registration of information resources in

mediator is based on ideas of compositional development of information systems

Registration of resources is a process of purposeful specification transformation including decomposition of mediator specifications into consistent fragments, search among specifications of relevant resources of data types treating as candidates for refining by them of the mediator specification types, construction of expressions defining resource classes as composition of the mediator classes

Specification composition calculus, type reducts, type algebra A process of registration of heterogeneous information resources in a subject

mediator is based on GLAV that combines two approaches - Local As View (LAV) and Global As View (GAV)

GAV views provide for reconciliation of various conflicts between resource and mediator specifications and provide rules for transformation of a query results from resource into mediator representation

The main registration result is a GLAV expression defining how a resource class is determined as a composition of the mediator classes

Such registration technique provides for stability of EIS application specification during any modifications of specific information resources and of their actual presence

Identification of resources relevant to a mediator (that precedes the registration) is based on three models: metadata model, ontological model, canonical information model

Page 19: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

19

Heterogeneous Information resources Integration for Problem Solving

• integration information from pre-selected resources. A procedural approach is known to integrate information from resources through ad-hoc procedures. When information needs or resources change, a new mediator should be generated. This is known as Global as View (GAV) approach.

• integration information from arbitrary resources according to the predefined information needs. A declarative approach is known. Mediators contain mechanisms to rewrite queries according to resource descriptions. A rewritten query should be contained in the original query. This is known as Local as View (LAV) approach.

• combined LAV and GAV approaches (GLAV), applying partial materialization

Page 20: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

20

Representation of Information resources

Formally, the contents of an information resource are described by a set of expressions:

V(Z) C1 (U 1) &…& Cn ( Un )

Where C1, … , Cn are classes on the mediator level, V is the class on the information resource level. This means that the resource can be asked a query of the form V(Z) (or any partial instantiation of it), and returns instances with state attributes that satisfy the following implication:

Z (V( Z) => U C1 (U 1) &…& Cn ( Un ))

Z are assumed to be free variables in U C1 (U 1) &…& Cn ( Un )

Page 21: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

21

Resources Representation at Subject Mediator

Mediator level metainformation

Local into mediator level mapping

Hermitage Museum Web Site Louvre Museum Web Site Uffizi Museum Web Site

museum_objectP

titlenameplace_of_origindate_of_originr_namedimensions

museum_objecrS

titlenameplace_of_origindate_of_originr_namematerial

workP

titleauthorplace_of_origindate_of_originin_rep

workS

titleauthorplace_of_origindate_of_originin_repexp_space

artist

namegeneral_Infoworks

canvas

titlenamecultureplace_of_originr_name

museum_objectP(p/Museum_Object[title, name, place_of_origin, date_of_origin, r_name]) painting(p/Painting[title, name:created_by.name, place_of_origin, date_of_origin, r_name: in_collection.in_repository.name]), date_of_origin < 1700, date_of_origin > 1600 …

workP(p/Work[title, author, place_of_origin, date_of_origin, in_rep]) painting(p/Painting[title, author: created_by.name, place_of_origin, date_of_origin, in_rep: in_collection.in_repository.name]), in_rep = ‘Louvre’ …amount(h/Entity[title, name: created_by.name], v/real) value(h/Entity[title, name: created_by.name], v/real)

canvas(p/Canvas[title, name, culture, place_of_origin, r_name]) painting (p/Painting[title, name: created_by.name, place_of_origin, date_of_origin, r_name: in_collection.in_repository.name]), creator (c/Creator[name, culture]), r_name = 'Uffizi', date_of_origin >= 1550, date_of_origin < 1700artist (a/Artist[name, general_Info, works]) creator(a/Creator[name, general_Info, works/{set-of:Painting}])

Local views in terms of mediator classes

Page 22: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

22

Uffizi views in terms of the mediator

canvas(p/Canvas[title, name, culture, place_of_origin, r_name])

painting (p/Painting[title, name: created_by.name, place_of_origin, date_of_origin, r_name: in_collection.in_repository.name]), creator (c/Creator[name, culture]), r_name = 'Uffizi', date_of_origin >= 1550, date_of_origin < 1700

artist (a/Artist[name, general_Info, works])

creator(a/Creator[name, general_Info, works/{set-of:Painting}])

amount(h/Entity[title, name: created_by.name], v/real)

value(h/Entity[title, name: created_by.name], v/real)

Page 23: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

23

Resource Registration Facilities

The facilities intended to support functions of resource registration include:

• contextualization of ontology;

• constructing mapping of a resource data model and metadata into the canonical ones;

• structural and behavioral conflicts resolution;

• view definition: representation of resource classes in terms of the mediator's classes;

• semi-automatic construction of a resource wrapper;

• connecting the wrapper to the interoperation infrastructure.

Page 24: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

24

Structure of the Resource Registration Tool

reresource context / mediator metainformation reconciliation

construction resource class specifications as views over mediator

classes

most common reduct identification

Reresource Registration Tool Mediator’s DBMS (Oracle 9i)

wrapper generation

B-Toolkit

metainformationrepository

B-AMN

wrapper code

Page 25: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

25

Contextualization of Ontology

mapping of local ontological context to that of the mediator•by names and relationships•by natural language description•applying structural integration to concept specifications•introducing new concepts over existing ones

contextualization through structural correlation•establishing loose ontological relevance of specification elements applying analysis of intercontext concept relationships •establishing tight ontological relevance of specification elements introducing a subsumption relationship between concepts

Page 26: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

26

Correlation of Ontological Concepts

evaluation of descriptor weights

establishing intercontext relationships between concepts

t

VkYk

t

VkXk

t

VVkYkXk

YX

YX

WW

WW

YXsim22

,

XVi ii

kk

Xk

nN

f

nN

f

W2

log

log

t

VkXk

t

VVkYkXk

X

YX

W

WWX,Yr

2

,min

t

VkYk

t

VVkYkXk

Y

YX

W

WWY,Xr

2

,min

Page 27: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

27

Ontological Metainformation

Class ADT

-code: string

Category

-definition: string-wordClass: string

Concept

type

1*

-weight: float-name: string

Descriptor

descriptors

descriptorOf

*

1

-strength: float=1

ConceptRel

-weight: float-frequency: float-name: string

ConceptWeight

fromRelationtoConcept

*1

toRelation

fromConcept *1

weights

weightOf

*

1foreign*

*

collection1

*concept 1

*

*

1 category

PositiveRel

NarrowRel

PartRel

RelativeRel

Page 28: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

28

Identification of Ontologically Relevant Elements

ArtAge<<concept>>

Period<<concept>>

FineArt.period<<attribute>>

Creator.culture_race<<attribute>>

positive(0.64)

Page 29: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

29

Process of an Information Resource Registration For each resource class the following steps (of the compositional development

process) are required:

1. relevant mediator classes identification• Find mediator classes that ontologically can be used for defining resource

class in terms of mediator classes. To a resource class several mediator classes may correspond covering with their instance types different reducts of an instance type of the resource class.

2. most common reducts constructionFor an instance type of each identified mediator class do:

• Construct most common reducts for instance type of this mediator class and resource class instance type to concretize (partially) such mediator instance type.

• In this process for each attribute type of the common reduct a concretizing type, concretizing function or their combination should be constructed (this step should be recursively applied).

Page 30: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

30

Process of an Information Resource Registration

3. partial resource view construction• For each relevant mediator class construct a partial resource view

expressing a constraints in terms of the mediator class that should be satisfied by values of respective most common reducts of resource class instances.

4. partial views composition• Construct compositions of the resource type most common reducts

obtained for instance types of all mediator classes involved.• Construct a resource view as a composition of partial views obtained

above. This is an expression of a materialized view of an information resource in terms of mediator classes. An instance type of this view is determined by the most common reducts composition constructed above.

Page 31: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

Query rewriting for information integration

Outline :

Canonical model query language Query containment Query rewriting algorithms View definition and inverse rules construction Query rewriting algorithm schema and an example of rewriting Mediator architecture

Page 32: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

32

Canonical model query language SYNTHESIS Conjunctive Query (SCQ) is a query of the form

q(v/Tv):- C1(v1/Tv1), … , Cn(vn/Tvn), F1(X1,Y1), … , Fm(Xm,Ym), B

where q(v/Tv), C1(v1/Tv1), … , Cn(vn/Tvn) are collection (class) atoms, F1(X1,Y1), … , Fm(Xm,Ym) are functional atoms, B, called constraint, is a conjunction of predicates over the variables v, v1, … , vn, typed by Tv, Tv1, … , Tvn , or output variables Y1Y2 … Ym of functional atoms. Each atom Ci(vi/Tvi) or Fj(Xj,Yj) (i = 1, … , n; j = 1, … ,m) is called a subgoal. The value v structured according to Tv is called the output value of the query. A union query is a finite union of SCQs.

Formal semantics of SCQ have been defined as a composition of Cartesian product of sets and classes, functional predicates execution, selection of instances satisfying B, typing of joins of product domains by join operation of the specifications of the respective argument types

Semantics of disjunctions Ci(vi/Tvi) Cj(vj/Tvj) requires that for Tvi and Tvj a resulting type of disjunction is defined by type operation meet

Page 33: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

33

Query Containment Algorithms

A query Q1 is said to be contained in a query Q2 , if for all databases D, the set of tuples computed for Q1 is a subset of those computed for Q2

Query containment has been studied for the purposes of query optimization, detecting independence of queries from database updates, rewriting queries using views, maintenance of integrity constraints, semantic data caching, etc.   Containment mapping for conjunctive queries [1977] Uniform containment of Datalog programs [1988] Containment of Conjunctive queries with built-in subgoals [1989] Containment of conjunctive queries in Datalog program [1989] Uniform containment for Datalog

programs [1996] Containment for Queries with Complex Objects [1997] Boolean query containment [1997] Containment for Conjunctive Queries With Regular Expressions [1998] Query containment relative to views [1999]

Page 34: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

34

Query rewriting algorithms for data integration

U-join algorithm, Bucket algorithm: conjunctive queries using conjunctive views [1996]

Inverse-rule algorithm: Datalog programs using Datalog views [1998]

MiniCon algorithm, Shared-Variable-Bucket: improved versions of the Bucket algorithm [2000, 2001]

Algorithms for finding contained rewritings in the presence of functional dependencies [1998-2002]

Rewriting Unions of Relational Conjunctive Queries [John Wang Thesis,

Griffith University, Brisbane, 2002]

Resolution-based rewriting [2002]

Page 35: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

35

View definition and inverse rules constructionDuring the registration a local resource class is described as a view over virtual classes of the mediator having the following general form of SCQ.

V(h/Th) P1(b1/Tb1), … , Pk(bk/Tbk), F1(X1,Y1), … , Fr(Xr,Yr), BA reduct of the view body instance type is to be refined by the concretizing type

Th designed above the resource.

To produce inverse rules out of the view definitions replace in the view each not

contained in Th attribute from Tb1, … ,Tbk with a distinct Skolem function of h/Th producing output value of the type of the respective attribute. Such Skolemizing mapping of the view is denoted as . After the Skolemizing mapping, inverse rules for the mediator classes in the view body are produced as

(Pi(bi/Tbi) V(h/Th)) (for i = 1, … , k)

For the mediator functions being type methods the inverse rules look like

(Tm.Fbj(Xbj,Ybj) Ts.Fhl(Xhl,Yhl)), for j = 1, … ,r, here Fbj and Fhl are

methods of Tm and Ts such that type of function of Fhl refines type of function of

Fbj.

(B) is the inferred constraint of the view predicate V(h/Th).

Page 36: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

36

Example: View Def and Inverse Rules

Views definition for Uffizi site: canvas(p/Canvas[title, name, culture, place_of_origin, r_name])

painting(p/Painting[title, name: created_by.name, place_of_origin, date_of_origin, r_name: in_collection.in_repository.name]),

creator(c/Creator[name, culture]), r_name = 'Uffizi', date_of_origin >= 1550, date_of_origin < 1700

artist (a/Artist[name, general_Info, works]) creator(a/Creator[name, general_Info, works/{set-of:Painting}]) amount(h/Entity[title, name: created_by.name], v/real)

value(h/Entity[title, name: created_by.name], v/real)

Inverse rules for the first view def above painting(p/Painting[title, name: created_by.name, place_of_origin,

#1date_of_origin, r_name: in_collection.in_repository.name]) canvas(p/ Canvas[title, name, culture, place_of_origin, r_name])

creator(c/ Creator[name, culture]) canvas(p/ Canvas[title, name, culture, place_of_origin, r_name])

#1date_of_origin is a Skolem function

Page 37: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

37

Query rewriting algorithm schema (1)

1. For each SCQ (denoted as Q) q(v/Tv):- C1(v1/Tv1), … , Cn(vn/Tvn), F1(X1,Y1), … , Fm(Xm,Ym), B in Qu generate a set of candidate formulae

valuable_Italian_heritage_entities(h/Heritage_Entity_Valued[title, c_name, r_name, v]) :-

heritage_entity(h/Heritage_Entity[title, c_name: created_by.name, place_of_origin, date_of_origin, r_name: in_collection.in_repository.name]), value(h/ Heritage_Entity [title, name: c_name], v/real),

v >= 200000, date_of_origin >= 1500, date_of_origin < 1750, place_of_origin = ‘Italy’

2. For each subgoal Ci(vi/Tvi) or Fj(Xj,Yj) of Q find inverse rule r I unifying with the subgoal its head Pl(bl/Tbl) or Fbo(Xbo,Ybo) (unification is based on subtyping and refinement relations*)

painting(p/Painting[title, name: created_by.name, place_of_origin, #1date_of_origin, r_name: in_collection.in_repository.name]) canvas(p/ Canvas[title, name, culture, place_of_origin, r_name])

* Painting is a subtype of Heritage_Entity+

Page 38: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

38

Query rewriting algorithm schema (2)

3. A destination of Q is a sequence D of atoms P1(b1/Tb1), … , Pn (bn/Tbn), Fb1(Xb1,Yb1), … , Fbm (Xbm,Ybm) obtained as a result of the query subgoals unification with the heads of inverse rules from I. Several destinations can be produced as various combinations of SCQ subgoals unifications found.

painting(p/Painting[title, name: created_by.name, place_of_origin, #1date_of_origin, r_name: in_collection.in_repository.name]),

value(h/Painting[title, name: created_by.name], v/real)

4. To construct a candidate formula so that for each atom in D: establish a mapping i of attributes and variables in this atom and associated view to the attributes and variables of the respective atom of Q

mapping for the destination (only different name mappings are shown): 1 = { p → h, name: created_by.name → c_name: created_by.name,

#1date_of_origin → date_of_origin : #1date_of_origin } 2 = { name: created_by.name → name:c_name }

Page 39: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

39

Query rewriting algorithm schema (3)

5. For each destination and variable mappings defined, construct a formula : 1(P1(b1/Tb1)), … , n(Pn(bn/Tbn)), n+1(Fb1(Xb1,Yb1)), … , n+m(Fbm(Xbm,Ybm))

6. Construct the mapping of a constraint of Q to a constraint in .

7. Replace heads of the inverse rules in the obtained SCQ with the associated inverse rules bodies to get the formula 2

q(v/Tv):- 1(V1(h1/Th1)), … , n(Vn(hn/Thn)), n+1(Fh1(Xh1,Yh1)), … , n+m(Fhm(Xhm,Yhm)), (B), E

Applying 1, 2, we get the a candidate formula: valuable_Italian_heritage_entities(h/Heritage_Entity_Valued[title, c_name,

r_name, v]) :- canvas(h/Canvas[title, c_name: name, culture, place_of_origin, date_of_origin: #1date_of_origin, r_name]), amount(h/Painting[title, name: c_name], v/real), v >= 200000, date_of_origin >= 1500, date_of_origin < 1750, place_of_origin = ‘Italy’

Page 40: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

40

Query rewriting algorithm schema (4)8. If the constraint (B) E and the inferred constraints of the view atoms in the

candidate formula are consistent and there are no Skolem functions in the candidate of Q then the formula is a rewriting

9. Skolem functions elimination: if the inferred constraints of the view atoms imply the constraints in the candidate formula, then we can remove those constraints directly

The inferred constraint for canvas(h/Canvas[title, c_name: name, culture, place_of_origin, date_of_origin: #1date_of_origin, r_name]) that looks as r_name = 'Uffizi', #1date_of_origin >= 1550, #1date_of_origin < 1700 impliesdate_of_origin >= 1500, date_of_origin < 1750.

Due to that Skolem functions can be eliminated from this candidate formula and after the consistency check we get the rewriting:

valuable_Italian_heritage_entities(h/Heritage_Entity_Valued[title, c_name,

r_name, v]) :- canvas(h/ Canvas[title, c_name: name, culture, place_of_origin, r_name]), amount(h/Painting[title, name: c_name], v/real), v >= 200000, place_of_origin = ‘Italy’

Page 41: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

41

Query rewriting algorithm schema (5)

10. Containment property of the candidate formulae: if we expand each view atom with the corresponding Skolemized view body and treat the Skolem functions as variables, then we will get a safe SCQ which is contained in Q (in particular, an instance type of a subgoal of 2 is a refinement of the instance type of the respective subgoal of Q )

Page 42: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

42

Query Rewriting Example (1)Similarly, for Louvre the heritage_entity subgoal of a query unifies with painting,

sculpture as heritage_entity subclasses. Only rewriting formed for painting is shown here.

valuable_Italian_heritage_entities(h/Heritage_Entity_Valued[title, c_name, r_name, v]) :- workP(h/Work[title, c_name: author, place_of_origin, date_of_origin, r_name: in_rep]), amount(h/Work[title, name: c_name], v/real), v >= 200000, date_of_origin >= 1500, date_of_origin < 1750, place_of_origin = ‘Italy’

Finally, for Hermitage Museum heritage_entity subgoal of a query also unifies with painting, sculpture as heritage_entity subclasses. Only rewriting formed for painting is shown here.

valuable_Italian_heritage_entities(h/Heritage_Entity_Valued[title, c_name, r_name, v]) :- museum_objectP(h/Museum_Object[title, c_name:name, place_of_origin, date_of_origin, r_name]),pcost(h/Museum_Object [title, name], v/real), v >= 200000, date_of_origin >= 1500, date_of_origin < 1750, place_of_origin = ‘Italy’

Page 43: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

43

Mediator

Query Rewriting Example (2)

Find data on valuable Italian heritage entities produced between 1500 and 1750 year

heritage_entity(h/Heritage_Entity[title, c_name: created_by.name, place_of_origin, date_of_origin, r_name: in_collection.in_repository.name]), value(h/ Heritage_Entity [title, name: c_name], v/real), v >= 200000, date_of_origin >= 1500, date_of_origin < 1750, place_of_origin = ‘Italy’

QueryRewriter

ThesaurusThesaurus extension may add ‘Italia’

museum_objectP(h/Museum_Object[title, c_name:name, place_of_origin, date_of_origin, r_name]), pcost(h/Museum_Object [title, name], v/real), v >= 200000, date_of_origin >= 1500, date_of_origin < 1750, place_of_origin = ‘Italy’

workP(h/Work[title, c_name: author, place_of_origin, date_of_origin, r_name: in_rep]), amount(h/Work[title, name: c_name], v/real), v >= 200000, date_of_origin >= 1500, date_of_origin < 1750, place_of_origin = ‘Italy’

canvas(h/ Canvas[title, c_name: name, culture, place_of_origin, r_name]), amount(h/Painting[title, name: c_name], v/real), v >= 200000, place_of_origin = ‘Italy’

HermitageLouvre

Uffizi

User

Page 44: ODB Course. Mediation environment for object DB integration Leonid A. Kalinichenko Institute of Informatics Problems, Russian Academy of Sciences leonidk@synth.ipi.ac.ru

44

Mediator Architecture

Portal

Web Browser

Application Server

Web Page

Web Page

Servlets/JSP

EJB /WS

Application Client

Mediator

Oracle 10gMetainformation

Repository

DataRepository

RegistrationClient

Rewriter

Planner

Supervisor

Synth2Oracle

SOAPWrapper

ADQL2SYFSMetadataAccess

Collection Collection

Collection Adapter

Collection Adapter

4 4

4

5 9

Collection

Tool Adapter

4

Software Tools

5

1 2 1 2

6

7

6

3

3

3

3

3