cse 636 data integration schemasql implementation

13
CSE 636 Data Integration SchemaSQL Implementation

Post on 19-Dec-2015

228 views

Category:

Documents


5 download

TRANSCRIPT

CSE 636Data Integration

SchemaSQL Implementation

2

Architecture

DBMSnDBMS1

ResidentSQL Engine

SchemaSQLServer

Federation User

SchemaSQL

QueryFinal

Answer

Answers toqueries Q1…Qn

collected

FinalAnswer

Optimized localquery Q1

Optimized local query Qn

…answer(Q1) answer(Qn)

Final Seriesof SQL Queries

3

SchemaSQL Server

• Maintains a Federation System Table (FST)– FST(db-name, rel-name, attr-name)– Names of databases, relations and attributes in the

federation

• Compiles the instantiations of the variables in the query

• Enforces conditions, groupings, aggregations and mergings

4

Query Processing

Phase 1• Corresponding to a set of variable declarations

in the FROM clause, create VITs using one or more SQL queries against some local databases and/or the FST– VIT: Variable Instantiation Table whose schema

consists of all the variables in one or more variable declarations in the FROM clause

Phase 2• Rewrite the original SchemaSQL query against

the federation into an “equivalent” query against the set of VIT relations and compute it using the resident SQL server

Fixed Output Schema

5

Example

SELECT RelC, C.salFloor FROM univ-C RelC,

univ-C::RelC C,univ-D::salInfo D

WHERE RelC = D.dept ANDC.salFloor > D.technician ANDC.category = ‘technician’

univ-C: cs math univ-D: salInfo

category salFloor

Prof 74K

Assoc Prof 62K

… …

category salFloor

Prof 67K

Assoc Prof 56K

… …

dept Prof Assoc Prof Asst Prof …

cs 72K 65K 78K …

math 65K 54K 69K …

… … … … …

6

Example – Phase 1

• VITRelC(RelC):

SELECT rel-name AS RelC FROM FST WHERE db-name = ‘univ-C’

7

Example – Phase 1

• VITC(RelC, CsalFloor):

1. SELECT RelC FROM VITRelC

2. If {r1, …, rn} is the answer in step 1, then VITC is computed by the following SQL query to univ-C SELECT ‘r1’ AS RelC, salFloor AS CsalFloor FROM r1

WHERE category = ‘technician’ UNION … UNION SELECT ‘rn’ AS RelC, salFloor AS CsalFloor FROM rn

WHERE category = ‘technician’

8

Example – Phase 1

• VITD(Ddept, Dtechnician):

SELECT dept AS Ddept, technician AS Dtechnician FROM salInfo

9

Example – Phase 1

VITRelC VITC VITD

RelC

cs

math

Ddept Dtechnician

cs 72K

math 65K

… …

RelC CsalFloor

cs 42K

math 46K

… …

10

Example – Phase 2

Joined Variable Instantiation Table (JVIT) is the (natural) join of the VITs generated during Phase 1

1. CREATE VIEW JVIT(RelC, CsalFloor, Ddept, Dtechnician) AS SELECT VITRelC.RelC, VITC.CsalFloor,

VITD.Ddept, VITD.Dtechnician FROM VITRelC, VITC, VITD

WHERE VITRelC.RelC = VITD.Ddept ANDVITRelC.CsalFloor > VITD.Dtechnician ANDVITRelC.RelC = VITC.RelC

2. SELECT RelC, CsalFloorFROM JVIT

11

Example – Phase 2 (Aggregation)

Q: Find the average salary floor across all departments for each employee category in database univ-B

SELECT T.category, avg(T.D)FROM univ-B::salInfo D,

univ-B::salInfo TWHERE D <> ‘category’GROUP BY T.category

univ-B: salInfo

category cs math ece …

Prof 72K 65K 78K …

Assoc Prof 65K 54K 69K …

… … … … …

12

Example – Phase 2 (Aggregation)

Q: Find the average salary floor across all departments for each employee category in database univ-B

SELECT T.category, avg(T.D)FROM univ-B::salInfo D,

univ-B::salInfo TWHERE D <> ‘category’GROUP BY T.category

Aggregation After Phase 2SELECT Tcategory, avg(TD)FROM JVITGROUP BY Tcategory

13

References

1. L. V. S. Lakshmanan, F. Sadri, I. N. Subramanian:SchemaSQL – A Language for Interoperability in Relational Multi-database SystemsVLDB, 1996

2. L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian:SchemaSQL – An Extension to SQL for Multidatabase InteroperabilityTODS, 2001