cse 636 data integration limited source capabilities slides by hector garcia-molina

23
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

CSE 636Data Integration

Limited Source Capabilities

Slides by Hector Garcia-Molina

Page 2: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

2

Heterogeneous Databases

data

DBMS1

data

DBMS2

data

legacy

data

web site

Distributed Database System

Page 3: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

3

Limited Capabilities

Page 4: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

4

author:

title:

subject:

format:

price:

must specify at leastone of these

this attributenot returned

cannot query onthis attribute

menu ofchoices

Example: Amazon.com

Page 5: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

5

Example: BarnesAndNoble.com

must specify at leastone of these

can query if one ofother attributes

specified

Menu of choices

author:

title:

subject:

format:

price:

Page 6: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

6

Why Limited Capabilities?

• Search forms• Security• Indexes• Legacy

Page 7: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

7

Capability vs. Content

• Capability description– Can only search for subject = “art,” “history,”

“science”

• Content description– Source only contains subject = “art,” “history,”

“science”

Page 8: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

8

• Describing source capabilities• Extending source capabilities• How mediators cope with limited capabilities• Mediator capabilities• Other topics

Outline

Mediator

SourceSource

Wrapper Wrapper

Page 9: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

9

Describing Query Capabilities

R(X, Y, ... Z)

Adornments:• f: may or may not specify• u: cannot be specified• b: must be specified• c[S]: specified from list S• o[S]: optional, chose from S

Page 10: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

10

Describing Query Capabilities

R(X, Y, ... Z)

Adornments:• f: may or may not specify• u: cannot be specified• b: must be specified• c[S]: specified from list S• o[S]: optional, chose from S

With output restriction• f’• u’• b’• c’[S]• o’[S]

Page 11: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

11

Example

• Relation R(X, Y, Z)• Description Templates:

bu’f, uf’c[z1, z2]

• Answerable queries:R(x1, Y, Z), R(X, Y, z1)

• Unanswerable queries:R(X, y1, Z), R(X, Y, z3)

Page 12: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

12

Other Description Mechanisms

• Tsimmis– Query templates

• Information Manifold– capability records (# bound attrs, conditions ok,...)

• Disco• Garlic

– black box

• Context-free grammars

Page 13: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

13

Extending Source Capabilities

amazon

Wrapper

Query: author=“Freud” AND price > 10

Source: R(author, price, ...)Template: b, u, ...

Page 14: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

14

Extending Source Capabilities

Source: R(author, price, ...)Template: b, u, ...

Query: author=“Freud” AND price > 10

Source Query: author=“Freud”

Wrapper Filter: price > 10

amazon

Wrapper

Page 15: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

15

Another Example

Barnes&Noble

Wrapper

Query: (author = “Freud” OR author = “Jung”) AND price < 10

R(author, price, …)No disjunctive conditions;Price can only be specified with author

Page 16: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

16

Another Example

Query: (author = “Freud” OR author = “Jung”) AND price < 10

R(author, price, …)No disjunctive conditions;Price can only be specified with author

Q1: author = “Freud” AND price < 10Q2: author = “Jung” AND price < 10

Union Operation

Barnes&Noble

Wrapper

Page 17: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

17

Extending Source Capabilities

• General scheme:– try many query rewritings– check if query fragments supported by source– check if wrapper can combine answer fragments– do all this very efficiently!!

– H. Garcia-Molina, W. Labio, R. Yerneni: Capability-Sensitive Query Processing on Internet Sources,ICDE 1999

• Tsimmis, Info Manifold: no disjunctive queries• DISCO: no query splitting• Garlic: only CNF queries

Page 18: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

18

Mediator Processing

R(X, Y, Z) f, f, b

T(Z, W, U) f, u, b

M(X, Y, Z, W, U) = Join(R, T)

Query: M(5, Y, Z, W, 3)

Mediator

SourceSource

Wrapper Wrapper

Page 19: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

19

Plan 1

R(X, Y, Z) f, f, b

T(Z, W, U) f, u, b

M(X, Y, Z, W, U) = Join(R, T)

Query: M(5, Y, Z, W, 3)

Mediator

SourceSource

Wrapper Wrapper

(1) R(5, Y, Z)(2) T(Z, W, 3)

(3) Join answers

Page 20: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

20

Plan 2

R(X, Y, Z) f, f, b

T(Z, W, U) f, u, b

M(X, Y, Z, W, U) = Join(R, T)

Query: M(5, Y, Z, W, 3)

Mediator

SourceSource

Wrapper Wrapper

(3) Join answers

(1) P = T(Z, W, 3)

(2) for each (z,w,u) P: R(5, Y, u)

Page 21: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

21

Mediator Plan Generation

• Need feasible and efficient plan• Search space is huge• Tsimmis, Info Manifold, Garlic:

– exponential algorithms

• Polynomial algorithms:– often find optimal or near-optimal plan– bounded performance

– R. Yerneni, C. Li, J. D. Ullman, H. Garcia-Molina: Optimizing Large Join Queries in Mediation Systems, ICDT 1999

Page 22: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

22

Conclusion

• Not all sources are created equal!• Need to

– describe what sources can do– efficiently process queries with limited sources– describe what mediators can do– exploit content information– deal with unavailable sources

Page 23: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina

23

References

• Computing Capabilities of Mediators– Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey

D. Ullman– SIGMOD Conference 1999

• Describing and Using Query Capabilities of Heterogeneous Sources– Vasilis Vassalos, Yannis Papakonstantinou– VLDB 1997