mediators, wrappers, etc. based on tsimmis project at stanford. concepts used in several other...
TRANSCRIPT
![Page 1: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/1.jpg)
Mediators, Wrappers, etc.
Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous data sources, incl., flat files, spreadsheets, … . Key idea: write “wrappers” for data sources that export a relation-like (or something as high level) views. BUT, remember: sources != DBs. Exported Views sets of heterogeneous “lightweight objects”.
![Page 2: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/2.jpg)
II architecture.
mediator
sourcesource
mediator
query
query•No predefined hierarchy. •A med talks to sources via translators and other med’s.
![Page 3: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/3.jpg)
What data model is appropriate?
Remember role played by data model now: In db design, you model appln. data first,
develop schema, create tables and populate ‘em.
Here, you are trying to abstract existing data and/or applns. using wrappers and would like to leverage the abstraction for querying (i.e., II) via mediators.
So, you don’t get to preach here!
Model as expressive as possible Yet as flexible as possible Handle missing, repeated (nested), and heterogeneous data Support meta-data
![Page 4: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/4.jpg)
What are the architectural requirements?
Facilitate easy joining of new mediators and “registration” of new sourcesNeed for Mediator generator and wrapper generator
![Page 5: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/5.jpg)
What sort of query model/language is appropriate?
Must understand and be in sync with the expressive but permissive data model we sketched at. TSIMMIS uses LOREL. But we will keep our discussion more general. In principle, can use SchemaSQL, XQuery, etc.
![Page 6: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/6.jpg)
More on data model
Lightweight object model (OEM): an OEM object =
OID: <label, type, value>. Self-descriptive (i.e., schema along with data, and for every data item!). Value – atomic or set-valued.
![Page 7: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/7.jpg)
An example OEM database guide
restorestoresto
o1
o2 o3 o4c n a near
gourmet Three amigos
s
cz
1650 stecatherine
montreal
H3G 1M7.
address
westmont
•Not every resto may have address of same type.
•Indeed, some may have no address!
![Page 8: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/8.jpg)
TSIMMIS Query model
Each mediator describes its concepts (whatever it can garner from the sources it talks to) using some logical rules. TSIMMIS uses MSL, but we will see that SchemaLog can express it easily.
![Page 9: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/9.jpg)
Information Manifold Approach
Two models: (Local as View (LAV)): World view = global predicates (like
base relations but does not exist) Each source = a description of what
info. it can contribute for the global predicate = view over global predicate (derived relations)
Query global predicate Answer using views (which are the
only ones that hold the data!)
![Page 10: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/10.jpg)
IM approach
Alternative model: global predicates exported by sources as a view of the data they actually store Global as View (GAV) Query global predicates Answer by expanding query using
view defs.
IM follows LAV
![Page 11: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/11.jpg)
LAV example
Global predicates: emp(E), phone(E,P), office(E,O), mgr(E,M), dept(E,D) (remember they DON’T exist!) source1(E,P,M) emp(E), phone(E,P), mgr(E,M). source2(E,O,D) emp(E), office(E,O), dept(E,D). source3(E,P) emp(E), phone(E,P), dept(E,`toy’). Points to remember:
Views are descriptive, not prescriptive. Completeness not guaranteed. Consistency across sources not guaranteed.
Example query: q1(O,P) phone(mary,P), office(mary,O).
![Page 12: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/12.jpg)
Query answering How can we answer such a query?
Must get all relevant info. from views. I.e., rewrite query using ONLY source/view predicates. More than one possible way. Want ALL possible rewrites (to ensure (near)
completeness).
Rewritten q1: r1q1(O,P) s1(E,P,M), s2(E,O,D). r2q1(O,P) s3(E,P), s2(E,O,D). There are other rewrites too (e.g., join all three
sources), but they are contained in one of the above. So, above rewrites are all “minimal” answers.
Compare expanded r1q1 and r2q1 with q1 (w.r.t. containment). What can you say?
![Page 13: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/13.jpg)
How do we get minimal rewrites?
q – original query given (CQ over global predicates). r – a candidate rewrite. It’s valid provided r’s expansion (by expanding source def.’s), say E(r) is contained in q. A rewrite r is minimal if E(r) is NOT contained in E(r’) for any other rewrite. What does minimality really mean?: Example: s1(X,Y) a(X,Y). s2(X,Y) a(X,Y). query: q(X,Y) <- a(X,Y).
r1q(X,Y) s1(X,Y) as well as r2q(X,Y) s2(X,Y) are needed to answer it. Why? (s1 and s2 do NOT necessarily provide the same set of tuples. Rules are descriptions NOT prescriptions!) How many rewrites should we try?
![Page 14: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/14.jpg)
Levy-etal. Theorem Thm.: if a rewrite r of query q has more subgoals than q, then s can’t be minimal.
Proof: assume r is valid (or it’s useless). So E(r) is contained in q. let h be the c.m. if r has
more subgoals than q, there must be a subgoal p in r, s.t. h doesn’t map any subgoal of q to any subgoal in E(p).
Then get rid of all such subgoals modified rewrite r’. r’ contains r (trivially). But r’ is contained in q (just use the original c.m. h). \qed Given a q, only consider those sources whose body contains >= 1 global predicate appearing in q. Still exponential # choices, but not too terrible in practice.
![Page 15: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/15.jpg)
Example revisited & expanded.
Suppose source 1 instead exported s1(E,P) and source 2 s2(E,O). Is q1 answerable using the views? What about q2(E) emp(E), mgr(E, `john’). What about q3(E1, E2) phone(E1,P), phone(E2,P). what about q4(E,M) emp(E), dept(E, “toy”), mgr(E,M).
![Page 16: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/16.jpg)
QAV (AQUV) – general story
Why is QAV worthwhile problem? Speed up query processing. Materialized views.
can I answer this query using stored view(s)? Information integration.
Sources store some data, and *describe* (usu. using rules) how local data relates to the global schema (i.e., what are the contributions?)
Can I answer this query using available source data (i.e., views)?
How best can I answer?
![Page 17: Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous](https://reader035.vdocuments.site/reader035/viewer/2022072016/56649ee15503460f94bf2244/html5/thumbnails/17.jpg)
QAV – two models
Classic query optimization context: Equivalent rewriting. Used extensively in data warehousing/OLAP.
Information integration: Maximally contained (also called
minimal, maximally sound) rewriting.
Excellent survey: Alon Y. Halevy. Answering queries using views: a survey. VLDB Jl. 2001.