containment cse 590 db rachel pottinger. outline zintroduction zmotivation zformal definition...
Post on 22-Dec-2015
215 views
TRANSCRIPT
Containment
CSE 590 DBRachel Pottinger
Outline
IntroductionMotivationFormal definitionAlgorithms for different complexitiesAn application: rewriting queries
using views
Containment, what is it?
For two queries, Q1 and Q2, if all of the answers to Q1 are a subset of those for Q2 for all databases, then Q1 is contained in Q2.
Denoted as Q1 Q2.For general datalog, this is undecidable
(by reduction from decision problems for context free languages)
Why should I care?
Containment is useful in a number of situations, including: Query minimization Independence of queries using updates Rewriting queries using views Interesting logic problem
More definitions
Equivalence of queries: Q1Q2 if they return the same answers for all databases. This is the same as Q1 Q2 and Q2 Q1
Conjunctive query - a query that is formed only of conjunctions of predicates.
Q(X,Y):- e(X,Z),e(Z,Y)
Containment Mapping
Let Q1 and Q2 be two conjunctive queries Q1: I :- J1, …, Jl Q2: H :- G1, …, Gk
A symbol mapping h is said to be a containment mapping if h turns Q2 into Q1; that is, h(H)= I, and for each i = 1,2,…,k, there is some j such that h(Gi)=Jj. There is no requirement that each Jj be the target of some Gi
Proof Sketch
If there’s a containment mapping from Q2 to Q1, then Q1 Q2
Suppose maps Vars(Q2)Vars(Q1) Let D be a database and be an answer is a mapping from Vars(Q1) D
• Vars(Q2) D
The rest of the proof follows later
Example of homomorphism rules
Q1: fp(X,Y) :- e(Y,X), e(X,Z)
Q2: fp(A,B) :- e(B,A), e(C,A),e(A,D)
For Q1 Q2, map from Q2 to Q1
Test for containment of a conjunctive query (Q1 Q2)
Freeze the body of Q1, and put this into a canonical database
Apply Q2 to the canonical database
If Q1 can be derived from Q2 on the canonical database, then Q1 Q2, otherwise not
A chilling example
Q1: p(X,Z) :- a(X,Y), a(Y,Z)
Q2: p(X,Z) :- a(X,U), a(V,Z)
Canonical Database of Q1
Proof continued
If Q1 Q2,then there is a containment mapping Since Q1 Q2, we know that if we apply
Q2 to the canonical database formed from Q1, we’ll get back the same fact we got from applying it to Q1, which makes a mapping from Q2 to Q1.
Conjunctive queries with negation
Negation in the heads of the subgoals, ie: Q(X,Y):- e(X,Z),e(Z,Y)
The Levy and Sagiv test looks at an exponential number of canonical databases, thus is P
2 complete
Consider all partitions of Q1; form canonical databases for all of them, D1, … Dk
For each database Di, see if the database makes all subgoals of Q1 true.
For all Di’s passing step 2, see if it the head of Q1 can be derived by applying Q2
If so, then Q1 Q2, else not
A negative example
Q1: p(X,Z):-a(X,Y), a(Y,Z), a(X,Z)
Q2: p(A,C):-a(A,B),a(B,C), a(A,D)
Conjunctive Queries with Arithmetic Comparisons
Q(X,Y):-e(X,Z),e(Z,Y), Z < YTreat the same as the negated
subgoals, only a check must be made for each ordering of each partition
Also P2 complete for dense domain
such as reals
Example with arithmetic comparisons
Q1:p(X,Z):-a(X,Y), a(Y,Z), X < YQ2:p(A,C):-A(A,B),A(B,C), A < Cfalse, see x = z = 0, y = 1
Other complexity results
queries restricted to queries Q1 and Q2 such that all database predicates have arity at most 2 and every database predicate occurs at most three times in the body of Q1 - P
2
Conjunctive queries where Q1 is fixed- NP complete Conjunctive queries where Q2 is fixed - polynomial Conjunctive query containment where Q2 is an
acyclic query - polynomial time Conjunctive queries where every database
predicate occurs at most twice in the body of Q1 - linear time
Rewriting Queries Using Views
Useful in query optimizationGood for query minimizationNeeded to make the best use of
cached informationNecessary in data integration
Views
A view is a relation that is not part of the conceptual model, but is visible to the user.
Useful for common expressions, or protecting data
Example: If you had faculty(name, office, ssn) you may want students to access faculty_office(name, office)
Views (con’t.)
Views can be either materialized or virtual
In data integration, data sources can be thought of as views
An example of rewriting queries using views
Suppose you had two databases: One has famous people and whether
they are right or left handed One has the birthdays of famous people
You want the birthdays of all of the lefties
Containment in rewriting
Query of q(X):-e(X,Y), e(Y,X)View of v(A,B):- e(A,C),e(C,B)
Q(x,u):-p(x,y),p0(y,z),p1(x,w),p2(w,u)
V1(a,b):-p(a,c),p0(c,b),p1(a,d)
V2(a,b):-p1(a,b)
V3(a,b):-p2(a,b)
A more complicated example