optimization of nested queries sujatha thanigaimani cosc 6421
TRANSCRIPT
Optimization of Nested Queries
Sujatha Thanigaimani
COSC 6421
Outline
• Introduction
• Kim’s Algorithm for efficient processing
• Count bug – Solution
• inequality bug – Solution
• Alternate Algorithm
• Modification of Kim’s algorithm
Nested Queries • Queries containing other queries
• Inner query:– Can appear in FROM or WHERE clause
“outer query” “inner query”
Example: SELECT cname FROM borrower WHERE cname IN (SELECT cname FROM depositor)
think this as a functionthat returns the result of the inner query
Evaluation of Nested Queries
Naive method :
Tuple Iteration Semantics (TIS) - inefficient.
Kim’s Algorithm Rationale :
Interesting and powerful feature of SQL. Unnesting :
Process of transforming nested queries into canonical form.
Classified the Nested Queries for better understanding and processing
Types :
SUPPLIER(sno, sname, sloc, sbudget),PARTS(pno,pname,qoh,color),PROJECT(jno,jname,pno,jbudget,jloc)SHIPMENT(sno,pno,jno,qty,shipdate)
Type-A Nesting:
Not correlated, aggregated sub query
Example :
SELECT SNO FROM SP WHERE PNo= (SELECT MAX(PN0) FROM P)
can be evaluated independently of the outer query block, and the result of its evaluation will be a single constant
Type-N Nesting :
Non correlated, not aggregated subquery
SELECT SNO FROM SP
WHERE PNO IS lN
(SELECT PNO FROM P
WHERE WEIGHT> 50)
Evaluation : inner query block Q is processed, resulting in
a list of values X which can then be substituted for the inner
query block so that PNO IS IN Q becomes PNO IS IN
X.The resulting query is then evaluated by nested iteration
Type-J Nesting :
Correlated, not aggregated subquery
SELECT SNAME FROM S WHERE SNO IS IN (SELECT SNOFROM SP WHERE QTY> 100 AND SPORIGIN = S. CITY).
Type-JA Nesting :
Correlated, aggregated subquery
SELECT PNAM FROM P WHERE PNO= (SELECT MAX(PN0) FROM SP WHERE SPORlGlN = P.CITY)
Evaluation : In TIS, the inner query block is processed once foreach tuple of the outer relation which satisfies all simple predicates onthe outer relation ----- inefficient
Kim developed alternate algorithms for efficient processing ofnested queries.
Algorithm NEST-N-J (for type-N or type-J)
1. Combine the FROM clauses of all query blocks into one FROM
clause
2. AND together the WHERE clauses of all query blocks,
replacing IS IN by =
3. Retain the SELECT clause of the outermost query block
The result is a canonical query logically equivalent to the
original nested query.
SELECT RiCk SELECT RiCk
FROM Ri FROM Ri,Rj
WHERE RiCh IS IN WHERE RiCh = RjCm
(SELECT RjCm FROM Rj)
Algorithm NEST-JA
1. Generate a temporary relation Rt(C1,Cn,Cn+l) from R2 such that Rt Cn+l is the result of applying the aggregate function AGG on the Cn+l column of R2 which have matching values of RI for Cl,C2, etc
SELECT R1.Cn+2 Rt(C1,..,Cn,Cn+1)=(SELECT
FROM R1 C1,Cn,AGG(Cn+1)
WHERE R1.Cn+1 = FROM R2
(SELECT AGG(R2.Cn+1) GROUP BY C1,..,Cn)
FROM R2
WHERE R2.C1 = R1.C1 AND
R2.C1 = R1.C1 AND
…
R2.Cn = R1.C1);
2. Transform the inner query block of the initial query bychanging all references to R2 columns Join predicateswhich also reference Rl to the corresponding Rt columns. The result isa type-J nested query, which can be passed to algorithm NEST-N-J fortransformation to its canonical equivalent.
SELECT R1.Cn+2FROM R1 WHERE R1.Cn+1 = (SELECT Rt.Cn+1FROM RtWHERE Rt.C1 = R1.C1 AND
Rt.C2 = R1.C2 AND
Rt.Cn = R1.C1);
Count bug :
PARTS (PNUM,QOH)
SUPPLY (PNUM,QUAN,SHIPDATE)
SELECT PNUM FROM PARTS WHERE QOH =
(SELECT COUNT( SHlPDATE ) FROM SUPPLY
WHERE SUPPLY. PNUM = PARTS.PNUM AND SHIPDATE < l – l - 80)
Parts
PNUM QOH
3 6
10 1
8 0
PNUM QUAN SHIPDATE
3 4 7-3-79
3 2 10-1-78
10 1 6-8-78
10 2 8-10-81
8 5 5-7-83
Supply
PNUM
10
8
Result by TIS Result
PNUM
10
Solution using Outer Join
R X
A
B
S Y
B
C
E
R=+S X Y
A null
B B
null C
null E
Solution with outer joinstemp (SUPPNUM,CT) =
(select parts.PNUM, count(SHIPDATE)from parts, supplywhere SHIPDATE < 1-1-80 and
parts.PNUM =+ supply.PNUMgroup by parts.PNUM)
parts.PNUM =+ supply.PNUM (for SHIPDATE < 1-1-80)
Parts.PNUM Parts.QOH Supply.PNUM Supply.QUON Supply.SHIPDATE
3 6 3 4 7-3-79
3 6 3 2 10-1-78
10 1 10 1 6-8-78
8 0 null null null
TEMP
SUPPNUM CT
3 2
10 1
8 0Final Result
PNUM
10
8
Drawbacks :
1. If the sub query has COUNT(*), this will always return a result > 0
because of the outer join. The '*' must be changed to a column name
from the inner relation.
SELECT PNUM
FROM PARTS,TEMP
WHERE PARTS.QOH = TEMP.CT AND PARTS.PNUM
= TEMP.SUPPNUM
2. Duplicates Problem :
Parts
PNUM QOH
3 2
3 6
10 1
10 0
8 0
Supply
PNUM QUAN SHIPDATE
3 4 7-3-79
3 2 10-1-78
10 1 6-8-78
Result by TIS Our Result
PNUM
3
10
8
PNUM
8
SUPPNUM CT
3 4
10 2
8 0
Solution:
1. Remove duplicates before the join in the creation of Temp table is performed.
TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS)
2. Use the projection instead of outer relation in any join required to
build the temp table
TEMP2(SUPPNUM,CT) =
(SELECT TEMP1.PNUM ,COUNT(SHIPDATE)
FROM TEMP1, SUPPLY
WHERE SUPPLY.SHIPDATE < 1-1-80
AND TEMP1.PNUM =+ SUPPLY.PNUM
GROUP BY TEMP1.PNUM)
SUPPNUM CT
3 2
10 1
8 0
PNUM
3
10
8
Another bug : Relations other than equality
SELECT PNUM FROM PARTS WHERE QOH =
(SELECT MAX(QUAN) FROM SUPPLY
WHERE SUPPLY. PNUM < PARTS.PNUM AND SHIPDATE < l – l - 80)
TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM SUPPLY WHERE SHIPDATE < l-l-80
GROUP BY PNUM
SELECT PNUM
FROM PARTS, TEMP
WHERE QOH = TEMP.MAXQUAN AND TEMP.SUPPNUM<PARTS.PNUM
Max is calculated for each S.pnum but required is Max should be taken for a set of S.Pnum which are lesser than given P.Pnum
Problem
Solution :
1. First join, then aggregate (Kim' was: First group, then join).
TEMP (SUPPNUM, MAXQUAN) = SELECT PNUM, MAX(QUAN) FROM
PARTS,SUPPLY WHERE SHIPDATE < l-l-80 AND
SUPPLY.PNUM < PARTS.PNUM
GROUP BY PNUM
SELECT PNUM
FROM PARTS,TEMP
WHERE PARTS.QOH = TEMP.MAXQUAN AND
PARTS.PNUM = TEMP.SUPPNUM
Modified Algorithm : Nest JA2
1. Project the Join column of the outer relation, and restrictit with any simple predicates applying to the outer relation
TEMPI(PNUM) = (SELECT DISTINCT PNUM FROM PARTS)
2. Create a temporary relation, Joining the inner relationwith the projection of the outer relation. If the aggregatefunction is COUNT, the Join must be an outer Join
TEMP2(PNUM)= (SELECT PNUM FROM SUPPLYWHERE SHIPDATE < l-1-80)
TEMP3 (PNUM,CT) =(SELECT TEMPl. PNUM, COUNT(TEMP2. SHIPDATE)FROM TEMPl,TEMP2WHERE TEMPl.PNUM=+TEMP2.PNUMGROUP BY TEMPl. PNUM)
3. Join the outer relation with the temporary relation, according to the transformed version of the original query
SELECT PNUM
FROM PARTS,TEMP3
WHERE PARTS.QOH = TEMP3.CT AND
PARTS.PNUM = TEMP3.PNUM
Processing a General Nested Query : Recursive Approach
procedure nest_g (query-block)
for each predicate in the WHERE clause of query-block
if predicate is a nested predicate (i.e contains inner query block)
nest_g (inner_query_block)
/* Determine type of nesting and call appropriate transformation
procedure*/
/* if nesting is type-JA */
nest-JA2(inner_query_block)
Nest_g contd
nest-N-J(query_block,inner_query_block)
Else
/* if nesting is type-A */
nest_a(inner_query_block)
Else
nest-N-J (query_block, inner_query_block)
Return
Advantage :
• Simplicity
Analysis
Modified Kim’s Algorithm :
R.B OP1 TEMP1.COUNT : R.B OP1 O
ITEMPI < I R OJ S I ,Hence better than alternate algorithm
References:
1.Optimisation of Nested SQL Queries Revisited - Richard A Ganski, Harry K T Wong
2.Improved Unnesting Algorithms for Join Aggregate SQL Queries – M.Muralikrishna
Thank You