robust optimization of database queries optimization of database queries jayant haritsa database...
TRANSCRIPT
Robust Optimization of Database Queries
Jayant HaritsaDatabase Systems LabDatabase Systems LabIndian Institute of Science
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 1
Database Management Systems (DBMS)(DBMS)
Effi i t d i t h i f Efficient and convenient mechanisms for storing, maintenance and querying of g, q y genterprise data
b ki i t i t l– banking, inventory, insurance, travel, …
Cornerstone of computer industry– Uses more than 80 percent of computers worldwide– Employs more than 70 percent of IT professionalsp y p p– Largest monetary sector of computer business
July 2011 2Robust Query Optimization (IASc Mid-year Meeting)
Current Database SystemsCurrent Database Systems
CommercialIBM DB2 / Oracle / Microsoft SQL Server /– IBM DB2 / Oracle / Microsoft SQL Server / Sybase ASE / CA Ingres
Public-Domain – PostgreSQL / MySQL
July 2011 3Robust Query Optimization (IASc Mid-year Meeting)
Relational Database Systems[RDBMS][RDBMS]
Based on first order logic Based on first-order logic Edgar Codd of IBM Research, Turing Award (1981)
“We believe in Codd not God” We believe in Codd, not God
Data is stored in a set of relations (i.e. tables) with attributes, relationships, constraints
R ll N | N | Add C N | Titl | C ditSTUDENT COURSE
Roll No | Name | Address Course No | Title | Credits
REGISTER
Roll No | Course No | Grade
REGISTER
July 2011 4Robust Query Optimization (IASc Mid-year Meeting)
QUERY INTERFACEQ
Structured Query Language (SQL)Structured Query Language (SQL)– Invented by IBM, 1970s
E l Li t f t d t d th i titl– Example: List names of students and their course titles
select STUDENT Name COURSE Titleselect STUDENT.Name, COURSE.Titlefrom STUDENT, COURSE, REGISTERwhere STUDENT.RollNo = REGISTER.RollNo and
REGISTER.CourseNo = COURSE.CourseNo
STUDENT COURSERoll No | Name | Address Course No | Title | Credits
STUDENT COURSE
Roll No | Course No | Grade
REGISTER
Roll No | Course No | Grade
July 2011 5Robust Query Optimization (IASc Mid-year Meeting)
Query Execution PlansQ y SQL is a declarative language SQL is a declarative language– Specifies only what is wanted, but not how the query
should be evaluated (i.e. ends, not means)( , )select STUDENT.Name, COURSE.Titlefrom STUDENT COURSE REGISTERfrom STUDENT, COURSE, REGISTERwhere STUDENT.RollNo = REGISTER.RollNo and
REGISTER.CourseNo = COURSE.CourseNo
Unspecified:join order [ ((S R) C) or ((R C) S) ? ]join techniques [ Nested-Loops or Sort-Merge or Hash ? ]
DBMS query optimizer identifies the optimal DBMS query optimizer identifies the optimal evaluation strategy: “query execution plan”
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 6
Sample Execution PlanpRETURN Total Execution CostRETURN
Cost: 286868
Total Execution Cost(estimated)
HASH JOINCost: 286868
MERGE JOINCost: 278751
INDEX SCANCost: 6745
SORTCost: 225103
TABLE SCANCost: 6834
TABLE SCANCost: 209760
Cost: 6745
STUDENT COURSE REGISTER
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 7
CONCEPT:PLAN DIAGRAMS
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 8
Relational Selectivities
Cost based Query Optimizer’s choice of Cost-based Query Optimizer s choice of
execution plan = f (query database system )execution plan = f (query, database, system, …)
For a given database and system setup For a given database and system setup,
execution plan chosen for a query =execution plan chosen for a query = f (selectivities of query’s base relations)
– selectivity is the estimated percentage of rows of a relation used in producing the query result
April 2011 Plan Diagrams Tutorial (ICDE 2011) 9
SQL Query Template [Q7 f TPC H B h k][Q7 of TPC-H Benchmark]
Determines the values of goods shipped between nations in a time period
select supp nation, cust nation, l year, sum(volume) as revenue
Determines the values of goods shipped between nations in a time period
supp_nation, cust_nation, l_year, sum(volume) as revenue from(select n1.n_name as supp_nation, n2.n_name as cust_nation,
extract(year from l shipdate) as l year,(y _ p ) _y ,l_extendedprice * (1 - l_discount) as volume
from supplier, lineitem, orders, customer, nation n1, nation n2where s_suppkey = l_suppkey and o_orderkey = l_orderkey andValue determines
l ti it fValue determines
fc_custkey = o_custkey and s_nationkey = n1.n_nationkeyand c_nationkey = n2.n_nationkey and
((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or ( 1 'GERMANY' d 2 'FRANCE')) d
selectivity of ORDERS relation
selectivity of CUSTOMER relation
(n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')) andl_shipdate between date '1995-01-01' and date '1996-12-31'and o totalprice ≤ C1 and c acctbal ≤ C2 ) as shipping
group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year
_ p _ ) pp g
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 10
Relational Selectivity Spacey p
ctiv
itySe
lec
Selectivity
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 11
Plan and Cost Diagramsg
A plan diagram is a pictorial enumeration of the plan choices of the query optimizer over p q y pthe relational selectivity space
A cost diagram is a visualization of the g(estimated) plan execution costs over the same relational selectivity space
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 12
PICASSOPicasso is a Java tool developed in our lab thatPicasso is a Java tool developed in our lab thatautomatically generates plan and cost diagrams
~50000 lines of code (2004-11) with ~100 classes 50000 lines of code (2004 11) with 100 classes Operational on DB2/Oracle/SQLServer/Sybase/PostgreSQL Copyrighted by IISc in May 2006py g y y Released as free software in Nov 2006 by Prof. N Balakrishnan,
Associate Director of IISc Release of v1.0 in May 2007, v2.0 in Feb 2009, v2.1 in Feb 2011 In use at academic and industrial labs worldwide
– CMU, Purdue, Duke, TU Munich, NU Singapore, IIT-B, …– IBM, Microsoft, Oracle, Sybase, HP, …
Received Best Software award in Received Best Software award in Very Large Data Base (VLDB) conference, 2010
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 13
The Picasso Connection
W ith itWoman with a guitarGeorges Braque, 1913
Plan diagrams are often similar tooften similar tocubist paintings !
[ Pablo Picasso f ffounder of cubist genre ]
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 14
Complex Plan Diagram [QT8 O tA*][QT8,OptA*]
Highly irregular plan boundariesplan boundaries # of plans: 76
The Picasso Connection
Extremely fine-grained coverage
(P76 ~ 0.01%)Intricate Complex Patterns
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 15
Cost Diagram [QT8 O t A*][QT8, Opt A*]
All t ithiAll costs are within 20 percent of the
maximum
MinCost: 8.26E5MaxCost: 1.05E6
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 16
PHENOMENA:PLAN DIAGRAM REDUCTION
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 17
Problem StatementCan the plan diagram be recolored with aCan the plan diagram be recolored with a smaller set of colors (i.e. some plans are “swallowed” by others), such thatswallowed by others), such that
Guarantee:No query point in the original diagram hasits estimated cost increased, post-swallowing,b th λ t ( d fi d)by more than λ percent (user-defined)
Analogy: Analogy: (with due apologies to Sri Lankans in the audience)Sri Lanka agrees to be annexed by India if it isg yassured that the cost of living of each Lankancitizen is not increased by more than λ percent
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 18
Reduced Plan Diagram [λ=10%][QT8 O tA* R 100]Complex Plan Diagram[QT8 O tA* R 100][QT8, OptA*, Res=100][QT8, OptA*, Res=100]
QTD: QT8_OptA*_100
Reduced to 5 plans from 76 !from 76 !
Comparatively smoother contours
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 19
Anorexic Reduction
E l l h Extensive empirical evaluation with a spectrum of multi-dimensional query templates indicates h that
“With a cost-increase-threshold of just 20%,virtually all complex plan diagramsu y mp p g m[irrespective of query templates, data distribution,query distribution, system configurations, etc.]q y , y g , ]
reduce to “anorexic levels” (~10 or less plans)!
February 2011 Picasso (IIITM Kerala) 20
APPLICATION:ROBUST PLANS
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 21
Selectivity Estimation Errorsy
( ) estimated location b optimi erqe(xe, ye) : estimated location by optimizerq (x , y ): actual location during executionqa(xa, ya): actual location during execution
The difference could be substantial due to– Outdated Statistics (expensive to maintain)( p )– Coarse Summaries (histograms)
Attribute Value Independence (AVI) assumptions– Attribute Value Independence (AVI) assumptions
Chronic problem in database design
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 22
Impact of Error Examplep pActual Query
L
Estimated Query Location (qe): (1, 40)
Actual QueryLocation (qa): (80,80)
Cost(Poa) = 50 x 104 (optimal)Cost(P ) = 110 x 104 (highly sub optimal)
LIN
Cost(Poe) = 110 x 104 (highly sub-optimal)EIT
Cost(Poe) = 9 x 104 (optimal)TEM
Selectivity Space SSpace S
ORDERS
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 23
Error Locations wrt Pl R l t R iPlan Replacement Regions
I h tl b tInherently robust???
S100q
Inherently robustInherently robust???
Endo-optimalreY qa
qa
``S llct
ivity
qaSwallowre
E ti l
Sele
c
qe
Exo-optimalre
Selectivity X00 100
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 24
Positive Impact of ReductionPositive Impact of ReductionIn m st c ses repl cement pl n pr vides r bustness In most cases, replacement plan provides robustness to selectivity errors even in exo-optimal region
Original Plan (P )(Poe)
QT5qe = (0.36, 0.05)
Replacement Plan (Pre)
qe ( , )
Local Optimal PlanP1P3P6 P4 P4
P (Poa)P37
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 25
Negative Impact of ReductionNegative Impact of ReductionBut occasionally the replacement is But, occasionally, the replacement is much worse than the original plan !
Replacement Plan (Pre)
QT5QT5qe = (0.03, 0.14)
Original Plan (Poe)
Local Optimal Plan(Poa)
P1P3P6 P4
P4
P3737
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 26
Research ChallengeResearch Challenge
How do we ensure that plan replacements can only help,b t t i ll h t th but never materially hurt the expected performance?expected performance?
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 27
Our Contribution
Mathematical analysis to show that only the perimeter of the selectivity space suffices to selectivity space suffices to determine global safetym g f y
Border Safety ⇒ Interior Safety !
April 2011 Plan Diagrams Tutorial (ICDE 2011) 28
Take Awayy
W ffi i tl d l We can efficiently produce plan diagrams that simultaneously possess diagrams that simultaneously possess the desirable properties of being online, anorexic, safe and robust.
This result could play a meaningful This result could play a meaningful role in designing the next generation g g gof database query optimizers.
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 29
More Details …
h //d l ii i / j /PICASSOhttp://dsl.serc.iisc.ernet.in/projects/PICASSO
Publications, Software, Sample Diagrams
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 30
Dedication
Name: Dr B K Narayana Rao (Founding Fellow)Born: 1881 Elected: 1934 Section: MedicineBorn: 1881 Elected: 1934 Section: MedicineCouncil Service: 1934-61; Vice President 1938-43
Name: Prof T S SubbarayaBorn: 1905 Elected: 1935 Section: PhysicsBorn: 1905 Elected: 1935 Section: Physics
Name: Dr B N Balakrishna Rao Name: Dr B N Balakrishna RaoBorn: 1910 Elected: 1945 Section: Medicine
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 31
QUESTIONS?QUESTIONS?
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 32
END PRESENTATIONEND PRESENTATION
July 2011 Robust Query Optimization (IASc Mid-year Meeting) 33