robust optimization of database queries optimization of database queries jayant haritsa database...

33
Robust Optimization of Database Queries Jayant Haritsa Database Systems Lab Database Systems Lab Indian Institute of Science July 2011 Robust Query Optimization (IASc Mid-year Meeting) 1

Upload: others

Post on 16-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Robust Optimization of Database Queries

Jayant HaritsaDatabase Systems LabDatabase Systems LabIndian Institute of Science

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 1

Page 2: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Database Management Systems (DBMS)(DBMS)

Effi i t d i t h i f Efficient and convenient mechanisms for storing, maintenance and querying of g, q y genterprise data

b ki i t i t l– banking, inventory, insurance, travel, …

Cornerstone of computer industry– Uses more than 80 percent of computers worldwide– Employs more than 70 percent of IT professionalsp y p p– Largest monetary sector of computer business

July 2011 2Robust Query Optimization (IASc Mid-year Meeting)

Page 3: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Current Database SystemsCurrent Database Systems

CommercialIBM DB2 / Oracle / Microsoft SQL Server /– IBM DB2 / Oracle / Microsoft SQL Server / Sybase ASE / CA Ingres

Public-Domain – PostgreSQL / MySQL

July 2011 3Robust Query Optimization (IASc Mid-year Meeting)

Page 4: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Relational Database Systems[RDBMS][RDBMS]

Based on first order logic Based on first-order logic Edgar Codd of IBM Research, Turing Award (1981)

“We believe in Codd not God” We believe in Codd, not God

Data is stored in a set of relations (i.e. tables) with attributes, relationships, constraints

R ll N | N | Add C N | Titl | C ditSTUDENT COURSE

Roll No | Name | Address Course No | Title | Credits

REGISTER

Roll No | Course No | Grade

REGISTER

July 2011 4Robust Query Optimization (IASc Mid-year Meeting)

Page 5: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

QUERY INTERFACEQ

Structured Query Language (SQL)Structured Query Language (SQL)– Invented by IBM, 1970s

E l Li t f t d t d th i titl– Example: List names of students and their course titles

select STUDENT Name COURSE Titleselect STUDENT.Name, COURSE.Titlefrom STUDENT, COURSE, REGISTERwhere STUDENT.RollNo = REGISTER.RollNo and

REGISTER.CourseNo = COURSE.CourseNo

STUDENT COURSERoll No | Name | Address Course No | Title | Credits

STUDENT COURSE

Roll No | Course No | Grade

REGISTER

Roll No | Course No | Grade

July 2011 5Robust Query Optimization (IASc Mid-year Meeting)

Page 6: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Query Execution PlansQ y SQL is a declarative language SQL is a declarative language– Specifies only what is wanted, but not how the query

should be evaluated (i.e. ends, not means)( , )select STUDENT.Name, COURSE.Titlefrom STUDENT COURSE REGISTERfrom STUDENT, COURSE, REGISTERwhere STUDENT.RollNo = REGISTER.RollNo and

REGISTER.CourseNo = COURSE.CourseNo

Unspecified:join order [ ((S R) C) or ((R C) S) ? ]join techniques [ Nested-Loops or Sort-Merge or Hash ? ]

DBMS query optimizer identifies the optimal DBMS query optimizer identifies the optimal evaluation strategy: “query execution plan”

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 6

Page 7: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Sample Execution PlanpRETURN Total Execution CostRETURN

Cost: 286868

Total Execution Cost(estimated)

HASH JOINCost: 286868

MERGE JOINCost: 278751

INDEX SCANCost: 6745

SORTCost: 225103

TABLE SCANCost: 6834

TABLE SCANCost: 209760

Cost: 6745

STUDENT COURSE REGISTER

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 7

Page 8: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

CONCEPT:PLAN DIAGRAMS

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 8

Page 9: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Relational Selectivities

Cost based Query Optimizer’s choice of Cost-based Query Optimizer s choice of

execution plan = f (query database system )execution plan = f (query, database, system, …)

For a given database and system setup For a given database and system setup,

execution plan chosen for a query =execution plan chosen for a query = f (selectivities of query’s base relations)

– selectivity is the estimated percentage of rows of a relation used in producing the query result

April 2011 Plan Diagrams Tutorial (ICDE 2011) 9

Page 10: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

SQL Query Template [Q7 f TPC H B h k][Q7 of TPC-H Benchmark]

Determines the values of goods shipped between nations in a time period

select supp nation, cust nation, l year, sum(volume) as revenue

Determines the values of goods shipped between nations in a time period

supp_nation, cust_nation, l_year, sum(volume) as revenue from(select n1.n_name as supp_nation, n2.n_name as cust_nation,

extract(year from l shipdate) as l year,(y _ p ) _y ,l_extendedprice * (1 - l_discount) as volume

from supplier, lineitem, orders, customer, nation n1, nation n2where s_suppkey = l_suppkey and o_orderkey = l_orderkey andValue determines

l ti it fValue determines

fc_custkey = o_custkey and s_nationkey = n1.n_nationkeyand c_nationkey = n2.n_nationkey and

((n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or ( 1 'GERMANY' d 2 'FRANCE')) d

selectivity of ORDERS relation

selectivity of CUSTOMER relation

(n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')) andl_shipdate between date '1995-01-01' and date '1996-12-31'and o totalprice ≤ C1 and c acctbal ≤ C2 ) as shipping

group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year

_ p _ ) pp g

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 10

Page 11: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Relational Selectivity Spacey p

ctiv

itySe

lec

Selectivity

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 11

Page 12: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Plan and Cost Diagramsg

A plan diagram is a pictorial enumeration of the plan choices of the query optimizer over p q y pthe relational selectivity space

A cost diagram is a visualization of the g(estimated) plan execution costs over the same relational selectivity space

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 12

Page 13: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

PICASSOPicasso is a Java tool developed in our lab thatPicasso is a Java tool developed in our lab thatautomatically generates plan and cost diagrams

~50000 lines of code (2004-11) with ~100 classes 50000 lines of code (2004 11) with 100 classes Operational on DB2/Oracle/SQLServer/Sybase/PostgreSQL Copyrighted by IISc in May 2006py g y y Released as free software in Nov 2006 by Prof. N Balakrishnan,

Associate Director of IISc Release of v1.0 in May 2007, v2.0 in Feb 2009, v2.1 in Feb 2011 In use at academic and industrial labs worldwide

– CMU, Purdue, Duke, TU Munich, NU Singapore, IIT-B, …– IBM, Microsoft, Oracle, Sybase, HP, …

Received Best Software award in Received Best Software award in Very Large Data Base (VLDB) conference, 2010

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 13

Page 14: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

The Picasso Connection

W ith itWoman with a guitarGeorges Braque, 1913

Plan diagrams are often similar tooften similar tocubist paintings !

[ Pablo Picasso f ffounder of cubist genre ]

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 14

Page 15: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Complex Plan Diagram [QT8 O tA*][QT8,OptA*]

Highly irregular plan boundariesplan boundaries # of plans: 76

The Picasso Connection

Extremely fine-grained coverage

(P76 ~ 0.01%)Intricate Complex Patterns

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 15

Page 16: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Cost Diagram [QT8 O t A*][QT8, Opt A*]

All t ithiAll costs are within 20 percent of the

maximum

MinCost: 8.26E5MaxCost: 1.05E6

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 16

Page 17: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

PHENOMENA:PLAN DIAGRAM REDUCTION

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 17

Page 18: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Problem StatementCan the plan diagram be recolored with aCan the plan diagram be recolored with a smaller set of colors (i.e. some plans are “swallowed” by others), such thatswallowed by others), such that

Guarantee:No query point in the original diagram hasits estimated cost increased, post-swallowing,b th λ t ( d fi d)by more than λ percent (user-defined)

Analogy: Analogy: (with due apologies to Sri Lankans in the audience)Sri Lanka agrees to be annexed by India if it isg yassured that the cost of living of each Lankancitizen is not increased by more than λ percent

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 18

Page 19: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Reduced Plan Diagram [λ=10%][QT8 O tA* R 100]Complex Plan Diagram[QT8 O tA* R 100][QT8, OptA*, Res=100][QT8, OptA*, Res=100]

QTD: QT8_OptA*_100

Reduced to 5 plans from 76 !from 76 !

Comparatively smoother contours

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 19

Page 20: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Anorexic Reduction

E l l h Extensive empirical evaluation with a spectrum of multi-dimensional query templates indicates h that

“With a cost-increase-threshold of just 20%,virtually all complex plan diagramsu y mp p g m[irrespective of query templates, data distribution,query distribution, system configurations, etc.]q y , y g , ]

reduce to “anorexic levels” (~10 or less plans)!

February 2011 Picasso (IIITM Kerala) 20

Page 21: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

APPLICATION:ROBUST PLANS

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 21

Page 22: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Selectivity Estimation Errorsy

( ) estimated location b optimi erqe(xe, ye) : estimated location by optimizerq (x , y ): actual location during executionqa(xa, ya): actual location during execution

The difference could be substantial due to– Outdated Statistics (expensive to maintain)( p )– Coarse Summaries (histograms)

Attribute Value Independence (AVI) assumptions– Attribute Value Independence (AVI) assumptions

Chronic problem in database design

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 22

Page 23: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Impact of Error Examplep pActual Query

L

Estimated Query Location (qe): (1, 40)

Actual QueryLocation (qa): (80,80)

Cost(Poa) = 50 x 104 (optimal)Cost(P ) = 110 x 104 (highly sub optimal)

LIN

Cost(Poe) = 110 x 104 (highly sub-optimal)EIT

Cost(Poe) = 9 x 104 (optimal)TEM

Selectivity Space SSpace S

ORDERS

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 23

Page 24: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Error Locations wrt Pl R l t R iPlan Replacement Regions

I h tl b tInherently robust???

S100q

Inherently robustInherently robust???

Endo-optimalreY qa

qa

``S llct

ivity

qaSwallowre

E ti l

Sele

c

qe

Exo-optimalre

Selectivity X00 100

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 24

Page 25: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Positive Impact of ReductionPositive Impact of ReductionIn m st c ses repl cement pl n pr vides r bustness In most cases, replacement plan provides robustness to selectivity errors even in exo-optimal region

Original Plan (P )(Poe)

QT5qe = (0.36, 0.05)

Replacement Plan (Pre)

qe ( , )

Local Optimal PlanP1P3P6 P4 P4

P (Poa)P37

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 25

Page 26: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Negative Impact of ReductionNegative Impact of ReductionBut occasionally the replacement is But, occasionally, the replacement is much worse than the original plan !

Replacement Plan (Pre)

QT5QT5qe = (0.03, 0.14)

Original Plan (Poe)

Local Optimal Plan(Poa)

P1P3P6 P4

P4

P3737

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 26

Page 27: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Research ChallengeResearch Challenge

How do we ensure that plan replacements can only help,b t t i ll h t th but never materially hurt the expected performance?expected performance?

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 27

Page 28: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Our Contribution

Mathematical analysis to show that only the perimeter of the selectivity space suffices to selectivity space suffices to determine global safetym g f y

Border Safety ⇒ Interior Safety !

April 2011 Plan Diagrams Tutorial (ICDE 2011) 28

Page 29: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Take Awayy

W ffi i tl d l We can efficiently produce plan diagrams that simultaneously possess diagrams that simultaneously possess the desirable properties of being online, anorexic, safe and robust.

This result could play a meaningful This result could play a meaningful role in designing the next generation g g gof database query optimizers.

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 29

Page 30: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

More Details …

h //d l ii i / j /PICASSOhttp://dsl.serc.iisc.ernet.in/projects/PICASSO

Publications, Software, Sample Diagrams

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 30

Page 31: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

Dedication

Name: Dr B K Narayana Rao (Founding Fellow)Born: 1881 Elected: 1934 Section: MedicineBorn: 1881 Elected: 1934 Section: MedicineCouncil Service: 1934-61; Vice President 1938-43

Name: Prof T S SubbarayaBorn: 1905 Elected: 1935 Section: PhysicsBorn: 1905 Elected: 1935 Section: Physics

Name: Dr B N Balakrishna Rao Name: Dr B N Balakrishna RaoBorn: 1910 Elected: 1945 Section: Medicine

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 31

Page 32: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

QUESTIONS?QUESTIONS?

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 32

Page 33: Robust Optimization of Database Queries Optimization of Database Queries Jayant Haritsa Database Systems Lab ... l_shipdate between date '1995-01-01' and date '1996-12-31' ... (IASc

END PRESENTATIONEND PRESENTATION

July 2011 Robust Query Optimization (IASc Mid-year Meeting) 33