db performance tuning using indexes section 8.5 and chapters 20 (raghu)

DB performance DB performance tuning using indexestuning using indexes

Section 8.5 and Chapters 20 (Raghu)Section 8.5 and Chapters 20 (Raghu)

What do you need to know?What do you need to know?

A relational operator can be executed using A relational operator can be executed using different different physical algorithmsphysical algorithms

An SQL query corresponds to a relational An SQL query corresponds to a relational algebra expression and may map to alternative algebra expression and may map to alternative equivalent equivalent execution plansexecution plans

Join OperationJoin Operation Several different algorithms to implement joinsSeveral different algorithms to implement joins

Nested-loop joinNested-loop join Block nested-loop joinBlock nested-loop join Indexed nested-loop joinIndexed nested-loop join Merge-joinMerge-join Hash-joinHash-join

Choice based on Choice based on cost estimatecost estimate

Nested-Loop (NL) JoinNested-Loop (NL) Join To compute the theta join To compute the theta join rr ss

for eachfor each tuple tuple ttrr in in rr do begin do begin

for each tuple for each tuple ttss in in ss do begin do begin

test pair (test pair (ttrr,t,tss) to) to see if they satisfy the join condition see if they satisfy the join condition

if they do, add if they do, add ttrr.t.tss to the result.to the result.

endendendend

rr is called the is called the outerouter relationrelation and and ss the the inner relationinner relation of of the join.the join.

Requires no indices and can be used with any kind of Requires no indices and can be used with any kind of join condition.join condition.

ExpensiveExpensive since it examines every pair of tuples in the since it examines every pair of tuples in the two relations. two relations.

Merge-JoinMerge-Join1.1. Sort both relations on their join attribute (if not already sorted Sort both relations on their join attribute (if not already sorted

on the join attributes).on the join attributes).

2.2. Merge the sorted relations to join themMerge the sorted relations to join them1.1. Join step is similar to the merge stage of the sort-merge algorithm. Join step is similar to the merge stage of the sort-merge algorithm.

2.2. Main difference is handling of duplicate values in join attribute: Main difference is handling of duplicate values in join attribute: every pair with same value on join attribute must be matchedevery pair with same value on join attribute must be matched

Equivalent execution plansEquivalent execution plansselect d.customer-nameselect d.customer-name

from branch b, account a, depositor dfrom branch b, account a, depositor d

where b.branch-name = a.branch-namewhere b.branch-name = a.branch-name

and a.account-number = d.account-numberand a.account-number = d.account-number

and b.branch-city = ‘Brooklyn’and b.branch-city = ‘Brooklyn’

Understanding the WorkloadUnderstanding the Workload

For each For each queryquery in the workload: in the workload: Which relations does it access?Which relations does it access? Which attributes are retrieved?Which attributes are retrieved? Which attributes are involved in selection/join conditions? Which attributes are involved in selection/join conditions?

How selective are these conditions likely to be? How selective are these conditions likely to be?

For each For each updateupdate in the workload: in the workload: Which attributes are involved in selection/join conditions? Which attributes are involved in selection/join conditions?

How selective are these conditions likely to be?How selective are these conditions likely to be? The type of update (The type of update (INSERT/DELETE/UPDATEINSERT/DELETE/UPDATE), and the ), and the

attributes that are affected.attributes that are affected.

Choice of IndexesChoice of Indexes

What What indexesindexes should we create? should we create? Which relations should have indexes? What field(s) Which relations should have indexes? What field(s)

should be the search key? Should we build several should be the search key? Should we build several indexes?indexes?

For each index, what For each index, what kind of an indexkind of an index should it be? should it be? Clustered? Hash/tree? Clustered? Hash/tree?

Choice of Indexes (Contd.)Choice of Indexes (Contd.)

One approach:One approach: Consider the most important queries in Consider the most important queries in turn. Consider the best plan using the current indexes, turn. Consider the best plan using the current indexes, and see if a better plan is possible with an additional and see if a better plan is possible with an additional index. If so, create it.index. If so, create it. Obviously, this implies that we must understand how a DBMS Obviously, this implies that we must understand how a DBMS

evaluates queries and creates evaluates queries and creates query evaluation plans!query evaluation plans! For now, we discuss simple 1-table queries.For now, we discuss simple 1-table queries.

Before creating an index, must also consider the Before creating an index, must also consider the impact on updates in the workload!impact on updates in the workload! Trade-off:Trade-off: Indexes can make queries go faster, updates Indexes can make queries go faster, updates

slower. Require disk space, too.slower. Require disk space, too.

Index Selection GuidelinesIndex Selection Guidelines Attributes in Attributes in WHEREWHERE clause are candidates for index keys.clause are candidates for index keys.

Exact match condition suggests hash index.Exact match condition suggests hash index. Range query suggests tree index.Range query suggests tree index.

• Clustering is especially useful for range queries; can also help on Clustering is especially useful for range queries; can also help on equality queries if there are many duplicates.equality queries if there are many duplicates.

Multi-attribute search keysMulti-attribute search keys should be considered when a should be considered when a WHERE WHERE clause contains several conditions.clause contains several conditions.

Order of attributes is important for range queries.Order of attributes is important for range queries. Such indexes can sometimes enable Such indexes can sometimes enable index-only index-only strategies for strategies for

important queries.important queries.• For index-only strategies, clustering is not important!For index-only strategies, clustering is not important!

Try to choose indexes that benefit as many queries as Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per possible. Since only one index can be clustered per relation, choose it based on important queries that would relation, choose it based on important queries that would benefit the most from clustering.benefit the most from clustering.

Examples of Clustered Examples of Clustered IndexesIndexes

B+ tree index on E.age can be B+ tree index on E.age can be used to get qualifying tuples.used to get qualifying tuples. How selective is the condition?How selective is the condition? Is the index clustered?Is the index clustered?

Consider the Consider the GROUP BY GROUP BY query.query. If many tuples have If many tuples have E.ageE.age > 10, > 10,

using using E.ageE.age index and sorting the index and sorting the retrieved tuples may be costly.retrieved tuples may be costly.

Clustered Clustered E.dnoE.dno index may be better! index may be better!

Equality queries and duplicates:Equality queries and duplicates: Clustering on Clustering on E.hobbyE.hobby helps! helps!

SELECT E.dnoFROM Emp EWHERE E.age>40

SELECT E.dno, COUNT (*)FROM Emp EWHERE E.age>10GROUP BY E.dno

SELECT E.dnoFROM Emp EWHERE E.hobby=Stamps

Indexes with Composite Indexes with Composite Search Keys Search Keys

Composite Search KeysComposite Search Keys:: Search Search on a combination of fields.on a combination of fields.

Equality query:Equality query: Every field value is Every field value is equal to a constant value. E.g. wrt equal to a constant value. E.g. wrt <sal,age> index:<sal,age> index:

• age=20 and sal =75age=20 and sal =75 Range query:Range query: Some field value is not Some field value is not

a constant. E.g.:a constant. E.g.:• age =20; or age=20 and sal > 10age =20; or age=20 and sal > 10

Data entries in index sorted by Data entries in index sorted by search key to support range search key to support range queries.queries.

Lexicographic orderLexicographic order, or, or Spatial order.Spatial order.

sue 13 75

bob

cal

joe 12

10

20

8011

12

name age sal

<sal, age>

<age, sal> <age>

<sal>

12,20

12,10

11,80

13,75

20,12

10,12

75,13

80,11

11

12

12

13

10

20

75

80

Data recordssorted by name

Data entries in indexsorted by <sal,age>

Data entriessorted by <sal>

Examples of composite keyindexes using lexicographic order.

Composite Search KeysComposite Search Keys To retrieve Emp records with To retrieve Emp records with ageage=30 =30 ANDAND salsal=4000=4000, ,

an index on <an index on <age,salage,sal> would be better than an index > would be better than an index on on ageage or an index on or an index on salsal.. Choice of index key orthogonal to clustering etc.Choice of index key orthogonal to clustering etc.

If condition is: If condition is: 20<20<ageage<30 <30 ANDAND 3000< 3000<salsal<5000<5000: : Clustered tree index on <Clustered tree index on <age,salage,sal> or <> or <sal,agesal,age> is best.> is best.

If condition is: If condition is: ageage=30 =30 ANDAND 3000< 3000<salsal<5000<5000: : Clustered <Clustered <age,salage,sal> index much better than <> index much better than <sal,agesal,age> >

index!index!

Composite indexes are larger, updated more often.Composite indexes are larger, updated more often.

Index-Only PlansIndex-Only Plans

A number of A number of queries can be queries can be answered answered without without retrieving any retrieving any tuples from one tuples from one or more of the or more of the relations relations involved if a involved if a suitable index suitable index is available.is available.

SELECT E.dno, COUNT(*)FROM Emp EGROUP BY E.dno

SELECT E.dno, MIN(E.sal)FROM Emp EGROUP BY E.dno

SELECT AVG(E.sal)FROM Emp EWHERE E.age=25 AND

E.sal BETWEEN 3000 AND 5000

<E.dno>

<E.dno,E.sal>Tree index!

<E. age,E.sal> or <E.sal, E.age>

Tree!

Index Selection for Joins (20.3)Index Selection for Joins (20.3) When considering a join condition:When considering a join condition:

Hash indexHash index on inner relation (where the search on inner relation (where the search key includes the join columns) is very good for key includes the join columns) is very good for Index Nested Loops.Index Nested Loops.

• Should be clustered if join column is not key for Should be clustered if join column is not key for inner, and inner tuples need to be retrieved.inner, and inner tuples need to be retrieved.

Clustered B+ treeClustered B+ tree on join column(s) good for on join column(s) good for Sort-Merge, because don’t need to sort.Sort-Merge, because don’t need to sort.

As examples show, our choice of indexes is guided

by the plan(s) that we expect an optimizer to consider

for a query.

Example 1Example 1

Hash indexHash index on on D.dnameD.dname supports ‘Toy’ selection. supports ‘Toy’ selection. Given this, index on D.dno is not needed.Given this, index on D.dno is not needed.

Hash indexHash index on on E.dnoE.dno allows us to get matching (inner) allows us to get matching (inner) Emp tuples for each selected (outer) Dept tuple.Emp tuples for each selected (outer) Dept tuple.

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE D.dname=‘Toy’ AND E.dno=D.dno

Dept

Emp (hash index on dno)

D.dno = E.dno

dname = ‘Toy’

E.ename, D.mgr

(hash index on dname)

(INL)

Example 1aExample 1a

What if What if WHERE WHERE included: `` ... included: `` ... ANDAND E.age=25 E.age=25’’ ?’’ ? Could retrieve Emp tuples using Could retrieve Emp tuples using index on index on E.ageE.age, then join , then join

with Dept tuples satisfying with Dept tuples satisfying dname dname selection. Comparable to selection. Comparable to strategy that used strategy that used E.dnoE.dno index. index.

So, if So, if E.ageE.age index is already created, this query provides index is already created, this query provides much less motivation for adding an much less motivation for adding an E.dnoE.dno index. index.

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE D.dname=‘Toy’ AND E.dno=D.dnoAND E.age = 25

Dept

Emp (hash index on E.age)

D.dno = E.dno

D.dname = ‘Toy’

(hash index on D.dname)

E.age = 25

Example 2Example 2

Clearly, Emp should be the Clearly, Emp should be the outer relationouter relation.. Suggests that we build a Suggests that we build a hash index on hash index on D.dnoD.dno..

What index should we build on Emp?What index should we build on Emp? B+ tree on B+ tree on E.salE.sal could be used, OR an could be used, OR an (hash) index on (hash) index on

E.hobbyE.hobby could be used. Only one of these is needed, and could be used. Only one of these is needed, and which is better depends upon the selectivity of the which is better depends upon the selectivity of the conditions.conditions.

• As a rule of thumb, equality selections more selective than range As a rule of thumb, equality selections more selective than range selections.selections.

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE E.sal BETWEEN 10000 AND 20000 AND E.hobby=‘Stamps’ AND E.dno=D.dno

Clustering and Joins (20.4)Clustering and Joins (20.4)

Clustering is especially important when accessing Clustering is especially important when accessing inner tuples in INL => should make inner tuples in INL => should make index on index on E.dnoE.dno clustered.clustered.

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE D.dname=‘Toy’ AND E.dno=D.dno

Dept

Emp (hash index on E.dno)

D.dno = E.dno

D.dname = ‘Toy’

(hash index on D.dname)

(INL)

Clustering and Joins (cont.)Clustering and Joins (cont.)

If many employees collect stamps, Sort-Merge join If many employees collect stamps, Sort-Merge join may be worth considering. A may be worth considering. A clustered index on clustered index on D.dnoD.dno would help. would help.

SummarySummary: : Clustering is useful whenever many tuples Clustering is useful whenever many tuples are to be retrieved.are to be retrieved.

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE E.hobby=‘Stamps AND E.dno=D.dno

Emp Dept (hash index on D.dno)

dno = dno

hobby = ‘Stamps’

(hash index on E.hobby)

(Sort-Merge)

Rewriting queries (20.7.3, Rewriting queries (20.7.3, 20.9)20.9)

If a query runs slower than expected, check the plan If a query runs slower than expected, check the plan that is being used. The choice of indexes may have to that is being used. The choice of indexes may have to be adjusted, an index may need to be re-built , statistics be adjusted, an index may need to be re-built , statistics may be too old, or a may be too old, or a query may have to be rewritten.query may have to be rewritten.

Sometimes, the DBMS may not be executing the plan Sometimes, the DBMS may not be executing the plan you had in mind. Common areas of weakness:you had in mind. Common areas of weakness: Selections involving Selections involving null values, arithmetic or string null values, arithmetic or string

expressionsexpressions (ex: (ex: WHERE E.age = 2*D.ageWHERE E.age = 2*D.age)) Selections involving Selections involving OROR conditions (next slide) conditions (next slide) Lack of evaluation featuresLack of evaluation features like index-only strategies or certain like index-only strategies or certain

join methods or poor size estimation.join methods or poor size estimation.

Rewriting queries with Rewriting queries with DISTINCTDISTINCT

MinimizeMinimize the use of the use of DISTINCTDISTINCT: : don’t need it if duplicates are acceptable, don’t need it if duplicates are acceptable, or if answer contains a key. or if answer contains a key.

Goal is to avoid expensive operations, like Goal is to avoid expensive operations, like duplicate eliminationduplicate elimination

Rewriting selections Rewriting selections involving ORinvolving OR

Suppose indexes on Suppose indexes on hobbyhobby and and ageage Rewrite query using Rewrite query using unionunion..

SELECT E.dnoFROM Employees EWHERE E.hobby = ‘Stamps’OR E.age = 10

=SELECT E.dnoFROM Employees EWHERE E.hobby = ‘Stamps’UNIONSELECT E.dnoFROM Employees EWHERE E.age = 10

Rewriting query with GROUP Rewriting query with GROUP BY/HAVINGBY/HAVING

Minimize the use of Minimize the use of GROUP BYGROUP BY and and HAVINGHAVING::

SELECT MIN (E.age)FROM Employee EGROUP BY E.dnoHAVING E.dno=102

SELECT MIN (E.age)FROM Employee EWHERE E.dno=102=

Rewriting nested queriesRewriting nested queries

Use only Use only one “query block”,one “query block”, if possible. if possible.

SELECT DISTINCT * FROM Sailors S WHERE S.sname IN

(SELECT Y.sname FROM YoungSailors Y)

SELECT DISTINCT S.* FROM Sailors S, YoungSailors Y WHERE S.sname = Y.sname

=

db performance tuning using indexes section 8.5 and chapters 20 (raghu)

Documents