cs 4432lecture #10 - b+ tree indexing1 cs4432: database systems ii lecture #10 professor elke a....

38
CS 4432 lecture #10 - b+ tree indexing 1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

Upload: georgia-simpson

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

1

CS4432: Database Systems IILecture #10

Professor Elke A. Rundensteiner

Page 2: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #8 - indexing 2

Hierarchy of index structuresSequencefield

5030

7020

4080

10100

6090

firstlevel

(dense,if non-

sequential)

10203040

506070...

105090...

highLevel

(alwayssparse)

1

2

5

43

Page 3: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #8 - indexing 3

Conventional indexes : pros/cons ?Advantage:

- Simple- Index is sequential file

good for scans - Search efficient for static data

Disadvantage:

- Inserts expensive, and/or- Lose sequentiality & balance

- Then search time unpredictable

Page 4: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

4

Example Sequential Index

continuous

free space

102030

405060

708090

39313536

323834

33

overflow area(not sequential)

Page 5: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

5

• Without re-organization we get unpredictable performance

• Too much/often re-organization brings too much overhead

• DBA does not know when to reorganize

• DBA does not know how full to loadpages of new index

Problems … Problems … Problems …

Page 6: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

6

So Let’s Try Another Index . . .

• Give up “sequentiality” of index• Predictable performance under

updates• Achieve always balance of “tree” • Automate restructuring under

updates

Page 7: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

7

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 8: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

8

B+ Trees in Practice

• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133

• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 Mbytes

Page 9: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

9

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

Page 10: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

10

Sample leaf node:

From non-leaf node

to next leafin

sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 9

5

Page 11: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

11

In textbook’s notationn=3

Leaf:

Non-leaf:

30

35

30

30 35

30

Page 12: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

12

Size of node n: n+1 pointersn keys

(fixed)

Page 13: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

14

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

Page 14: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

15

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records; except for the “sequence pointer”

Page 15: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

17

Root

B+Tree Example : Searches

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 16: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

18

Insert into B+tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

Page 17: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

19

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

Page 18: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

20

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

Page 19: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

21

(c) Insert key = 160 n=3

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

Page 20: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

22

(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Page 21: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

23

Recap: Insert Data into B+ Tree

• Find correct leaf L. • Put data entry onto L.

– If L has enough space, done!– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.

• This can happen recursively– To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.

Page 22: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

24

(a) Simple case (b) Leaf-node: Coalesce with neighbor

(sibling)

(c) Leaf-node: Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

Page 23: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

25

(a) Delete key = 11 n=33 5 11

30

31

30

100

Page 24: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

26

(b) Coalesce with sibling– Delete 50

10

40

100

10

20

30

40

50

n=4

40

Page 25: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

27

(c) Redistribute keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

Page 26: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

28

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Coalese and Non-leaf coalese– Delete 37

n=4

40

30

25

25

new root

Page 27: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

29

Delete Data from B+ Tree

• Start at root, find leaf L where entry belongs.• Remove the entry.

– If L is at least half-full, done! – If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

• Merge could propagate to root, decreasing height.

Page 28: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

30

• Concurrency control harder in B-Trees• B-tree consumes more space• B-tree automatically decides :

– when to reorganize– how full to load pages of new index

Discussion of B-trees (vs. static indexed sequential files)

Page 29: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

31

ComparisonB-tree vs. indexed seq.

file• Less space, so

lookup faster• Inserts managed

by overflow area• Requires

temporary restructuring

• Unpredictable performance

• Consumes more space, so lookup slower

•Each insert/delete potentially restructures

•Build-in restructuring

• Predictable performance

Page 30: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

32

• Speaking of buffering… Is LRU a good policy for B+tree

buffers?Of course not!

Should try to keep root in memory at all times

(and perhaps some nodes from second level)

Should keep the “path” when going down to leaves

(just in case of restructuring)

Page 31: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

34

Interesting problem:

For B+tree, how large should n be?

n is number of keys / node

Page 32: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

35

assumptions: n children per node and N records in database

(1) Time to read B-Tree node from disk is (tseek + tread*n) msec.(2) Once in main memory, use binary search to locate key, (a + b log_2 n) msec(3) Need to search (read) log_n (N) tree nodes

(4) t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)

Page 33: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

36

Can get: f(n) = time to find a record

f(n)

nopt n

FIND nopt by f’(n) = 0

What happens to nopt as:•Disk gets faster? CPU get faster? …

Page 34: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

37

Bulk Loading of B+ Tree

• For large collection of records, create B+ tree.• Method 1: Repeatedly insert records slow.• Method 2: Bulk Loading more efficient.

Page 35: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

38

Bulk Loading of B+ Tree

• Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ treeRoot

Page 36: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

39

Bulk Loading (Contd.)

• Index entries for leaf pages always entered into right-most index page

• When this fills up, it splits.

Split may go up right-most path to root.

3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

Root

Data entry pages

not yet in B+ tree3523126

10 20

3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ treeData entry pages

Page 37: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432 lecture #10 - b+ tree indexing

40

Summary of Bulk Loading

• Method 1: multiple inserts.– Slow.– Does not give sequential storage of leaves.

• Method 2: Bulk Loading – Has advantages for concurrency control.– Fewer I/Os during build.– Leaves will be stored sequentially (and

linked) – Can control “fill factor” on pages.

Page 38: CS 4432lecture #10 - b+ tree indexing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

SummaryB+ tree idea: self-balancing index structure that supports both search and insert/delete in log_n time.

B+ tree is versatile : handles equality and range searches

B+ tree and its variants: common index structure in industrial DBMSs

CS 4432 lecture #10 - b+ tree indexing

41