design’and’analysis’of’algorithms’ lecture9...

48
SVU CS502 Design and Analysis of Algorithms Lecture9: BTrees Dr. ChungWen Albert Tsao [email protected] www.408codingschool.com/CS502_Algorithm 1/12/16 1 http://www.slideshare.net/anujmodi555/b-trees-in-data-structure Slide Source: http://www.geeksforgeeks.org/b-tree-set-1-introduction-2/

Upload: voanh

Post on 07-Jul-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

SVU  CS502

Design  and  Analysis  of  Algorithms  Lecture-­‐9:  B-­‐Trees

Dr.  Chung-­‐Wen  Albert  Tsao  

[email protected]  www.408codingschool.com/CS502_Algorithm

1/12/161http://www.slideshare.net/anujmodi555/b-trees-in-data-structure

Slide Source:

http://www.geeksforgeeks.org/b-tree-set-1-introduction-2/

B-Trees 2

Motivation for B-Trees

• Index structures for large datasets cannot be stored in main memory

• Storing it on disk requires different approach to efficiency • Crudely speaking, one disk access takes about the same time

as 200,000 instructions

B-Trees 3

Motivation (cont.)

• Assume that we use an Red-Black tree to store about 20 million records

• We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24

• We can’t improve on the log n lower bound on search for a binary tree

• But, we can reduce slow disk accesses with more branches and thus shorter tree! – As branching increases, depth decreases

B-Trees 4

1) All leaves are at same level. 2) B-Tree is defined by the term minimum degree ‘t’.

• That is, a node can have between t and 2t child nodes. • Exception: Root may contain as few as 1 node.

• The value of t depends upon disk block size. • The "fill factor" of (50%) is used to control the growth and the shrinkage.

3) Number of keys per-node = Number of children per-node -1 • Every node must contain between t-1 and 2t–1 keys • All keys of a node are sorted in increasing order. • The child between keys k1 and k2 contains all keys between k1 and k2

An example B-Tree

B-Tree of minimum degree t=3

Note that all the leaves are at the same level

http://www.geeksforgeeks.org/b-tree-set-1-introduction-2/

B-Trees 6

Constructing a B-tree

B-Tree of minimum degree t=3

• The splitChild operation moves a key up, so B-Trees grow up • Binary search trees like red-black tree grow down.

B-Trees 6

Inserting k into a B-Tree A proactive insertion algorithm

1) Initialize x as root.

2) While x is not leaf, do following

..a) Find the child of x that is going to to be traversed next.

..b) If child node y is not full, x=y.

..c) If y is full, split it and move a key from y to its parent x.

If k < mid-key in y, x = first part of y.

Else x = second part of y.

3) The loop in step 2 stops when x is leaf. Insert k to x.

(x must have space for 1 extra key as we have been splitting all nodes in advance.)

B-Trees 11

Inserting into a B-Tree A re-active insertion algorithm

1)Attempt to insert the new key k into a leaf

2)If leaf is full, split it into two, promoting the middle key to the leaf’s parent

3)If parent becomes full, split it into two, promoting the middle key

4)Repeat the above steps all the way to the top

5)If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher

B-Trees 6

B-Tree of minimum degree t=3

Insert 1 2 3 4 5

1--2--3--4--5

Insert 6:

3 1--2 4--5--6

Insert 7:

3 1--2 4--5--6--7

Insert 8:

3 1--2 4--5--6--7--8

Insert 9:

3--------6 1--2 4--5 7--8--9

Insert 10:

3--------6 1--2 4--5 7--8--9--10

↓ ↓

↓splitChild splitChild

B-Trees 6

B-Tree of minimum degree t=3

Insert 13:

3--------6--------9 1--2 4--5 7--8 10-11-12-13

Insert 11:

3--------6 1--2 4--5 7--8--9--10-11

Insert 12:

3--------6--------9 1--2 4--5 7--8 10-11-12

Insert 14:

3--------6--------9 1--2 4--5 7--8 10-11-12-13-14

Insert 15:

3--------6--------9--------12 1--2 4--5 7--8 10-11 13-14-15

Insert 16:

3--------6--------9--------12 1--2 4--5 7--8 10-11 13-14-15-16

splitChildsplitChild

↓ ↓

B-Trees 6

B-Tree of minimum degree t=3

Insert 20: 9 3--------6 12-------15 1--2 4--5 7--8 10-11 13-14 16-17-18-19-20

Insert 17: 3--------6--------9--------12 1--2 4--5 7--8 10-11 13-14-15-16-17

Insert 18: 3--------6--------9--------12--------15 1--2 4--5 7--8 10-11 13-14 16-17-18

Insert 19: 9 3--------6 12-------15 1--2 4--5 7--8 10-11 13-14 16-17-18-19

splitChild

12

Insert 23:

9 3--------6 12-------15-------18 1--2 4--5 7--8 10-11 13-14 16-17 19-20-21-22-23

Insert 21:

9 3--------6 12-------15-------18 1--2 4--5 7--8 10-11 13-14 16-17 19-20-21

Insert 22:

9 3--------6 12-------15-------18 1--2 4--5 7--8 10-11 13-14 16-17 19-20-21-22

13

Insert 26:

9 3--------6 12-------15-------18-------21 1--2 4--5 7--8 10-11 13-14 16-17 19-20 22-23-24-25-26

Insert 24:

9 3--------6 12-------15-------18-------21 1--2 4--5 7--8 10-11 13-14 16-17 19-20 22-23-24

Insert 25:

9 3--------6 12--------15-------18-------21 1--2 4--5 7--8 10-11 13-14 16-17 19-20 22-23-24-25

B-Trees 12

Exercise in Inserting a B-Tree

• Insert the following keys to a B-Tree of minimum degree t=3: • 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

5--------9-----------------24 1--3--4 7--8 11-13-14-19-23 25-31-35-45-56

B-Trees 12

Exercise in Inserting a B-Tree

• Insert the following keys to a B-Tree of minimum degree t=2: • 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

16

Case 1. If key k ∈ a leaf node x, delete k from x. (Simple leaf deletion) Case 2. If key k ∈ an internal node x, do the following.

Let y/z = x’s child node that precedes/follows k. Let k0/k1 = x’s predecessor/successor. (Simple non-leaf deletion)

2.a) If y has ≥ t keys, k = k0. Recursively delete k0. 2.b) If y has < t keys, but z has ≥ t keys, k = k1. Recursively delete k1. (Too few keys in child node y and y’s siblings z) 2.c) If both y, z have only t-1 keys, delete k from x, and merge y with z. Case 3. If key k ∉ x, find child node y who is the ancestor of k.

Let y = x’s child node that is the ancestor of k. Let z = y’s immediate sibling.

3.a) If y has ≥ t keys, continue to 3.d) 3.b) If y has only t-1 keys but z has ≥ t keys, move key: z ⇝ x ⇝ y, (Too few keys in child node y and y’s siblings z) 3.c) If both y and z have t-1 keys, merge y with z , and move key: x ⇝ y. 3.d) Set x = y; Recursively delete k.

Removal from a B-tree

17

Delete F: Case 1

1. If key k ∈ a leaf node x, delete k from x. (Simple leaf deletion)

Removal from a B-tree: Case 1

x

x

18

2. If key k ∈ an internal node x, do the following. a) If y has ≥ t keys, ⇒ replace k with k’s predecessor. Recursively delete k’s predecessor.

(Simple non-leaf deletion)

x

child y

Delete M: Case 2a

Removal from a B-tree: Case 2a

x

child y

19

3--------6 1--2 4--5 7--8--9--10

3--------7 1--2 4--5 8--9--10

Case 2b) If y has < t keys, but z has ≥ t keys, k = k1. Recursively delete k1. (Simple non-leaf deletion)

Remove 6⇒ ⇒

20

x

z

y←y merge z

Removal from a B-tree: Case 2cCase 2c (k ∈ x: Too few keys in child node y and y’s sibling z)

⇒ delete k & merge y with z.

y←y merge zy

21

3--------6 1--2 4--5 7--8--9--10

6 1--2--4--5 7--8--9--10

Case 2c: (k ∈ x: Too few keys in child node y and y’s siblings z)

Delete 3

9 3--------6 12-------15-------18 1--2 4--5 7--8 10-11 13-14 16-17 19-20-21

Delete 12

9 3--------6 15-------18 1--2 4--5 7--8 10-11-13-14 16-17 19-20-21

⇒ ⇒

22

Case 3b (k ∉ x: Too few keys in child node y and y’s sibling z)

y

node x

x ← y merge z

Delete D:

z

x

Removal from a B-tree: Case 3b

y←y merge z

D

23

Case 3a (k ∉ x: Too few keys in child node y but enough in y’s sibling z)

y

x←y

Delete Bx

z

z

Removal from a B-tree: 3a

y

B

B-Trees 21

Exercise in Removal from a B-Tree

• Given B-tree created by these data: • Minimum degree t=2 • 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56

• Add these further keys: 2, 6,12

• Delete these keys: 4, 5, 7, 3, 14

B-Trees 22

Analysis of B-Trees

• The maximum number of items in a B-tree of order m=2t and height h: root m – 1 level 1 m(m – 1) level 2 m2(m – 1) . . . level h mh(m – 1)

• So, the total number of items is (1 + m + m2 + m3 + … + mh)(m – 1) = [(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1

• When m = 5 and h = 2 this gives 53 – 1 = 124

B-Trees 23

Reasons for using B-Trees

• When searching tables held on disc, the cost of each disc transfer is high but doesn't depend much on the amount of data transferred, especially if consecutive items are transferred – If we use a B-tree of order 101, say, we can transfer each node in one

disc read operation – B-tree of order 101 and height 3 can hold 1014 – 1 items

(approximately 100 million) and any item can be accessed with 3 disc reads (assuming we hold the root in memory)

• If we take m = 3, we get a 2-3 tree, in which non-leaf nodes have two or three children (i.e., one or two keys) – B-Trees are always balanced (since the leaves are all at the same level),

so 2-3 trees make a good type of balanced tree

B-Trees 24

Comparing Trees

• Binary trees – Can become unbalanced and lose their good time complexity (big O) – AVL trees are strict binary trees that overcome the balance problem – Heaps remain balanced but only prioritise (not order) the keys

• Multi-way trees – B-Trees can be m-way, they can have any (odd) number of children – One B-Tree, the 2-3 (or 3-way) B-Tree, approximates a permanently

balanced binary tree, exchanging the AVL tree’s balancing operations for insertion and (more complex) deletion operations

B + TREE

Slides source: http://www.slideshare.net/Tech_MX/b-tree-14155416

INTRODUCTION OF B+ TREE

○ The properties of a B+ tree, ○ Similar to the B-Tree,

○ All the leaf nodes are at the same bottom level. ○ Each internal node of the tree has between [t] and [2*t] children ○ Number of keys per-node = Number of keys per-node -1 ○ Use a "fill factor" of (50%) to control the growth and the

shrinkage. ○ In contrast to a B-tree,

○ All records are stored at the leaf level of the tree; only keys are stored in internal nodes.

○ All the leaf nodes are linked as a list for faster (sequential) disk access.

B+ TREE

Internal (Index) Node

Leaf (data) nodes are linked for a rapid sequential file read.

OPERATIONS IN B+ TREE○ SEARCH

○ INSERTION

○ DELETION

B+ TREE- SEARCH OPERATIONTWO CASES:

○ Successful Search

○ Unsuccessful Search

SEARCHING

○ Compare the key value with the data in the tree, then give the result back.

For example: find the value 45, and 15 in below tree.

B+ TREE- INSERTION OPERATION

Inserting a record when

Case 1: leaf node: not full, index node: not full.

Case 2: leaf node: full, index node: not full.

Case 3: leaf node: full, index node: full.

INSERTION –CASE 1 LEAF NODE: NOT FULL, INDEX NODE: NOT FULL

Add Record with Key 28

INSERTION –CASE 2 LEAF NODE: FULL, INDEX NODE: NOT FULL

Add Record with Key 70

70

AFTER INSERTING A Record With Key 70.

• This record should go in the leaf node containing 50, 55, 60, and 65.

Left Leaf node Right Leaf node 50 55 60 65 70

Add a record containing a key value of 95 to the following tree.

INSERTION –CASE 3 LEAF NODE: FULL, INDEX NODE: FULL

95

This record belongs in the node containing 75, 80, 85, and 90. Since this node is full we split it into two nodes:

Left Leaf Node Right Leaf node 75 80 85 90 95

The middle key, 85, rises to the index node.

But the index node is also full, so we split the index node: Left Index node Right Index node New Index node 25 50 75 85 60

Leaf nodes are at same level only.

95

add

split

split

B+ TREE – DELETION OPERATION

Deleting a record from B+ tree may result in

Case 1: both leaf/index nodes above the fill factor.

Case 2: leaf/index node below/above fill factor

Case 3: Both leaf/index nodes below the fill factor.

○ This node will contain 2 records after the deletion. So, simply delete 70 from the leaf node.

70

Delete 70 from the following B+ Tree

Case 1: both leaf/index nodes above the fill factor.

Delete 25 from the B+ tree

○ when we delete 25 we must replace it with 28 in the index node.

DELETE 60 FROM THE B+ TREE

○ The leaf node containing 60 will be below the fill factor after the deletion. Thus, we must combine leaf nodes.

○ With recombined nodes, the index node will be reduced by one key. Hence, it will also fall below the fill factor. Thus, we must combine index nodes.

○ 60 appears as the only key in the root index node.

B+ TREES AS FILE INDEXES

○ B+ Trees are descendants of B Trees.

○ Retrieval of records from large files or databases stored in external memory is time consuming.

○ To promote Efficient Retrievals, file indexes are used.

○ An index is a <Key , Address> pair.

○ The records of the file are sequentially stored and for each block of records, the largest key and the block address is stored in an index.

○ In B+ Tree to retrieve a record given its key, it is essential that the search traverses down to a leaf node to retrieve its address.

○ The non leaf nodes only serve to help the process traverse downwards towards the appropriate leaf node.

ADVANTAGE OF B+ TREE

○ B+ Trees good for a full scan ○ the leaf nodes form a linked list.

○ B tree will need a complete in-order traversal

○ Any search will end at leaf node only. ○ Time complexity for every search results in O(h),

where h is the height of the B+ tree. ○ Waste of Memory. ○ In comparing to B+ trees, B trees are efficient.

DISADVANTAGE OF B+ TREE