chapter 10 search structures all the programs in this file are selected from ellis horowitz, sartaj...

54
CHAPTER 10 CHAPTER 10 Search Structures Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data Structures in C”, Computer Science Press, 1992.

Upload: julian-cox

Post on 29-Dec-2015

229 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

CHAPTER 10CHAPTER 10Search StructuresSearch Structures

All the programs in this file are selected fromEllis Horowitz, Sartaj Sahni, and Susan Anderson-Freed“Fundamentals of Data Structures in C”,Computer Science Press, 1992.

Page 2: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

AVL Trees

• Dynamic tables may also be maintained as binary search trees.

• Depending on the order of the symbols putting into the table, the resulting binary search trees would be different. Thus the average comparisons for accessing a symbol is different.

Page 3: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Binary Search Tree for The Months of The Year

JAN

APR

AUG

DEC

SEPT

OCT

NOV

FEB MAR

MAYJUNE

JULY

Input Sequence: JAN, FEB, MAR, APR, MAY, JUNE, JULY, AUG, SEPT, OCT, NOV, DEC

Max comparisons: 6

Average comparisons: 3.5

Page 4: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

A Balanced Binary Search Tree For The Months of The

Year

JULY

Input Sequence: JULY, FEB, MAY, AUG, DEC, MAR, OCT, APR, JAN, JUNE, SEPT, NOV

APR DEC JUNE

MAR

NOV SEPT

OCTAUG

FEB MAY

JAN

Max comparisons: 4

Average comparisons: 3.1

Page 5: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Degenerate Binary Search Tree

APR

Input Sequence: APR, AUG, DEC, FEB, JAN, JULY, JUNE, MAR, MAY, NOV, OCT, SEPT

AUG

DEC

FEB

JAN

JULY

JUNE

MAR

SEPT

NOV

OCT

MAY

Max comparisons: 12

Average comparisons: 6.5

Page 6: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Minimize The Search Time of Binary Search Tree In Dynamic Situation

• From the above three examples, we know that the average and maximum search time will be minimized if the binary search tree is maintained as a complete binary search tree at all times.

• However, to achieve this in a dynamic situation, we have to pay a high price to restructure the tree to be a complete binary tree all the time.

• In 1962, Adelson-Velskii and Landis introduced a binary tree structure that is balanced with respect to the heights of subtrees. As a result of the balanced nature of this type of tree, dynamic retrievals can be performed in O(log n) time if the tree has n nodes. The resulting tree remains height-balanced. This is called an AVL tree.

Page 7: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

AVL Tree

• Definition: An empty tree is height-balanced. If T is a nonempty binary tree with TL and TR as its left and right subtrees respectively, then T is height-balanced iff

(1) TL and TR are height-balanced, and (2) |hL – hR| ≤ 1 where hL and hR are the heights of

TL and TR, respectively.• Definition: The Balance factor, BF(T) , of a

node T is a binary tree is defined to be hL – hR, where hL and hR, respectively, are the heights of left and right subtrees of T. For any node T in an AVL tree, BF(T) = -1, 0, or 1.

Page 8: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Balanced Trees Obtained for The Months of The Year

MAR

0

MAR

-1

MAY

0

(a) Insert MARCH

(b) Insert MAY

MAR

-2

MAY

-1

(c) Insert NOVEMBER

NOV

0

MAY

0

NOV

00

MAR

RR

(d) Insert AUGUST

MAY

+1

NOV

0+1

MAY0

AUG

Page 9: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Balanced Trees Obtained for The Months of The Year

(Cont.)

(e) Insert APRIL

MAY

+2

NOV

0+2

MAR+1

AUG0

APR

LL MAY

+1

NOV

00

AUG0

APR MAR

0

MAY

+2

NOV

0-1

AUG0

APR MAR

+1

0

JAN (f) Insert JANUARY

MAR

0

MAY

-10

AUG0

APR JAN

0

NOV

0

LR

Page 10: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Balanced Trees Obtained for The Months of The Year

(Cont.)

MAR

+1

MAY

-1-1

AUG0

APR JAN

+1 NOV

0

0

DEC

(g) Insert DECEMBER

MAR

+1

MAY

-1-1

AUG0

APR JAN

0

NOV

0

0

DEC

(h) Insert JULY

JULY

0

Page 11: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Balanced Trees Obtained for The Months of The Year

(Cont.)

(i) Insert FEBRUARY

MAR

+2

MAY

-2-2

AUG0

APR JAN

+1 NOV

0

-1

DEC JULY

0

FEB

0

MAR

+1

MAY

-10

DEC+1

AUG JAN

0

NOV

0

0

FEB JULY

0

RL

0

APR

Page 12: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Balanced Trees Obtained for The Months of The Year

(Cont.)

(j) Insert JUNE

LR 0

JAN

AUG FEB

NOV

JULY MAY

APR JUNE

DEC MAR

+1

0

+1

0

0 0 0

-1 -1

+2

MAR

AUG JAN

JULY

NOV

DEC MAY

+1

-1

-1

0

JUNE

0FEB

0

-1-1

APR

0

Page 13: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Balanced Trees Obtained for The Months of The Year

(Cont.)

APR

AUG FEB

JAN

DEC MAR

JULY MAY

NOV

OCT

JUNE

-1

+1

-1

-1

-1

-2

0

0

0+1

0

(k) Insert OCTOBER

APR

AUG FEB

JAN

DEC MAR

JULY NOV

MAY OCTJUNE

-1

+1

0

-1

0

0

00

0+1

0

RR

Page 14: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Balanced Trees Obtained for The Months of The Year

(Cont.)

(i) Insert SEPTEMBER

JAN

DEC MAR

AUG FEB JULY NOV

-1

+1

0 -1

OCT

-1

MAY

0

JUNE

0

APR

-1-1

0

SEPT

0+1

Page 15: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Rebalancing Rotation of Binary Search Tree

• LL: new node Y is inserted in the left subtree of the left subtree of A

• LR: Y is inserted in the right subtree of the left subtree of A

• RR: Y is inserted in the right subtree of the right subtree of A

• RL: Y is inserted in the left subtree of the right subtree of A.

• If a height–balanced binary tree becomes unbalanced as a result of an insertion, then these are the only four cases possible for rebalancing.

Page 16: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Rebalancing Rotation LL

+1A

0B

BL BR

AR

h+2h

+2A

+1B

BL BR

AR

0B

BL

0A

BR AR

h+2

LL

height of BL increases to h+1

Page 17: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Rebalancing Rotation RR

-1A

0B

BRBL

AL

h+2

-2A

-1B

BRBL

AL

0B

BR

0A

BLAL

h+2

RR

height of BR increases to h+1

Page 18: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Rebalancing Rotation LR(a)

+1A

0B

+2A

-1B

0C

0B

0C

0A

LR(a)

Page 19: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Rebalancing Rotation LR(b)

+1A

0B

0C

BLh

CL CR

AR

h+2

h

+2A

-1B

+1C

BL

CL CR

AR

LR(b)

0C

0B

-1A

CLBLCR AR

h+2

h

Page 20: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Rebalancing Rotation LR(c)

+2A

-1B

-1C

BL

CL CR

AR

LR(c)

0C

+1B

0A

CLBLCR AR

h+2

h

Page 21: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

AVL Trees (Cont.)

• Once rebalancing has been carried out on the subtree in question, examining the remaining tree is unnecessary.

• To perform insertion, binary search tree with n nodes could have O(n) in worst case. But for AVL, the insertion time is O(log n).

Page 22: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

AVL Insertion Complexity

• Let Nh be the minimum number of nodes in a height-balanced tree of height h. In the worst case, the height of one of the subtrees will be h-1 and that of the other h-2. Both subtrees must also be height balanced. Nh = Nh-1 + Nh-2 + 1, and N0 = 0, N1 = 1, and N2 = 2.

• The recursive definition for Nh and that for the Fibonacci numbers Fn= Fn-1 + Fn-2, F0=0, F1= 1.

• It can be shown that Nh= Fh+2 – 1. Therefore we can derive that . So the worst-case insertion time for a height-balanced tree with n nodes is O(log n).

15/2 hhN

Page 23: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Probability of Each Type of Rebalancing Rotation

• Research has shown that a random insertion requires no rebalancing, a rebalancing rotation of type LL or RR, and a rebalancing rotation of type LR and RL, with probabilities 0.5349, 0.2327, and 0.2324, respectively.

Page 24: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Comparison of Various Structures

Operation Sequential List Linked List AVL Tree

Search for x O(log n) O(n) O(log n)

Search for kth item

O(1) O(k) O(log n)

Delete x O(n) O(1)1 O(log n)

Delete kth item

O(n - k) O(k) O(log n)

Insert x O(n) O(1)2 O(log n)

Output in order

O(n) O(n) O(n)

1. Doubly linked list and position of x known.

2. Position for insertion known

Page 25: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

2-3 Trees

• If search trees of degree greater than 2 is used, we’ll have simpler insertion and deletion algorithms than those of AVL trees. The algorithms’ complexity is still O(log n).

• Definition: A 2-3 tree is a search tree that either is empty or satisfies the following properties:

(1) Each internal ndoe is a 2-node or a 3-node. A 2-node has one element; a 3-node has two elements.

(2) Let LeftChild and MiddleChild denote the children of a 2-node. Let dataL be the element in this node, and let dataL.key be its key. All elements in the 2-3 subtree with root LeftChild have key less than dataL.key, whereas all elements in the 2-3 subtree with root MiddleChild have key greater than dataL.key.

(3) Let LeftChild, MiddleChild, and RightChild denote the children of a 3-node. Let dataL and dataR be the two elements in this node. Then, dataL.key < dataR.key; all keys in the 2-3 subtree with root LeftChild are less than dataL.key; all keys in the 2-3 subtree with root MiddleChild are less than dataR.key and greater than dataL.key; and all keys in the 2-3 subtree with root RightChild are greater than dataR.key.

(4) All external nodes are at the same level.

Page 26: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

2-3 Tree Example

40

10 20 80

A

B C

Page 27: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

The Height of A 2-3 Tree

• Like leftist tree, external nodes are introduced only to make it easier to define and talk about 2-3 trees. External nodes are not physically represented inside a computer.

• The number of elements in a 2-3 tree with height h is between 2h - 1 and 3h - 1. Hence, the height of a 2-3 tree with n elements is between and )1(log3 n )1(log2 n

Page 28: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

2-3 Tree Data Structure

typedef struct two_three *two_three_ptr;

struct two_three {

element data_l, data_r;

two_three_ptr left_child, middle_child,

right_child;

};

Page 29: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Searching A 2-3 Tree• The search algorithm for binary search tree

can be easily extended to obtain the search function of a 2-3 tree (Search()23).

• The search function calls a function compare that compares a key x with the keys in a given node p. It returns the value 1, 2, 3, or 4, depending on whether x is less than the first key, between the first key and the second key, greater than the second key, or equal to one of the keys in node p.

Program 10.4: Function to search a 2-3 tree

Page 30: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Insertion Into A 2-3 Tree

• First we use search function to search the 2-3 tree for the key that is to be inserted.

• If the key being searched is already in the tree, then the insertion fails, as all keys in a 2-3 tree are distinct. Otherwise, we will encounter a unique leaf node U. The node U may be in two states:– the node U only has one element: then the key can

be inserted in this node.– the node U already contains two elements: A new

node is created. The newly created node will contain the element with the largest key from among the two elements initially in p and the element x. The element with the smallest key will be in the original node, and the element with median key, together with a pointer to the newly created node, will be inserted into the parent of U.

Page 31: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Insertion to A 2-3 Tree Example

40

10 20 70 80

A

B C

(a) 70 inserted

20 40

10 30

A

B D

(b) 30 inserted

8070

C

Page 32: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Insertion of 60 Into Figure 10.15(b)

20

10 30

A

B D

60

C

80

E

70

F

40

G

Page 33: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Node Split

• From the above examples, we find that each time an attempt is made to add an element into a 3-node p, a new node q is created. This is referred to as a node split.

Program 10.5: Insertion into a 2-3 tree (P.501)

Page 34: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Deletion From a 2-3 Tree

• If the element to be deleted is not in a leaf node, the deletion operation can be transformed to a leaf node. The deleted element can be replaced by either the element with the largest key on the left or the element with the smallest key on the right subtree.

• Now we can focus on the deletion on a leaf node.

Page 35: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Deletion From A 2-3Tree Example

50 80

10 20 60 70

A

B D

9590

C

50 80

10 20 60

A

B D

9590

C

50 80

10 20 60

A

B D

95

C

(a) Initial 2-3 tree(b) 70 deleted

(c) 90 deleted

Page 36: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Deletion From A 2-3Tree Example (Cont.)

20 80

10 50

A

B D

95

C

(d) 60 deleted

20

10 50 80

A

B C

(e) 95 deleted

20

10 80

A

B C

(f) 50 deleted

20 80

B(g) 10 deleted

Page 37: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Rotation and Combine

• As shown in the example, deletion may invoke a rotation or a combine operations.

• For a rotation, there are three cases– the leaf node p is the left child of its parent r.– the leaf node p is the middle child of its

parent r.– the leaf node p is the right child of its parent

r.

Page 38: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Three Rotation Cases

x ?

y z

a b dc

r

p q

y ?

x z

a b dc

r

p q

z ?

x y

a b dc

r

pq

y ?

x z

a b dc

r

pq

w zr

px yq

b dc e

w yr

zp

xq

b dc e

a

(a) p is the left child of r

(b) p is the middle child of r

(c) p is the right child of r

Page 39: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Steps in Deletion From a Leaf Of a 2-3 Tree

• Step 1: Modify node p as necessary to reflect its status after the desired element has been deleted.

• Step 2: while( p has zero elements && p is not the root ) { let r be the parent of p; let q be the left or right sibling of p ( as

appropriate ); if( q is a 3-node )

rotate; else

combine; p=r; }

• Step 3: If p has zero elements, then p must be the root. The

left child of p becomes the new root, and node p is deleted.

Page 40: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Combine When p is the Left Child of r

x z

y

a b c

r

p q

z

x y

a b c

r

p

x z

y

a b c

r

pq

a

rz

x

b

d p

(a)

(b)

c

d

Page 41: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

M-Way Search Tree

Definition: An m-way search tree, either is empty or satisfies the following properties:

(1)The root has at most m subtrees and has the following structures:

n, A0, (K1, A1), (K2, A2), …, (Kn, An) where the Ai, 0 ≤ i ≤ n ≤ m, are pointers to subtrees, and

the Ki, 1 ≤ i ≤ n ≤ m, are key values.(2) Ki < Ki +1, 1 ≤ i ≤ n (3) All key values in the subtree Ai are less than Ki +1 and greater then Ki , 0 ≤ i ≤ n (4) All key values in the subtree An are greater than Kn , and those in A0 are less than K1.(5) The subtrees Ai, 0 ≤ i ≤ n , are also m-way search trees.

Page 42: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Searching an m-Way Search Tree

• Suppose to search a m-Way search tree T for the key value x. Assume T resides on a disk. By searching the keys of the root, we determine i such that Ki ≤ x < Ki+1.– If x = Ki, the search is complete.

– If x ≠ Ki, x must be in a subtree Ai if x is in T.

– We then proceed to retrieve the root of the subtree Ai and continue the search until we find x or determine that x is not in T.

Page 43: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Searching an m-Way Search Tree

• The maximum number of nodes in a tree of degree m and height h is

• Therefore, for an m-Way search tree, the maximum number of keys it has is mh - 1.

• To achieve a performance close to that of the best m-way search trees for a given number of keys n, the search tree must be balanced.

10

)1/()1(hi

hi mmm

Page 44: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

B-Tree

Definition: A B-tree of order m is an m-way search tree that either is empty or satisfies the following properties:

(1) The root node has at least two children.

(2) All nodes other than the root node and failure nodes have at least children.

(3) All failure nodes are at the same level.

2/m

Page 45: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

B-Tree (Cont.)

• Note that 2-3 tree is a B-tree of order 3 and 2-3-4 tree is a B-tree of order 4.

• Also all B-trees of order 2 are full binary trees.• A B-tree of order m and height l has at most ml -1 keys.• For a B-tree of order m and height l, the minimum

number of keys (N) in such a tree is • If there are N key values in a B-tree of order m, then all

nonfailure nodes are at levels less than or equal to l, . The maximum number of accesses that have to be made for a search is l.

• For example, a B-tree of order m=200, an index with N ≤ 2x106-2 will have l ≤ 3.

.1 ,12/2 1 lmN l

1}2/)1{(log 2/ Nl m

Page 46: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

The Choice of m

• B-trees of high order are desirable since they result in a reduction in the number of disk accesses.

• If the index has N entries, then a B-tree of order m=N+1 has only one level. But this is not reasonable since all the N entries can not fit in the internal memory.

• In selecting a reasonable choice for m, we need to keep in mind that we are really interested in minimizing the total amount of time needed to search the B-tree for a value x. This time has two components:(1)the time for reading in the node from the disk(2) the time needed to search this node for x.

Page 47: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

The Choice of m (Cont.)

• Assume a node of a B-tree of order m is of a fixed size and is large enough to accommodate n, A0 , and m-1 triple (Ki , Ai , Bi), 1 ≤ j < m.

• If the Ki are at most charactersα long and Ai and Bi each characters βlong, then the size of a node is about m(α+2β). Then the time to access a node is

ts + tl + m(α+2β) tc = a+bm

where a = ts + tl = seek time + latency time

b = (α+2β) tc , and tc = transmission time per character.• If binary search is used to search each node of the B-tree,

then the internal processing time per node is c log2 m+d for some constants c and d.

• The total processing time per node is τ= a + bm + c log2 m+d • The maximum search time is where f is some constant.

}loglog

{*}2/)1{(log*22

2 cm

bm

m

daNf

Page 48: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Figure 10.36: Values of (35+0.06m)/log2m

m Search time (sec)

2 35.12

4 17.62

8 11.83

16 8.99

32 7.38

64 6.47

128 6.10

256 6.30

512 7.30

1024 9.64

2048 14.35

4096 23.40

8192 40.50

Page 49: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Figure 10.37: Plot of (35+0.06m)/log2m

50 400125

5.7

6.8

m

Tota

l m

axim

um

searc

h

tim

e

Page 50: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Insertion into a B-Tree

• Instead of using 2-3-4 tree’s top-down insertion, we generalize the two-pass insertion algorithm for 2-3 trees because 2-3-4 tree’s top-down insertion splits many nodes, and each time we change a node, it has to be written back to disk. This increases the number of disk accesses.

• The insertion algorithm for B-trees of order m first performs a search to determine the leaf node p into which the new key is to be inserted.– If the insertion of the new key into p results p having m keys, the

node p is split.– Otherwise, the new p is written to the disk, and the insertion is

complete.• Assume that the h nodes read in during the top-down pass

can be saved in memory so that they are not to be retrieved from disk during the bottom-up pass, then the number of disk accesses for an insertion is at most h (downward pass) +2(h-1) (nonroot splits) + 3(root split) = 3h+1.

• The average number of disk accesses is approximately h+1 for large m.

Page 51: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Figure 10.38: B-Trees of Order 3

10, 30

10 25, 30

20

10 25, 30

20, 28

10

(a) p = 1, s = 0

(b) p = 3, s = 1

(c) p = 4, s = 2

p is the number of nonfailure nodes in the final B-tree with N entries.

s is the number of split

Page 52: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Deletion from a B-Tree

• The deletion algorithm for B-tree is also a generalization of the deletion algorithm for 2-3 trees.

• First, we search for the key x to be deleted.– If x is found in a node z, that is not a leaf, then the

position occupied by x in z is filled by a key from a leaf node of the B-tree.

– Suppose that x is the ith key in z (x =Ki). Then x may be replaced by either the smallest key in the sbutree Ai or the largest in the subtree Ai-1. Since both nodes are leaf nodes, the deletion of x from a nonleaf node is transformed into a deletion from a leaf.

Page 53: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Deletion from a B-Tree (Cont.)

• There are four possible cases when deleting from a leaf node p. – In the first case, p is also the root. If the root is left with at

least one key, the changed root is written back to disk. Otherwise, the B-tree is empty following the deletion.

– In the second case, following the deletion, p has at least keys. The modified leaf is written back to disk.– In the third case, p has keys, and its nearest

sibling, q, has at least keys. Check only one of p’s nearest siblings. p is deficient, as it has one less than the minimum number of keys required. q has more keys than the minimum required. As in the case of a 2-3 tree, a rotation is performed. In this rotation, the number of keys in q decreases by one, and the number in p increases by one.

– In the fourth case, p has keys, and q has keys. p is deficient and q has minimum number of keys permissible for a nonroot node. Nodes p and q and the keys Ki are combined to form a single node.

12/ m

22/ m 2/m

22/ m 12/ m

Page 54: CHAPTER 10 Search Structures All the programs in this file are selected from Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed “Fundamentals of Data

Figure 10.39 B-Tree of Order 5

2 20 35

2 10 15 2 25 30 3 40 45 50