file structuresnu-oopsla lab1 chap 9. multilevel indexing and b-trees 서울대학교...
TRANSCRIPT
File Structure SNU-OOPSLA Lab 1
Chap 9Chap 9. Multilevel Indexing a. Multilevel Indexing and B-Treesnd B-Trees
서울대학교 컴퓨터공학부객체지향시스템연구실SNU-OOPSLA-LAB
김 형 주 교수
File Structures by Folk, Zoellick, and Ricarrdi
File Structure SNU-OOPSLA Lab 2
Chapter Objectives(1)Chapter Objectives(1) Place the development of B-trees in the historical context of the
problems they were designed to solve Look briefly at other tree structures that might be used on
secondary storage, such as paged AVL trees Introduce multirecord and multilevel indexes and evaluate the
speed of the search operation Provide an understanding of the important properties possessed
by B-trees, and show how these properties are especially well suited to secondary storage applications
File Structure SNU-OOPSLA Lab 3
Chapter Objectives(2)Chapter Objectives(2)
Present the object-oriented design of B-trees define class BTreeNode and Btree
Explain the implementation of the fundamental operations on B-trees
Introduce the notion of page buffering and virtual B-trees Describe variations of the fundamental B-trees algorithms, such
as those used to build B* trees and B-trees with variable-length records
File Structure SNU-OOPSLA Lab 4
Contents(1)Contents(1)
9.1 Introduction
9.2 Statement of the Problem
9.3 Indexing with Binary Search Trees
: AVL Trees, Paged Binary Trees, Problems with Paged Tress
9.4 Multilevel Indexing
9.5 B-Trees
9.6 Example of Creating a B-Tree
9.7 An Object-Oriented Representation of B-Trees
: Class BTreeNode , Class BTree
File Structure SNU-OOPSLA Lab 5
Contents(2)Contents(2)
9.8 B-Tree Methods Search, Insert, and Others
9.9 B-Tree Nomenclature
9.10 Formal Definition of B-Tree Properties
9.11 Worst-case Search Depth
9.12 Deletion, Merging, and Redistribution
9.13 Redistribution During Insertion
9.14 B* Trees
9.15 Buffering of Pages : Virtual B-Trees
9.16 Variable-Length Records and Keys
File Structure SNU-OOPSLA Lab 6
IntroductionIntroduction: The Invention of the B-tree: The Invention of the B-tree
1972 Acta Infomatica : R. Bayer and E. McCreight (at Boeing Corporation) “Organization and Maintenance of Large Ordered Indexes”
1979 : ‘de facto’ standard for database index D.Comer “The Ubiquitous B-tree” ACM Computing Survey
Why the name B-tree? Balanced, Bushy, Broad, Boeing, Bayer
Retrieval, Insertion, Deletion time
= log K I ( I : no of indexes in file, K : no of indexes in a page)
Excellent for dynamically changing random access files
9.1 Introduction : Invention of the B-Tree
File Structure SNU-OOPSLA Lab 7
Statement of the ProblemStatement of the Problem Problems in an index on secondary storage
Searching the index must be faster than binary searching
In binary search:
15 items - 4 seeks, 1,000 items - 9.5 seeks
Insertion and deletion must be as fast as search inserting a key may involve moving many other
keys in some file structures
9.2 Statement of the Problem
File Structure SNU-OOPSLA Lab 8
Binary Search Tree(1) Binary Search Tree(1)
Advantages Data may not be physically sorted Good performance on balanced tree Insert cost = search cost
Disadvantages In out-of-balance binary tree, more seeks
are required
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 9
Binary Search Tree(2)Binary Search Tree(2) Sorted list of keys
AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ
KF
FB
CL HN
SD
PA WS
DE FT JD NR RF TK YJAX
At most 4 seeks/one recordBinary search tree
representation
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 10
Internal Representation of Binary TreeInternal Representation of Binary Tree
With RRN(fixed length record) or pointer
9.3 Indexing with Binary Search Trees
ROOTFB
JD
RF
SD
AX
YJPA
HN
KF
CL
NR
DE
WS
TK
0
1
2
3
4
5
6
7
8
9
10
11
12
13
FT
14
10 8
6 13
11 2
7 1
0 3
4 12
14 5
key left right key left right
9
File Structure SNU-OOPSLA Lab 11
UnbalancedUnbalanced Binary Tree Binary Tree
- At most 9 seeks/one record
YJ
KF
FB
CL HN
SD
PA WS
AX DE FT JD NR RF TK
LV
LA NP
MB
ND
NK
- Worst case : sequential search
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 12
AVL Tree(1)AVL Tree(1) A height-balanced k tree ( HB(k) tree)
Allowable difference in the height of any two sub-tree is k
AVL Tree : HB(1) Tree G.M. Adel’son, Vel’skii, E.M. Landis
Maintenance overhead is needed Performance
Given N keys, worst-case search => 1.44 log2(N+2)
cf. Completely balanced AVL tree : worst-case search => log2(N+1)
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 13
AVL Tree(2)AVL Tree(2)9.3 Indexing with Binary Search Trees
(a) AVL Trees
X X
X X
(b) Non - AVL Trees
File Structure SNU-OOPSLA Lab 14
AVL Tree(3)AVL Tree(3)
Binary tree structure that is balanced nature with respect to the height of subtree
Definition An empty tree is height balanced
If T is a nonempty binary tree with TL and TR as its
left and right subtrees, then T is height balanced iff (1) TL and TR are height balanced and (2) |hL-hR|<1
where hL and hR are the heights of TL and TR,
respectively
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 15
AVL Tree(4)AVL Tree(4) BalanceFactor, BF(T), of a node T in a binary tree
is hL-hR where hL and hR are the height of the left a
nd right subtree of T
For any node in tree T in AVL tree, BF(T) should be one of “ -1, 0, 1”
If BF(T) is -2 or 2, then proper rotation is carried out in order to get balance
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 16
AVL Tree(5)AVL Tree(5)
New Identifier
MARCH
After Insertion No Rebalancing needed
0MAR
New Identifier
MAY
After Insertion No Rebalancing needed
New Identifier
NOVEMBER
After Insertion After Rebalancing
-1MAR
0MAY
-2MAR
-1MAY
0NOV
0MAY
0MAR
0NOV
RR
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 17
AVL Tree(6)AVL Tree(6)
New Identifier
AUGUST
After Insertion No Rebalancing needed
+1MAY
+1MAR
0AUG
0NOV
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 18
AVL Tree(7)AVL Tree(7)
New Identifier
APRIL
After Insertion After Rebalancing
+2MAY
+2MAR
+1AUG
0NOV
0APR
+1MAY
0AUG
0APR
0NOV
0MAR
LL
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 19
AVL Tree(8)AVL Tree(8)
+2MAY
-1AUG
0APR
0NOV
+1MAR
New Identifier
JANUARY
After Insertion After Rebalancing
0JAN
0MAR
0AUG
-1MAY
0JAN
0NOV
0APR
LR
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 20
AVL Tree(9)AVL Tree(9)
New Identifier
DECEMBER
After Insertion No Rebalancing needed
+1MAR
-1AUG
-1MAY
+1JAN
0NOV
0APR
0DEC
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 21
AVL Tree(10)AVL Tree(10)
New Identifier
JULY
After Insertion No Rebalancing needed
+1MAR
-1AUG
-1MAY
0JAN
0NOV
0APR
0DEC
0JUL
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 22
AVL Tree(11)AVL Tree(11)
New Identifier
FEBRUARY
After Insertion After Rebalancing
+2MAR
-2AUG
-1MAY
+1JAN
0NOV
0APR
-1DEC
0JUL
0FEB
+1MAR
0DEC
-1MAY
0JAN
+1AUG
0NOV
0APR
0FEB
0JUL
RL
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 23
AVL Tree(12)AVL Tree(12)
New Identifier
JUNE
After Insertion After Rebalancing
+2MAR
-1DEC
-1MAY
-1JAN
+1AUG
0NOV
0APR
0FEB
-1JUL
0JUN
0JAN
+1DEC
0MAR
0FEB
+1AUG
0APR
-1MAY
-1JUL
0JUN
-1NOV
LR
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 24
AVL Tree(13)AVL Tree(13)
-1JAN
+1DEC
-1MAR
0FEB
+1AUG
0APR
-2MAY
-1JUL
0JUN
-1NOV
New Identifier
OCTOBER
After Insertion
0OCT
After Rebalancing
RR
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 25
AVL Tree(14)AVL Tree(14)
0JAN
+1DEC
0MAR
0FEB
+1AUG
0APR
0NOV
-1JUL
0JUN
0OCT
0MAY
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 26
AVL Tree(15)AVL Tree(15)
New Identifier
SEPTEMBER
After Insertion No Rebalancing needed
-1JAN
+1DEC
-1MAR
0FEB
+1AUG
0APR
-1NOV
-1JUL
0JUN
-1OCT
0MAY
0SEP
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 27
AVL Tree : AVL Tree : Rebalancing(1)Rebalancing(1)
Rebalancing is carried out using four different kinds of rotations LL when new node Y is inserted in the left subtree of
the left subtree of A LR when new node Y is inserted in the right subtree
of the left subtree of A RR when new node Y is inserted in the right subtree
of the right subtree of A RL when new node Y is inserted in the left subtree
of the right subtree of A
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 28
AVL Tree : AVL Tree : Rebalancing(2)Rebalancing(2)
A
Insert Y
LL LR RL RR
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 29
AVL Tree : AVL Tree : Rebalancing(LL)Rebalancing(LL)
+1A
0B
BLBR
AR
h
h+2
+2A
0B
BLBR
AR
0B
0A
BRAR
BL
rotation typerotation typeLLLL
h+2
Balanced SubtreeUnbalanced following
insertion
Height of BL increase to h+1(BL < B < BR < A < AR)
Balanced Subtree
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 30
AVL Tree : AVL Tree : Rebalancing(RR)Rebalancing(RR)
-1A
0B
BLBR
AL
0B
0A
AlBL
BR
rotation typerotation typeRRRR
h+2
Balanced SubtreeUnbalanced following
insertion
Height of BR increase to h+1(AL < A < BL < B < BR)
h+2
-2A
0B
BLBR
AL
Balanced Subtree
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 31
AVL Tree : AVL Tree : Rebalancing(LR)Rebalancing(LR)
+1A
0B
Balanced Subtree Unbalanced followinginsertion
+1A
-1B
0C
Balanced Subtree
0C
0B
0A
rotation typerotation typeLR(a)LR(a)
9.3 Indexing with Binary Search Trees
(B < C < A)
File Structure SNU-OOPSLA Lab 32
AVL Tree : AVL Tree : Rebalancing(LR)Rebalancing(LR)
Balanced SubtreeUnbalanced following
insertionBalanced Subtree
+1A
BL
0B
0C
CLCR
h
h-1
AR h+2
+2A
BL
-1B
+1C
CLCR
AR
0C
0B
-1A
BL CL CR AR
rotation typerotation typeLR(b)LR(b)
h
h+2
h
9.3 Indexing with Binary Search Trees
(BL < B < CL < C < CR < A < AR)
File Structure SNU-OOPSLA Lab 33
AVL Tree : AVL Tree : Rebalancing(LR)Rebalancing(LR)Balanced Subtree
Unbalanced followinginsertion
Balanced Subtree
+1A
BL
0B
0C
CLCR
h
h-1
AR h+2
+2A
BL
-1B
-1C
CLCR
AR
0C
+1B
0A
BL CL CR AR
rotation typerotation typeLR(c)LR(c)
h+2
RL a, b and c are symmetric to LR a, b and c
h
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 34
Paged Binary Tree(1)Paged Binary Tree(1) Page
A unit of disk I/O for handling seek and transfer of disk data Typically, 4k, 8k, 16k ...
Paged Binary Tree Divide a binary tree into pages and then store each page in a
block of contiguous locations on disk. If every page holds 7 keys, 511 nodes(keys) in only three seeks
Performance : # of seeks
A completely full balanced tree : log2 (N+1)
A completely full paged tree : log(k+1) (N+1)
(k : # of keys hold in a single page)
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 35
Paged Binary Tree(2)Paged Binary Tree(2)
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 36
The Problem with Paged TreesThe Problem with Paged Trees
Only valid when we have the entire set of keys in hand before the tree is built
Problems due to out of balance How to select a good separator How to group keys How to guarantee the maximum loading
B-tree provides a solution for above problems!
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 37
Paged Binary Tree (Out of balance)Paged Binary Tree (Out of balance)
I P
X
G
E H
D
C
A
B
F
M
S
U
T W
V YK N R
O QJ L Z
random input sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V
9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab 38
Multilevel IndexingMultilevel Indexing Approach as simple index record
limited on the number of keys allowed Approach as multirecord index
consists of a sequence of simple index records binary search is too expensive
Approach as multilevel index reduced the number of records to be searched speed up the search
<example> 80Mbytes file of 8,000,000 records
10-byte keys
9.4 Multilevel Indexing : A Better Approach to Tree Indexes
File Structure SNU-OOPSLA Lab 39
Example of Multilevel IndexingExample of Multilevel Indexing
9.4 Multilevel Indexing : A Better Approach to Tree Indexes
1 2 8 4th level index
a single index record with 8 keys
1 2 . . . 100
: :801 800
1
12::8
3rd level index
8 index records to index the largest keys in the 800 second-level records
2nd level index1 2 . . . 100
: :901 1000
: :
7901 8000
12::9:::
800
800 index records with 80,000 keyschoose one of the keys in each index record as thekey of that whole record
Lowest level index is an index to data file and its reference fields are record addresses in the data file
File Structure SNU-OOPSLA Lab 40
Multi-level Indexing(3)Multi-level Indexing(3)
How can we insert new keys into the multilevel index? The index records in some level might be full The several levels of indexes might be rebuilt Overflow chain may be helpful, but still ugly
Multi-level index structure is not strong in dynamic data processing applications
B-tree will give you the right solution!
File Structure SNU-OOPSLA Lab 41
B-TreesB-Trees: Working up from the bottom: Working up from the bottom
Bayer and McCreight, 1972, Acta Infomatica
Build trees upward from the bottom instead of
downward from the top
Each node of B-tree is an index record which
consists of “key-reference” pairs The order of B-tree: the max number of key-reference pairs
Every index record should have at least half of the order
9.5 B-Trees:Working up from the bottom
File Structure SNU-OOPSLA Lab 42
Sample B-Tree Sample B-Tree TD
A C M PI TS
P
D
A C D I M D S T DPD
File Structure SNU-OOPSLA Lab 43
Splitting & Promoting(1)Splitting & Promoting(1)
Splitting Creation of two nodes out of one because the
original node becomes overfull Result in the need to promote a key to a higher-
level node to provide an index separating the two new nodes
Promotion of a key Movement of a key from one node into a higher-
level node when split occurs
9.6 Example of Creating a B-Tree
File Structure SNU-OOPSLA Lab 44
Splitting & Promoting(2)Splitting & Promoting(2)
* * * * * * * *A B C D E F G
* * * * * * * *E F G J
Initial leaf of a B-tree with a page size of seven
Splitting the leaf to accommodate the new J key
Insert J key
(continued....)
* * * * * * * *A B C D
9.6 Example of Creating a B-Tree
File Structure SNU-OOPSLA Lab 45
Splitting & Promoting(3)Splitting & Promoting(3)
* * * * * * * *A B C D * * * * * * * *E F G
* * * * * *D
Promotion of the E key into a root node
9.6 Example of Creating a B-Tree
J
J
File Structure SNU-OOPSLA Lab 46
Insertion in B-tree(1)Insertion in B-tree(1) Input Sequence
: C S D T A M P I B W N G U R K E H O L J Y Q Z F X V
C D S
Insertion of C, S, D, Tinto the initial page
D
DA C S
Insertion of A causes node to split and the largest key in each leaf node(D and T)to beplaced in the root node
9.6 Example of Creating a B-Tree
T T
T
File Structure SNU-OOPSLA Lab 47
Insertion in B-tree(2)Insertion in B-tree(2)9.6 Example of Creating a B-Tree
TD
A C M PI TS
P
M and P are inserted into the rightmost leaf node,then insertion of I causes it to split
D
File Structure SNU-OOPSLA Lab 48
Insertion in B-tree(3)Insertion in B-tree(3)
A B G N
PD M
C I M P
Insertions of B,W,N, and G into leaf nodes causesanother split and the root is now full
9.6 Example of Creating a B-Tree
W
D WS T
File Structure SNU-OOPSLA Lab 49
Insertion in B-tree(4)Insertion in B-tree(4)
Insertion of U proceeds without incident, but R would have to be inserted into the rightmost leaf, which is full
9.6 Example of Creating a B-Tree
A B G N
PD M
C I M P
W
D US T W
File Structure SNU-OOPSLA Lab 50
Insertion in B-tree(5)Insertion in B-tree(5)
Insertion of causes the rightmost leaf node to split, insertion intothe root to split and the tree grows to level three
9.6 Example of Creating a B-Tree
P W
D M P T W
A B C D G I M N P R S T U W
File Structure SNU-OOPSLA Lab 51
Insertion in B-tree(6)Insertion in B-tree(6)
Insertions of K,E,H,O,L,J,Y,Q, and Z, continue with another node split
9.6 Example of Creating a B-Tree
P Z
D I M P T Z
A B C D E G M I J K L M
Q R S T U W Y Z
N O P
File Structure SNU-OOPSLA Lab 52
Insertion in B-tree(7)Insertion in B-tree(7)
Insertions of F, X, and V finish the insertion of the alphabet
9.6 Example of Creating a B-Tree
I P Z
D G I T X ZM P
A B C D
E F G H I
J K L M
N O P
Q R S T
U V W X
Y Z
File Structure SNU-OOPSLA Lab 53
Insertion in B-treesInsertion in B-trees
Major components of insertion Split the node Promote the middle key Increase the height of the B-tree
Insertion may touch no more than 2 nodes per level
Insertion cost is strictly linear in the height of the tree
File Structure SNU-OOPSLA Lab 54
Class BTreeNode(1)Class BTreeNode(1) Represent B-Tree nodes in memory
B-tree is an index file associated with a data file Specified in btnode.h of Appendix I The template BTreeNode class based on the SimpleI
ndex template class
9.7 An Object-Oriented Representation of B-Trees
SimpleIndex Class
BTreeNode Class
Public methodsInsert, Remove, Clear, SearchPrint, NumKeys
Public methodsInsert, Remove, LargestKeySplit, Pack, Unpack
File Structure SNU-OOPSLA Lab 55
Class BTreeNode(2)Class BTreeNode(2) Members Public methods:
insert : simply calls SimpleIndex::Insert and then check for overflow
remove a key, split and merge nodes search : inherited from SimpleIndex class(works perfectly
well) pack/unpack : manage the difference between the memory
and the disk representation of BTreeNode objects Protected member
store the file address of the node and the minimum and maximum number of keys
9.7 An Object-Oriented Representation of B-Trees
File Structure SNU-OOPSLA Lab 56
Template <class keyType>
class BTreeNode: public SimpleIndex <keyType>
{public
BTreeNode(int maxKeys, int unique = 1);
int Insert (const keyType key, int recAddr);
int Remove(const keyType key, int recAddr = -1);
int LargestKey ();
int Split (BTreeNode<ketType>*newNode);
int Pack (IOBuffer& buffer);
int Unpack(IOBuffer& buffer);
protected
int MaxBKeys;
int Init();
friend class Btree<keyType>;
}
File Structure SNU-OOPSLA Lab 57
Class BTreeClass BTree Uses in-memory BTreeNode objects adds the file access portion enforces the consistent size of the nodes specified in btree.h of Appendix I Methods
Create, Open, Close a B-Tree Search, Insert, Remove key-reference pairs
Protected area Fetch(transfer nodes from disk to memory) Store(transfer nodes back to disk) root node, height of the tree, file of index records BTNode **Node:used to keep a collection of tree nodes in memory
and reduce disk access
9.7 An Object-Oriented Representation of B-Trees
File Structure SNU-OOPSLA Lab 58
Template <class keyType>
class Btree {public:
Btree(int order, int keySize=sizeof(keyType), int unique=1);
int Open (char * name, int mode);
int Create (char * name, int mode);
int Close ();
int Insert (const keyType key, const int recAddr);
int Remove (const ketType key, const int recAddr = -1);
int Search (const keyType key, const int recAddr = -1);
protected typedef BTreeNode<keyType> BTNode;
BTNode * FindLeaf (const ketType key);
BTNode * Fetch(const int recaddr);
int Store (BTNode *); BTNode Root; int Height; int Order;
BTNode ** Nodes;
RecordFile<BTNode> BtreeFile;
}|
File Structure SNU-OOPSLA Lab 59
A B G N
PD M
C I M P
W
D US T W
4 D M P W 0 3 8 5
3 G I M Nil Nil Nil Nil
Page 2
Page 3
KEY array CHILD arrayKEYCOUNT
content of PAGE 2, 3
2
0 3 8 5
Page StructurePage Structure9.8 B-Tree Methods Search, Insert, and Others
File Structure SNU-OOPSLA Lab 60
Algorithm for SearchAlgorithm for Search Searching procedure
iterative
work in two stages
operating alternatively on entire pages (Class BTree)
and then within pages (Class BTreeNode)
Step1: Loading a page into memeory
Step 2: Searching through a page, looking for the key alon
g the tree until it reaches the leaf level
9.8 B-Tree Methods Search, Insert, and Others
File Structure SNU-OOPSLA Lab 61
Search and FindLeaf methodSearch and FindLeaf method9.8 B-Tree Methods Search, Insert, and Others
recAddr = btree.Search(‘L’)call FindLeaf(‘L’);Search key in the leaf node, and then if key exists, return the data file address of record with key ‘L’ otherwise, return -1
Template <class keyType>int BTree<keyType>::Search(const keyType key, const int recAddr)
template <class keyType>BTreeNode<keyType>* BTree<keyType>::FindLeaf(const keyType key)
• Specifications of Search and FindLeaf methods(Fig 9.18)
Search down to leafNode, beginning of the rootreturn the address of leafNode
Search method
FindLeaf method
File Structure SNU-OOPSLA Lab 62
Algorithm for Insertion(1)Algorithm for Insertion(1)
Observations of Insertion, Splitting, and Promotion
proceed all the way down to the leaf level
after finding the insertion location at the leaf level, the work
proceeds upward from the bottom
Iterative procedure as having three phases
Search to the leaf level, using FindLeaf method
Insertion, overflow detection, and splitting on the
upward path
Creation of a new root node, if the current root was split
9.8 B-Tree Methods Search, Insert, and Others
File Structure SNU-OOPSLA Lab 63
With no redistribution (Step 1) Locate node on bottom most level in which to insert
record. Location is determined by key search.
(Step 2) If vacant record slot is available, insert the record so that key sequencing is maintained. Then, update the pointer associated with the record (Pointer is null for level 0 records). Then Stop!
(Step 3) If no vacant record slot exists, identify median record. All records and pointers to the left of the median records are stored in one node (the original) and those to the right are stored in another node(the new node).
Algorithm for Insertion(2)Algorithm for Insertion(2)
9.8 B-Tree Methods Search, Insert, and Others
File Structure SNU-OOPSLA Lab 64
(Step 4) If the topmost node was split, create a new topmost node which contains the median record identified in Step 3, filled with pointers to the original and split nodes. Update the root node to point to the new topmost node. Then Stop!
(Step 5) If topmost node was not split, prepare to insert median record identified in Step 3 and a pointer to the new node (created in Step 3). Then Goto Step 2.
Note : Step 4 makes B-tree increase in height by 1 level B-trees have 70% occupancy(like B+-trees) on an average
Algorithm for Insertion(3)Algorithm for Insertion(3)
9.8 B-Tree Methods Search, Insert, and Others
File Structure SNU-OOPSLA Lab 65
Insertion ExampleInsertion Example
Insert 3 Insert 19,4,20
Insert 13,16
0 0
3 3 4 19 20
Insert 1
0 1
2
split 4 20
1 3 4 19 20
0 1
2
4 20
1 3 4 13 16 19 20
0 1
2
4 16 20
1 3 4 9 13 16
3
split
19 20
Insert 9
9.8 B-Tree Methods Search, Insert, and Others
File Structure SNU-OOPSLA Lab 66
Create, Open, and Close
9.8 B-Tree Methods Search, Insert, and Others
Specified in btree.tc of Appendix I Method Create
writes the empty root node into the file BTreeFile so that its first record is reserved for that root node
Method Open opens BTreeFile and load the root node into memory from
the first record in the file Method Close
simply stores the node into BTreeFile and close it
File Structure SNU-OOPSLA Lab 67
B-Tree NomenclatureB-Tree Nomenclature
Be aware that terms are not uniform in the literature
Definitions are also quite different
In fact, there are a number of B-tree variations
This text book uses “B tree” for B+ tree by other
books
In this book, “B+ tree” is B+ tree with a linked list of
sorted data blocks
9.9 B-Tree Nomenclature
File Structure SNU-OOPSLA Lab 68
Root
C G
E F H I
Data Block Data Block Data BlockData Block
BA
Other Book Our Book
B-Tree N/A
File Structure SNU-OOPSLA Lab 69
Root
C G I
E F G H I
Data Block Data Block Data BlockData Block
B CA
Other Book Our Book
B+-Tree B-Tree
File Structure SNU-OOPSLA Lab 70
Root
C G I
E F G H I
Data Block Data Block Data BlockData Block
B CA
Other Book Our Book
B+-Treewith
Linked ListB+-Tree
File Structure SNU-OOPSLA Lab. 71
Another aspect (node structures) Another aspect (node structures) Homogeneous Trees :B-Tree in other textHomogeneous Trees :B-Tree in other text
Homogeneous trees - leaf nodes and interior nodes have same structures; Each contains both data pointers and tree pointers
Average search length less for homogeneous trees, because some searches may conclude before reaching a leaf node
File Structure SNU-OOPSLA Lab. 72
B-Tree in other textB-Tree in other text
37 64
8 23 45 53 85 91
1 7 1420 70 80 88 9527 36
38 40 50 52 60
23 pointers to 23 records in data file
File Structure SNU-OOPSLA Lab. 73
Another Aspect (node structures) Another Aspect (node structures) Heterogeneous Trees :BHeterogeneous Trees :B++-Tree in other text-Tree in other text
Heterogeneous trees - leaf nodes and interior nodes have different structures
File Structure SNU-OOPSLA Lab. 74
BB++-Tree in other text-Tree in other text
37 64
14 23 45 53 85 91
1 7 8 1420 232736 6470 9195 808588
373840 455052 5360
23 pointers to 23 records in data file
File Structure SNU-OOPSLA Lab. 75
Comparison of B-Tree and BComparison of B-Tree and B++-Tree in -Tree in other textother text
Topic B-Tree B+-Tree
Algorithm Complexityfor insertion
Rather complexity more simple
Retrievalefficiency
less efficiency(B-tree is tall &
spindle)
more efficientB+-tree is short
& bushyStorage
efficiencyslightly more
efficient(is less space)
less efficient(is more space)
1-pass structurecreation algorithms
rather complex simple
File Structure SNU-OOPSLA Lab. 76
Comparison of B-Tree and BComparison of B-Tree and B++-Tree in -Tree in other textother text
Historical Note B-tree : Bayer & McCreight B+-tree: Comer B*-tree : Knuth, B-trees with 67% minimum occupancy
B÷÷-trees : B+-trees with 67% minimum occupancy
File Structure SNU-OOPSLA Lab 77
Formal Definition of B-Tree PropertiesFormal Definition of B-Tree Properties
** The properties of a B-tree of order m
1. Every page has a maximum of m descendants
2. Every page, except for the root and the leaves, has
at least ceiling of (m/2) descendants
3. The root has at least two descendants (unless it is a leaf)
4. All the leaves appear on the same level
5. The leaf level forms a complete, ordered index of the associated data file
9.10 Formal Definition of B-Tree Properties
File Structure SNU-OOPSLA Lab 78
Worst-case Search Depth(1)Worst-case Search Depth(1)
Search depth : depth of the tree
Worst case
When every page of the tree has only the minimum #
of descendants
A maximal height with a minimum breadth
9.11 Worst-Case Search Depth
File Structure SNU-OOPSLA Lab 79
Worst-case Search Depth(2)Worst-case Search Depth(2)
level
1(root) 2 3
...
d
minimum # of descendants
2 2 x [m/2] 2 x [m/2]2
2 x [m/2]d-1
...
For a tree with N keys in its leaves, N >= 2 x [m/2]d-1
Upper bound for the depth of a B-tree ---> d
e.g.. Btree order = 512 keys, given 1,000,000 keysd <= 3.37 at most 3 depth ( 3 disk I/O )
d <= 1 + log[m/2](N/2)
9.11 Worst-Case Search Depth
B-TREE WITH ORDER m
File Structure SNU-OOPSLA Lab 80
Deletion, Redistribution, and ConcatenationDeletion, Redistribution, and Concatenation Ensure that the B-tree properties are maintained after
a deletion
Algorithm (with redistribution and cocatenation) 1. If the key to be deleted is not in a leaf,
swap it with its immediate successor, which is in a leaf
(might be redistributed or concatenated!)
2. Delete the key
9.12 Deletion, Merging, and Redistribution
File Structure SNU-OOPSLA Lab 81
Deletion Algorithm(Cont’d)Deletion Algorithm(Cont’d) 3. If underflow occurs (the leaf now contains one too few keys),
3.1 If the left or right sibling has more than the minimum number of keys , redistribute
3.2 Otherwise, concatenate the two leaves and the median key from the parent into one leaf
3.3 Apply above step 3 to the parent as if it were deleted
9.12 Deletion, Merging, and Redistribution
File Structure SNU-OOPSLA Lab 82
RedistributionRedistribution
Occur when a sibling has more than the minimum # of keys Idea: Move keys between siblings Result in a change in the key in the parent page Does not propagate : strictly local effects How many keys should be moved?
Not necessarily fixed Even distribution is desired
9.12 Deletion, Merging, and Redistribution
File Structure SNU-OOPSLA Lab 83
Concatenation(merge)Concatenation(merge)
Occur in case of underflow Combining the two pages and the key from the
parent page ==> make a single full page Reverse the splitting Concatenation must involve demotion of keys : may
cause underflow in the parent page The effects propagate upward
9.12 Deletion, Merging, and Redistribution
File Structure SNU-OOPSLA Lab 84
I P Z
M P T X ZD G I
F G H I
J K L
N O P
Q R S Y Z
U V W
M
E
T
X
A B C D
e.g. Deletion(1)e.g. Deletion(1)
9.12 Deletion, Merging, and Redistribution
Figure A
File Structure SNU-OOPSLA Lab 85
e.g. Deletion(2)e.g. Deletion(2)
9.12 Deletion, Merging, and Redistribution
I P ZA B C
M P T X ZD G I
F G H I
J K L
N O P
Q R S Y Z
U V W
Removal of key C from figure A:Change occurs only in leaf node
D
M
E
T
X
A B D
File Structure SNU-OOPSLA Lab 86
e.g. Deletion(3)e.g. Deletion(3)
9.12 Deletion, Merging, and Redistribution
I O Z
M O T X ZD F I
F G H I
J K L
N O
Q R S Y Z
U V W
Result of deleting P from figure A: P changes to O in the second level and the root
M
E
T
X
A B C D
File Structure SNU-OOPSLA Lab 87
e.g. Deletion(4)e.g. Deletion(4)
9.12 Deletion, Merging, and Redistribution
I P Z
M P T X ZD I
F G
J K L
N O P
Q R S Y Z
U V W
Result of deleting H from figure A :Removal of H caused an underflow,and two leaf nodes were merged
M
E
T
X
A B C D
I
File Structure SNU-OOPSLA Lab 88
Redistribution during InsertionRedistribution during Insertion A way to improve storage utilizationA way to improve storage utilization
A way of avoiding the creation of new pages
Tend to make an efficient B-tree in terms of space utilization Worst case : around 50% Average case : 67 ~ 69% With redistribution during insertion : over 85%
9.13 Redistribution During Insertion
File Structure SNU-OOPSLA Lab 89
0
D H
A C E F I J K N O P
1
M
Q U
R S
DELETE J(No change)
DELETE M(Swap with N)
0
D H
A C E F I K MN O P
1
M N
Q U
R S V W X Y Z
9.13 Redistribution During Insertion
V W X Y Z
File Structure SNU-OOPSLA Lab 90
0
D H
A C E F I K O P
1
N
Q U W
V W X Y Z
DELETE R(Redistribution)
DELETE A(Concatenation)
U VR S
0
D H
A C E F I K O P
1
N
Q W
X Y ZS U V
C D
E F
underflowunderflow
9.13 Redistribution During Insertion
File Structure SNU-OOPSLA Lab 91
NOW UNDERFLOW PROPAGATE UPWARD!
HEIGHT OF THE TREEDECREASED
I K O P X Y ZS U V
0
H
I K O P
1
N
Q W
X Y ZS U VC D E F
C D E F
H N Q W
underflowunderflow
9.13 Redistribution During Insertion
File Structure SNU-OOPSLA Lab 92
BB** Trees Trees
Knuth, 1973, Addison-Wesley Use redistribution operation during insertion Perform two-to-three split
When split, the page has at least one sibling that is also full
After split, the pages are about 2/3 full The page with at least (ceiling of (2m -1)/3) keys
c.f. remember (ceiling of (m/2)) -1 keys
9.14 B* Trees
File Structure SNU-OOPSLA Lab 93
A
SRP T XVDCA F KH
RF
CBA D VT X MK P
B* Tree(Cont’d)
Insert BSH
Original tree:
Two-to-three-split:
9.14 B* Trees
File Structure SNU-OOPSLA Lab 94
Buffering of B-tree pagesBuffering of B-tree pages: Virtual B-Trees: Virtual B-Trees
B-tree size >> main memory (in practice) Need buffering pages of B-tree Better to keep the root page in the main memory
Buffer replacement algorithm: LRU + page height weighting factor
Keep pages of top some levels all the time in main memory
9.15 Buffering of Pages:Virtual B-Trees
File Structure SNU-OOPSLA Lab 95
Placement of Information associated with the KeyPlacement of Information associated with the Key
How to store associated information
In a data and index mingled file Once the key is found, no more disk access
required
In a separate file Larger number of keys per a page
Higher order, shallower tree
9.15 Buffering of Pages:Virtual B-Trees
File Structure SNU-OOPSLA Lab 96
Variable Length Records and KeysVariable Length Records and Keys
A B-tree with variable length keys No single, fixed order A different criterion for over/underflow condition
Using max/min number of bytes (c.f. max/min number of keys)
Key promotion mechanism Shortest variable-length keys are promoted in
preference to longer ones Pages with the largest numbers of descendants up
high in the tree
9.16 Variable-Length Records and Keys
File Structure SNU-OOPSLA Lab 97
Let’s Review !!!Let’s Review !!!
9.1 Introduction
9.2 Statement of the Problem
9.3 Indexing with Binary Search Trees
: AVL Trees, Paged Binary Trees, Problems with Paged Tress
9.4 Multilevel Indexing
9.5 B-Trees
9.6 Example of Creating a B-Tree
9.7 An Object-Oriented Representation of B-Trees
: Class BTreeNode , Class BTree
File Structure SNU-OOPSLA Lab 98
Let’s Review !!!Let’s Review !!!
9.8 B-Tree Methods Search, Insert, and Others
9.9 B-Tree Nomenclature
9.10 Formal Definition of B-Tree Properties
9.11 Worst-case Search Depth
9.12 Deletion, Merging, and Redistribution
9.13 Redistribution During Insertion
9.14 B* Trees
9.15 Buffering of Pages : Virtual B-Trees
9.16 Variable-Length Records and Keys