file structuresnu-oopsla lab1 chap 9. multilevel indexing and b-trees 서울대학교...

98
File Structure SNU-OOPSLA Lab 1 Chap 9 Chap 9 . Multilevel Indexing a . Multilevel Indexing a nd B-Trees nd B-Trees 서서서서서 서서서서서서 서서서서서서서서서서 SNU-OOPSLA-LAB 서 서 서 서서 File Structures by Folk, Zoellick, and Ricarrdi

Upload: magnus-burns

Post on 04-Jan-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 1

Chap 9Chap 9. Multilevel Indexing a. Multilevel Indexing and B-Treesnd B-Trees

서울대학교 컴퓨터공학부객체지향시스템연구실SNU-OOPSLA-LAB

김 형 주 교수

File Structures by Folk, Zoellick, and Ricarrdi

Page 2: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 2

Chapter Objectives(1)Chapter Objectives(1) Place the development of B-trees in the historical context of the

problems they were designed to solve Look briefly at other tree structures that might be used on

secondary storage, such as paged AVL trees Introduce multirecord and multilevel indexes and evaluate the

speed of the search operation Provide an understanding of the important properties possessed

by B-trees, and show how these properties are especially well suited to secondary storage applications

Page 3: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 3

Chapter Objectives(2)Chapter Objectives(2)

Present the object-oriented design of B-trees define class BTreeNode and Btree

Explain the implementation of the fundamental operations on B-trees

Introduce the notion of page buffering and virtual B-trees Describe variations of the fundamental B-trees algorithms, such

as those used to build B* trees and B-trees with variable-length records

Page 4: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 4

Contents(1)Contents(1)

9.1 Introduction

9.2 Statement of the Problem

9.3 Indexing with Binary Search Trees

: AVL Trees, Paged Binary Trees, Problems with Paged Tress

9.4 Multilevel Indexing

9.5 B-Trees

9.6 Example of Creating a B-Tree

9.7 An Object-Oriented Representation of B-Trees

: Class BTreeNode , Class BTree

Page 5: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 5

Contents(2)Contents(2)

9.8 B-Tree Methods Search, Insert, and Others

9.9 B-Tree Nomenclature

9.10 Formal Definition of B-Tree Properties

9.11 Worst-case Search Depth

9.12 Deletion, Merging, and Redistribution

9.13 Redistribution During Insertion

9.14 B* Trees

9.15 Buffering of Pages : Virtual B-Trees

9.16 Variable-Length Records and Keys

Page 6: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 6

IntroductionIntroduction: The Invention of the B-tree: The Invention of the B-tree

1972 Acta Infomatica : R. Bayer and E. McCreight (at Boeing Corporation) “Organization and Maintenance of Large Ordered Indexes”

1979 : ‘de facto’ standard for database index D.Comer “The Ubiquitous B-tree” ACM Computing Survey

Why the name B-tree? Balanced, Bushy, Broad, Boeing, Bayer

Retrieval, Insertion, Deletion time

= log K I ( I : no of indexes in file, K : no of indexes in a page)

Excellent for dynamically changing random access files

9.1 Introduction : Invention of the B-Tree

Page 7: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 7

Statement of the ProblemStatement of the Problem Problems in an index on secondary storage

Searching the index must be faster than binary searching

In binary search:

15 items - 4 seeks, 1,000 items - 9.5 seeks

Insertion and deletion must be as fast as search inserting a key may involve moving many other

keys in some file structures

9.2 Statement of the Problem

Page 8: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 8

Binary Search Tree(1) Binary Search Tree(1)

Advantages Data may not be physically sorted Good performance on balanced tree Insert cost = search cost

Disadvantages In out-of-balance binary tree, more seeks

are required

9.3 Indexing with Binary Search Trees

Page 9: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 9

Binary Search Tree(2)Binary Search Tree(2) Sorted list of keys

AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ

KF

FB

CL HN

SD

PA WS

DE FT JD NR RF TK YJAX

At most 4 seeks/one recordBinary search tree

representation

9.3 Indexing with Binary Search Trees

Page 10: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 10

Internal Representation of Binary TreeInternal Representation of Binary Tree

With RRN(fixed length record) or pointer

9.3 Indexing with Binary Search Trees

ROOTFB

JD

RF

SD

AX

YJPA

HN

KF

CL

NR

DE

WS

TK

0

1

2

3

4

5

6

7

8

9

10

11

12

13

FT

14

10 8

6 13

11 2

7 1

0 3

4 12

14 5

key left right key left right

9

Page 11: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 11

UnbalancedUnbalanced Binary Tree Binary Tree

- At most 9 seeks/one record

YJ

KF

FB

CL HN

SD

PA WS

AX DE FT JD NR RF TK

LV

LA NP

MB

ND

NK

- Worst case : sequential search

9.3 Indexing with Binary Search Trees

Page 12: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 12

AVL Tree(1)AVL Tree(1) A height-balanced k tree ( HB(k) tree)

Allowable difference in the height of any two sub-tree is k

AVL Tree : HB(1) Tree G.M. Adel’son, Vel’skii, E.M. Landis

Maintenance overhead is needed Performance

Given N keys, worst-case search => 1.44 log2(N+2)

cf. Completely balanced AVL tree : worst-case search => log2(N+1)

9.3 Indexing with Binary Search Trees

Page 13: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 13

AVL Tree(2)AVL Tree(2)9.3 Indexing with Binary Search Trees

(a) AVL Trees

X X

X X

(b) Non - AVL Trees

Page 14: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 14

AVL Tree(3)AVL Tree(3)

Binary tree structure that is balanced nature with respect to the height of subtree

Definition An empty tree is height balanced

If T is a nonempty binary tree with TL and TR as its

left and right subtrees, then T is height balanced iff (1) TL and TR are height balanced and (2) |hL-hR|<1

where hL and hR are the heights of TL and TR,

respectively

9.3 Indexing with Binary Search Trees

Page 15: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 15

AVL Tree(4)AVL Tree(4) BalanceFactor, BF(T), of a node T in a binary tree

is hL-hR where hL and hR are the height of the left a

nd right subtree of T

For any node in tree T in AVL tree, BF(T) should be one of “ -1, 0, 1”

If BF(T) is -2 or 2, then proper rotation is carried out in order to get balance

9.3 Indexing with Binary Search Trees

Page 16: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 16

AVL Tree(5)AVL Tree(5)

New Identifier

MARCH

After Insertion No Rebalancing needed

0MAR

New Identifier

MAY

After Insertion No Rebalancing needed

New Identifier

NOVEMBER

After Insertion After Rebalancing

-1MAR

0MAY

-2MAR

-1MAY

0NOV

0MAY

0MAR

0NOV

RR

9.3 Indexing with Binary Search Trees

Page 17: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 17

AVL Tree(6)AVL Tree(6)

New Identifier

AUGUST

After Insertion No Rebalancing needed

+1MAY

+1MAR

0AUG

0NOV

9.3 Indexing with Binary Search Trees

Page 18: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 18

AVL Tree(7)AVL Tree(7)

New Identifier

APRIL

After Insertion After Rebalancing

+2MAY

+2MAR

+1AUG

0NOV

0APR

+1MAY

0AUG

0APR

0NOV

0MAR

LL

9.3 Indexing with Binary Search Trees

Page 19: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 19

AVL Tree(8)AVL Tree(8)

+2MAY

-1AUG

0APR

0NOV

+1MAR

New Identifier

JANUARY

After Insertion After Rebalancing

0JAN

0MAR

0AUG

-1MAY

0JAN

0NOV

0APR

LR

9.3 Indexing with Binary Search Trees

Page 20: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 20

AVL Tree(9)AVL Tree(9)

New Identifier

DECEMBER

After Insertion No Rebalancing needed

+1MAR

-1AUG

-1MAY

+1JAN

0NOV

0APR

0DEC

9.3 Indexing with Binary Search Trees

Page 21: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 21

AVL Tree(10)AVL Tree(10)

New Identifier

JULY

After Insertion No Rebalancing needed

+1MAR

-1AUG

-1MAY

0JAN

0NOV

0APR

0DEC

0JUL

9.3 Indexing with Binary Search Trees

Page 22: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 22

AVL Tree(11)AVL Tree(11)

New Identifier

FEBRUARY

After Insertion After Rebalancing

+2MAR

-2AUG

-1MAY

+1JAN

0NOV

0APR

-1DEC

0JUL

0FEB

+1MAR

0DEC

-1MAY

0JAN

+1AUG

0NOV

0APR

0FEB

0JUL

RL

9.3 Indexing with Binary Search Trees

Page 23: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 23

AVL Tree(12)AVL Tree(12)

New Identifier

JUNE

After Insertion After Rebalancing

+2MAR

-1DEC

-1MAY

-1JAN

+1AUG

0NOV

0APR

0FEB

-1JUL

0JUN

0JAN

+1DEC

0MAR

0FEB

+1AUG

0APR

-1MAY

-1JUL

0JUN

-1NOV

LR

9.3 Indexing with Binary Search Trees

Page 24: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 24

AVL Tree(13)AVL Tree(13)

-1JAN

+1DEC

-1MAR

0FEB

+1AUG

0APR

-2MAY

-1JUL

0JUN

-1NOV

New Identifier

OCTOBER

After Insertion

0OCT

After Rebalancing

RR

9.3 Indexing with Binary Search Trees

Page 25: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 25

AVL Tree(14)AVL Tree(14)

0JAN

+1DEC

0MAR

0FEB

+1AUG

0APR

0NOV

-1JUL

0JUN

0OCT

0MAY

9.3 Indexing with Binary Search Trees

Page 26: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 26

AVL Tree(15)AVL Tree(15)

New Identifier

SEPTEMBER

After Insertion No Rebalancing needed

-1JAN

+1DEC

-1MAR

0FEB

+1AUG

0APR

-1NOV

-1JUL

0JUN

-1OCT

0MAY

0SEP

9.3 Indexing with Binary Search Trees

Page 27: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 27

AVL Tree : AVL Tree : Rebalancing(1)Rebalancing(1)

Rebalancing is carried out using four different kinds of rotations LL when new node Y is inserted in the left subtree of

the left subtree of A LR when new node Y is inserted in the right subtree

of the left subtree of A RR when new node Y is inserted in the right subtree

of the right subtree of A RL when new node Y is inserted in the left subtree

of the right subtree of A

9.3 Indexing with Binary Search Trees

Page 28: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 28

AVL Tree : AVL Tree : Rebalancing(2)Rebalancing(2)

A

Insert Y

LL LR RL RR

9.3 Indexing with Binary Search Trees

Page 29: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 29

AVL Tree : AVL Tree : Rebalancing(LL)Rebalancing(LL)

+1A

0B

BLBR

AR

h

h+2

+2A

0B

BLBR

AR

0B

0A

BRAR

BL

rotation typerotation typeLLLL

h+2

Balanced SubtreeUnbalanced following

insertion

Height of BL increase to h+1(BL < B < BR < A < AR)

Balanced Subtree

9.3 Indexing with Binary Search Trees

Page 30: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 30

AVL Tree : AVL Tree : Rebalancing(RR)Rebalancing(RR)

-1A

0B

BLBR

AL

0B

0A

AlBL

BR

rotation typerotation typeRRRR

h+2

Balanced SubtreeUnbalanced following

insertion

Height of BR increase to h+1(AL < A < BL < B < BR)

h+2

-2A

0B

BLBR

AL

Balanced Subtree

9.3 Indexing with Binary Search Trees

Page 31: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 31

AVL Tree : AVL Tree : Rebalancing(LR)Rebalancing(LR)

+1A

0B

Balanced Subtree Unbalanced followinginsertion

+1A

-1B

0C

Balanced Subtree

0C

0B

0A

rotation typerotation typeLR(a)LR(a)

9.3 Indexing with Binary Search Trees

(B < C < A)

Page 32: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 32

AVL Tree : AVL Tree : Rebalancing(LR)Rebalancing(LR)

Balanced SubtreeUnbalanced following

insertionBalanced Subtree

+1A

BL

0B

0C

CLCR

h

h-1

AR h+2

+2A

BL

-1B

+1C

CLCR

AR

0C

0B

-1A

BL CL CR AR

rotation typerotation typeLR(b)LR(b)

h

h+2

h

9.3 Indexing with Binary Search Trees

(BL < B < CL < C < CR < A < AR)

Page 33: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 33

AVL Tree : AVL Tree : Rebalancing(LR)Rebalancing(LR)Balanced Subtree

Unbalanced followinginsertion

Balanced Subtree

+1A

BL

0B

0C

CLCR

h

h-1

AR h+2

+2A

BL

-1B

-1C

CLCR

AR

0C

+1B

0A

BL CL CR AR

rotation typerotation typeLR(c)LR(c)

h+2

RL a, b and c are symmetric to LR a, b and c

h

9.3 Indexing with Binary Search Trees

Page 34: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 34

Paged Binary Tree(1)Paged Binary Tree(1) Page

A unit of disk I/O for handling seek and transfer of disk data Typically, 4k, 8k, 16k ...

Paged Binary Tree Divide a binary tree into pages and then store each page in a

block of contiguous locations on disk. If every page holds 7 keys, 511 nodes(keys) in only three seeks

Performance : # of seeks

A completely full balanced tree : log2 (N+1)

A completely full paged tree : log(k+1) (N+1)

(k : # of keys hold in a single page)

9.3 Indexing with Binary Search Trees

Page 35: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 35

Paged Binary Tree(2)Paged Binary Tree(2)

9.3 Indexing with Binary Search Trees

Page 36: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 36

The Problem with Paged TreesThe Problem with Paged Trees

Only valid when we have the entire set of keys in hand before the tree is built

Problems due to out of balance How to select a good separator How to group keys How to guarantee the maximum loading

B-tree provides a solution for above problems!

9.3 Indexing with Binary Search Trees

Page 37: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 37

Paged Binary Tree (Out of balance)Paged Binary Tree (Out of balance)

I P

X

G

E H

D

C

A

B

F

M

S

U

T W

V YK N R

O QJ L Z

random input sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V

9.3 Indexing with Binary Search Trees

Page 38: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 38

Multilevel IndexingMultilevel Indexing Approach as simple index record

limited on the number of keys allowed Approach as multirecord index

consists of a sequence of simple index records binary search is too expensive

Approach as multilevel index reduced the number of records to be searched speed up the search

<example> 80Mbytes file of 8,000,000 records

10-byte keys

9.4 Multilevel Indexing : A Better Approach to Tree Indexes

Page 39: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 39

Example of Multilevel IndexingExample of Multilevel Indexing

9.4 Multilevel Indexing : A Better Approach to Tree Indexes

1 2 8 4th level index

a single index record with 8 keys

1 2 . . . 100

: :801 800

1

12::8

3rd level index

8 index records to index the largest keys in the 800 second-level records

2nd level index1 2 . . . 100

: :901 1000

: :

7901 8000

12::9:::

800

800 index records with 80,000 keyschoose one of the keys in each index record as thekey of that whole record

Lowest level index is an index to data file and its reference fields are record addresses in the data file

Page 40: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 40

Multi-level Indexing(3)Multi-level Indexing(3)

How can we insert new keys into the multilevel index? The index records in some level might be full The several levels of indexes might be rebuilt Overflow chain may be helpful, but still ugly

Multi-level index structure is not strong in dynamic data processing applications

B-tree will give you the right solution!

Page 41: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 41

B-TreesB-Trees: Working up from the bottom: Working up from the bottom

Bayer and McCreight, 1972, Acta Infomatica

Build trees upward from the bottom instead of

downward from the top

Each node of B-tree is an index record which

consists of “key-reference” pairs The order of B-tree: the max number of key-reference pairs

Every index record should have at least half of the order

9.5 B-Trees:Working up from the bottom

Page 42: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 42

Sample B-Tree Sample B-Tree TD

A C M PI TS

P

D

A C D I M D S T DPD

Page 43: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 43

Splitting & Promoting(1)Splitting & Promoting(1)

Splitting Creation of two nodes out of one because the

original node becomes overfull Result in the need to promote a key to a higher-

level node to provide an index separating the two new nodes

Promotion of a key Movement of a key from one node into a higher-

level node when split occurs

9.6 Example of Creating a B-Tree

Page 44: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 44

Splitting & Promoting(2)Splitting & Promoting(2)

* * * * * * * *A B C D E F G

* * * * * * * *E F G J

Initial leaf of a B-tree with a page size of seven

Splitting the leaf to accommodate the new J key

Insert J key

(continued....)

* * * * * * * *A B C D

9.6 Example of Creating a B-Tree

Page 45: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 45

Splitting & Promoting(3)Splitting & Promoting(3)

* * * * * * * *A B C D * * * * * * * *E F G

* * * * * *D

Promotion of the E key into a root node

9.6 Example of Creating a B-Tree

J

J

Page 46: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 46

Insertion in B-tree(1)Insertion in B-tree(1) Input Sequence

: C S D T A M P I B W N G U R K E H O L J Y Q Z F X V

C D S

Insertion of C, S, D, Tinto the initial page

D

DA C S

Insertion of A causes node to split and the largest key in each leaf node(D and T)to beplaced in the root node

9.6 Example of Creating a B-Tree

T T

T

Page 47: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 47

Insertion in B-tree(2)Insertion in B-tree(2)9.6 Example of Creating a B-Tree

TD

A C M PI TS

P

M and P are inserted into the rightmost leaf node,then insertion of I causes it to split

D

Page 48: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 48

Insertion in B-tree(3)Insertion in B-tree(3)

A B G N

PD M

C I M P

Insertions of B,W,N, and G into leaf nodes causesanother split and the root is now full

9.6 Example of Creating a B-Tree

W

D WS T

Page 49: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 49

Insertion in B-tree(4)Insertion in B-tree(4)

Insertion of U proceeds without incident, but R would have to be inserted into the rightmost leaf, which is full

9.6 Example of Creating a B-Tree

A B G N

PD M

C I M P

W

D US T W

Page 50: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 50

Insertion in B-tree(5)Insertion in B-tree(5)

Insertion of causes the rightmost leaf node to split, insertion intothe root to split and the tree grows to level three

9.6 Example of Creating a B-Tree

P W

D M P T W

A B C D G I M N P R S T U W

Page 51: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 51

Insertion in B-tree(6)Insertion in B-tree(6)

Insertions of K,E,H,O,L,J,Y,Q, and Z, continue with another node split

9.6 Example of Creating a B-Tree

P Z

D I M P T Z

A B C D E G M I J K L M

Q R S T U W Y Z

N O P

Page 52: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 52

Insertion in B-tree(7)Insertion in B-tree(7)

Insertions of F, X, and V finish the insertion of the alphabet

9.6 Example of Creating a B-Tree

I P Z

D G I T X ZM P

A B C D

E F G H I

J K L M

N O P

Q R S T

U V W X

Y Z

Page 53: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 53

Insertion in B-treesInsertion in B-trees

Major components of insertion Split the node Promote the middle key Increase the height of the B-tree

Insertion may touch no more than 2 nodes per level

Insertion cost is strictly linear in the height of the tree

Page 54: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 54

Class BTreeNode(1)Class BTreeNode(1) Represent B-Tree nodes in memory

B-tree is an index file associated with a data file Specified in btnode.h of Appendix I The template BTreeNode class based on the SimpleI

ndex template class

9.7 An Object-Oriented Representation of B-Trees

SimpleIndex Class

BTreeNode Class

Public methodsInsert, Remove, Clear, SearchPrint, NumKeys

Public methodsInsert, Remove, LargestKeySplit, Pack, Unpack

Page 55: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 55

Class BTreeNode(2)Class BTreeNode(2) Members Public methods:

insert : simply calls SimpleIndex::Insert and then check for overflow

remove a key, split and merge nodes search : inherited from SimpleIndex class(works perfectly

well) pack/unpack : manage the difference between the memory

and the disk representation of BTreeNode objects Protected member

store the file address of the node and the minimum and maximum number of keys

9.7 An Object-Oriented Representation of B-Trees

Page 56: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 56

Template <class keyType>

class BTreeNode: public SimpleIndex <keyType>

{public

BTreeNode(int maxKeys, int unique = 1);

int Insert (const keyType key, int recAddr);

int Remove(const keyType key, int recAddr = -1);

int LargestKey ();

int Split (BTreeNode<ketType>*newNode);

int Pack (IOBuffer& buffer);

int Unpack(IOBuffer& buffer);

protected

int MaxBKeys;

int Init();

friend class Btree<keyType>;

}

Page 57: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 57

Class BTreeClass BTree Uses in-memory BTreeNode objects adds the file access portion enforces the consistent size of the nodes specified in btree.h of Appendix I Methods

Create, Open, Close a B-Tree Search, Insert, Remove key-reference pairs

Protected area Fetch(transfer nodes from disk to memory) Store(transfer nodes back to disk) root node, height of the tree, file of index records BTNode **Node:used to keep a collection of tree nodes in memory

and reduce disk access

9.7 An Object-Oriented Representation of B-Trees

Page 58: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 58

Template <class keyType>

class Btree {public:

Btree(int order, int keySize=sizeof(keyType), int unique=1);

int Open (char * name, int mode);

int Create (char * name, int mode);

int Close ();

int Insert (const keyType key, const int recAddr);

int Remove (const ketType key, const int recAddr = -1);

int Search (const keyType key, const int recAddr = -1);

protected typedef BTreeNode<keyType> BTNode;

BTNode * FindLeaf (const ketType key);

BTNode * Fetch(const int recaddr);

int Store (BTNode *); BTNode Root; int Height; int Order;

BTNode ** Nodes;

RecordFile<BTNode> BtreeFile;

}|

Page 59: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 59

A B G N

PD M

C I M P

W

D US T W

4 D M P W 0 3 8 5

3 G I M Nil Nil Nil Nil

Page 2

Page 3

KEY array CHILD arrayKEYCOUNT

content of PAGE 2, 3

2

0 3 8 5

Page StructurePage Structure9.8 B-Tree Methods Search, Insert, and Others

Page 60: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 60

Algorithm for SearchAlgorithm for Search Searching procedure

iterative

work in two stages

operating alternatively on entire pages (Class BTree)

and then within pages (Class BTreeNode)

Step1: Loading a page into memeory

Step 2: Searching through a page, looking for the key alon

g the tree until it reaches the leaf level

9.8 B-Tree Methods Search, Insert, and Others

Page 61: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 61

Search and FindLeaf methodSearch and FindLeaf method9.8 B-Tree Methods Search, Insert, and Others

recAddr = btree.Search(‘L’)call FindLeaf(‘L’);Search key in the leaf node, and then if key exists, return the data file address of record with key ‘L’ otherwise, return -1

Template <class keyType>int BTree<keyType>::Search(const keyType key, const int recAddr)

template <class keyType>BTreeNode<keyType>* BTree<keyType>::FindLeaf(const keyType key)

• Specifications of Search and FindLeaf methods(Fig 9.18)

Search down to leafNode, beginning of the rootreturn the address of leafNode

Search method

FindLeaf method

Page 62: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 62

Algorithm for Insertion(1)Algorithm for Insertion(1)

Observations of Insertion, Splitting, and Promotion

proceed all the way down to the leaf level

after finding the insertion location at the leaf level, the work

proceeds upward from the bottom

Iterative procedure as having three phases

Search to the leaf level, using FindLeaf method

Insertion, overflow detection, and splitting on the

upward path

Creation of a new root node, if the current root was split

9.8 B-Tree Methods Search, Insert, and Others

Page 63: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 63

With no redistribution (Step 1) Locate node on bottom most level in which to insert

record. Location is determined by key search.

(Step 2) If vacant record slot is available, insert the record so that key sequencing is maintained. Then, update the pointer associated with the record (Pointer is null for level 0 records). Then Stop!

(Step 3) If no vacant record slot exists, identify median record. All records and pointers to the left of the median records are stored in one node (the original) and those to the right are stored in another node(the new node).

Algorithm for Insertion(2)Algorithm for Insertion(2)

9.8 B-Tree Methods Search, Insert, and Others

Page 64: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 64

(Step 4) If the topmost node was split, create a new topmost node which contains the median record identified in Step 3, filled with pointers to the original and split nodes. Update the root node to point to the new topmost node. Then Stop!

(Step 5) If topmost node was not split, prepare to insert median record identified in Step 3 and a pointer to the new node (created in Step 3). Then Goto Step 2.

Note : Step 4 makes B-tree increase in height by 1 level B-trees have 70% occupancy(like B+-trees) on an average

Algorithm for Insertion(3)Algorithm for Insertion(3)

9.8 B-Tree Methods Search, Insert, and Others

Page 65: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 65

Insertion ExampleInsertion Example

Insert 3 Insert 19,4,20

Insert 13,16

0 0

3 3 4 19 20

Insert 1

0 1

2

split 4 20

1 3 4 19 20

0 1

2

4 20

1 3 4 13 16 19 20

0 1

2

4 16 20

1 3 4 9 13 16

3

split

19 20

Insert 9

9.8 B-Tree Methods Search, Insert, and Others

Page 66: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 66

Create, Open, and Close

9.8 B-Tree Methods Search, Insert, and Others

Specified in btree.tc of Appendix I Method Create

writes the empty root node into the file BTreeFile so that its first record is reserved for that root node

Method Open opens BTreeFile and load the root node into memory from

the first record in the file Method Close

simply stores the node into BTreeFile and close it

Page 67: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 67

B-Tree NomenclatureB-Tree Nomenclature

Be aware that terms are not uniform in the literature

Definitions are also quite different

In fact, there are a number of B-tree variations

This text book uses “B tree” for B+ tree by other

books

In this book, “B+ tree” is B+ tree with a linked list of

sorted data blocks

9.9 B-Tree Nomenclature

Page 68: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 68

Root

C G

E F H I

Data Block Data Block Data BlockData Block

BA

Other Book Our Book

B-Tree N/A

Page 69: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 69

Root

C G I

E F G H I

Data Block Data Block Data BlockData Block

B CA

Other Book Our Book

B+-Tree B-Tree

Page 70: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 70

Root

C G I

E F G H I

Data Block Data Block Data BlockData Block

B CA

Other Book Our Book

B+-Treewith

Linked ListB+-Tree

Page 71: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab. 71

Another aspect (node structures) Another aspect (node structures) Homogeneous Trees :B-Tree in other textHomogeneous Trees :B-Tree in other text

Homogeneous trees - leaf nodes and interior nodes have same structures; Each contains both data pointers and tree pointers

Average search length less for homogeneous trees, because some searches may conclude before reaching a leaf node

Page 72: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab. 72

B-Tree in other textB-Tree in other text

37 64

8 23 45 53 85 91

1 7 1420 70 80 88 9527 36

38 40 50 52 60

23 pointers to 23 records in data file

Page 73: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab. 73

Another Aspect (node structures) Another Aspect (node structures) Heterogeneous Trees :BHeterogeneous Trees :B++-Tree in other text-Tree in other text

Heterogeneous trees - leaf nodes and interior nodes have different structures

Page 74: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab. 74

BB++-Tree in other text-Tree in other text

37 64

14 23 45 53 85 91

1 7 8 1420 232736 6470 9195 808588

373840 455052 5360

23 pointers to 23 records in data file

Page 75: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab. 75

Comparison of B-Tree and BComparison of B-Tree and B++-Tree in -Tree in other textother text

Topic B-Tree B+-Tree

Algorithm Complexityfor insertion

Rather complexity more simple

Retrievalefficiency

less efficiency(B-tree is tall &

spindle)

more efficientB+-tree is short

& bushyStorage

efficiencyslightly more

efficient(is less space)

less efficient(is more space)

1-pass structurecreation algorithms

rather complex simple

Page 76: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab. 76

Comparison of B-Tree and BComparison of B-Tree and B++-Tree in -Tree in other textother text

Historical Note B-tree : Bayer & McCreight B+-tree: Comer B*-tree : Knuth, B-trees with 67% minimum occupancy

B÷÷-trees : B+-trees with 67% minimum occupancy

Page 77: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 77

Formal Definition of B-Tree PropertiesFormal Definition of B-Tree Properties

** The properties of a B-tree of order m

1. Every page has a maximum of m descendants

2. Every page, except for the root and the leaves, has

at least ceiling of (m/2) descendants

3. The root has at least two descendants (unless it is a leaf)

4. All the leaves appear on the same level

5. The leaf level forms a complete, ordered index of the associated data file

9.10 Formal Definition of B-Tree Properties

Page 78: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 78

Worst-case Search Depth(1)Worst-case Search Depth(1)

Search depth : depth of the tree

Worst case

When every page of the tree has only the minimum #

of descendants

A maximal height with a minimum breadth

9.11 Worst-Case Search Depth

Page 79: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 79

Worst-case Search Depth(2)Worst-case Search Depth(2)

level

1(root) 2 3

...

d

minimum # of descendants

2 2 x [m/2] 2 x [m/2]2

2 x [m/2]d-1

...

For a tree with N keys in its leaves, N >= 2 x [m/2]d-1

Upper bound for the depth of a B-tree ---> d

e.g.. Btree order = 512 keys, given 1,000,000 keysd <= 3.37 at most 3 depth ( 3 disk I/O )

d <= 1 + log[m/2](N/2)

9.11 Worst-Case Search Depth

B-TREE WITH ORDER m

Page 80: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 80

Deletion, Redistribution, and ConcatenationDeletion, Redistribution, and Concatenation Ensure that the B-tree properties are maintained after

a deletion

Algorithm (with redistribution and cocatenation) 1. If the key to be deleted is not in a leaf,

swap it with its immediate successor, which is in a leaf

(might be redistributed or concatenated!)

2. Delete the key

9.12 Deletion, Merging, and Redistribution

Page 81: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 81

Deletion Algorithm(Cont’d)Deletion Algorithm(Cont’d) 3. If underflow occurs (the leaf now contains one too few keys),

3.1 If the left or right sibling has more than the minimum number of keys , redistribute

3.2 Otherwise, concatenate the two leaves and the median key from the parent into one leaf

3.3 Apply above step 3 to the parent as if it were deleted

9.12 Deletion, Merging, and Redistribution

Page 82: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 82

RedistributionRedistribution

Occur when a sibling has more than the minimum # of keys Idea: Move keys between siblings Result in a change in the key in the parent page Does not propagate : strictly local effects How many keys should be moved?

Not necessarily fixed Even distribution is desired

9.12 Deletion, Merging, and Redistribution

Page 83: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 83

Concatenation(merge)Concatenation(merge)

Occur in case of underflow Combining the two pages and the key from the

parent page ==> make a single full page Reverse the splitting Concatenation must involve demotion of keys : may

cause underflow in the parent page The effects propagate upward

9.12 Deletion, Merging, and Redistribution

Page 84: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 84

I P Z

M P T X ZD G I

F G H I

J K L

N O P

Q R S Y Z

U V W

M

E

T

X

A B C D

e.g. Deletion(1)e.g. Deletion(1)

9.12 Deletion, Merging, and Redistribution

Figure A

Page 85: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 85

e.g. Deletion(2)e.g. Deletion(2)

9.12 Deletion, Merging, and Redistribution

I P ZA B C

M P T X ZD G I

F G H I

J K L

N O P

Q R S Y Z

U V W

Removal of key C from figure A:Change occurs only in leaf node

D

M

E

T

X

A B D

Page 86: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 86

e.g. Deletion(3)e.g. Deletion(3)

9.12 Deletion, Merging, and Redistribution

I O Z

M O T X ZD F I

F G H I

J K L

N O

Q R S Y Z

U V W

Result of deleting P from figure A: P changes to O in the second level and the root

M

E

T

X

A B C D

Page 87: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 87

e.g. Deletion(4)e.g. Deletion(4)

9.12 Deletion, Merging, and Redistribution

I P Z

M P T X ZD I

F G

J K L

N O P

Q R S Y Z

U V W

Result of deleting H from figure A :Removal of H caused an underflow,and two leaf nodes were merged

M

E

T

X

A B C D

I

Page 88: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 88

Redistribution during InsertionRedistribution during Insertion A way to improve storage utilizationA way to improve storage utilization

A way of avoiding the creation of new pages

Tend to make an efficient B-tree in terms of space utilization Worst case : around 50% Average case : 67 ~ 69% With redistribution during insertion : over 85%

9.13 Redistribution During Insertion

Page 89: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 89

0

D H

A C E F I J K N O P

1

M

Q U

R S

DELETE J(No change)

DELETE M(Swap with N)

0

D H

A C E F I K MN O P

1

M N

Q U

R S V W X Y Z

9.13 Redistribution During Insertion

V W X Y Z

Page 90: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 90

0

D H

A C E F I K O P

1

N

Q U W

V W X Y Z

DELETE R(Redistribution)

DELETE A(Concatenation)

U VR S

0

D H

A C E F I K O P

1

N

Q W

X Y ZS U V

C D

E F

underflowunderflow

9.13 Redistribution During Insertion

Page 91: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 91

NOW UNDERFLOW PROPAGATE UPWARD!

HEIGHT OF THE TREEDECREASED

I K O P X Y ZS U V

0

H

I K O P

1

N

Q W

X Y ZS U VC D E F

C D E F

H N Q W

underflowunderflow

9.13 Redistribution During Insertion

Page 92: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 92

BB** Trees Trees

Knuth, 1973, Addison-Wesley Use redistribution operation during insertion Perform two-to-three split

When split, the page has at least one sibling that is also full

After split, the pages are about 2/3 full The page with at least (ceiling of (2m -1)/3) keys

c.f. remember (ceiling of (m/2)) -1 keys

9.14 B* Trees

Page 93: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 93

A

SRP T XVDCA F KH

RF

CBA D VT X MK P

B* Tree(Cont’d)

Insert BSH

Original tree:

Two-to-three-split:

9.14 B* Trees

Page 94: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 94

Buffering of B-tree pagesBuffering of B-tree pages: Virtual B-Trees: Virtual B-Trees

B-tree size >> main memory (in practice) Need buffering pages of B-tree Better to keep the root page in the main memory

Buffer replacement algorithm: LRU + page height weighting factor

Keep pages of top some levels all the time in main memory

9.15 Buffering of Pages:Virtual B-Trees

Page 95: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 95

Placement of Information associated with the KeyPlacement of Information associated with the Key

How to store associated information

In a data and index mingled file Once the key is found, no more disk access

required

In a separate file Larger number of keys per a page

Higher order, shallower tree

9.15 Buffering of Pages:Virtual B-Trees

Page 96: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 96

Variable Length Records and KeysVariable Length Records and Keys

A B-tree with variable length keys No single, fixed order A different criterion for over/underflow condition

Using max/min number of bytes (c.f. max/min number of keys)

Key promotion mechanism Shortest variable-length keys are promoted in

preference to longer ones Pages with the largest numbers of descendants up

high in the tree

9.16 Variable-Length Records and Keys

Page 97: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 97

Let’s Review !!!Let’s Review !!!

9.1 Introduction

9.2 Statement of the Problem

9.3 Indexing with Binary Search Trees

: AVL Trees, Paged Binary Trees, Problems with Paged Tress

9.4 Multilevel Indexing

9.5 B-Trees

9.6 Example of Creating a B-Tree

9.7 An Object-Oriented Representation of B-Trees

: Class BTreeNode , Class BTree

Page 98: File StructureSNU-OOPSLA Lab1 Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structures

File Structure SNU-OOPSLA Lab 98

Let’s Review !!!Let’s Review !!!

9.8 B-Tree Methods Search, Insert, and Others

9.9 B-Tree Nomenclature

9.10 Formal Definition of B-Tree Properties

9.11 Worst-case Search Depth

9.12 Deletion, Merging, and Redistribution

9.13 Redistribution During Insertion

9.14 B* Trees

9.15 Buffering of Pages : Virtual B-Trees

9.16 Variable-Length Records and Keys