b+ tree - csce.uark.edusgauch/4523/textbook_slides/b+trees.pdf · a binary search tree is similar...

31
B+ TREE

Upload: others

Post on 24-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

B+ TREE

Page 2: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

OUTLINE Preliminary terms Database organization Binary tree, Binary search tree, B tree, and B+ tree

B+ tree Definition

Searching

Insertion

Deletion

Page 3: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

PRELIMINARY TERMS File Organization: refers to the logical relationships among the various records existing in the file Page: logical unit of data transfer Sector: physical unit of storage

Blocking factor : how many record are in the block. (record size/page size)

Fixed length record vs. Variable length record

Spanned record storage vs. unspanned record

(spanned record: the last record in a page span over to next page: example: (1024/50=20.48) , floor(1024/50) = 20). Each page keeps 20 records.

Page 4: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.)

The following shows different types of file organization •  Heap file:

•  new records are added at the end of file •  Average search time: # of page / 2

•  Example: Page size: 1024 bytes Record size: 50 bytes # Record in a table: 10,000 to search for a record how many pages on average will be read? 1024/50 = 20 (#record per page) 10,000/20= 500 pages average search time: 500/2 = 250 pages

Student ID

Name

12 All 3 Paul 6 Kate 8 Larry 10 Vince

Page 5: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.) Ordered file:

•  Records are ordered on a key field •  Average search time: log n

(n pages, base of logarithm is 2) Example:

Page size: 512 bytes Record Size: 50 Blocking factor: 512/50= 10 Records: 20,000 20,000/10 = 2000 pages Log 2000 = 11

Student ID Name 5 7 9

Sam Bill Paul

12 14 16

Joe Mong Cherry

20 25 28

Vince Mike Hannah

Page 6: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.) Indexing methods:

•  Index has 2 columns: key refers to unique id, address refers to page no.

Student ID

Name

1 5 18

Will July Bob

9 12 25

Ben Sue Melisa

17 20 25

Pat Paul Craig

Key Address 1 101 5 101 18 101 9 102 12 102 25 102 17 103 20 103 25 103

Page 7: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.) Index file is already ordered Make an index based on the ID

•  Index has k pages •  Index search log k

If value found , the number of

pages that have been reviewed is

log k +1

Key Pointer 2 102 4 103 6 102 8 101 9 103 10 101 … …

Page 8: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.) Example: size of key field is 8 bytes size of pointer is 4 bytes

page size 512 bytes

blocking factor: 512/12 = 42

# of records: 2000

dense index: 2000/42=477 average searching index = log 477 = 9

9 + 1 = 10 pages

Page 9: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.) Two types of index file:

•  Dense index: number of records in index and original file are the same •  Non-dense index: index has fewer of records than the original table

Student ID

Name

1 5 8

Will July Bob

9 12 15

Ben Sue Melisa

17 20 25

Pat Paul Craig

Key Address 1 101 5 101 8 101 9 102 12 102 15 102 17 103

Key Address 1 101 9 102 17 103

Page 10: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.) Why we use index file?

•  Binary search •  Size of index is much smaller because index keeps two fields

Page 11: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DATABASE ORGANIZATION (CONT.) Suppose page size = 512 bytes

Record size = 25 bytes Blocking factor = 512/25 = 20 # records in file : 34000 # pages of the file : 34000/20 = 1700 pages

Heap file: average search time = 1700/2 = 850 Ordered file = log 1700 = 11

Index file: Size of key value = 6 bytes Size of pointer = 4 bytes Blocking factor = 512/10 = 51 # of records in index file: 34000 # of pages in index file: 34000/51 = 667 Searching index = log 667 = 10 If data found : log 667 + 1 If data not found : log 667

Page 12: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

BINARY TREE Consisting of a finite set of nodes that is either empty, or consists of one specially designated node called the root of the binary tree. Elements of two disjoint binary trees called the left subtree and right subtree of the root (sibling).

Page 13: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

BINARY TREE (CONT.) The number of subtrees of a node is called the degree of the node. In a binary tree, all nodes have degree 0, 1, or 2. A node of degree zero is called a terminal node or leaf node. A non-leaf node is often called a branch node.

The degree of a tree is the maximum degree of a node in the tree. A binary tree is degree 2. The level or depth of a node : the level of the root is zero; and the level of any other node is one higher than that of its parent. Or to put it another way, the level or depth of a node ni is the length of the unique path from the root to ni.

Page 14: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

BINARY TREE (CONT.) The height of ni is the length of the longest path from ni to a leaf. Thus all leaves in the tree are at height 0. The height of a tree is equal to the height of the root. The depth of a tree is equal to the level or depth of the deepest leaf; this is always equal to the height of the tree. If there is a directed path from n1 to n2, then n1 is an ancestor of n2 and n2 is a descendant of n1.

Page 15: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

BINARY TREE VS. BINARY SEARCH TREE (BST) Binary tree is a generic version of binary search tree and its not ordered. While constructing, binary tree we follow a rule that every node should have at most two children. Whereas in case of BST, along with at most two children rule, we follow below rules 1) All left descendants should possess smaller values than root value.

2) All right descendants should possess larger values than root value.

(BST are effective only when we are dealing with data which resides in main memory (RAM)).

Page 16: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

BST Example:

Page 17: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

MULTILEVEL INDEXING If we can change the base of logarithm, we improve the searching time A node is a page. If a node has k data values, it has k+1 node pointers In a node, Data values are ordered

In a non-leaf node pointer on left side of data value, points to a node having less data values and node pointer on right side of data values points having equal and higher data values Each non-leaf node points to floor((k+1)/2)-1 nodes

Each node includes floor((k+1)/2) data values

Page 18: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

B+ TREE

Page 19: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

B+ TREE B tree is a balanced tree because every path from the root node to a leaf node is the same length A balanced tree means that all searches for individual values require the same number of nodes to be read from the disc. B tree is an M-ary tree having large number of children per node

B trees store the full records in the nodes

B+ tree consists of a root, internal and leaves nodes

B+ tree can be viewed as a B-tree in which each node contains only key (not key-value) pairs Nodes are in the last or leaf level are connected (linked list)

Page 20: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

B+ TREE (CONT.) Each internal node in a B or B+ tree has

M pointers and M - 1 keys Order or branching factor of M

If the nodes are full (i.e., the tree is complete) depth = logMN where N is number of data items stored

A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored in the

nodes themselves It is not actually a B tree of order 2 since it is not guaranteed

to be balance, and B / B+ trees are balanced

Page 21: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

B+ TREE PROPERTIES A data structure often used in the implementation of database indexes Each node of the tree contains an ordered list of keys and pointers to lower level nodes in the tree These pointers can be thought of as being between each of the keys

The algorithms for insertion and deletion ensure that the tree is no taller than necessary, requiring that each node be at least half full

if a node can store a maximum of M keys and M+1 pointers

all nodes (except root) guaranteed to have M/2..M keys

height of tree maximum log (M/2) tall

Page 22: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

B TREE VS. B+ TREE B+ trees do not store data pointer in interior nodes, they are ONLY stored in leaf nodes. This is not optional as in B-Tree. This means that interior nodes can fit more keys on block of memory. The leaf nodes of B+ trees are linked, so doing a linear scan of all keys will requires just one pass through all the leaf nodes. A B tree, on the other hand, would require a traversal of every level in the tree. This property can be utilized for efficient search as well since data is stored only in leafs.

Page 23: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

SEARCHING Compare the key value with the data in the tree, then return the result For example find 45 and 15 in the B+ tree

Page 24: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

INSERTING INTO B+ TREE If the node is not full

insert the key; insert the data in the exterior data storage If the node is full then:

Rule 1: if node is leaf, then break down the node into two partitions. The first partition should hold ceil of (M-1)/2 key values, and the second one can hold rest of the key values where M is number of pointers, then copy the smallest key values from the second partition to parent node. Rule 2: if node is non-leaf, then break down the node into two partitions. The first partition should hold [ceil of (M/2)]-1 key values, and the second one can hold the rest. Move the smallest key value from the second partition to newly created parent node.

Page 25: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

INSERTION - EXAMPLE Insert 5, 10, 8, 9, 6, 12, 1, 11, 2, 4, 16, 18 (M = 4) Insert 2,5,7,10,13,16,20,22,23,24 (M=4 , number of pointers, 3 key values can be in a node at a maximum)

Page 26: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

INSERT 2,5,7,10,13,16,20,22,23,24

Page 27: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DELETION FROM B+ TREE Remove the required key and associated reference from the node. If the node still has enough keys and references to satisfy the invariants, stop. If the node has too few keys to satisfy the invariants, but its next oldest or next youngest sibling at the same level has more than necessary, distribute the keys between this node and the neighbor. Repair the keys in the level above to represent that these nodes now have a different “split point” between them; this involves simply changing a key in the levels above, without deletion or insertion. If the node has too few keys to satisfy the invariant, and the next oldest or next youngest sibling is at the minimum for the invariant, then merge the node with its sibling; if the node is a non-leaf, we will need to incorporate the “split key” from the parent into our merging. In either case, we will need to repeat the removal algorithm on the parent node to remove the “split key” that previously separated these merged nodes — unless the parent is the root and we are removing the final key from the root, in which case the merged node becomes the new root (and the tree has become one level shorter than before).

Page 28: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

DELETION EXAMPLE

Page 29: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

BRANCHING FACTOR OF B+ TREE Suppose

•  Page (node) size = 512 bytes •  Key value size = 5 bytes •  Node pointer size = 6 bytes •  Find branching factor of the B+ tree

Solution: lets assume a page (node) has n data values,

5*n + 6 (n+1) <=512

n = floor(506/11) = 46

Page 30: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

BRANCHING FACTOR OF B+ TREE (CONT.) Suppose Page (node) size = 1024 bytes Key value size = 10 bytes Node pointer size = 5 bytes A table has 15,000 records Find branching factor and number of levels of the B+ tree Solution: lets assume a page (node) has n data values 10*n + 5 (n+1) <=1024 n = floor(1019/15) = 67 B+ tree has three levels Search through 4 pages if data has been found Search through 3 pages if data has not been found

Level Node Key Pointers Root 1 67 68 Level 2 68 67*68 68*68 Level 3 68*68 68*68*67 68*68*68

Page 31: B+ TREE - csce.uark.edusgauch/4523/textbook_slides/B+Trees.pdf · A Binary search tree is similar to a B Tree where M is 2 Note: it is a B tree, not a B+ tree, since data is stored

HANDY VIDEOS Dr. Doug Fisher's B+ tree basics Dr. Doug Fisher's B+ tree insertion

TechGuiders deletion examples

Dr. Doug Fisher's B+ trees as database indices