§6 b+ trees

13
§6 B+ Trees Definition A B+ tree of order M is a tree with the following structural properties: (1) The root is either a leaf or has between 2 an d M children. (2) All nonleaf nodes (except the root) have betw een M/2 and M children. (3) All leaves are at the same depth. Assume each nonroot leaf also has between M/2 and M chi ldren. 12 15 25 31 41 59 84 91 1,4,8,11 12,13 15,18,19 21,24 25,26 31,38 41,43,46 48,49,50 59,68 72,78 84,88 91,92,99 21 48 72 A B+ tree of order 4 (2-3-4 tree) All the actual data are stored at the leaves. Each interior node contains M pointers to the children. And M 1 smal lest key valu es in the sub trees except the 1 st one. < < < < < < < < < < < 1/13

Upload: jenis

Post on 19-Jan-2016

30 views

Category:

Documents


2 download

DESCRIPTION

21. 25. 31. 48. 72. 41. 12. 15. 59. 84. 91. 1,4,8,11. 12 ,13. 15 ,18,19. 21,24. 25 ,26. 31 ,38. 41 ,43,46. 48,49,50. 59 ,68. 72,78. 84 ,88. 91 ,92,99.

TRANSCRIPT

Page 1: §6  B+ Trees

§6 B+ Trees【 Definition 】 A B+ tree of order M is a tree with the followi

ng structural properties:(1) The root is either a leaf or has between 2 and M children.(2) All nonleaf nodes (except the root) have between M/2 and

M children.(3) All leaves are at the same depth.Assume each nonroot leaf also has between M/2 and M children.

12 15 25 31 41 59 84 91

1,4,8,11 12,13 15,18,19 21,24 25,26 31,38 41,43,46 48,49,50 59,68 72,78 84,88 91,92,99

21 48 72

A B+ tree of order 4(2-3-4 tree)

All the actual data are stored at the leaves.

Each interior node contains M

pointers to the children.

And M 1 smallest key values in the subtrees except the 1st

one.

< < < < < < < < < < <

1/13

Page 2: §6  B+ Trees

§6 B+ Trees

A B+ tree of order 3(2-3 tree)

8,11,12 16,17 22,23,31 41,52 58,59,61

16: 41:58

22:

Find: 52 Insert: 18

16,17,18

Insert: 1

11,12 22,23,31 41,52 58,59,61

11:16 41:58

22:

16,17,181, 8

Insert: 19

11,12 22,23,31 41,52 58,59,61

11: 41:58

16:22

16,171, 8 18,19

18:

Insert: 28

2/13

Page 3: §6  B+ Trees

§6 B+ Trees

11,12 22,23 41,52 58,59,61

11: 58:

16:

16,171, 8 18,19

18:

28,31

28:

22:

41:

Insert: 70

First find a sibling with 2 keys and adjust. Keep more nodes

full.

Deletion is similar to insertion except that the root is removed when it loses two children.

3/13

Page 4: §6  B+ Trees

§6 B+ Trees

For a general B+ tree of order M

Btree Insert ( ElementType X, Btree T ) {

Search from root to leaf for X and find the proper leaf node;Insert X;while ( this node has M+1 keys ) {

split it into 2 nodes with (M+1)/2 and (M+1)/2 keys, respectively;

if (this node is the root) create a new root with two children; check its parent;

}}

Home work:

p.138 4.36 Access a 2-3 tree

Discussion 7:Discussion 7: Depth( Depth(MM, , NN) = ? ) = ? TTinsertinsert = ? = ? TTfindfind = ? = ?

4/13

Page 5: §6  B+ Trees

Research Project 3Family of B Trees (23)

In computer science, there is a family of B trees – B- trees, B+ trees, B* trees, B# trees, and B x-trees. They are tree data structures that keep data sorted and allow searches, insertions, and deletions in logarithmic (amortized) time. In this project, you are supposed to introduce the B- trees and compare it with B+ trees.

Detailed requirements can be downloaded fromhttp://acm.zju.edu.cn/dsaa/

5/13

Page 6: §6  B+ Trees

Research Project 4Tries (23)

A trie is an index structure that is particularly useful when the keys vary in length. It is also called a prefix tree, and is used to store an associative array where the keys are usually strings. In this project, you are supposed to introduce the tries and compare with ordinary binary search trees.

Detailed requirements can be downloaded fromhttp://acm.zju.edu.cn/dsaa/

6/13

Page 7: §6  B+ Trees

Inverted File Index

How can I find in which retrieved web pages that include

"Computer Science"?

7/13

Page 8: §6  B+ Trees

Inverted File Index

Solution 1: Scan each page for the string "Computer Science".

How did Google do?

8/13

Page 9: §6  B+ Trees

Inverted File Index

Solution 2: Inverted File Index

【 Definition 】 Index is a mechanism for locating a given term in a text.

【 Definition 】 Inverted file contains a list of pointers (e.g. the number of a page) to all occurrences of that term in the text.

〖 Example〗 Document sets

Doc Text

1 Gold silver truck

2 Shipment of gold damaged in a fire

3 Delivery of silver arrived in a silver truck

4 Shipment of gold arrived in a truck

No. Term Times; Documents

1 a <3; 2,3,4>2 arrived <2; 3,4>3 damaged <1; 2>4 delivery <1; 3>5 fire <1; 2>6 gold <3; 1,2,4>7 of <3; 2,3,4>8 in <3; 2,3,4>9 shipment <2; 2,4>

10 silver <2; 1,3>11 truck <3; 1,3,4>

Inverted File Inde

x

silver truck

9/13

Page 10: §6  B+ Trees

Inverted File Index

Discussion 8:Discussion 8: How to easily print the How to easily print the sentences which contain the words and highlight sentences which contain the words and highlight the words?the words?

Doc Text

1 Gold silver truck

2 Shipment of gold damaged in a fire

3 Delivery of silver arrived in a silver truck

4 Shipment of gold arrived in a truck

No. Term Times; Documents

1 a <3; 2,3,4>2 arrived <2; 3,4>3 damaged <1; 2>4 delivery <1; 3>5 fire <1; 2>6 gold <3; 1,2,4>7 of <3; 2,3,4>8 in <3; 2,3,4>9 shipment <2; 2,4>

10 silver <2; 1,3>11 truck <3; 1,3,4>

10/13

Page 11: §6  B+ Trees

Inverted File Index

Word Stemming

Process a word so that only its stem or root form is left.

Processprocessingprocessesprocessed

processsayssaidsaying

say

〖 Example〗

Stop Words

Some words are so common that almost every document contains them, such as “a” “the” “it”. It is useless to index them. They are called stop words. We can eliminate them from the original documents.

11/13

Page 12: §6  B+ Trees

Inverted File Index

Access Methods

Solution 1: Search trees ( B- trees, B+ trees, Tries, ... )

Solution 2: Hashing

Discussion 9:Discussion 9: What are the pros and cons of using hashing?What are the pros and cons of using hashing?

Discussion 10:Discussion 10: How to improve the How to improve the qualityquality of search results? of search results?

12/13

Page 13: §6  B+ Trees

Research Project 5

Roll Your Own Mini Search Engine

In this project, you are supposed to create your own mini search engine which can handle 1 million inquiries over 100 files in 1 second. You may download the functions for handling stop words and stemming from the Internet.

Detailed requirements can be downloaded fromhttp://acm.zju.edu.cn/dsaa/

13/13