hashing & indexing

58
INDEX (CHAPTER 12) INDEX (CHAPTER 12) 9/23/2007 1

Upload: anasua-bhattacharyya

Post on 28-Mar-2015

1.022 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: hashing & indexing

INDEX (CHAPTER 12)INDEX (CHAPTER 12)

9/23/20071

Page 2: hashing & indexing

TOPICS

• Basic concepts

• Hashing• Hashing

• B+-tree

9/23/20072

Page 3: hashing & indexing

INTRODUCTION

• Review

Conceptual E-R data model

Logical Relational data model SQL

Physical relation = a fileOrg. of records on a disk pageOrganization of attributes within a recordI d FilIndex Files

9/23/20073

Page 4: hashing & indexing

Software Architecture of a DBMS

Query Parser

Query Interpretor

Query Optimizer

Relational Algebra operators: ∏, σ, ρ, δ, ←, ∪, ∩, ÷, −

Abstraction of records

Index structures

Relational Algebra operators: ∏, σ, ρ, δ, ←, ∪, ∩, ÷,

File System

Buffer Pool Manager

9/23/20074

Page 5: hashing & indexing

Implementation of б

SS# N A S l d

• Emp table:

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 B b 22 27000 43 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

• бSalary=30,000(Employee)SS# Name Age Salary dno

4 Kathy 30 30000 5

• Process the select operator using a file scan (linear scan)F1 = Open the file corresponding to EmployeeF1 Open the file corresponding to EmployeeP = read first page of F1While P is not null

For each record in P, if the record satisfies the selection predicate then produce as outputP = read next page of F1 /* P becomes null when EoF is reached */

9/23/20075

P = read next page of F1 /* P becomes null when EoF is reached */

Page 6: hashing & indexing

Implementation of б

SS# N A S l d

• Emp table:

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 B b 22 27000 43 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

• бSalary=30,000(Employee)SS# Name Age Salary dno

4 Kathy 30 30000 5

• Process the select operator using a file scan (linear scan)F1 = Open the file corresponding to Employee

Fetch the page from disk if not in the buffer pool

F1 Open the file corresponding to EmployeeP = read first page of F1While P is not null

For each record in P, if the record satisfies the selection predicate then produce as outputP = read next page of F1

9/23/20076

P = read next page of F1

Page 7: hashing & indexing

Implementation of б

SS# N A S l d

• Emp table:

SS# Name Age Salary dno

1 Joe 24 20000 2

2 Mary 20 25000 3

3 B b 22 27000 43 Bob 22 27000 4

4 Kathy 30 30000 5

5 Shideh 4 4000 1

• бSalary=30,000(Employee)SS# Name Age Salary dno

4 Kathy 30 30000 5

• Process the select operator using a file scan (linear scan)F1 = Open the file corresponding to Employee

Header

F1 Open the file corresponding to EmployeeP = read first page of F1While P is not null

For each record in P, if the record satisfies the selection predicate then produce as outputP = read next page of F1

9/23/20077

P = read next page of F1

Page 8: hashing & indexing

TERMINOLOGY

• An exact match selection predicate: бSalary=30,000(Employee) , бFirstName=“Shideh”(Employee)бFirstName= Shideh (Employee)

• A range selection predicate: б (Employee) б (Employee)• A range selection predicate: бSalary>30,000(Employee) , бSalary<30,000(Employee), бSalary>30,000 and Salary < 32,000 (Employee)

9/23/20078

Page 9: hashing & indexing

INTRODUCTION (Cont…)( )

• Motivation: Speed-up those queries that reference only a small portion of the records in a file.

• Analogy: Catalog cards in the library (more than one index).

• Evaluation:1. Access time (find)2. Insertion time (find + add)3. Deletion time (find + delete)4 S h d4. Space overhead

• Search-key: The attribute (or set of attributes) used to lookup records in a file

• Primary index: The index whose search key specifies the sequential order of y y p qthe records within a file.

• Secondary index: The index whose search key does not specify the sequential order of the records within a file.

9/23/20079

Page 10: hashing & indexing

INTRODUCTION (Cont…)( )

• Example:

Alaska Alaska Bob 12 AliceState Name Age Other!

AlaskaAlaskaArizonaCaliforniaCalifornia

Alaska Bob 12 ...Alaska George 28Arizona David 48California Hellen 20California Jack 37

AliceBobCharlesDavidDavid

FloridaFloridaIndianaOhio

Florida Frank 10Florida Charles 4Indiana Joe 12Ohio Alice 23

FrankGeorgeHellenJack

• Assume, size of disk page = 2 data records = 5 index records.

I d i t i d i ?

Ohio Ohio David 36 Joe

• Indexing or not indexing?SELECT age SELECT ageFROM personnel FROM personnelWHERE name = “Alice” WHERE name = “Don”

9/23/200710

WHERE name Alice WHERE name Don

Page 11: hashing & indexing

INTRODUCTION (Cont…)( )

• Example:Alaska Alaska Bob 12 Alice

State Name Age Else!AlaskaAlaskaBostonCaliforniaCalifornia

Alaska Bob 12 ...Alaska George 28Boston David 48California Hellen 20California Jack 37

AliceBobCharlesDavidDavid

FloridaFloridaIndianaOhio

Florida Frank 10Florida Charles 4Indiana Joe 12Ohio Alice 23

FrankGeorgeHellenJack

• Assume, size of disk page = 2 data records = 5 index records.

Ohio Ohio David 36 Joe

• Primary vs. Secondary”SELECT name SELECT ageFROM personnel FROM personnelWHERE t t “Ohi ” WHERE “D id”

9/23/200711

WHERE state = “Ohio” WHERE name = “David”

Page 12: hashing & indexing

INTRODUCTION (Cont…)( )

• Example: (page = 2 data = 5 index)

Al k Al k B b 12 AliState Name Age Else!

AlaskaAlaskaBostonCaliforniaCalifornia

Alaska Bob 12 ...Alaska George 28Boston David 48California Hellen 20California Jack 37

AliceBobCharlesDavidDavidCalifornia

FloridaFloridaIndianaOhio

California Jack 37Florida Frank 10Florida Charles 4Indiana Joe 12Ohio Alice 23

DavidFrankGeorgeHellenJack

• Exact match vs. RangeSELECT name SELECT name

Ohio Ohio David 36 Joe

FROM personnel FROM personnelWHERE state = “California” WHERE state >= “Alaska” and

state <= “Florida”

Speed p b emplo ing binar search (is it possible?)

9/23/200712

• Speedup by employing binary search (is it possible?)

Page 13: hashing & indexing

Dense Index Files

• Dense index — Index record appears for every search-key value in the file.

9/23/200713

Page 14: hashing & indexing

Example of Sparse Index Filesp p

9/23/200714

Page 15: hashing & indexing

Multilevel Index

9/23/200715

Page 16: hashing & indexing

HASHING

Hash function:• K: the set of all search key valuesK: the set of all search key values• V: the set of all bucket address• h(K): K V• K is large (perhaps infinite) but set of search-key values actually stored in theK is large (perhaps infinite) but set of search key values actually stored in the

database is much smaller than K.• Fast lookup: To find Ki, search the bucket with h(Ki) address.

9/23/200716

Page 17: hashing & indexing

HASHING (Cont…)( )

• Example:– K = salary (set of all 6 digit integers)y ( g g )– V = 1000 buckets addressed from 0 to 999– h(k) = k mod 1000.SELECT nameFROM personnelWHERE salary = “120,100”

• To find a 120 100 salary we should search bucket number 100• To find a 120,100 salary, we should search bucket number 100.• Hash is only appropriate for Exact match queries.• A bad hash function maps the value to a subset of (or a few) buckets (e.g., h(k)

= k mod 10 k mod 10.

9/23/200717

Page 18: hashing & indexing

HASHING (Cont…)( )

• Clustered Hash Index– The index structure and its buckets are represented as a file (say file.hash)p ( y )– The relation is stored in file.hash (I.e., each entry in file.hash corresponds to a

record in relation)– Assuming no duplicates: the record can be accessed in 1 IO.

N l d H h I d• Non-clustered Hash Index:– The index structure and its buckets are represented as a file (say file.hash)– The relation remains intact

Each entry in file hash has the following format: (search key value RID)– Each entry in file.hash has the following format: (search-key value, RID)– Assuming no duplicates: the record can be accessed in 2 IO.

9/23/200718

Page 19: hashing & indexing

HEAP FILE ORGANIZATION

• Assume a student table: Student(name, age, gpa, major)t(Student) = 16P(Student) = 4( )

Bob, 21, 3.7, CS Kane, 19, 3.8, ME Louis, 32, 4, LS Chris, 22, 3.9, CSBob, 21, 3.7, CS

Mary, 24, 3, ECE

Tom, 20, 3.2, EE

Kane, 19, 3.8, ME

Lam, 22, 2.8, ME

Chang, 18, 2.5, CS

Louis, 32, 4, LS

Martha, 29, 3.8, CS

James, 24, 3.1, ME

Chris, 22, 3.9, CS

Chad, 28, 2.3, LS

Leila, 20, 3.5, LSTom, 20, 3.2, EE

Kathy, 18, 3.8, LS

Chang, 18, 2.5, CS

Vera, 17, 3.9, EE

James, 24, 3.1, ME

Pat, 19, 2.8, EE

Leila, 20, 3.5, LS

Shideh, 16, 4, CS

9/23/200719

Page 20: hashing & indexing

Non-Clustered Hash Index• A non-clustered hash index on the age attribute with 4 buckets• A non-clustered hash index on the age attribute with 4 buckets, • h(age) = age % B

(21, (1, 1))

(24, (1, 2))(32, (3,1))(20 (1 3))

(20, (4,3))(16, (4,4))(24 (3 3))

( ( ))(17, (2,4))(29, (3,2))

(20, (1,3))

(18 (1 4))

(28, (4,2))012

(24, (3,3))

(18, (1, 4))(22, (2,2))(22, (4,1))

(19, (2, 1))

23

(19, (3, 4))(18 (2 3))

B b 21 3 7 CS K 19 3 8 ME L i 32 4 LS Ch i 22 3 9 CS

(18, (2,3))

Bob, 21, 3.7, CS

Mary, 24, 3, ECE

Tom 20 3 2 EE

Kane, 19, 3.8, ME

Lam, 22, 2.8, ME

Chang 18 2 5 CS

Louis, 32, 4, LS

Martha, 29, 3.8, CS

James 24 3 1 ME

Chris, 22, 3.9, CS

Chad, 28, 2.3, LS

Leila 20 3 5 LS

9/23/200720

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS

Chang, 18, 2.5, CS

Vera, 17, 3.9, EE

James, 24, 3.1, ME

Pat, 19, 2.8, EE

Leila, 20, 3.5, LS

Shideh, 16, 4, CS

Page 21: hashing & indexing

Clustered Hash Index• A clustered hash index on the age attribute with 4 buckets• A clustered hash index on the age attribute with 4 buckets, • h(age) = age % B

Bob, 21, 3.7, CS

Mary, 24, 3, ECE

T 20 3 2 EELouis, 32, 4, LS

J 24 3 1 MELeila, 20, 3.5, LSShideh, 16, 4, CS

Tom, 20, 3.2, EE

K h 18 3 8 LS

Vera, 17, 3.9, EEMartha, 29, 3.8, CS

James, 24, 3.1, ME

Chad, 28, 2.3, LS012 Kathy, 18, 3.8, LS

Kane, 19, 3.8, MELam, 22, 2.8, ME

Ch 18 2 CSPat, 19, 2.8, EE

Chris, 22, 3.9, CS

23

Chang, 18, 2.5, CS

9/23/200721

Page 22: hashing & indexing

Non-Clustered Hash Index• A non-clustered hash index on the age attribute with 4 buckets 500• A non-clustered hash index on the age attribute with 4 buckets, • h(age) = age % B• Pointers are page-ids

(21, (1, 1))

(24, (1, 2))(32, (3,1))(20 (1 3))

(20, (4,3))(16, (4,4))(24 (3 3))

5001001

( ( ))(17, (2,4))(29, (3,2))

(20, (1,3))

(18 (1 4))

(28, (4,2))012

(24, (3,3))

7065001001706 (18, (1, 4))

(22, (2,2))(22, (4,1))

(19, (2, 1))

23

(19, (3, 4))(18 (2 3))

101706101

B b 21 3 7 CS K 19 3 8 ME L i 32 4 LS Ch i 22 3 9 CS

(18, (2,3))

Bob, 21, 3.7, CS

Mary, 24, 3, ECE

Tom 20 3 2 EE

Kane, 19, 3.8, ME

Lam, 22, 2.8, ME

Chang 18 2 5 CS

Louis, 32, 4, LS

Martha, 29, 3.8, CS

James 24 3 1 ME

Chris, 22, 3.9, CS

Chad, 28, 2.3, LS

Leila 20 3 5 LS

9/23/200722

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS

Chang, 18, 2.5, CS

Vera, 17, 3.9, EE

James, 24, 3.1, ME

Pat, 19, 2.8, EE

Leila, 20, 3.5, LS

Shideh, 16, 4, CS

Page 23: hashing & indexing

Clustered Hash Index (SEQUENTIAL LAYOUT)• A clustered hash index on the age attribute with 4 buckets• A clustered hash index on the age attribute with 4 buckets, • h(age) = age % 4• When the number of buckets are known in advance, the system may

assume a sequentially laid file to eliminate the need for the hash directory.assume a sequentially laid file to eliminate the need for the hash directory.

Leila, 20, 3.5, LSShideh, 16, 4, CS

James, 24, 3.1, ME, , ,

M 24 3 ECE Bob, 21, 3.7, CSMary, 24, 3, ECE

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS Kane, 19, 3.8, MELam, 22, 2.8, MEVera, 17, 3.9, EELouis, 32, 4, LS

Martha, 29, 3.8, CS

Pat, 19, 2.8, EEChris, 22, 3.9, CS

Ch d 28 2 3 LS

9/23/200723

Chang, 18, 2.5, CSChad, 28, 2.3, LS

Page 24: hashing & indexing

Clustered Hash Index (SEQUENTIAL LAYOUT)• A clustered hash index on the age attribute with 4 buckets• A clustered hash index on the age attribute with 4 buckets, • h(age) = age % 4• When the number of buckets are known in advance, the system may

assume a sequentially laid file to eliminate the need for the hash directory.assume a sequentially laid file to eliminate the need for the hash directory.

Leila, 20, 3.5, LSShideh, 16, 4, CS

Offset (bucket-id –1) times page size is for bucket id

James, 24, 3.1, ME, , ,

Offset 0 is for bucket 0

bucket-id

M 24 3 ECE

Offset Page Size is for bucket 1

Bob, 21, 3.7, CSMary, 24, 3, ECE

Tom, 20, 3.2, EE

Kathy, 18, 3.8, LS Kane, 19, 3.8, MELam, 22, 2.8, MEVera, 17, 3.9, EELouis, 32, 4, LS

Martha, 29, 3.8, CS

Pat, 19, 2.8, EEChris, 22, 3.9, CS

Ch d 28 2 3 LS

9/23/200724

Chang, 18, 2.5, CSChad, 28, 2.3, LS

Page 25: hashing & indexing

Bucket Block address

0Number on disk

12

M-2M-1

9/23/200725

Page 26: hashing & indexing

Example of Non-Clustered Hash Index

9/23/200726

Page 27: hashing & indexing

Main buckets Overflow buckets

340460

Main buckets

981 Record pointer

Record pointer

Overflow buckets

01 460

Record pointer181 Record pointer

Record pointer12

32176191

551 Record pointer

Record pointer91Record pointer

22

Record pointer

Record pointer

72522

Record pointer

9

9/23/200727

p

Page 28: hashing & indexing

Bucket Block address

0Number on disk

12

M-2M-1

9/23/200728

Page 29: hashing & indexing

Example of Hash Index

9/23/200729

Page 30: hashing & indexing

Main buckets Overflow buckets

340460

Main buckets

981 Record pointer

Record pointer

Overflow buckets

01 460

Record pointer182 Record pointer

Record pointer12

32176191

552 Record pointer

Record pointer91Record pointer

22

Record pointer

Record pointer

72522

Record pointer

9

9/23/200730

p

Page 31: hashing & indexing

HASHING (Cont…)( )

• Loading factor– B = # of buckets, S = # of records per bucket, R = # of records in the relation, p ,– loading - factor = R / (B×S)– The loading factor should not exceed 80%, if that happens, double B and re-hash.

• Why a bucket might overflow?– Heavy loading of the file– Poor hash functions– Statistical peculiarities

If b k t fl ?• If a bucket overflows?– Chaining: chain an empty bucket to the bucket that overflows.– Open addressing: If bucket h(k) is full, store the record in h(k) + 1, if that is also

full, try h(k) + 2, and so on., y ( ) ,– Two hash functions: If bucket h(k) is full, store the record in h’(k).

9/23/200731

Page 32: hashing & indexing

HASHING (Cont…)( )

• Problem: The file grows and shrinks over time. Hence, how one should choose the hash function:1. Based on current file size performance degradation as DB grows2. Based on anticipated file size waste space initially (and reduced buffer hits)3. Periodical reorganization time consuming

3.1. Choose new hash function3.2. Recompute hash value on every record3.3. Generate new bucket assignments

S l ti• Solution:– Dynamic hash functions: dynamic modification of h to accommodate growth and

shrinkage of the DB. (e.g., extendible hashing)

9/23/200732

Page 33: hashing & indexing

HASHING (Cont…)( )

Extendible hashing• Choose a hash function (h) such that it results in a b (b = 32) bit binaryChoose a hash function (h) such that it results in a b (b 32) bit binary

number.• The directory has a header that contains its depth, d.• Each directory entry points to a hash bucket.y y p• Buckets are created on demand, as records are inserted.• Each bucket contains a local depth used to find data.

Directory depth

200

directory

Directory depth

1 bucket

01

10

11siblings

9/23/200733

Page 34: hashing & indexing

HASHING (Cont…)( )

Extendible hashing (continued):• Every time a bucket overflows, its local depth is increased. If the local depth isEvery time a bucket overflows, its local depth is increased. If the local depth is

greater than the depth of the directory, the directory’s depth is increased, causing the directory to double in size.

• Each directory entry has one sibling or buddy. Two entries are buddies if they have identical bit patterns except for the dth bit.

• Every time a bucket overflows, its local depth is increased.• If the local depth is greater than the depth of the directory, then the directory’s

d th i i d i th di t t d bl i idepth is increased, causing the directory to double in size.• A bucket can overflow at any desired loading factor. That is, a split might

happen every time a bucket is 80% full.

9/23/200734

Page 35: hashing & indexing

HASHING (Cont…)( )

• Retrieval with Extendible hashing:

Retrieve (K )Retrieve (K0)1. Calculate h’ = h(K0)2. Read depth d of the directory3. Interpret the d initial bits of h’ as an integer base 2, term this r.p g ,4. Retrieve the bucket pointed to by the rth entry5. Find the record in this bucket

5.1. If a hashing technique is used to organize the records in a bucket, use the d bits d fi d h b kdefined on that bucket5.2. If necessary, follow the collision resolution scheme within this bucket.

9/23/200735

Page 36: hashing & indexing

HASHING (Cont…)( )

• Insertion with Extendible hashing:

Insert (K )Insert (K0)1. Apply the first four steps of Retrieve (K0) to find bucket b.2. If the insertion of K0 into b result in no overflow then Insert K0 into b and return3. Otherwise, obtain a new bucket b’,4. Set the local depth of b’ and b to equal (local depth of b + 1)5. If the new depth is NOT greater than the depth of the directory

5.1. Distinguish between b and b’ using their new d and set the appropriate (i ) f h di i hentry(ies) of the directory to point to each

5.2. Rehash the entries in bucket b and assign each individual entry to the appropriate bucket b or b’5.3. Insert (K0)( 0)

6. If the new depth is greater than the depth of the directory6.1 Increase the depth of the directory, doubling its size6.2. Set each entry and its buddy to point to the old bucket that it was pointing to

9/23/200736

6.3. Insert (K0)

Page 37: hashing & indexing

HASHING (Cont…)( )

• Deletion with Extendible hashing:• Delete (K0)Delete (K0)

1. Apply the first four steps of Retrieve (K0) to find bucket b.2. If K0 is not b then return with value no found3. Otherwise, delete the entry corresponding to K0

4. If the sum of the number of entries on this page and its sibling page are below the size of a bucket then:4.1. Copy the entries in the two buckets into one bucket b’4 2 Depth of b’ = (depth of b 1)4.2. Depth of b = (depth of b - 1)4.3. Free bucket b and its sibling4.4. Locate the two hash directory entries pointing to b and its buddy. Set these two

pointers to b’4.5. If every pointer in the directory equals its sibling pointer then decrease the

depth of the directory by one and set each entry in an obvious manner.

9/23/200737

Page 38: hashing & indexing

Use of Extendable Hash Structure: Example

9/23/200738Initial Hash structure, bucket size = 2

Page 39: hashing & indexing

Example (Cont.)p ( )

• Hash structure after insertion of one Brighton and two Downtown records

9/23/200739

Page 40: hashing & indexing

Example (Cont.)p ( )Hash structure after insertion of Mianus record

9/23/200740

Page 41: hashing & indexing

Example (Cont.)

H h t t ft i ti f th P id d

9/23/200741

Hash structure after insertion of three Perryridge records

Page 42: hashing & indexing

Example (Cont.)p ( )

• Hash structure after insertion of Redwood and Round Hill records

9/23/200742

Page 43: hashing & indexing

HASHING (Cont…)( )

• Extendible hashing:The insertion algorithm of extendible hashing might crash whenThe insertion algorithm of extendible hashing might crash when

9/23/200743

Page 44: hashing & indexing

HASHING (Cont…)( )

Hashing vs. Indexing• Hashing is appropriate for exact match queries: (cannot support range queries)Hashing is appropriate for exact match queries: (cannot support range queries)

SELECT A1, A2, …FROM rWHERE (Ai = c)WHERE (Ai c)

• Indexing is appropriate for both range and exact match queries:SELECT A1, A2, …FROM rFROM rWHERE (Ai <= c1) and (Ai > c2)

9/23/200744

Page 45: hashing & indexing

Examplep

• Suppose that we are using extendable hashing on a file that contains records with the following search key values:g y

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

Show the extendable hash structure for this file if hash function is

h(x) = x mod 8 and buckets can hold three records( )

9/23/200745

Page 46: hashing & indexing

B+-TREE

• B+-tree is a multi-level tree structured directory

….

Root

Internal Nodes

... ... Leaf Nodes

• Clustered: Leaf nodes contain the records themselves

Data File

Clustered: Leaf nodes contain the records, themselves.

9/23/200746

Page 47: hashing & indexing

B+-TREE (Cont…)( )

• Non-clustered: Leaf nodes contain the pairs (P, K), where P is a pointer to the record in the file and K is a search-key.y

9/23/200747

Page 48: hashing & indexing

B+-TREE (Cont…)( )

• Leaf nodes

P K P P K P

– Maintain between to n-1 values per leaf.– If i < j then Ki < Kj

P1 K1 P2 . . . Pn-1 Kn-1 Pn

(n-1)2

i j

5 7 10 (n = 4)

– Every search-key value in the file appears in some leaf node.– Suppose Li and Lj are two leaves and i < j, then every search value in Li is less than

every search value in Lj.

5 7 10 15 17 18

9/23/200748

Page 49: hashing & indexing

B+-TREE (Cont…)( )

• Internal nodes– Maintain between to n pointers per internal node

n2 p p

– root is an exception: It must have more than one pointer.– Suppose a node with m pointers and 2<= i < m:

1. Pi points to subtree containing search-key values < Ki and >= Ki-1.2. Pm points to subtree containing search-key values >= Km-1.3. P1 points to subtree containing search-key values < K1.

5 7 105 7 10

2 3 5 6 10 10 11

9/23/200749

Page 50: hashing & indexing

To calculate the order n of a B+-treeTo calculate the order n of a B tree

• Suppose that the search key field is V = 9 bytes long, the block size is B=512 bytes a record pointer is P = 7 bytesblock size is B=512 bytes, a record pointer is Pr = 7 bytes, and a block pointer is P = 6 bytes. 1. Calculate order of the internal nodes2. Calculate order of the leaf nodes

9/23/200750

Page 51: hashing & indexing

To calculate the order n of a B+-treeTo calculate the order n of a B tree

• Suppose that the search key field is V = 9 bytes long, the block size is B=512 bytes a record pointer is P = 7 bytesblock size is B=512 bytes, a record pointer is Pr = 7 bytes, and a block pointer is P = 6 bytes. 1. Calculate order of the internal nodes

• An internal node Of a B+-tree can have up to n tree i d h k lpointers and n-1 search key values

• (n * P) + ((n-1) * V) <= B• (n * 6) +((n 1) * 9) <= 512• (n * 6) +((n-1) * 9) <= 512• (15 * n ) <= 521• n = 34

9/23/200751

n 34

Page 52: hashing & indexing

To calculate the order n of a B+-treeTo calculate the order n of a B tree

• Suppose that the search key field is V = 9 bytes long, the block size is B=512 bytes a record pointer is P = 7 bytesblock size is B 512 bytes, a record pointer is Pr 7 bytes, and a block pointer is P = 6 bytes.

C l l d f h l f dCalculate order of the leaf nodes• The leaf nodes of the B+-tree will have the same

number of values and pointers, except that the pointers p , p pare data pointers and a next pointer.

• (nleaf * (Pr + V)) + P <= B• (n * (7 + 9)) + 6 <= 512• (nleaf * (7 + 9)) + 6 <= 512• (16 * nleaf ) <= 506• nleaf = 31

9/23/200752

leaf

Page 53: hashing & indexing

B+-TREE (Cont…)( )

• Lookup 30

8 41 50

4 7 10 20 30 40 41 47 50 52

471020

30404147

5052

– Find 7: 4 Ios– Find 4-20: 4 IOs (assuming primary index), 8 IOs (assuming secondary index)– More than 10% selection: it is more efficient to do sequential scan (do not use the

d i d )secondary index).– Example: 10,000 records, select 1000 of them, 1000 records per disk page:

(Sequential search: 10 IOs, Secondary index: potentially 1000+ IOs)

9/23/200753

Page 54: hashing & indexing

B+-TREE (Cont…)( )

• Analysis– “B” in B+-tree stands for Balanced. i.e., the length of every path from the root to a , g y p

leaf node is the same.– Hence, good performance for lookup, insertion, and deletion– K: number of search key values in a file, then the path is < log (K).

#K 1 000 000 d 10 100 th t t 3 t 9 d b dn2

– #K = 1,000,000, and 10 <= n <= 100 then at most 3 to 9 nodes be accessed.– Insertion and Deletion should not destroy the balance of the tree.

9/23/200754

Page 55: hashing & indexing

B+-TREE (Cont…)( )

8 25

n = 4;Internal nodes: 2 to 4 pointersLeaf nodes: 2 to 3 values

10 204 7 30 40

8 25

10 204 7 30 40 41

Insert 41

10 204 7 30 40 41

30 40 41 47

Insert 47

8 25 41

30 40 41 47

8 25

10 204 7 30 40 41 47

41

9/23/200755

Page 56: hashing & indexing

B+-TREE (Cont…)( )

Insert 508 25

10 204 7 30 40 41 47

41

50

Insert 5241 47 50 52

8 25 41 50

41

258 50

10 204 7 30 40 41 47 50 52

9/23/200756

Page 57: hashing & indexing

B+-TREE (Cont…)( )30

8 41 50

D l 20

10 204 7 30 40 41 47 50 52

Delete 20 30

8 41 50

4 7 30 40 41 47 50 5210

30 41

104 7

50

41 47 5030 40 52

9/23/200757

Page 58: hashing & indexing

ExampleExample

• Construct a B+- tree for the following set of lvalues:

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

• Assume n = 4 (number of pointers)– Inner nodes : 4 to 2 children– Leaf nodes : 3 to 2 values

9/23/200758