1 yet more on indexes hash tables source: our textbook, slides by hector garcia-molina

33
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

Upload: breanna-windon

Post on 31-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

1

Yet More on Indexes

Hash Tables

Source: our textbook, slides by Hector Garcia-Molina

Page 2: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

2

Main Memory Hash Tables

A hash function h maps search keys to integers in some range 0 to B-1

B is the number of buckets There is a B-element array, each

entry holds a pointer to a linked list Record with key k is put in the

linked list that starts at entry h(k) of B.

Page 3: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

3

Example of Hash Table

0

1

2

3

4

15 10

22

104 29

34

B = 5

h(k) = k mod 5

Page 4: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

4

Changes for Secondary Storage

Bucket array contains blocks, not pointers to linked lists

Records that hash to a certain bucket are put in the corresponding block

If a bucket overflows then start a chain of overflow blocks

Page 5: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

5

Insertion into Static Hash Table

To insert a record with key K: compute h(K) insert record into one of the blocks

in the chain of blocks for bucket number h(K), adding a new block to the chain if necessary

Page 6: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

6

EXAMPLE 2 records/bucket

INSERT:h(a) = 1h(b) = 2h(c) = 1h(d) = 0

0

1

2

3

d

ac

b

h(e) = 1

e

Page 7: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

7

Deletion from a Static Hash Table

To delete records with key K: Go to the bucket numbered h(K) Search for records with key K,

deleting any that are found Possibly condense the chain of

overflow blocks for that bucket

Page 8: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

8

0

1

2

3

a

bc

e

d

EXAMPLE: deletion

Delete:ef

fg

maybe move“g” up

cd

Page 9: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

9

Rule of thumb: Try to keep space utilization

between 50% and 80% Utilization = # record used

total # records that fit

If < 50%, wasting space If > 80%, overflows significant

depends on how good hashfunction is & on # records/bucket

Page 10: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

10

Efficiency of Static Hash Tables

If the hash table size is large enough and the distribution of keys by the hash function is sufficiently "even", then most buckets have no overflow blocks

In this case lookup typically takes one disk I/O and insertion/deletion take two

Significantly better than sequential indexes and B-trees

(But: hash tables do not support efficient range queries as B-trees do)

What if there are long overflow blocks?

Page 11: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

11

How do we cope with growth?

Overflows and reorganizations Dynamic hashing

Extensible Linear

Page 12: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

12

Extensible Hash Tables

Each bucket in the bucket array contains a pointer to a block, instead of a block itself

Bucket array can grow by doubling in size Certain buckets can share a block if small

enough hash function computes a sequence of k

bits, but only first i bits are used at any time to index into the bucket array

Value of i can increase (corresponds to bucket array doubling in size)

Page 13: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

14

(b) Use directory

h(K)[i ] to bucket

.

.

.

.

Page 14: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

15

Inserting into Extensible Hash Table

To insert record with key K: compute h(K) go to bucket indexed by first i bits of h(K) follow the pointer to get to block B if room in B, insert record else let j be number of bits of hash value

used to determine membership in B

Page 15: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

16

Insertion cont'd

Case 1: j < i. split block B in two distribute records in B to the 2 new blocks

based on value of their (j+1)-st bit update header of each new block to j+1 adjust pointers in bucket array so that

entries that used to point to B now point to correct block

if still no room in appropriate block for new record then repeat this process

Page 16: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

17

Insertion cont'd

Case 2: j = i. increment i by 1 double length of bucket array entry for w0 and w1 both point to

same block that old entry w pointed to (block is shared)

apply case 1 to split block B

Page 17: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

18

Example: h(k) is 4 bits; 2 keys/bucket

i = 1

1

1

0001

1001

1100

Insert 1010

11100

1010

New directory

200

01

10

11

i =

2

2

Page 18: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

19

10001

21001

1010

21100

Insert:

0111

0000

00

01

10

11

2i =

Example continued

0111

0000

0111

0001

2

2

Page 19: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

20

00

01

10

11

2i =

21001

1010

21100

20111

20000

0001

Insert:

1001

Example continued

1001

1001

1010

000

001

010

011

100

101

110

111

3i =

3

3

Page 20: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

21

Extensible hashing: deletion

No merging of blocks Merge blocks

and cut directory if possible(Reverse insert procedure)

Page 21: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

22

Extensible hashing

Can handle growing files- with less wasted space- with no full reorganizations

Summary

+

Indirection(Not bad if directory in

memory)

Directory doubles in size(Now it fits, now it does not)

-

-

Page 22: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

23

Linear Hash Tables

Number of buckets increases more slowly than with extensible hashing

Number of buckets is such that on average each block is x% full (say 80%) -- threshold

Overflow blocks can occur but average number per bucket << 1

Use the i low-order bits from the result of the hash function to index into the bucket array

Page 23: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

24

Linear hashing Another dynamic hashing scheme

Two ideas:(a) Use i low order bits of hash

01110101grows

b

i

(b) Bucket array grows linearly

Page 24: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

25

Inserting into Linear Hash Table

To insert record with key K, with last i bits of h(K) being a1a2…ai :

Let m be the integer represented by a1a2…ai in binary

If m < n (number of buckets), then bucket m exists -- put record in that bucket

If m ≥ n, then bucket m does not (yet) exist, so put record in bucket whose index corresponds to 0a2…ai

Page 25: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

26

Inserting cont'd

If no room in indicated bucket, then create an overflow bucket

Compare # records / # buckets to threshold

If exceeds threshold then add a new bucket and rearrange records

If number of buckets exceeds i, then increment i by 1

Page 26: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

27

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

If h(k)[i ] m, then look at bucket h(k)[i ]

else, look at bucket h(k)[i ] - 2i -1

Rule

0101• can have overflow chains!

• insert 0101

Page 27: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

28

Example b=4 bits, i =2, 2 keys/bucket

00 01 10 11

0101

1111

0000

1010

m = 01 (max used block)

Futuregrowthbuckets

10

1010

0101 • insert 0101

11

11110101

Page 28: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

29

Example Continued: How to grow beyond this?

00 01 10 11

111110100101

0101

0000

m = 11 (max used block)

i = 2

0 0 0 0100 101 110 111

3

. . .

100

100

101

101

0101

0101

Page 29: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

30

Linear Hashing

Can handle growing files- with less wasted space- with no full reorganizations

No indirection like extensible hashing

Summary

+

+

Can still have overflow chains-

Page 30: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

31

Hashing good for probes given keye.g., SELECT …

FROM RWHERE R.A = 5

Comparing Index Approaches

Page 31: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

32

Sequential Indexes and B-trees good for

Range Searches:e.g., SELECT

FROM RWHERE R.A > 5

Indexing vs Hashing

Page 32: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

33

Index definition in SQL

Create index name on rel (attr) Create unique index name on rel

(attr)defines candidate key

Drop INDEX name

Page 33: 1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

34

CANNOT SPECIFY TYPE OF INDEX

(e.g. B-tree, Hashing, …)

OR PARAMETERS(e.g. Load Factor, Size of

Hash,...)

... at least in SQL...

Note