Transcript
Page 1: CSCE 520  Test 2 Info Indexing

1

CSCE 520 Test 2 InfoIndexing

Modified from slides of Hector Garcia-Molina and Jeff Ullman

Page 2: CSCE 520  Test 2 Info Indexing

2

Physical Storage Media

Speed of data access

Cost per unit of data

Reliability

•Data loss (power failure or system crash)

•Physical failure (storage device)

•Storage types

•Volatile storage

•Non-volatile storage

Page 3: CSCE 520  Test 2 Info Indexing

3

Memory Hierarchy

DBMSPrograms,Main MemoryDBMS

Tertiary Storage

VirtualMemory

Disk FileSystem

Main Memory

Cache

Page 4: CSCE 520  Test 2 Info Indexing

4

Disk Access Characteristics

•Move data to main memory: •Position head on cylinder•Find and access sector

•Steps of reading a block:•Processor and disk controller processes the request •Seek time: position the head•Rotation latency: rotate the sector under the head•Transfer time: sector/block read by the head

Page 5: CSCE 520  Test 2 Info Indexing

5

Disk Access Characteristics

•Steps of writing a block:•Read the block into the main memory•Change main memory copy of block•Write new content back on disk•Verify correctness of write

Page 6: CSCE 520  Test 2 Info Indexing

6

How to find records efficiently?

• Primary key – sequential organization

• Search key?• High I/O cost

INDEXING

Page 7: CSCE 520  Test 2 Info Indexing

Cost of Indexing

• Where the time spent on answering a query

• Fast: processing in memory• Slow: fetching from secondary storage• Cost of indexing:

– Index on several attributes: fast retrieval but slow writes (maintain index structure)

7

Page 8: CSCE 520  Test 2 Info Indexing

8

Topics

• Conventional indexes• B-trees• Hashing schemes (read only)

Page 9: CSCE 520  Test 2 Info Indexing

9

Sequential File

2010

4030

6050

8070

10090

Page 10: CSCE 520  Test 2 Info Indexing

10

Sequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Page 11: CSCE 520  Test 2 Info Indexing

11

Sequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Page 12: CSCE 520  Test 2 Info Indexing

12

Sequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

Page 13: CSCE 520  Test 2 Info Indexing

13

Sparse vs. Dense Tradeoff

• Sparse: Less index space per record can keep more of

index in memory• Dense: Can tell if any record exists

without accessing file

Page 14: CSCE 520  Test 2 Info Indexing

14

Terms

• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level index

Page 15: CSCE 520  Test 2 Info Indexing

15

Next:

• Duplicate keys

• Deletion/Insertion

• Secondary indexes

Page 16: CSCE 520  Test 2 Info Indexing

16

Duplicate keys

1010

2010

3020

3030

4540

Page 17: CSCE 520  Test 2 Info Indexing

17

1010

2010

3020

3030

4540

10101020

20303030

1010

2010

3020

3030

4540

10101020

20303030

Dense index, one way to implement?

Duplicate keys

Page 18: CSCE 520  Test 2 Info Indexing

18

1010

2010

3020

3030

4540

10203040

Dense index, better way?

Duplicate keys

Page 19: CSCE 520  Test 2 Info Indexing

19

1010

2010

3020

3030

4540

10102030

Sparse index, one way?

Duplicate keys

care

ful if lookin

gfo

r 2

0 o

r 3

0!

Page 20: CSCE 520  Test 2 Info Indexing

20

1010

2010

3020

3030

4540

10203030

Sparse index, another way?

Duplicate keys

– place first new key from block

shouldthis be40?

Page 21: CSCE 520  Test 2 Info Indexing

21

Duplicate values, primary index

• Index may point to first instance ofeach value only

File Index

Summary

aaa

b

Page 22: CSCE 520  Test 2 Info Indexing

22

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

Page 23: CSCE 520  Test 2 Info Indexing

23

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

Page 24: CSCE 520  Test 2 Info Indexing

24

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

Page 25: CSCE 520  Test 2 Info Indexing

25

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

Page 26: CSCE 520  Test 2 Info Indexing

26

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

Page 27: CSCE 520  Test 2 Info Indexing

27

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

Page 28: CSCE 520  Test 2 Info Indexing

28

Insertion, sparse index case

2010

30

5040

60

10304060

Page 29: CSCE 520  Test 2 Info Indexing

29

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

Page 30: CSCE 520  Test 2 Info Indexing

30

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 15

15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

Page 31: CSCE 520  Test 2 Info Indexing

31

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 25

25

overflow blocks(reorganize later...)

Page 32: CSCE 520  Test 2 Info Indexing

32

Insertion, dense index case

• Similar

• Often more expensive . . .

Page 33: CSCE 520  Test 2 Info Indexing

33

Summary so far

• Conventional index– Basic Ideas: sparse, dense, multi-

level…– Duplicate Keys– Deletion/Insertion– Secondary indexes

Page 34: CSCE 520  Test 2 Info Indexing

34

Conventional indexes

Advantage:- Simple- Index is sequential file

good for scans

Disadvantage:- Inserts expensive,

and/or- Lose sequentiality &

balance

Page 35: CSCE 520  Test 2 Info Indexing

35

• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”

Page 36: CSCE 520  Test 2 Info Indexing

36

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 37: CSCE 520  Test 2 Info Indexing

37

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

Page 38: CSCE 520  Test 2 Info Indexing

38

Sample leaf node:

From non-leaf node

to next leafin

sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Page 39: CSCE 520  Test 2 Info Indexing

39

Size of nodes: n+1 pointersn keys

(fixed)

Page 40: CSCE 520  Test 2 Info Indexing

40

Don’t want nodes to be too empty

• Use at least

Non-leaf: (n+1)/2pointers

Leaf: (n+1)/2 pointers to data

Page 41: CSCE 520  Test 2 Info Indexing

41

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Page 42: CSCE 520  Test 2 Info Indexing

42

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records except for “sequence pointer”

Page 43: CSCE 520  Test 2 Info Indexing

43

(3) Number of pointers/keys for B+tree

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 1 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

Page 44: CSCE 520  Test 2 Info Indexing

44

Insert into B+tree (read only)

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

Page 45: CSCE 520  Test 2 Info Indexing

45

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

Page 46: CSCE 520  Test 2 Info Indexing

46

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

Page 47: CSCE 520  Test 2 Info Indexing

47

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

Page 48: CSCE 520  Test 2 Info Indexing

48

(b) Coalesce with sibling– Delete 50

10

40

100

10

20

30

40

50

n=4

40

Page 49: CSCE 520  Test 2 Info Indexing

49

(c) Redistribute keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

Page 50: CSCE 520  Test 2 Info Indexing

50

B+tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!


Top Related