Download - CSCE 520 Test 2 Info Indexing
![Page 1: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/1.jpg)
1
CSCE 520 Test 2 InfoIndexing
Modified from slides of Hector Garcia-Molina and Jeff Ullman
![Page 2: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/2.jpg)
2
Physical Storage Media
Speed of data access
Cost per unit of data
Reliability
•Data loss (power failure or system crash)
•Physical failure (storage device)
•Storage types
•Volatile storage
•Non-volatile storage
![Page 3: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/3.jpg)
3
Memory Hierarchy
DBMSPrograms,Main MemoryDBMS
Tertiary Storage
VirtualMemory
Disk FileSystem
Main Memory
Cache
![Page 4: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/4.jpg)
4
Disk Access Characteristics
•Move data to main memory: •Position head on cylinder•Find and access sector
•Steps of reading a block:•Processor and disk controller processes the request •Seek time: position the head•Rotation latency: rotate the sector under the head•Transfer time: sector/block read by the head
![Page 5: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/5.jpg)
5
Disk Access Characteristics
•Steps of writing a block:•Read the block into the main memory•Change main memory copy of block•Write new content back on disk•Verify correctness of write
![Page 6: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/6.jpg)
6
How to find records efficiently?
• Primary key – sequential organization
• Search key?• High I/O cost
INDEXING
![Page 7: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/7.jpg)
Cost of Indexing
• Where the time spent on answering a query
• Fast: processing in memory• Slow: fetching from secondary storage• Cost of indexing:
– Index on several attributes: fast retrieval but slow writes (maintain index structure)
7
![Page 8: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/8.jpg)
8
Topics
• Conventional indexes• B-trees• Hashing schemes (read only)
![Page 9: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/9.jpg)
9
Sequential File
2010
4030
6050
8070
10090
![Page 10: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/10.jpg)
10
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
![Page 11: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/11.jpg)
11
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
![Page 12: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/12.jpg)
12
Sequential File
2010
4030
6050
8070
10090
Sparse 2nd level
10305070
90110130150
170190210230
1090
170250
330410490570
![Page 13: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/13.jpg)
13
Sparse vs. Dense Tradeoff
• Sparse: Less index space per record can keep more of
index in memory• Dense: Can tell if any record exists
without accessing file
![Page 14: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/14.jpg)
14
Terms
• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level index
![Page 15: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/15.jpg)
15
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
![Page 16: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/16.jpg)
16
Duplicate keys
1010
2010
3020
3030
4540
![Page 17: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/17.jpg)
17
1010
2010
3020
3030
4540
10101020
20303030
1010
2010
3020
3030
4540
10101020
20303030
Dense index, one way to implement?
Duplicate keys
![Page 18: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/18.jpg)
18
1010
2010
3020
3030
4540
10203040
Dense index, better way?
Duplicate keys
![Page 19: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/19.jpg)
19
1010
2010
3020
3030
4540
10102030
Sparse index, one way?
Duplicate keys
care
ful if lookin
gfo
r 2
0 o
r 3
0!
![Page 20: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/20.jpg)
20
1010
2010
3020
3030
4540
10203030
Sparse index, another way?
Duplicate keys
– place first new key from block
shouldthis be40?
![Page 21: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/21.jpg)
21
Duplicate values, primary index
• Index may point to first instance ofeach value only
File Index
Summary
aaa
b
![Page 22: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/22.jpg)
22
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
![Page 23: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/23.jpg)
23
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
![Page 24: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/24.jpg)
24
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 30
4040
![Page 25: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/25.jpg)
25
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
5070
![Page 26: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/26.jpg)
26
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
![Page 27: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/27.jpg)
27
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
4040
![Page 28: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/28.jpg)
28
Insertion, sparse index case
2010
30
5040
60
10304060
![Page 29: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/29.jpg)
29
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 34
34
• our lucky day! we have free space where we need it!
![Page 30: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/30.jpg)
30
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
15
2030
20
• Illustrated: Immediate reorganization• Variation:
– insert new block (chained file)– update index
![Page 31: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/31.jpg)
31
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 25
25
overflow blocks(reorganize later...)
![Page 32: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/32.jpg)
32
Insertion, dense index case
• Similar
• Often more expensive . . .
![Page 33: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/33.jpg)
33
Summary so far
• Conventional index– Basic Ideas: sparse, dense, multi-
level…– Duplicate Keys– Deletion/Insertion– Secondary indexes
![Page 34: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/34.jpg)
34
Conventional indexes
Advantage:- Simple- Index is sequential file
good for scans
Disadvantage:- Inserts expensive,
and/or- Lose sequentiality &
balance
![Page 35: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/35.jpg)
35
• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”
![Page 36: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/36.jpg)
36
Root
B+Tree Example n=3
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
![Page 37: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/37.jpg)
37
Sample non-leaf
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
![Page 38: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/38.jpg)
38
Sample leaf node:
From non-leaf node
to next leafin
sequence5
7
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
![Page 39: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/39.jpg)
39
Size of nodes: n+1 pointersn keys
(fixed)
![Page 40: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/40.jpg)
40
Don’t want nodes to be too empty
• Use at least
Non-leaf: (n+1)/2pointers
Leaf: (n+1)/2 pointers to data
![Page 41: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/41.jpg)
41
Full nodemin. node
Non-leaf
Leaf
n=3
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
![Page 42: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/42.jpg)
42
B+tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to records except for “sequence pointer”
![Page 43: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/43.jpg)
43
(3) Number of pointers/keys for B+tree
Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1
Leaf(non-root) n+1 n
Root n+1 n 1 1
Max Max Min Min ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
![Page 44: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/44.jpg)
44
Insert into B+tree (read only)
(a) simple case– space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
![Page 45: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/45.jpg)
45
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
![Page 46: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/46.jpg)
46
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
![Page 47: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/47.jpg)
47
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf
Deletion from B+tree
![Page 48: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/48.jpg)
48
(b) Coalesce with sibling– Delete 50
10
40
100
10
20
30
40
50
n=4
40
![Page 49: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/49.jpg)
49
(c) Redistribute keys– Delete 50
10
40
100
10
20
30
35
40
50
n=4
35
35
![Page 50: CSCE 520 Test 2 Info Indexing](https://reader035.vdocuments.site/reader035/viewer/2022062423/56814471550346895db10499/html5/thumbnails/50.jpg)
50
B+tree deletions in practice
– Often, coalescing is not implemented– Too hard and not worth it!