cs 4432lecture #71 cs4432: database systems ii lecture #7 professor elke a. rundensteiner
Post on 21-Dec-2015
216 views
TRANSCRIPT
CS 4432 lecture #7 2
Indexing : helps to retrieve data quicker for certain queries
value= 1,000,000
Select * FROM Emp WHERE salary = 1,000,000;Select * FROM Emp WHERE salary = 1,000,000;
Chapter 4 (chapter 13 in ‘complete book’)
value
record
CS 4432 lecture #7 5
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
Every Record
is in Index.
CS 4432 lecture #7 6
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
Only first Record
per block in Index.
CS 4432 lecture #7 7
Sequential File
2010
4030
6050
8070
10090
Sparse 2nd level
10305070
90110130150
170190210230
1090
170250
330410490570
CS 4432 lecture #7 8
• NOTE:
FILE or INDEX may be layed out on disk as either a contiguous or a block-chained strategy
CS 4432 lecture #7 9
Question:• Can we (do we want to)
build a dense, 2nd level index for a dense index?
Sequential File2010
4030
6050
8070
10090
2nd level?1030507090
110130150170190210230
1090
170250330410490570
1st level?
CS 4432 lecture #7 10
Notes on pointers:
(1)Block pointer (sparse index) can be smaller than record pointer
BP
RP
(2) If file is contiguous, then we can omitpointers (i.e., compute them)
CS 4432 lecture #7 11
K1
K3
K4
K2
R1
R2
R3
R4
say:1024 Bper block
• if we want K3 block: get it at offset (3-1)1024 = 2048 bytes
CS 4432 lecture #7 12
Sparse vs. Dense Tradeoff
• Sparse: Less index space per record can keep more of index in memory (Later: sparse better for insertions)
• Dense: Can tell if any record exists without accessing file
(Later: dense needed for secondary indexes)
CS 4432 lecture #7 13
Terms
• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values
in)• Sparse index• Multi-level index
CS 4432 lecture #7 16
1010
2010
3020
3030
4540
1010
2010
3020
3030
4540
10101020
20303030
10101020
20303030
Dense index, one way to implement?
Duplicate keys
CS 4432 lecture #7 18
1010
2010
3020
3030
4540
10102030
Sparse index, one way?
Duplicate keys
care
ful if lookin
gfo
r 2
0 o
r 3
0!
CS 4432 lecture #7 19
1010
2010
3020
3030
4540
10203030
Sparse index, another way?
Duplicate keys
– place first new key from block
shouldthis be40?
CS 4432 lecture #7 20
Duplicate values, primary index
• Index may point to first instance ofeach value only
File Index
Summary
aaa
b
CS 4432 lecture #7 23
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
CS 4432 lecture #7 24
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 30
4040
CS 4432 lecture #7 25
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
5070
CS 4432 lecture #7 27
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
4040
CS 4432 lecture #7 29
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 34
34
• our lucky day! we have free space where we need it!
CS 4432 lecture #7 30
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
15
2030
20
• Illustrated: Immediate reorganization• Variation:
– insert new block (chained file)– update index
CS 4432 lecture #7 31
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 25
25
overflow blocks(reorganize later...)
CS 4432 lecture #7 34
Secondary indexesSequencefield
5030
7020
4080
10100
6090
Can I make a
Sparse Index?
CS 4432 lecture #7 35
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
does not make sense!
CS 4432 lecture #7 36
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Dense index10203040
506070...
105090...
sparsehighlevel
CS 4432 lecture #7 37
With secondary indexes:
• Lowest level is dense• Other levels are sparse
Also: Pointers are record pointers
(not block pointers; not computed)
CS 4432 lecture #7 39
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10101020
20304040
4040...
one option...
Problem:excess overhead!
• disk space• search time
CS 4432 lecture #7 40
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10
another option...
4030
20Problem:variable sizerecords inindex!
CS 4432 lecture #7 41
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
Another idea :Chain records with same key?
Problems:• Need to add fields to records• Need to follow chain to know records