b+-trees and hashing. model of computation data stored on disk(s) minimum transfer unit: page = b...
TRANSCRIPT
B+-trees and Hashing
Model of Computation
Data stored on disk(s) Minimum transfer unit: page = b bytes or B records (or
block) If r is the size of a record,
then B=b/r N records -> N/B pages I/O complexity: in number
of pages
CPU Memory
Disk
I/O complexity
An ideal index has space O(N/B), update overhead O(1) or O(logB(N/B)) and search complexity O(a/B) or O(logB(N/B) + a/B)
where a is the number of records in the answer
But, sometimes CPU performance is also important… minimize cache misses -> don’t waste CPU cycles
Buffer manager- Context
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Buffer Management
Keep pages in a part of memory (buffer), read directly from there What happens if you need to bring a new page into buffer and
buffer is full: you have to evict one page Replacement policy:
LRU : Least Recently Used (CLOCK) MRU: Most Recently Used Toss-immediate : remove a page if you know that you will not
need it again Pinning (needed in recovery, index based processing,etc) Other DB specific RPs: DBMIN, LRU-k, 2Q
Buffer Management in a DBMS
Data must be in RAM for DBMS to operate on it! Buffer Mgr hides the fact that not all data is in RAM
DB
MAIN MEMORY
DISK
disk page
free frame
Page Requests from Higher Levels
BUFFER POOL
choice of frame dictatedby replacement policy
When a Page is Requested ...
Buffer pool information table contains: <frame#, pageid, pin_count, dirty>
If requested page is not in pool and no free frame:
Choose a frame for replacement.Only “un-pinned” pages are candidates!
If frame is “dirty”, write it to disk Read requested page into chosen frame
Pin the page and return its address.
If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!
More on Buffer Management Requestor of page must eventually unpin it, and indicate
whether page has been modified: dirty bit is used for this.
Page in pool may be requested many times, a pin count is used. To pin a page, pin_count++ A page is a candidate for replacement iff pin count == 0
(“unpinned”) CC & recovery may entail additional I/O when a frame is
chosen for replacement. Write-Ahead Log protocol; more later!
Buffer Replacement Policy
Frame is chosen for replacement by a replacement policy: Least-recently-used (LRU), MRU,
Clock, etc. Policy can have big impact on # of
I/O’s; depends on the access pattern.
LRU Replacement Policy
Least Recently Used (LRU) for each page in buffer pool, keep track of time
when last unpinned replace the frame which has the oldest (earliest)
time very common policy: intuitive and simple
Works well for repeated accesses to popular pages Problems?
“Clock” Replacement Policy
• An approximation of LRU• Arrange frames into a cycle, store one reference bit per frame
• Can think of this as the 2nd chance bit• When pin count reduces to 0, turn on ref. bit• When replacement necessary
do for each page in cycle {if (pincount == 0 && ref bit is on)
turn off ref bit;else if (pincount == 0 && ref bit is off)
choose this page for replacement;} until a page is chosen;
A(1)
B(p)
C(1)
D(1)
B+-tree
Records must be ordered over an attribute,
SSN, Name, etc. Queries: exact match and range
queries over the indexed attribute: “find the name of the student with ID=087-34-7892” or “find all students with gpa between 3.00 and 3.5”
B+-tree:properties
Insert/delete at log F (N/B) cost; keep tree height-balanced. (F = fanout)
Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries/pointers.
Two types of nodes: index nodes and data nodes; each node is 1 page (disk based method)
Example
Root
100
120
150
180
30
40
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
41
55
57
81
95
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
Index node
Data node5
7
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
From non-leaf node
to next leaf
in sequence
Struct {Key real;Pointr *long;} entry;
Node entry[B];
Insertion Find correct leaf L. Put data entry onto L.
If L has enough space, done! Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L.
This can happen recursively To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height.
Tree growth: gets wider or one level taller at top.
Deletion
Start at root, find leaf L where entry belongs. Remove the entry.
If L is at least half-full, done! If L has only d-1 entries,
Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).
If re-distribution fails, merge L and sibling. If merge occurred, must delete entry
(pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing
height.
Characteristics Optimal method for 1-d range queries:Space: O(N/B), Updates: O(logB(N/B)),
Query:O(logB(N/B) + a/B) Space utilization: 67% for random input B-tree variation: index nodes store also
pointers to records
Example
Root
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
Range[32, 160]
Other issues
Internal node architecture [Lomet01]: Reduce the overhead of tree traversal.
Prefix compression: In index nodes store only the prefix that differentiate consecutive sub-trees. Fanout is increased.
Cache sensitive B+-tree Place keys in a way that reduces the cache
faults during the binary search in each node. Eliminate pointers so a cache line contains
more keys for comparison.
Exact Match QueriesExact Match Queries
Yes! Hashing static hashing dynamic hashing
B+-tree is perfect, but....
to answer a selection query (ssn=10) needs to traverse a full path.
In practice, 3-4 block accesses (depending on the height of the tree, buffering)
Any better approach?
HashingHashing
Hash-based indexes are best for equality selections. Cannot support range searches.
Static and dynamic hashing techniques exist.
Static HashingStatic Hashing
# primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed.
h(k) MOD N= bucket to which data entry with key k belongs. (N = # of buckets)
h(key) mod N
hkey
Primary bucket pages
Overflow pages
10
N-1
Static Hashing (Contd.)Static Hashing (Contd.)
Buckets contain data entries. Hash fn works on search key field of record
r. Use its value MOD N to distribute values over range 0 ... N-1.
h(key) = (a * key + b) usually works well. a and b are constants; lots known about how to tune h.
Long overflow chains can develop and degrade performance.
Extendible and Linear Hashing: Dynamic techniques to fix this problem.
Linear HashingLinear Hashing
A dynamic hashing scheme that handles the problem of long overflow chains without using a directory.
Directory avoided in LH by using temporary overflow pages, and choosing the bucket to split in a round-robin fashion.
When any bucket overflows split the bucket that is currently pointed to by the “Next” pointer and then increment that pointer to the next bucket.
Linear Hashing – The Main IdeaLinear Hashing – The Main Idea
Use a family of hash functions h0, h1, h2, ...
hL(key) = h(key) mod(2LN)
N = initial # buckets
h is some hash function
L level, starts at 0
Each time we use at most two hash functions: hL(key)
and hL+1(key)
hL+1 doubles the range of hL
Linear Hashing (Contd.)Linear Hashing (Contd.)
Algorithm proceeds in `rounds’. Current round number is L.
There are NL (= N * 2L) buckets at the beginning of a round
Buckets 0 to Next-1 have been split; Next to NL have not been split yet this round.
Round ends when all initial buckets have been split (i.e. Next = NL).
To start next round:L++;
Next = 0;
Linear Hashing - InsertLinear Hashing - Insert
Find appropriate bucket If bucket to insert into is full:
Add overflow page and insert data entry. Split Next bucket and increment Next.
Note: This is likely NOT the bucket being inserted to!!! to split a bucket, create a new bucket and use hLevel+1 to re-
distribute entries.
Since buckets are split round-robin, long overflow chains don’t develop!
Example: Insert 43Example: Insert 43
L=0, N=4
Next=0
PRIMARYPAGES
Level=0Next=1
PRIMARYPAGES
OVERFLOWPAGES44 3632
259 5
14 18 10 30
31 35 117
44 36
32
259 5
14 18 10 30
31 35 117 43
h0(x) = x mod 4h1(x) = x mod 8
Example: End of a RoundExample: End of a Round
22*
Next=3
Level=0, Next = 3PRIMARYPAGES
OVERFLOWPAGES
32*
9*
5*
14*
25*
66* 10*18* 34*
35*31* 7* 11* 43*
44*36*
37*29*
30*
37*
Next=0
PRIMARYPAGES
OVERFLOWPAGES
32*
9* 25*
66* 18* 10* 34*
35* 11*
44* 36*
5* 29*
43*
14* 30* 22*
31*7*
50*
Insert 50 Level=1, Next = 0
LH Search AlgorithmLH Search Algorithm
To find bucket for data entry r, find hL(r):
If hL(r) >= Next (i.e., hL(r) is a bucket that hasn’t been involved in a split this round) then r belongs in that bucket for sure.
Else, r could belong to bucket hL(r) or bucket hL(r) + NLevel must apply hL+1(r) to find out.