temple university – cis dept. cis331– principles of database systems
DESCRIPTION
Temple University – CIS Dept. CIS331– Principles of Database Systems. V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and Sudarshan and notes by C. Faloutsos at CMU). General Overview - rel. model. Relational model - SQL Formal & commercial query languages - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/1.jpg)
Temple University – CIS Dept.CIS331– Principles of Database Systems
V. Megalooikonomou
Indexing and Hashing I
(based on notes by Silberchatz, Korth, and Sudarshan and notes by C. Faloutsos at CMU)
![Page 2: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/2.jpg)
General Overview - rel. model Relational model - SQL
Formal & commercial query languages
Functional Dependencies Normalization Physical Design Indexing
![Page 3: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/3.jpg)
Indexing- overview primary / secondary indices index-sequential (ISAM) B - trees, B+ - trees hashing
static hashing dynamic hashing
![Page 4: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/4.jpg)
Basic Concepts
Indexing mechanisms speed up access to desired data
E.g., author catalog in library Search Key - attribute to set of attributes used to
look up records in a file An index file consists of records (called index
entries) of the form
Index files are typically much smaller than the original file
Two basic kinds of indices: Ordered indices: search keys are stored in sorted order Hash indices: search keys are distributed uniformly across
“buckets” using a “hash function”
search-key pointer
![Page 5: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/5.jpg)
Indexing once the records are stored in a
file, how do you search efficiently? (e.g., ssn=123?)
STUDENTSsn Name Address
123 smith main str234 jones forbes ave125 tomson main str
![Page 6: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/6.jpg)
Indexing once the records are stored in a
file, how do you search efficiently?
brute force: retrieve all records, report the qualifying ones
better: use indices (pointers) to locate the records directly
![Page 7: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/7.jpg)
Indexing – main idea:
123125234
STUDENTSsn Name Address
123 smith main str234 jones forbes ave125 tomson main str
![Page 8: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/8.jpg)
Measuring ‘goodness’ retrieval time?
insertion / deletion?
space overhead?
reorganization?
range queries?
![Page 9: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/9.jpg)
Main concepts search keys are sorted in the index
file and point to the actual records
primary vs. secondary indices
Clustering (sparse) vs
non-clustering (dense) indices
![Page 10: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/10.jpg)
Indexing
STUDENTSsn Name Address
123 smith main str234 jones forbes ave678 tomson main str456 stevens forbes ave345 smith forbes ave
123234345456567
Primary key index: on primary key (no duplicates)
![Page 11: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/11.jpg)
Indexing
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
forbes avemain str
secondary key index: duplicates may exist
Address-index
![Page 12: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/12.jpg)
Indexing
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
forbes avemain str
secondary key index: typically, with ‘postings lists’
Postings lists
![Page 13: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/13.jpg)
Main concepts – cont’d Clustering (= sparse) index:
records are physically sorted on that key (and not all key values are needed in the index)
Non-clustering (=dense) index: the opposite
E.g.:
![Page 14: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/14.jpg)
Indexing- Sparse index
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
123456
…
Clustering/sparse index on ssn
>=123
>=456
![Page 15: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/15.jpg)
Sparse Index Files Sparse Index: contains index records for only some
search-key values Applicable when records are sequentially ordered on search-
key To locate a record with search-key value K we:
Find index record with largest search-key value < K Search file sequentially starting at the record to which the
index record points Less space and less maintenance overhead for
insertions and deletions Generally slower than dense index for locating records Good tradeoff: sparse index with an index entry for
every block in file, corresponding to least search-key value in the block
![Page 16: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/16.jpg)
Indexing – Dense Index
Ssn Name Address345 tomson main str234 jones forbes ave567 smith forbes ave456 stevens forbes ave123 smith main str
123234345456567
Non-clustering / dense index
![Page 17: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/17.jpg)
Summary
Dense Sparse
Primary usual
secondary
usual rare
• All combinations are possible…
• at most one sparse/clustering index
• as many as desired dense indices
• usually: one primary-key index (maybe clustering) and a few secondary-key indices (non-clustering)
![Page 18: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/18.jpg)
Indexing- overview primary / secondary indices index-sequential (ISAM) B - trees, B+ - trees hashing
static hashing dynamic hashing
![Page 19: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/19.jpg)
ISAM What if index is too large to search
sequentially?
use a multilevel index…
![Page 20: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/20.jpg)
ISAM
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
123456
…
>=123
>=456
1233,423
…
block
![Page 21: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/21.jpg)
ISAM - observations if index is too large, store it on disk
and keep index-on-the-index usually two levels of indices, one first-level entry per disk block
(why? )
![Page 22: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/22.jpg)
ISAM - Multilevel Index
![Page 23: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/23.jpg)
ISAM - observations What about insertions/deletions?
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
123456
…
>=123
>=456
1233,423
…
124; peterson; fifth ave.
![Page 24: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/24.jpg)
ISAM - observations What about insertions/deletions?
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
123456
…
1233,423
…124; peterson; fifth ave.
overflows
Problems?
![Page 25: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/25.jpg)
ISAM - observations What about insertions/deletions?
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
123456
…
1233,423
…124; peterson; fifth ave.
overflows
• overflow chains may become very long - what to do?
![Page 26: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/26.jpg)
ISAM - observations What about insertions/deletions?
STUDENTSsn Name Address
123 smith main str234 jones forbes ave345 tomson main str456 stevens forbes ave567 smith forbes ave
123456
…
1233,423
…124; peterson; fifth ave.
overflows
• overflow chains may become very long - thus:
• shut-down & reorganize
• start with ~80% utilization
![Page 27: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/27.jpg)
So far … indices (like ISAM) suffer in the
presence of frequent updates sequential scan using primary index is
efficient, but a sequential scan using a secondary index is expensive each record access may fetch a new block
from disk
alternative indexing structure: B - trees
![Page 28: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/28.jpg)
Overview primary / secondary indices multilevel (ISAM) B - trees, B+ - trees hashing
static hashing dynamic hashing
![Page 29: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/29.jpg)
B-trees the most successful family of
index schemes (B-trees, B+-trees, B*-trees)
can be used for primary/secondary, clustering/non-clustering index
they are balanced “n-way” search trees
![Page 30: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/30.jpg)
B-trees Disadvantage of indexed-sequential files:
performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required
Advantage of B+-tree index files: automatic self-reorganization with small, local,
changes, in the face of insertions and deletions. Reorganization of entire file is not required
Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead
Advantages of B+-trees outweigh disadvantages, and they are used extensively
![Page 31: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/31.jpg)
B-treesE.g., B-tree of order 3 (i.e., at most 3 pointers from each
node):
1 3
6
7
9
13
<6
>6 <9 >9
![Page 32: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/32.jpg)
B-tree properties: each node, in a B-tree of order n :
key order at most n pointers at least n/2 pointers (except root) all leaves at the same level if number of pointers is k, then node has
exactly k-1 keys
v1 v2 … vn-1
p1 pn
![Page 33: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/33.jpg)
Properties “block aware” nodes: each node -> disk
page
O(log (N)) for everything! (ins/del/search)
typically, if N = 50 - 100, then 2 - 3 levels
utilization >= 50%, guaranteed; on average 69%
![Page 34: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/34.jpg)
Queries Algorithm for exact match query? (e.g., ssn=8?)
1 3
6
7
9
13
<6
>6 <9 >9
![Page 35: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/35.jpg)
Queries Algorithm for exact match query? (e.g., ssn=8?)
1 3
6
7
9
13
<6
>6 <9 >9
![Page 36: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/36.jpg)
Queries Algorithm for exact match query? (e.g., ssn=8?)
1 3
6
7
9
13
<6
>6 <9 >9
![Page 37: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/37.jpg)
Queries Algorithm for exact match query? (e.g., ssn=8?)
1 3
6
7
9
13
<6
>6 <9 >9
![Page 38: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/38.jpg)
Queries Algorithm for exact match query? (e.g., ssn=8?)
1 3
6
7
9
13
<6
>6 <9 >9H steps (= disk accesses)
![Page 39: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/39.jpg)
Queries what about range queries? (e.g.,
5<salary<8) Proximity/ nearest neighbor
searches? (e.g., salary ~ 8 )
![Page 40: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/40.jpg)
Queries what about range queries? (e.g.,
5<salary<8) Proximity/ nearest neighbor searches?
(e.g., salary ~ 8 )
1 3
6
7
9
13
<6
>6 <9 >9
![Page 41: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/41.jpg)
Queries what about range queries? (eg.,
5<salary<8) Proximity/ nearest neighbor searches?
(eg., salary ~ 8 )
1 3
6
7
9
13
<6
>6 <9 >9
![Page 42: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/42.jpg)
B-trees: Insertion Insert in leaf;
on overflow, push middle up (recursively)
split: preserves B - tree properties
![Page 43: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/43.jpg)
B-trees
Easy case: Tree T0; insert ‘8’
1 3
6
7
9
13
<6
>6 <9 >9
![Page 44: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/44.jpg)
B-trees
Tree T0; insert ‘8’
1 3
6
7
9
13
<6
>6 <9 >9
8
![Page 45: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/45.jpg)
B-trees
Hardest case: Tree T0; insert ‘2’
1 3
6
7
9
13
<6
>6 <9 >9
2
![Page 46: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/46.jpg)
B-trees
Hardest case: Tree T0; insert ‘2’
1 2
6
7
9
133
push middle up
![Page 47: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/47.jpg)
B-trees
Hardest case: Tree T0; insert ‘2’
6
7
9
131 3
22Ovf; push middle
![Page 48: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/48.jpg)
B-trees
Hardest case: Tree T0; insert ‘2’
7
9
131 3
2
6
Final state
![Page 49: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/49.jpg)
B-trees - insertion Q: What if there are two
middles? (e.g., order 4) A: either one is fine
![Page 50: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/50.jpg)
B-trees: Insertion Insert in leaf; on overflow, push
middle up (recursively – ‘propagate split’)
split: preserves all B - tree properties (!!)
notice how it grows: height increases when root overflows & splits
Automatic, incremental re-organization (contrast with ISAM!)
![Page 51: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/51.jpg)
INSERTION OF KEY ’K’
find the correct leaf node ’L’;
if ( ’L’ overflows ){
split ’L’, by pushing the middle key upstairs to parent node ’P’;
if (’P’ overflows){
repeat the split recursively;
}
else{
add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */
}
Pseudo-code
![Page 52: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/52.jpg)
Overview primary / secondary indices multilevel (ISAM) B – trees
Dfn, Search, insertion, deletion
B+ - trees hashing
![Page 53: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/53.jpg)
Deletion
Rough outline of algorithm: Delete key; on underflow, may need to merge
In practice, some implementors just allow underflows to happen…
![Page 54: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/54.jpg)
B-trees – Deletion
Easiest case: Tree T0; delete ‘3’
1 3
6
7
9
13
<6
>6 <9 >9
![Page 55: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/55.jpg)
B-trees – Deletion
Easiest case: Tree T0; delete ‘3’
1
6
7
9
13
<6
>6 <9 >9
![Page 56: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/56.jpg)
B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich
sibling’ Case4: delete leaf-key; underflow, and ‘poor
sibling’
![Page 57: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/57.jpg)
B-trees – Deletion Case1: delete a key at a leaf – no underflow
(delete 3 from T0)
1 3
6
7
9
13
<6
>6 <9 >9
![Page 58: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/58.jpg)
B-trees – Deletion Case2: delete a key at a non-leaf – no
underflow (e.g., delete 6 from T0)
1 3
6
7
9
13
<6
>6 <9 >9
Delete & promote, i.e:
![Page 59: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/59.jpg)
B-trees – Deletion Case2: delete a key at a non-leaf – no
underflow (e.g., delete 6 from T0)
1 3 7
9
13
<6
>6 <9 >9
Delete & promote, i.e.:
![Page 60: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/60.jpg)
B-trees – Deletion Case2: delete a key at a non-leaf – no
underflow (eg., delete 6 from T0)
1 7
9
13
<6
>6 <9 >9
Delete & promote, i.e.:3
![Page 61: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/61.jpg)
B-trees – Deletion Case2: delete a key at a non-leaf – no
underflow (eg., delete 6 from T0)
1 7
9
13
<3
>3 <9 >9
3FINAL TREE
![Page 62: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/62.jpg)
B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg.,
delete 6 from T0) Q: How to promote? A: pick the largest key from the left sub-tree (or the
smallest from the right sub-tree)
Observation:
Every deletion eventually becomes a deletion of a leaf key
![Page 63: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/63.jpg)
B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and
‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor
sibling’
![Page 64: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/64.jpg)
B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,
delete 7 from T0)
1 3
6
7
9
13
<6
>6 <9 >9
Delete & borrow, ie:
![Page 65: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/65.jpg)
B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,
delete 7 from T0)
1 3
6 9
13
<6
>6 <9 >9
Delete & borrow, ie:
Rich sibling
![Page 66: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/66.jpg)
B-trees – Deletion Case3: underflow & ‘rich sibling’
‘rich’ = can give a key, without underflowing
‘borrowing’ a key: always THROUGH the PARENT!
![Page 67: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/67.jpg)
B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,
delete 7 from T0)
1 3
6 9
13
<6
>6 <9 >9
Delete & borrow, ie:
Rich sibling
NO!!
![Page 68: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/68.jpg)
B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,
delete 7 from T0)
1 3
6 9
13
<6
>6 <9 >9
Delete & borrow, ie:
![Page 69: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/69.jpg)
B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,
delete 7 from T0)
1
3 9
13
<6
>6 <9 >9
Delete & borrow, ie:
6
![Page 70: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/70.jpg)
B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg.,
delete 7 from T0)
1
3 9
13
<3
>3 <9 >9
Delete & borrow, through the parent
6
FINAL TREE
![Page 71: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/71.jpg)
B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich
sibling’ Case4: delete leaf-key; underflow, and
‘poor sibling’
![Page 72: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/72.jpg)
B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,
delete 13 from T0)
1 3
6
7
9
13
<6
>6 <9 >9
![Page 73: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/73.jpg)
B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,
delete 13 from T0)
1 3
6
7
9<6
>6 <9 >9
![Page 74: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/74.jpg)
B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,
delete 13 from T0)
1 3
6
7
9<6
>6 <9 >9
A: merge w/ ‘poor’ sibling
![Page 75: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/75.jpg)
B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,
delete 13 from T0)
Merge, by pulling a key from the parent exact reversal from insertion: ‘split and push
up’, vs. ‘merge and pull down’ Ie.:
![Page 76: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/76.jpg)
B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,
delete 13 from T0)
1 3
6
7
<6
>6
A: merge w/ ‘poor’ sibling
9
![Page 77: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/77.jpg)
B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg.,
delete 13 from T0)
1 3
6
7
<6
>69
FINAL TREE
![Page 78: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/78.jpg)
B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows? A: repeat recursively
![Page 79: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/79.jpg)
B-tree deletion - pseudocodeDELETION OF KEY ’K’
locate key ’K’, in node ’N’
if( ’N’ is a non-leaf node) {
delete ’K’ from ’N’;
find the immediately largest key ’K1’;
/* which is guaranteed to be on a leaf node ’L’ */
copy ’K1’ in the old position of ’K’;
invoke this DELETION routine on ’K1’ from the leaf node ’L’;
else {
/* ’N’ is a leaf node */
... (next slide..)
![Page 80: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/80.jpg)
B-tree deletion - pseudocode/* ’N’ is a leaf node */ if( ’N’ underflows ){ let ’N1’ be the sibling of ’N’; if( ’N1’ is "rich"){ /* ie., N1 can lend us a key */ borrow a key from ’N1’ THROUGH the parent node; }else{ /* N1 is 1 key away from underflowing */ MERGE: pull the key from the parent ’P’, and merge it with the keys of ’N’ and ’N1’ into a new
node; if( ’P’ underflows){ repeat recursively } } }
![Page 81: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/81.jpg)
B-trees in practiceIn practice: no empty leaves; pointers to records
1 3
6
7
9
13
<6
>6 <9 >9theory
![Page 82: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/82.jpg)
B-trees in practiceIn practice: no empty leaves; pointers to records
1 3
6
7
9
13
<6
>6 <9 >9
practice
![Page 83: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/83.jpg)
B-trees in practiceIn practice:
1 3
6
7
9
13
<6
>6 <9 >9
Ssn ……
3
7
6
9
1
![Page 84: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/84.jpg)
B-trees in practice
In practice, the formats are:- leaf nodes: (v1, rp1, v2, rp2, … vn, rpn)- Non-leaf nodes: (p1, v1, rp1, p2, v2, rp2, …)
1 3
6
7
9
13
<6
>6 <9 >9
![Page 85: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/85.jpg)
Overview primary / secondary indices multilevel (ISAM)
B – trees
B+ - trees
hashing
![Page 86: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/86.jpg)
B+ trees - Motivation
B-tree – print keys in sorted order:
1 3
6
7
9
13
<6
>6 <9 >9
![Page 87: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/87.jpg)
B+ trees - Motivation
B-tree needs back-tracking – how to avoid it?
1 3
6
7
9
13
<6
>6 <9 >9
![Page 88: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/88.jpg)
Solution: B+ - trees Facilitate sequential ops
They string all leaf nodes together
AND
Replicate keys from non-leaf nodes, to make sure every key appears at the leaf level !!
![Page 89: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/89.jpg)
B+ trees
1 3
6
6
9
9
<6
>=6 <9 >=9
7 13
![Page 90: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/90.jpg)
B+-Trees (Cont.)
All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between [n/2] and n children
A leaf node has between [(n–1)/2] and n–1 values
Special cases: If the root is not a leaf, it has at least 2 children If the root is a leaf (that is, there are no other nodes
in the tree), it can have between 0 and (n–1) values
A B+-tree is a rooted tree satisfying the following properties:
![Page 91: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/91.jpg)
B+-Tree Node Structure Typical node
Ki are the search-key values Pi are pointers to children (for non-leaf
nodes) or pointers to records or buckets of records (for leaf nodes).
The search-keys in a node are ordered K1 < K2 < K3 < . . . < Kn–1
![Page 92: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/92.jpg)
Leaf Nodes in B+-Trees - Properties
For i = 1, 2, . . ., n–1, pointer Pi either points to a file record with search-key value Ki, or to a bucket of pointers to file records, each record having search-key value Ki. Only need bucket structure if search-key does not form a primary key.
If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than Lj’s search-key values
Pn points to next leaf node in search-key order
![Page 93: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/93.jpg)
Non-Leaf Nodes in B+-Trees - Properties
Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers: All the search-keys in the subtree to which P1 points
are less than K1
For 2 i n – 1, all the search-keys in the subtree to which Pi points have values greater than or equal to Ki–
1 and less than Km–1
![Page 94: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/94.jpg)
B-Tree vs B+-Tree
B-tree (above) and B+-tree (below) on same data
![Page 95: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/95.jpg)
B+ tree insertionINSERTION OF KEY ’K’ insert search-key value to ’L’ such that the keys are in order; if ( ’L’ overflows) { split ’L’ ; insert (ie., COPY) smallest search-key value of new node to parent node ’P’; if (’P’ overflows) { repeat the B-tree split procedure recursively; /* Notice: the B-TREE split; NOT the B+ -tree */ } }
![Page 96: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/96.jpg)
B+-tree insertion – cont’d
/* ATTENTION:
a split at the LEAF level is handled by COPYING the middle key upstairs;
A split at a higher level is handled by PUSHING the middle key upstairs
*/
![Page 97: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/97.jpg)
B+ trees - insertion
1 3
6
6
9
9
<6
>=6 <9 >=9
7 13
Eg., insert ‘8’
![Page 98: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/98.jpg)
B+ trees - insertion
1 3
6
6
9
9
<6
>=6 <9 >=9
7 13
Eg., insert ‘8’
8
![Page 99: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/99.jpg)
B+ trees - insertion
1 3
6
6
9
9
<6
>=6 <9 >=9
7 13
Eg., insert ‘8’
8
COPY middle upstairs
![Page 100: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/100.jpg)
B+ trees - insertion
1 3
6
6
9<6
>=6 <9>=9
9 13
Eg., insert ‘8’
COPY middle upstairs
7 8
7
![Page 101: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/101.jpg)
B+ trees - insertion
1 3
6
6
9<6
>=6 <9>=9
9 13
Eg., insert ‘8’
COPY middle upstairs
7 8
7
Non-leaf overflow – just PUSH the middle
![Page 102: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/102.jpg)
B+ trees - insertion
1 3
6
6
<6
>=6>=9
9 13
Eg., insert ‘8’
7 8
7
9
<7 >=7
<9
FINAL TREE
![Page 103: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/103.jpg)
B-Trees vs B+-Trees
Advantages of B-Tree indices: May use less tree nodes than a corresponding B+-Tree. Sometimes possible to find search-key value before reaching
leaf node. Disadvantages of B-Tree indices:
Only small fraction of all search-key values are found early Non-leaf nodes are larger, so fan-out is reduced. Thus B-Trees
typically have greater depth than corresponding B+-Tree Insertion and deletion more complicated than in B+-Trees Implementation is harder than B+-Trees.
Typically, advantages of B-Trees do not out weigh disadvantages
![Page 104: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/104.jpg)
B*-tree In B-trees, worst case util. = 50%,
if we have just split all the pages how to increase the utilization of B
- trees?
… with B* - trees!
![Page 105: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/105.jpg)
B-trees and B*-trees
E.g., Tree T0; insert ‘2’
1 3
6
7
9
13
<6
>6 <9 >9
2
![Page 106: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/106.jpg)
B*-trees: deferred split! Instead of splitting, LEND keys to
sibling!(through PARENT, of course!)
1 3
6
7
9
13
<6
>6 <9 >9
2
![Page 107: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/107.jpg)
B*-trees: deferred split! Instead of splitting, LEND keys to
sibling!(through PARENT, of course!)
1 2
3
6
9
13
<3
>3 <9 >9
2
7
FINAL TREE
![Page 108: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/108.jpg)
B*-trees: deferred split!
Notice: shorter, more packed, faster tree
It’s a rare case, where space utilization and speed improve together
BUT: What if the sibling has no room for our ‘lending’?
![Page 109: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/109.jpg)
B*-trees: deferred split!
BUT: What if the sibling has no room for our ‘lending’?
A: 2-to-3 split: get the keys from the sibling, pool them with ours (and a key from the parent), and split in 3.
Details: too messy (and even worse for deletion)
![Page 110: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/110.jpg)
Conclusions all B – tree variants can be used for
any type of index: primary/secondary, sparse (clustering), or dense (non-clustering)
All have excellent, O(logN) worst-case performance for ins/del/search
It’s the prevailing indexing method
![Page 111: Temple University – CIS Dept. CIS331– Principles of Database Systems](https://reader036.vdocuments.site/reader036/viewer/2022062718/56812c02550346895d9072a6/html5/thumbnails/111.jpg)
Overview ordered indices
primary / secondary indices index-sequential multilevel (ISAM)
B - trees, B+ - trees
hashing static hashing dynamic hashing