ch10 tree index
TRANSCRIPT
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 1/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1
Tree-StructuredIndexes
Chapter9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2
Introduction
Asforanyindex,3alternativesfordataentriesk*:¡
Datarecordwithkeyvalue k¡
<k,ridofdatarecordwithsearchkeyvaluek>¡
<k,listofridsofdatarecordswithsearchkeyk>
Choiceisorthogonaltotheindexingtechniqueusedtolocatedataentries k*.
Tree-structuredindexingtechniquessupport bothrangesearchesandequalitysearches.
ISAM:staticstructure; B+tree:dynamic,
adjustsgracefullyunderinsertsanddeletes.
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 3
RangeSearches
``Findallstudentswith gpa >3.0’’¢ Ifdataisinsortedfile,dobinarysearchtofindfirst
suchstudent,thenscantofindothers.¢ Costofbinarysearchcanbequitehigh.
Simpleidea:Createan`index’file.
* Candobinarysearchon(smaller)indexfile!
Page1 Page2 PageNPage3 DataFile
k2 kNk1 IndexFile
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 2/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 4
ISAM
Indexfilemaystillbequitelarge.Butwecanapplytheidearepeatedly!
* Leafpagescontaindataentries.
P0
K1 P
1K 2 P
2K
mP m
indexentry
Non-leaf
Pages
Pages
Overflowpage
Primarypages
Leaf
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 5
CommentsonISAM
Filecreation:Leaf(data)pagesallocatedsequentially,sortedbysearchkey;thenindexpagesallocated,thenspaceforoverflowpages.
Indexentries:<searchkeyvalue,pageid>;they`direct’searchfordataentries,whichareinleafpages.
Search:Startatroot;usekeycomparisonstogotoleaf.CostlogFN;F=#entries/indexpg,N=#leafpgs
Insert:Findleafdataentrybelongsto,andputitthere.
Delete:Findandremovefromleaf;ifemptyoverflowpage,de-allocate.
* Statictreestructure:inserts/deletesaffectonlyleafpages.
∝
Data
Pages
IndexPages
Overflowpages
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 6
ExampleISAMTree
Eachnodecanhold2entries;noneedfor`next-leaf-page’pointers.(Why?)
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
20 33 51 63
40
Root
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 3/9
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 4/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 0
ExampleB+Tree
Searchbeginsatroot,andkeycomparisonsdirectittoaleaf(asinISAM).
Searchfor5*,15*,alldataentries>=24*...
* Basedonthesearchfor15*,weknow itisnotinthetree!
Root
17 24 30
2* 3 * 5 * 7 * 14 * 1 6* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 1
B+TreesinPractice
Typicalorder:100.Typicalfill-factor:67%. averagefanout=133
Typicalcapacities: Height4:1334 =312,900,700records Height3:1333 =2,352,637records
Canoftenholdtoplevelsinbufferpool: Level1=1page=8Kbytes Level2=133pages=1Mbyte
Level3=17,689pages=133MBytes
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 2
InsertingaDataEntryintoaB+Tree
FindcorrectleafL.
PutdataentryontoL. IfLhasenoughspace,done! Else,mustsplit L(intoLandanewnodeL2)
• Redistributeentriesevenly,copyup middlekey.
• InsertindexentrypointingtoL2intoparentofL.
Thiscanhappenrecursively Tosplitindexnode,redistributeentriesevenly,butpushup middlekey.(Contrastwithleafsplits.)
Splits“grow”tree;rootsplitincreasesheight. Treegrowth:getswider oroneleveltallerattop.
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 5/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 3
Inserting8*intoExampleB+Tree
Observehowminimumoccupancyisguaranteedin
bothleafandindexpgsplits.
Notedifference betweencopy-up and push-up;
besureyouunderstandthereasonsforthis.
2* 3* 5* 7* 8*
5
Entrytobeinsertedinparentnode.(Notethat5iscontinuestoappearintheleaf.)scopiedupand
appearsonceintheindex.Contrast
5 24 30
17
13
Entrytobeinsertedinparentnode.(Notethat17ispushedupandonly
thiswithaleafsplit.)
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 4
ExampleB+TreeAfterInserting8*
v Noticethatrootwassplit,leadingtoincreaseinheight.
v Inthisexample,wecanavoidsplitbyre-distributing
entries;however,thisisusuallynotdoneinpractice.
2 * 3 *
Root
17
24 30
14* 16* 1 9* 2 0* 2 2* 2 4* 27 * 29* 33* 34* 38* 39*
135
7*5* 8*
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 5
DeletingaDataEntryfromaB+Tree
Startatroot,findleafL whereentrybelongs.
Removetheentry. IfLisatleasthalf-full,done!
IfLhasonlyd-1entries,• Trytore-distribute,borrowingfromsibling (adjacent
nodewithsameparentasL).
• Ifre-distributionfails,merge Landsibling.
Ifmergeoccurred,mustdeleteentry(pointingtoLorsibling)fromparentofL.
Mergecouldpropagatetoroot,decreasingheight.
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 6/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 6
ExampleTreeAfter(Inserting8*,Then)Deleting19*and20*...
Deleting19*iseasy.
Deleting20*isdonewithre-distribution.Noticehowmiddlekeyiscopiedup.
2 * 3 *
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 7
...AndThenDeleting24*
Mustmerge.
Observe`toss’ofindexentry(onright),and pulldown’ofindexentry(below).
30
2 2* 2 7* 29* 33* 34* 38* 39*
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
Root
30135 17
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 8
ExampleofNon-leafRe-distribution
Treeisshownbelowduringdeletionof24*.(Whatcouldbeapossibleinitialtree?)
Incontrasttopreviousexample,canre-distribute
entryfromleftchildofroottorightchild.Root
135 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 7/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 9
AfterRe-distribution
Intuitively,entriesarere-distributedby pushingthrough’thesplittingentryintheparentnode.
Itsufficestore-distributeindexentrywithkey20;we’vere-distributed17aswellforillustration.
14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2 * 3 *
Root
135
17
3020 22
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 0
PrefixKeyCompression
Importanttoincreasefan-out.(Why?)
Keyvaluesinindexentriesonly`directtraffic’;canoftencompressthem.
E.g.,IfwehaveadjacentindexentrieswithsearchkeyvaluesDannonYogurt,DavidSmith andDevarakondaMurthy,wecanabbreviateDavid SmithtoDav.(Theotherkeyscanbecompressedtoo...)
• Isthiscorrect?Notquite!WhatifthereisadataentryDaveyJones?(CanonlycompressDavidSmithtoDavi)
• Ingeneral,whilecompressing,mustleaveeachindexentrygreaterthaneverykeyvalue(inanysubtree)toitsleft.
Insert/deletemustbesuitablymodified.
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 1
BulkLoadingofaB+Tree
Ifwehavealargecollectionofrecords,andwewanttocreateaB+treeonsomefield,doingso
byrepeatedlyinsertingrecordsisveryslow.
BulkLoading canbedonemuchmoreefficiently.
Initialization:Sortalldataentries,insertpointertofirst(leaf)pageinanew(root)page.
3* 4* 6 * 9* 1 0* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sortedpagesofdataentries;notyetinB+treeRoot
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 8/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 2
BulkLoading(Contd.)
Indexentriesforleaf
pagesalwaysenteredintoright-mostindexpagejustaboveleaflevel.Whenthisfillsup,itsplits.(Splitmaygoupright-mostpathtotheroot.)
Muchfasterthanrepeatedinserts,especiallywhenoneconsiderslocking!
3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*
Root
DataentrypagesnotyetinB+tree
3523126
1 0 2 0
3* 4* 6 * 9* 1 0* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*
6
Root
10
12 23
20
35
38
notyetinB+tree
Dataentrypages
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 3
SummaryofBulkLoading
Option1:multipleinserts. Slow. Doesnotgivesequentialstorageofleaves.
Option2: BulkLoading Hasadvantagesforconcurrencycontrol. FewerI/Osduringbuild. Leaveswillbestoredsequentially(andlinked,of
course). Cancontrol“fillfactor”onpages.
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 4
ANoteon`Order’
Order (d)conceptreplacedbyphysicalspacecriterioninpractice( atleasthalf-full’).
Indexpagescantypicallyholdmanymoreentriesthanleafpages.
Variablesizedrecordsandsearchkeysmeandifferntnodeswillcontaindifferentnumbersofentries.
Evenwithfixedlengthfields,multiplerecordswiththesamesearchkeyvalue(duplicates)canleadtovariable-sizeddataentries(ifweuseAlternative(3)).
8/3/2019 Ch10 Tree Index
http://slidepdf.com/reader/full/ch10-tree-index 9/9
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 5
Summary
Tree-structuredindexesareidealforrange-searches,alsogoodforequalitysearches.
ISAMisastaticstructure. Onlyleafpagesmodified;overflowpagesneeded. Overflowchainscandegradeperformanceunlesssize
ofdatasetanddatadistributionstayconstant.
B+treeisadynamicstructure. Inserts/deletesleavetreeheight-balanced;logF Ncost. Highfanout(F)meansdepthrarelymorethan3or4. Almostalwaysbetterthanmaintainingasortedfile.
D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 6
Summary(Contd.)
Typically,67% occupancyonaverage. UsuallypreferabletoISAM,modulo locking
considerations;adjuststogrowthgracefully. Ifdataentriesaredatarecords,splitscanchangerids!
Keycompressionincreasesfanout,reducesheight.
BulkloadingcanbemuchfasterthanrepeatedinsertsforcreatingaB+treeonalargedataset.
Mostwidelyusedindexindatabasemanagementsystemsbecauseofitsversatility.Oneofthemost
optimizedcomponentsofaDBMS.