cs 4604: introducon to database management...
TRANSCRIPT
CS4604:Introduc0ontoDatabaseManagementSystems
B.AdityaPrakashLecture#8:IndexesandHashing
Announcements
§ ChecktheofficehoursscheduleonPiazza.– Severalextraones– MyWedofficehourthisweekFeb24iscanceled
§ OnWedFeb24:– ShamimulandSorourwillgivethelectureon(maybe)somehashingandSorKng.
Prakash2016 VTCS4604 2
STORINGDATA
Prakash2016 VTCS4604 3
Prakash2016 VTCS4604
DBMSLayers:
Query Optimization and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Queries
TODAYà
4
Prakash2016 VTCS4604
LeverageOSfordisk/filemanagement?
§ LayersofabstracKonaregood…but:
5
Prakash2016 VTCS4604
LeverageOSfordisk/filemanagement?
§ LayersofabstracKonaregood…but:– Unfortunately,OSoXengetsinthewayofDBMS
6
Prakash2016 VTCS4604
LeverageOSfordisk/filemanagement?
§ DBMSwants/needstodothings“itsownway”– Specializedprefetching– Controloverbufferreplacementpolicy
• LRUnotalwaysbest(someKmesworst!!)– Controloverthread/processscheduling
• “Convoyproblem”– AriseswhenOSschedulingconflictswithDBMSlocking
– Controloverflushingdatatodisk• WALprotocolrequiresflushinglogentriestodisk
7
Prakash2016 VTCS4604
DisksandFiles
§ DBMSstoresinformaKonondisks.– but:disksare(relaKvely)VERYslow!
§ MajorimplicaKonsforDBMSdesign!
8
Prakash2016 VTCS4604
DisksandFiles
§ MajorimplicaKonsforDBMSdesign:– READ:disk->mainmemory(RAM).– WRITE:reverse– Botharehigh-costoperaKons,relaKvetoin-memoryoperaKons,somustbeplannedcarefully!
9
Prakash2016 VTCS4604
WhyNotStoreItAllinMainMemory?
10
Prakash2016 VTCS4604
WhyNotStoreItAllinMainMemory?
§ Coststoomuch.– disk:~$1/Gb;memory:~$100/Gb– High-endDatabasestodayinthe10-100TBrange.
– Approx60%ofthecostofaproducKonsystemisinthedisks.
§ Mainmemoryisvola9le.§ Note:somespecializedsystemsdostoreenKredatabaseinmainmemory.
11
Prakash2016 VTCS4604
TheStorageHierarchySmaller, Faster
Bigger, Slower
12
Prakash2016 VTCS4604
TheStorageHierarchy
– Main memory (RAM) for currently used data.
– Disk for the main database (secondary storage).
– Tapes for archiving older versions of the data (tertiary storage).
Smaller, Faster
Bigger, Slower
Registers
L1 Cache
Main Memory
Magnetic Disk
Magnetic Tape
...
13
Prakash2016 VTCS4604
JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?
Registers On Chip Cache On Board Cache
Memory
Disk
1 2
10
100
Tape
10 9
10 6
Boston
This Building
This Room My Head
10 min
1.5 hr
2 Years
1 min
Pluto
2,000 Years
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
The image cannot be displayed. Your computer may not have
The image cannot be displayed. Your computer may not have enough
Andromeda
14
Prakash2016 VTCS4604
Disks§ Secondarystoragedeviceofchoice.§ Mainadvantageovertapes:randomaccessvs.sequen9al.
§ Dataisstoredandretrievedinunitscalleddiskblocksorpages.
§ UnlikeRAM,KmetoretrieveadiskpagevariesdependinguponlocaKonondisk.– relaKveplacementofpagesondiskisimportant!
15
Prakash2016 VTCS4604
AnatomyofaDisk
Platters
Spindle
• Sector • Track • Cylinder • Platter • Block size = multiple of sector size (which is fixed)
Disk head
Arm movement
Arm assembly
Tracks
Sector
#16
Prakash2016 VTCS4604
AccessingaDiskPage
§ Timetoaccess(read/write)adiskblock:– .– .– .
17
Prakash2016 VTCS4604
AccessingaDiskPage
§ Timetoaccess(read/write)adiskblock:– seek9me:movingarmstoposiKondiskheadontrack
– rota9onaldelay:waiKngforblocktorotateunderhead
– transfer9me:actuallymovingdatato/fromdisksurface
18
Prakash2016 VTCS4604
AccessingaDiskPage
§ RelaKveKmes?– seek9me:– rota9onaldelay:– transfer9me:
19
Prakash2016 VTCS4604
AccessingaDiskPage
§ RelaKveKmes?– seek9me:about1to20msec– rota9onaldelay:0to10msec– transfer9me:<1msecper4KBpage
Transfer
Seek
Rotate
transfer
20
Prakash2016 VTCS4604
Seek0me&rota0onaldelaydominate
§ KeytolowerI/Ocost:reduceseek/rotaKondelays!
§ Alsonote:Forshareddisks,muchKmespentwaiKnginqueueforaccesstoarm/controller
Seek
Rotate
transfer
21
Prakash2016 VTCS4604
ArrangingPagesonDisk
§ “Next” blockconcept:– blocksonsametrack,followedby– blocksonsamecylinder,followedby– blocksonadjacentcylinder
§ Accesing‘next’blockischeap§ AusefulopKmizaKon:pre-fetching
– Seetextbookpage323
22
Prakash2016 VTCS4604
Rulesofthumb…
1. MemoryaccessmuchfasterthandiskI/O(~1000x)
§ “SequenKal”I/Ofasterthan“random”I/O(~10x)
23
Prakash2016 VTCS4604
Conclusions---Storing
§ Memoryhierarchy§ Disks:(>1000xslower)-thus
– packinfoinblocks– trytofetchnearbyblocks(sequenKally)
24
TREEINDEXES
Prakash2016 VTCS4604 25
DeclaringIndexes
§ Nostandard!§ Typicalsyntax:CREATE INDEX StudentsInd ON Students(ID);
CREATE INDEX CoursesInd ON Courses(Number, DeptName);
Prakash2016 VTCS4604 26
TypesofIndexes
§ Primary:indexonakey– Usedtoenforceconstraints
§ Secondary:indexonnon-keyasribute§ Clustering:orderoftherowsinthedatapagescorrespondtotheorderoftherowsintheindex– Onlyoneclusteredindexcanexistinagiventable– Usefulforrangepredicates
§ Non-clustering:physicalordernotthesameasindexorder
Prakash2016 VTCS4604 27
UsingIndexes(1):EqualitySearches
§ Givenavaluev,theindextakesustoonlythosetuplesthathavevintheasribute(s)oftheindex.
§ E.g.(useCourseIndindex)SELECT Enrollment FROM Courses WHERE Number = “4604” and DeptName = “CS”
Prakash2016 VTCS4604 28
UsingIndexes(1):EqualitySearches
§ Givenavaluev,theindextakesustoonlythosetuplesthathavevintheasribute(s)oftheindex.
§ CanuseHashes,butseenext
Prakash2016 VTCS4604 29
UsingIndexes(2):RangeSearches
§ ``Findallstudentswithgpa>3.0’’§ maybeslow,evenonsortedfile§ Hashesnotagoodidea!§ Whattodo?
Prakash2016 VTCS4604
Page 1 Page 2 Page N Page 3 Data File
30
RangeSearches
§ ``Findallstudentswithgpa>3.0’’§ maybeslow,evenonsortedfile§ SoluKon:Createan`index’file.
Prakash2016 VTCS4604
Page 1 Page 2 Page N Page 3 Data File
k2 kN k1 Index File
31
RangeSearches
§ Moredetails:§ ifindexfileissmall,dobinarysearchthere§ Otherwise??
Prakash2016 VTCS4604
Page 1 Page 2 Page N Page 3 Data File
k2 kN k1 Index File
32
B-trees
§ themostsuccessfulfamilyofindexschemes(B-trees,B+-trees,B*-trees)
§ Canbeusedforprimary/secondary,clustering/non-clusteringindex.
§ balanced“n-way”searchtrees§ OriginalPaper:RudolfBayerandMcCreight,E.M.OrganizaKonandMaintenanceofLargeOrderedIndexes.ActaInformaKca1,173-189,1972.
Prakash2016 VTCS4604 33
B-trees
§ Eg.,B-treeoforderd=1:
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
34
B-treeproper0es:
§ eachnode,inaB-treeoforderd:– Keyorder– atmostn=2dkeys– atleastdkeys(exceptroot,whichmayhavejust1key)
– allleavesatthesamelevel– ifnumberofpointersisk,thennodehasexactlyk-1keys
– (leavesareempty)
Prakash2016 VTCS4604
v1 v2 … vn-1
p1 pn
35
Proper0es
§ “blockaware”nodes:eachnodeisadiskpage§ O(log(N))foreverything!(ins/del/search)§ typically,ifd=50-100,then2-3levels§ uKlizaKon>=50%,guaranteed;onaverage69%
Prakash2016 VTCS4604 36
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
37
JAVAanima0on
§ hsp://slady.net/java/bt/
Prakash2016 VTCS4604 38
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
39
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
40
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
41
Queries
§ Algoforexactmatchquery?(eg.,ssn=8?)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
Hsteps(=diskaccesses)
42
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2016 VTCS4604 43
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
44
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
45
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
46
Queries
§ whataboutrangequeries?(eg.,5<salary<8)§ Proximity/nearestneighborsearches?(eg.,salary~8)
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
47
Varia0ons
§ HowcouldwedoevenbeserthantheB-treesabove?
Prakash2016 VTCS4604 48
B+trees-Mo0va0on
§ B-tree–printkeysinsortedorder:
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
49
B+trees-Mo0va0on
§ B-treeneedsback-tracking–howtoavoidit?
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
50
B+trees-Mo0va0on
§ Strongerreason:forclusteringindex,datarecordsarescasered:
Prakash2016 VTCS4604
1 3
6
7
9
13
<6
>6 <9>9
51
Solu0on:B+-trees
§ facilitatesequenKalops§ Theystringallleafnodestogether§ AND§ replicatekeysfromnon-leafnodes,tomakesureeverykeyappearsattheleaflevel
§ (vital,forclusteringindex!)
Prakash2016 VTCS4604 52
B+trees
Prakash2016 VTCS4604
1 3
6
6
9
9
<6
>=6 <9>=9
7 13
53
B+trees
Prakash2016 VTCS4604
1 3
6
6
9
9
<6
>=6 <9>=9
7 13
IndexPages
DataPages
54
B+trees
§ Moredetails:next(andtextbook)§ Inshort:onsplit
– atleaflevel:COPYmiddlekeyupstairs– atnon-leaflevel:pushmiddlekeyupstairs(asinplainB-tree)
Prakash2016 VTCS4604 55
ExampleB+Tree
§ Searchbeginsatroot,andkeycomparisonsdirectittoaleaf
§ Searchfor5*,15*,alldataentries>=24*...
Prakash2016 VTCS4604
Based on the search for 15*, we know it is not in the tree!
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
56
Inser0ngaDataEntryintoaB+Tree
§ FindcorrectleafL.§ PutdataentryontoL.
– IfLhasenoughspace,done!– Else,mustsplitL(intoLandanewnodeL2)
• Redistributeentriesevenly,copyupmiddlekey.
§ parentnodemayoverflow– butthen:pushupmiddlekey.Splits“grow”tree;rootsplitincreasesheight.
Prakash2016 VTCS4604 57
ExampleB+Tree–Inser0ng30*
Prakash2016 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
58
ExampleB+Tree–Inser0ng30*
Prakash2016 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23* 30*
59
ExampleB+Tree-Inser0ng8*
Prakash2016 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
60
ExampleB+Tree-Inser0ng8*
Prakash2016 VTCS4604
Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
NoSpace
61
Prakash2016 VTCS4604
ExampleB+Tree-Inser0ng8*Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*
13 17 24
5*
SoSplit!
62
Prakash2016 VTCS4604
ExampleB+Tree-Inser0ng8*Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
2* 3* 5* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*
13 17 24
5*
SoSplit!
AndthenpushmiddleUP
63
Prakash2016 VTCS4604
ExampleB+Tree-Inser0ng8*Root
17 24
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29*
13
23*
2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 23* 7* 8*
5 13 17 24
5*
<5 >=5
FinalState
64
ExampleB+Tree-Inser0ng21*
Prakash2016 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8* 23*
65
ExampleB+Tree-Inser0ng21*
Prakash2016 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
2* 3* 14* 16* 19* 20* 24* 27* 29* 7* 5* 8* 21* 22* 23*
17 21 24 13 5 RootisFull,sosplitrecursively
66
ExampleB+Tree:Recursivesplit
Prakash2016 VTCS4604
• Notice that root was also split, increasing height.
2* 3*
Root
17
21 24
14* 16* 19* 20* 21* 22* 23* 24* 27* 29*
13 5
7* 5* 8*
67
Prakash2016 VTCS4604
Example:Datavs.IndexPageSplit
§ leaf:‘copy’§ non-leaf:‘push’
§ whynot‘copy’@non-leaves?
2* 3* 5* 7* 8*
5
5 21 24
17
13
… 2* 3* 5* 7*
17 21 24 13
Data Page Split
Index Page Split
8*
5
#68
SameInser0ng21*:TheDeferredSplit
Prakash2016 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
Notethishasfreespace.So…
69
Inser0ng21*:TheDeferredSplit
Prakash2016 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
LENDkeystosibling,throughPARENT!
2* 3*
Root
5
14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*
13 17 23
22* 29*
70
Inser0ng21*:TheDeferredSplit
Prakash2016 VTCS4604
2* 3*
Root
5
14* 16* 19* 20* 22* 24* 27* 29* 7* 5* 8*
13 17 24
23*
Shorter,morepacked,fastertree
2* 3*
Root
5
14* 16* 19* 20* 21* 23* 24* 27* 7* 5* 8*
13 17 23
22* 29*
71
Inser0onexamplesforyoutotry
Prakash2016 VTCS4604
2* 3*
Root
30
14* 16* 21* 22* 23*
13 5
7* 5* 8*
20 … (not shown)
11*
Insert the following data entries (in order): 28*, 6*, 25*
72
Answer…
Prakash2016 VTCS4604
2* 3*
30
7* 8* 14* 16*
7 5
6* 5*
13 …
After inserting 28*, 6*
After inserting 25*
21* 22* 23* 28*
20
11*
73
Answer…
Prakash2016 VTCS4604
2* 3*
13
20 23
7* 8* 14* 16* 21* 22* 23* 25* 28*
7 5
6* 5*
30
…
11*
After inserting 25*
74
Dele0ngaDataEntryfromaB+Tree
§ Startatroot,findleafLwhereentrybelongs.§ Removetheentry.
– IfLisatleasthalf-full,done!– IfLunderflows
• Trytore-distribute,borrowingfromsibling(adjacentnodewithsameparentasL).
• Ifre-distribuKonfails,mergeLandsibling.– updateparent– andpossiblymerge,recursively
Prakash2016 VTCS4604 75
Dele0onfromB+Tree
Prakash2016 VTCS4604
2* 3*
Root
17
21 24
14* 16* 19* 20* 21* 22* 23* 24* 27* 29*
13 5
7* 5* 8*
76
Prakash2016 VTCS4604
Example:Delete19*&20*
DeleKng19*iseasy:
2* 3*
Root 17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13 5
7* 5* 8*
2* 3*
Root 17
30
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 24*
27
27* 29*
20* 22*
• DeleKng20*->re-distribuKon(noKce:27copiedup)
1 2
3
77
Prakash2016 VTCS4604
2* 3*
Root 17
30
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 24*
27
27* 29*
...AndThenDele0ng24*
2* 3*
Root 17
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 27*
30
29*
• Mustmergeleaves:OPPOSITEofinsert
3
4
78
Prakash2016 VTCS4604
2* 3*
Root 17
30
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 24*
27
27* 29*
...AndThenDele0ng24*
2* 3*
Root 17
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 27*
30
29*
• Mustmergeleaves:OPPOSITEofinsert
…butarewedone??
3
4
79
...MergeNon-LeafNodes,ShrinkTree
Prakash2016 VTCS4604
2* 3*
Root 17
14* 16* 33* 34* 38* 39*
13 5
7* 5* 8* 22* 27*
30
29*
4
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39* 5* 8*
Root 30 13 5 17
5
80
ExampleofNon-leafRe-distribu0on
§ TreeisshownbelowduringdeleKonof24*.§ Now,wecanre-distributekeys
Prakash2016 VTCS4604
Root
13 5 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39* 22* 27* 29* 21* 7* 5* 8* 3* 2*
81
AmerRe-distribu0on
§ needonlyre-distribute‘20’;did‘17’,too§ whywouldwewanttore-distributemorekeys?
Prakash2016 VTCS4604
14* 16* 33* 34* 38* 39* 22* 27* 29* 17* 18* 20* 21* 7* 5* 8* 2* 3*
Root
13 5
17
30 20 22
82
Mainobserva0onsfordele0on
§ Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only
§ whynotnon-leaf,too?
Prakash2016 VTCS4604 83
Mainobserva0onsfordele0on
§ Ifakeyvalueappearstwice(leaf+nonleaf),theabovealgorithmsdeleteitfromtheleaf,only
§ whynotnon-leaf,too?§ ‘lazydeleKons’-infact,somevendorsjustmarkentriesasdeleted(~underflow),– andreorganize/compactlater
Prakash2016 VTCS4604 84
Recap:mainideas
§ onoverflow,split(and‘push’,or‘copy’)– orconsiderdeferredsplit
§ onunderflow,borrowkeys;ormerge– orletitunderflow...
Prakash2016 VTCS4604 85
B+TreesinPrac0ce
§ Typicalorder:100.Typicalfill-factor:67%.– averagefanout=2*100*0.67=134
§ TypicalcapaciKes:– Height4:1334=312,900,721entries– Height3:1333=2,406,104entries
Prakash2016 VTCS4604 86
B+TreesinPrac0ce
§ CanoXenkeeptoplevelsinbufferpool:– Level1=1page=8KB– Level2=134pages=1MB– Level3=17,956pages=140MB
Prakash2016 VTCS4604 87
BulkLoadingofaB+Tree
§ Inanemptytree,insertmanykeys§ Whynotone-at-a-Kme?
– Tooslow!
Prakash2016 VTCS4604 88
BulkLoadingofaB+Tree
§ IniKalizaKon:Sortalldataentries§ scanlist;wheneverenoughforapage,pack§ <repeatforupperlevel>
Prakash2016 VTCS4604
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Sorted pages of data entries; not yet in B+ tree Root
89
Prakash2016 VTCS4604
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
Root
Data entry pages not yet in B+ tree 35 23 12 6
10 20
3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*
6
Root
10
12 23
20
35
38
not yet in B+ tree Data entry pages
BulkLoadingofaB+Tree
#90
ANoteon`Order’
§ Order(d)conceptreplacedbyphysicalspacecriterioninpracKce(`atleasthalf-full’).
§ Manyrealsystemsareevensloppierthanthis:theyallowunderflow,andonlyreclaimspacewhenapageiscompletelyempty.
§ (whatarethebenefitsofsuch‘slopiness’?)
Prakash2016 VTCS4604 91
Conclusions
§ B+treeistheprevailingindexingmethod§ Excellent,O(logN)worst-caseperformanceforins/del/search;(~3-4diskaccessesinpracKce)
§ guaranteed50%spaceuKlizaKon;avg69%
Prakash2016 VTCS4604 92
Conclusions
§ Canbeusedforanytypeofindex:primary/secondary,sparse(clustering),ordense(non-clustering)
§ Severalfine-extensionsonthebasicalgorithm– deferredsplit;– bulk-loading
Prakash2016 VTCS4604 93
HASHING
Prakash2016 VTCS4604 94
(Sta0c)Hashing
§ Problem:“findEMPrecordwithssn=123”§ Whatifdiskspacewasfree,andKmewasatpremium?
Prakash2016 VTCS4604 95
Hashing
§ A:Brilliantidea:key-to-addresstransformaKon:
Prakash2016 VTCS4604 96
#0page
#123page
#999,999,999
123;Smith;Mainstr
Hashing
§ SincespaceisNOTfree:§ useM,insteadof999,999,999slots§ hashfuncKon:h(key)=slot-id
Prakash2016 VTCS4604 97
#0page
#123page
#999,999,999
123;Smith;Mainstr
Hashing
§ Typically:eachhashbucketisapage,holdingmanyrecords:
Prakash2016 VTCS4604 98
#0page
#h(123)
M
123;Smith;Mainstr
Hashing
§ NoKce:couldhaveclustering,ornon-clusteringversions:
Prakash2016 VTCS4604 99
#0page
#h(123)
M
123;Smith;Mainstr.
Hashing
§ NoKce:couldhaveclustering,ornon-clusteringversions:
Prakash2016 VTCS4604 100
123...
#0page
#h(123)
M
...EMPfile
123;Smith;Mainstr.
...
234;Johnson;Forbesave
345;Tompson;FiXhave
...
Designdecisions
§ 1)formulah()forhashingfuncKon§ 2)sizeofhashtableM§ 3)collisionresoluKonmethod
Prakash2016 VTCS4604 101
Problemwithsta0chashing
§ problem:overflow?§ problem:underflow?(underuKlizaKon)
Prakash2016 VTCS4604 102
Solu0on:Dynamic/extendiblehashing
§ idea:shrink/expandhashtableondemand..§ ..dynamichashing§ Details:howtogrowgracefully,onoverflow?§ ManysoluKons-Oneofthem:‘extendiblehashing’[Faginetal]
Prakash2016 VTCS4604 103
Extendiblehashing
Prakash2016 VTCS4604 104
#0page
#h(123)
M
123;Smith;Mainstr.
Extendiblehashing
Prakash2016 VTCS4604 105
#0page
#h(123)
M
123;Smith;Mainstr.
solu0on:
splitthebucketintwo
Extendiblehashing
Prakash2016 VTCS4604 106
indetail:§ keepadirectory,withptrstohash-buckets§ Q:howtodividecontentsofbucketintwo?§ A:hasheachkeyintoaverylongbitstring;keeponlyasmanybitsasneeded
Eventually:
Extendiblehashing
Prakash2016 VTCS4604 107
directory
00...01...
10...
11...
10101...
10110...
1101...
10011...
0111...0001...
101001...
Extendiblehashing
Prakash2016 VTCS4604 108
directory
00...01...
10...
11...
10101...
10110...
1101...
10011...
0111...0001...
101001...
Extendiblehashing
Prakash2016 VTCS4604 109
directory
00...01...
10...
11...
10101...
10110...
1101...
10011...
0111...0001...
101001...
spliton3-rdbit
Extendiblehashing
Prakash2016 VTCS4604 110
directory
00...01...
10...
11...
1101...
10011...
0111...0001...
101001...10101...
10110...
newpage/bucket
Extendiblehashing
Prakash2016 VTCS4604 111
directory(doubled)
1101...
10011...
0111...0001...
101001...10101...
10110...
newpage/bucket
000...001...
010...
011...
100...101...
110...
111...
Extendiblehashing
Prakash2016 VTCS4604 112
00...01...
10...
11...
10101...
10110...
1101...
10011...
0111...0001...
101001...
000...001...
010...
011...
100...101...
110...
111...
1101...
10011...
0111...0001...
101001...10101...
10110...
BEFORE AFTER
Extendiblehashing
§ Summary:directorydoublesondemand§ orhalves,onshrinkingfiles§ needs‘local’and‘global’depth
Prakash2016 VTCS4604 113