cs 4604: introducon to database management systemscs4604/spring16/lectures/...prakash 2016 vt cs...

89
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing

Upload: others

Post on 31-Jan-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

  • CS4604:Introduc0ontoDatabaseManagementSystems

    B.AdityaPrakashLecture#10:QueryProcessing

  • Prakash2016 VTCS4604 2

    Outline

    §  introduc?on§  selec?on§  projec?on§  join§  set&aggregateopera?ons

  • Prakash2016 VTCS4604 3

    Introduc0on

    §  Today’stopic:QUERYPROCESSING§  Somedatabaseopera?onsareEXPENSIVE§  Cangreatlyimproveperformancebybeing“smart”–  e.g.,canspeedup1,000,000xovernaïveapproach

  • Prakash2016 VTCS4604 4

    Introduc0on(cnt’d)

    § Mainweaponsare:– cleverimplementa?ontechniquesforoperators– exploi?ng“equivalencies”ofrela?onaloperators– usingsta?s?csandcostmodelstochooseamongthese.

  • Prakash2016 VTCS4604 5

    AReallyBadQueryOp0mizer

    §  ForeachSelect-From-Wherequeryblock– docartesianproductsfirst–  thendoselec?ons–  etc,ie.:

    •  GROUPBY;HAVING•  projec?ons•  ORDERBY

    §  Incrediblyinefficient– Hugeintermediateresults!

    × σpredicates

    tables …

  • Prakash2016 VTCS4604 6

    Cost-basedQuerySub-System

    Query Parser

    Query Optimizer

    Plan Generator Plan Cost Estimator

    Query Plan Evaluator

    Catalog Manager

    Usuallythereisaheuris?cs-basedrewri?ngstepbeforethecost-basedsteps.

    Schema Sta?s?cs

    Select * From Blah B Where B.blah = blah

    Queries

  • Prakash2016 VTCS4604 7

    TheQueryOp0miza0onGame§  “Op?mizer”isabitofamisnomer…§  Goalistopicka“good”(i.e.,lowexpectedcost)plan.–  Involveschoosingaccessmethods,physicaloperators,operatororders,…

    – No?onofcostisbasedonanabstract“costmodel”

  • Prakash2016 VTCS4604 8

    Rela0onalOpera0ons§  Wewillconsiderhowtoimplement:–  Selec3on(σ)Selectsasubsetofrowsfromrela?on.–  Projec3on(π)Deletesunwantedcolumnsfromrela?on.–  Join()Allowsustocombinetworela?ons.–  Set-difference(-)Tuplesinreln.1,butnotinreln.2.–  Union(∪)Tuplesinreln.1andinreln.2.–  Aggrega3on(SUM,MIN,etc.)andGROUPBY

    §  Recall:opscanbecomposed!§  Later(akerspringbreak),we’llseehowtoop3mizequerieswithmanyops

    ▹◃

  • Prakash2016 VTCS4604 9

    SchemaforExamples

    §  Similartooldschema;rnameaddedforvaria?ons.§  Sailors:–  Eachtupleis50byteslong,80tuplesperpage,500pages.–  N=500,pS=80.

    §  Reserves:–  Eachtupleis40byteslong,100tuplesperpage,1000pages.– M=1000,pR=100.

    Sailors(sid:integer,sname:string,ra3ng:integer,age:real)Reserves(sid:integer,bid:integer,day:dates,rname:string)

  • Prakash2016 VTCS4604 10

    SimpleSelec0ons

    §  Oftheform§  Ques?on:howbesttoperform?

    SELECT*FROMReservesRWHERER.rname<‘C%’

    σ R attr valueop R. ( )

  • Prakash2016 VTCS4604 11

    SimpleSelec0ons

    §  A:Dependson:– whatindexes/accesspathsareavailable– whatistheexpectedsizeoftheresult(intermsofnumberoftuplesand/ornumberofpages)

  • Prakash2016 VTCS4604 12

    SimpleSelec0ons

    §  Sizeofresultapproximatedas sizeofR*reduc3onfactor– “reduc?onfactor”isalsocalledselec3vity.– es?mateofreduc?onfactorsisbasedonsta?s?cs–wewilldiscussshortly.

  • Prakash2016 VTCS4604 13

    Alterna0vesforSimpleSelec0ons

    § Withnoindex,unsorted:– Mustessen?allyscanthewholerela?on– costisM(#pagesinR).For“reserves”=1000I/Os.

  • Prakash2016 VTCS4604 14

    SimpleSelec0ons(cnt’d)

    § Withnoindex,sorted:– costofbinarysearch+numberofpagescontainingresults.

    – Forreserves=10I/Os+⎡selec?vity*#pages⎤

  • Prakash2016 VTCS4604 15

    SimpleSelec0ons(cnt’d)

    § Withanindexonselec?onauribute:– Useindextofindqualifyingdataentries,–  thenretrievecorrespondingdatarecords.–  (Hashindexusefulonlyforequalityselec?ons.)

  • Prakash2016 VTCS4604 16

    UsinganIndexforSelec0ons

    §  Costdependson#qualifyingtuples,andclustering.– Cost:• findingqualifyingdataentries(typicallysmall)• pluscostofretrievingrecords(couldbelargew/oclustering).

  • Prakash2016 VTCS4604 17

    Selec0onsusingIndex(cnt’d)

    Index entries

    Data entries

    direct search for

    (Index File) (Data file)

    Data Records

    data entries

    Data entries

    Data Records

    CLUSTEREDUNCLUSTERED

  • Prakash2016 VTCS4604 18

    Selec0onsusingIndex(cnt’d)–  Inexample“reserves”rela?on,if10%oftuplesqualify(100pages,10,000tuples).• Withaclusteredindex,costisliulemorethan100I/Os;•  ifunclustered,couldbeupto10,000I/Os!unless…

  • Prakash2016 VTCS4604 19

    Selec0onsusingIndex(cnt’d)§  Importantrefinementforunclusteredindexes:

    1.Findqualifyingdataentries.2.Sorttherid’softhedatarecordstoberetrieved.3.Fetchridsinorder.Thisensuresthateachdatapageislookedatjustonce(though#ofsuchpageslikelytobehigherthanwithclustering).

  • Prakash2016 VTCS4604 20

    GeneralSelec0onCondi0ons

    §  Q:Whatwouldyoudo?(day

  • Prakash2016 VTCS4604 21

    GeneralSelec0onCondi0ons

    §  Q:Whatwouldyoudo?§  A:trytofindaselec?ve(clustering)index.Specifically:

    (day

  • Prakash2016 VTCS4604 22

    GeneralSelec0onCondi0ons

    §  Converttoconjunc3venormalform(CNF):–  (day

  • Prakash2016 VTCS4604 23

    GeneralSelec0onCondi0ons

    §  AB-treeindexmatches(aconjunc?onof)termsthatinvolveonlyauributesinaprefixofthesearchkey.–  Indexonmatchesa=5ANDb=3,butnotb=3.

    §  ForHashindex,musthaveallauributesinsearchkey

    (day

  • Prakash2016 VTCS4604 24

    TwoApproachestoGeneralSelec0ons

    §  Firstapproach:Findthecheapestaccesspath,retrievetuplesusingit,andapplyanyremainingtermsthatdon’tmatchtheindex

    §  Secondapproach:getridsfromfirstindex;ridsfromsecondindex;intersectandfetch.

    SKIP

  • Prakash2016 VTCS4604 25

    TwoApproachestoGeneralSelec0ons

    §  Firstapproach:Findthecheapestaccesspath,retrievetuplesusingit,andapplyanyremainingtermsthatdon’tmatchtheindex:– Cheapestaccesspath:AnindexorfilescanwithfewestI/Os.

    – Termsthatmatchthisindexreducethenumberoftuplesretrieved;othertermshelpdiscardsomeretrievedtuples,butdonotaffectnumberoftuples/pagesfetched.

    SKIP

  • Prakash2016 VTCS4604 26

    CheapestAccessPath-Example§  Considerday<8/9/94ANDbid=5ANDsid=3.

    §  AB+treeindexondaycanbeused;– then,bid=5andsid=3mustbecheckedforeachretrievedtuple.

    §  Similarly,ahashindexoncouldbeused;– Then,day

  • Prakash2016 VTCS4604 27

    CheapestAccessPath-cnt’d

    §  Considerday<8/9/94ANDbid=5ANDsid=3.

    §  HowaboutaB+treeon?§  HowaboutaB+treeon?§  HowaboutaHashindexon?

    SKIP

  • Prakash2016 VTCS4604 28

    Intersec0onofRIDs

    §  Secondapproach:ifwehave2ormorematchingindexes(w/Alterna?ves(2)or(3)fordataentries):– Getsetsofridsofdatarecordsusingeachmatchingindex.

    – Thenintersectthesesetsofrids.– Retrievetherecordsandapplyanyremainingterms.

    SKIP

  • Prakash2016 VTCS4604 29

    Intersec0onofRIDs(cnt’d)

    §  EXAMPLE:Considerday

  • Prakash2016 VTCS4604 30

    TheProjec0onOpera0on

    §  Issueisremovingduplicates.§  Basicapproach:sor?ng– 1.ScanR,extractonlytheneededaurs(why?)– 2.Sorttheresul?ngset– 3.RemoveadjacentduplicatesCost:Reserveswithsizera?o0.25=250pages.With20bufferpagescansortin2passes,so1000+250+2*2*250+250=2500I/Os

    SELECTDISTINCTR.sid,R.bidFROMReservesR

  • Prakash2016 VTCS4604 31

    Projec0on

    §  Canimprovebymodifyingexternalsortalgorithm(seechapter13):– ModifyPass0ofexternalsorttoeliminateunwantedfields.

    – Modifymergingpassestoeliminateduplicates.Cost:forabovecase:read1000pages,writeout250inrunsof40pages,mergeruns=1000+250+250=1500.

    SKIP

  • Prakash2016 VTCS4604 32

    DiscussionofProjec0on

    §  Ifanindexontherela?oncontainsallwantedauributesinitssearchkey,candoindex-onlyscan.– Applyprojec?ontechniquestodataentries(muchsmaller!)

  • Prakash2016 VTCS4604 33

    DiscussionofProjec0on

    §  Ifanordered(i.e.,tree)indexcontainsallwantedauributesasprefixofsearchkey,candoevenbeuer:– Retrievedataentriesinorder(index-onlyscan),discardunwantedfields,compareadjacenttuplestocheckforduplicates.

    AB-treeindexmatches(aconjunc?onof)termsthatinvolveonlyauributesinaprefixofthesearchkey.–  Indexonmatchesa=5ANDb=3,butnotb=3.

    ForHashindex,musthaveallauributesinsearchkey

  • Prakash2016 VTCS4604 34

    Joins

    §  Joinsareverycommon.§  Joinscanbeveryexpensive(crossproductin

    worstcase).§  Manyapproachestoreducejoincost.

  • Prakash2016 VTCS4604 35

    Joins

    §  Jointechniqueswewillcover:– Nested-loopsjoin–  Index-nestedloopsjoin– Sort-mergejoin– Hashjoin

  • Prakash2016 VTCS4604 36

    EqualityJoinsWithOneJoinColumn

    §  Inalgebra:RS.Common!Mustbecarefullyop?mized.R×Sislarge;so,R×Sfollowedbyaselec?onisinefficient.

    §  Remember,joinisassocia?veandcommuta?ve.

    SELECT*FROMReservesR1,SailorsS1WHERER1.sid=S1.sid

    ▹◃

  • Prakash2016 VTCS4604 37

    EqualityJoins

    §  Assume:– MpagesinR,pRtuplesperpage,mtuplestotal– NpagesinS,pStuplesperpage,ntuplestotal–  Inourexamples,RisReservesandSisSailors.

    § Wewillconsidermorecomplexjoincondi?onslater.

    §  Costmetric:#ofI/Os.Wewillignoreoutputcosts.

  • Prakash2016 VTCS4604 38

    Nestedloops

    §  Algorithm#0:(naive)nestedloop(SLOW!)

    R(A,..)

    S(A, ......) m

    n

  • Prakash2016 VTCS4604 39

    Nestedloops

    §  Algorithm#0:(naive)nestedloop(SLOW!)foreachtuplerofR

    foreachtuplesofSprint,iftheymatch

    R(A,..)

    S(A, ......) m

    n

  • Prakash2016 VTCS4604 40

    Nestedloops

    §  Algorithm#0:(naive)nestedloop(SLOW!)foreachtuplerofR

    foreachtuplesofSprint,iftheymatch

    R(A,..)

    S(A, ......) m

    n

    outer relationinner relation

  • Prakash2016 VTCS4604 41

    Nestedloops

    §  Algorithm#0:whyisitbad?§  howmanydiskaccesses(‘M’and‘N’arethenumberofblocksfor‘R’and‘S’)?

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

  • Prakash2016 VTCS4604 42

    Nestedloops

    §  Algorithm#0:whyisitbad?§  howmanydiskaccesses(‘M’and‘N’arethenumberofblocksfor‘R’and‘S’)?M+m*N

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

  • Prakash2016 VTCS4604 43

    SimpleNestedLoopsJoin

    §  Actualnumber(pR*M)*N+M=100*1000*500+1000I/Os.– At10ms/IO,Total:???

    § Whatifsmallerrela?on(S)wasouter?

    § Whatassump?onsarebeingmadehere?

  • Prakash2016 VTCS4604 44

    SimpleNestedLoopsJoin

    §  Actualnumber§  (pR*M)*N+M=100*1000*500+1000I/Os.– At10ms/IO,Total:~6days(!)

    § Whatifsmallerrela?on(S)wasouter?– slightlybeuer

    § Whatassump?onsarebeingmadehere?– 1bufferforeachtable(and1foroutput)

  • Prakash2016 VTCS4604 45

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  Algorithm #1: Blocked nested-loop join –  read in a block of R

    •  read in a block of S –  print matching tuples COST?

  • Prakash2016 VTCS4604 46

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  Algorithm #1: Blocked nested-loop join –  read in a block of R

    •  read in a block of S –  print matching tuples COST= M+M*N

  • Prakash2016 VTCS4604 47

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  Which one should be the outer relation?

    COST= M+M*N

  • Prakash2016 VTCS4604 48

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  Which one should be the outer relation? •  A: the smallest (page-wise)

    COST= M+M*N

  • Prakash2016 VTCS4604 49

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  M=1000, N=500 •  Cost = 1000 + 1000*500 = 501,000 •  = 5010 sec ~ 1.4h COST= M+M*N

  • Prakash2016 VTCS4604 50

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  M=1000, N=500 - if smaller is outer: •  Cost = 500 + 1000*500 = 500,500 •  = 5005 sec ~ 1.4h COST= N+M*N

  • Prakash2016 VTCS4604 51

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  What if we have B buffers available?

  • Prakash2016 VTCS4604 52

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  What if we have B buffers available? •  A: give B-2 buffers to outer, 1 to inner, 1 for

    output

  • Prakash2016 VTCS4604 53

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  Algorithm #1: Blocked nested-loop join –  read in B-2 blocks of R

    •  read in a block of S –  print matching tuples COST= ?

  • Prakash2016 VTCS4604 54

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  Algorithm #1: Blocked nested-loop join –  read in B-2 blocks of R

    •  read in a block of S –  print matching tuples COST= M+M/(B-2)*N

  • Prakash2016 VTCS4604 55

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  and, actually: •  Cost = M + ceiling(M/(B-2)) * N

    COST= M+M/(B-2)*N

  • Prakash2016 VTCS4604 56

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  If smallest (outer) fits in memory •  (ie., B= N+2), •  Cost =? COST= N+N/(B-2)*M

  • Prakash2016 VTCS4604 57

    Nestedloops

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  If smallest (outer) fits in memory •  (ie., B= N+2), •  Cost =N+M (minimum!) COST= N+N/(B-2)*M

  • Prakash2016 VTCS4604 58

    Nestedloops-guidelines

    §  pickasouterthesmallesttable(=fewestpages)

    §  fitasmuchofitinmemoryaspossible§  loopovertheinner

  • Prakash2016 VTCS4604 59

    §  useanexis?ngindex,orevenbuildoneonthefly

    §  cost:M+m*c(c:look-upcost)

    IndexNLjoin

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

  • Prakash2016 VTCS4604 60

    §  cost:M+m*c(c:look-upcost)§  ‘c’dependswhethertheindexisclusteredornot.

    IndexNLjoin

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

  • Prakash2016 VTCS4604 61

    Joins

    §  Jointechniqueswewillcover:– Nested-loopsjoin–  Index-nestedloopsjoin– Sort-mergejoin– Hashjoin

  • Prakash2016 VTCS4604 62

    Sort-mergejoin

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  sort both on joining attributed •  scan each and merge •  Cost, given B buffers?

  • Prakash2016 VTCS4604 63

    Sort-mergejoin

    R(A,..)

    S(A, ......) M pages,

    m tuples N pages,

    n tuples

    •  Cost, given B buffers? •  ~ 2*M*logM/logB + 2*N* logN/logB + M + N

  • Prakash2016 VTCS4604 64

    Sort-MergeJoin§  Usefulif

  • Prakash2016 VTCS4604 65

    Sort-MergeJoin§  Usefulif– oneorbothinputsarealreadysortedonjoinauribute(s)

    – outputisrequiredtobesortedonjoinauributes(s)

    §  “Merge”phasecanrequiresomebacktrackingifduplicatevaluesappearinjoincolumn

  • Prakash2016 VTCS4604 66

    ExampleofSort-MergeJoin

    sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

    sid bid day rname28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber31 101 10/11/96 lubber58 103 11/12/96 dustin

  • Prakash2016 VTCS4604 67

    ExampleofSort-MergeJoin

    §  With35,100or300bufferpages,bothReservesandSailorscanbesortedin2passes;totaljoincost:7500.

    §  (while Block Nested Loop (BNL) cost: 2,500 to 15,000 I/Os)

  • Prakash2016 VTCS4604 68

    Sort-mergejoin

    § Worstcaseformergingphase?

    §  Cost?

  • Prakash2016 VTCS4604 69

    Refinements

    §  Alltherefinementsofexternalsor?ng§  plusoverlappingofthemergingofsor?ngwiththemergingofjoining.

  • Prakash2016 VTCS4604 70

    Joins

    §  Jointechniqueswewillcover:– Nested-loopsjoin–  Index-nestedloopsjoin– Sort-mergejoin– Hashjoin

  • Prakash2016 VTCS4604 71

    §  hashjoin:usehashingfunc?onh()– hash‘R’into(0,1,...,‘max’)buckets– hash‘S’intobuckets(samehashfunc?on)–  joineachpairofmatchingbuckets

    Hashjoins

    R(A, ...) S(A, ......) 0

    1

    max

  • Prakash2016 VTCS4604 72

    – howtojoineachpairofpar??onsHr-i,Hs-i?– A:buildanotherhashtableforHs-i,andprobeitwitheachtupleofHr-i

    Hashjoin-details

    R(A, ...) S(A, ......)

    Hr-0

    0

    1

    max

    Hs-0

  • Prakash2016 VTCS4604 73

    Hashjoin-details

    §  Inmoredetail:§  Choosethe(page-wise)smallest-ifitfitsinmemory,do~NL– and,actually,buildahashtable(withh2()!=h())– andprobeit,witheachtupleoftheother

  • Prakash2016 VTCS4604 74

    §  whatifHs-iistoolargetofitinmain-memory?

    §  A:recursivepar??oning§  moredetails(overflows,hybridhashjoins):inbook

    §  costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)(why?Seenextslide)

    Hashjoindetails

  • Prakash2016 VTCS4604 75

    CostofHash-Join

    §  Inpar??oningphase,read+writebothrelns;2(M+N).Inmatchingphase,readbothrelns;M+NI/Os.

    §  Inourrunningexample,thisisatotalof4500I/Os.

  • Prakash2016 VTCS4604 76

    §  [costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)]

    § Whatis‘enough’?sqrt(N),orsqrt(M)?

    Hashjoindetails

  • Prakash2016 VTCS4604 77

    §  [costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)]

    §  Whatis‘enough’?sqrt(N),orsqrt(M)?§  A:sqrt(smallest)(why?)–  Becauseyouonlyneedenoughmemorytoholdthehashtablepar??onsofthesmallertableinmemorysoB>sizeofsmaller/B-1èB~sqrt(size-of-smaller)

    Hashjoindetails Details

  • Prakash2016 VTCS4604 78

    Sort-MergeJoinvs.HashJoin

    §  Givenaminimumamountofmemorybothhaveacostof3(M+N)I/Os.

    (min.memoryforsort-merge=sqrt(largertable)usingaggressiverefinements---intextbook)(min.memoryforhash=sqrt(smallertable)---seepreviousslides)

  • Prakash2016 VTCS4604 79

    Sort-MergevsHashjoin

    §  HashJoinPros:– ??– ??– ??

    §  Sort-MergeJoinPros:– ??

  • Prakash2016 VTCS4604 80

    Sort-MergevsHashjoin

    §  HashJoinPros:– Superiorifrela?onsizesdiffergreatly– Showntobehighlyparallelizable(beyondscopeofclass)

    §  Sort-MergeJoinPros:– ??

  • Prakash2016 VTCS4604 81

    Sort-MergevsHashjoin

    §  HashJoinPros:– Superiorifrela?onsizesdiffergreatly– Showntobehighlyparallelizable(beyondscopeofclass)

    §  Sort-MergeJoinPros:– Lesssensi?vetodataskew– Resultissorted(mayhelp“upstream”operators)– goesfasterifoneorbothinputsalreadysorted

  • Prakash2016 VTCS4604 82

    GeneralJoinCondi0ons

    §  Equali?esoverseveralauributes(e.g.,R.sid=S.sidANDR.rname=S.sname):– allpreviousmethodsapply,usingthecompositekey

  • Prakash2016 VTCS4604 83

    GeneralJoinCondi0ons

    §  Inequalitycondi?ons(e.g.,R.rname<S.sname):§  whichmethodss?llapply?– NL–  indexNL– Sortmerge– Hashjoin

  • Prakash2016 VTCS4604 84

    GeneralJoinCondi0ons

    §  Inequalitycondi?ons(e.g.,R.rname<S.sname):§  whichmethodss?llapply?– NL (probably,thebest!)–  indexNL (onlyifclusteredindex)– Sortmerge (doesnotapply!)(why?)– Hashjoin (doesnotapply!)(why?)

  • Prakash2016 VTCS4604 85

    SetOpera0ons

    §  Intersec?onandcross-product:specialcasesofjoin

    §  Union(Dis?nct)andExcept:similar;we’lldounion:

    §  Effec?vely:concatenate;usesor?ngorhashing§  Sor?ngbasedapproachtounion:– Sortbothrela?ons(oncombina?onofallauributes).– Scansortedrela?onsandmergethem.– Alterna3ve:MergerunsfromPass0forbothrela?ons.

    SKIP

  • Prakash2016 VTCS4604 86

    SetOpera0ons,cont’d

    §  Hashbasedapproachtounion:– Par??onRandSusinghashfunc?onh.– ForeachS-par??on,buildin-memoryhashtable(usingh2),scancorrespondingR-par??onandaddtuplestotablewhilediscardingduplicates.

    SKIP

  • Prakash2016 VTCS4604 87

    AggregateOpera0ons(AVG,MIN,etc.)

    § Withoutgrouping:–  Ingeneral,requiresscanningtherela?on.– GivenindexwhosesearchkeyincludesallauributesintheSELECTorWHEREclauses,candoindex-onlyscan.

    SKIP

  • Prakash2016 VTCS4604 88

    Summary§  Avirtueofrela?onalDBMSs:–  queriesarecomposedofafewbasicoperators–  Theimplementa?onoftheseoperatorscanbecarefullytuned

    –  Importanttodothis!§  Manyalterna?veimplementa?ontechniquesforeachoperator–  Nouniversallysuperiortechniqueformostoperators.

    “it depends” [Guy Lohman (IBM)]

  • Prakash2016 VTCS4604 89

    Summarycont’d

    §  Mustconsideravailablealterna?vesforeachopera?oninaqueryandchoosebestonebasedonsystemsta?s?cs,etc.–  Partofthebroadertaskofop?mizingaquerycomposedofseveralops.