bdcc: exploiting fine-grained persistent memories for olapboncz/bdcc-hardbd-20180416.pdf · bdcc:...

50
BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz

Upload: others

Post on 03-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistent

MemoriesforOLAPPeterBoncz

Page 2: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

NVRAM

• Systemintegration:– NVMe:blockdevicesonthePCIe bus– NVDIMM:persistentRAM,byte-levelaccess

• Lowlatency– LowerthanFlash,– closetoDRAM– Asymmetric(r<w)

• Fine-grainedaccess– 512byteforNMVe– NVDIMM:cache-line

Page 3: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

NVRAM:DBimpact

• Backtothe5-minuterule:– Restoringoldbalanceoflatencyandbandwidth?

• ManychallengesinOLTP– indexstructures,(in-page)logging– ensureconsistency,preventleakage,controlwear

èwhataboutOLAP?Shouldwere-thinkwarehousestorageforlow-latencypersistentmemories?

Page 4: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

VLDBJournal 2016Volume25,Issue3p.291- 316

BDCC: Bitwise Dimensional Co-Clustering

Page 5: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC:howtablesarestored

_bdcc_columnorderingè worksincolumnstores

partition1

partition2

partition3

partition10partition11

partition12

partition100partition101

_bdcc_

Page 6: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BitwiseInterleaving=Z-Ordering

spacefillingcurve

Computationallycheaperthaneg HilbertCurve

Almostasniceproperties

Page 7: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC- DataOrder

• anybitinterleavingofdimensionspossible• round-robin=Z-order• major-minor =classicalMDindex(eg DB2)• anybitmix inbetween

• our automatic algorithmsuse• roundrobinbitinterleaving• clusteringdepthbasedoncolumndensities,typically32KB(SSD)and512KB(HDD)blocks

7

Page 8: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC- Whatisit?

• Multi-dimensional indexing– table indexing:notmulti-media(audio,image)indexinghereJ– limitedamountofdimensions(upto5..7)

• Multi-table clustering(co-clustering)– indexingondimensionsfromother tables..– ..reachableoverforeign-keyrelationships– andexploitingcommonindexingdimensionsamongtablesinoperators

• GroupingintoMILLIONS ofverysmallgroups– scattered accesspatternsè Flash IOfriendly!– clustering:becausemillionsnotpossiblewithpartitioning

• Column-store optimized

Page 9: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC- TheIdea

Q1

Q2

Q3

Q4

Howdoesthishelp:• selection?• orderby?• Aggregation?•FKjoin?

Page 10: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

WhatBDCCgivesyouAccelerates

• MostSelections ->selectionpush-down,correlations• MostGroupings• AllForeignKeyJoins (nomatterifdimensionsareinvolved)

• evenremovesjoins,turningthemintoselections• CertainOrder-by

Mostlythroughstrongreductionofmemoryusagewhile

• Nostorageoverhead:everytuple storedonce• Bulkupdate-friendly• Easytoparallelize queryprocessing

Page 11: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

TwoStagesoftheProject

• BitwiseDimensionalCo-Clustering(BDCC)– I/Olevelclusteringandindexing–QueryprocessingviaPartitionSplit,PartitionRestartpublishedinVLDBJ2016

• DeepDimensionalCo-Clustering(DDC)– additionalI/Oblockclustering–QueryprocessingviaDDC-Recluster()unpublishedyet..WIP

11

Page 12: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCStructures

• BDCCdimension– mappingtoconsecutiveintegers– balancingthroughhistogramsandHu-Tucker

• BDCCtable– re-orderedprimarycopy– additional_bdcc_orderattribute

• BDCCcounttable– summarytable(_bdcc_,_count_)– clusteraccessindex

12

Page 13: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCStructures

13

Page 14: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCStructures

14

Page 15: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCStructures

15

“DimensionUse”è“DimensionUse”è“DimensionUse”è

Page 16: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCStructures

16

Page 17: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCStructures

17

Page 18: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Example

“counttotalordereditemsfromGermanyperdayandsupplier”

SELECT o_orderdate, s_name, count(*)

FROM NATION, SUPPLIER, ORDERS, LINEITEM

WHERE n_nationkey=s_nationkey

AND s_suppkey=l_suppkey

AND l_orderkey=o_orderkey

AND n_name='Germany'

GROUP BY o_orderdate, s_name

Page 19: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

RelationalAlgebraPlan

Page 20: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

RelationalAlgebraPlan

Page 21: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC-scanScansaBDCCtable

InanydesireddimensionorderHere:1:orderdate2:customernation3:suppliernation

Atadesiredgranularityusingbitmasks

3+2+3bitssetè use8bits(256groups)

Pushesdownselections:[0,7]=all[0,3]=all[5,5]=germany

Page 22: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC-scan

• extracts _bdcc_ bits è _gid_ columnd3s3c3d2s2c2d1s1c1è d3d2d1c3c2s3s2s1

• delivers tuples ordered on _gid_• performs selection pushdown ([lo-hi])

Basic Idea:• BDCC-scan delivers sorted stream

but sorting is free! As fast as a normal scan• carefully controlled scatter access pattern

we clustered on |_bdcc_| bits, but can BDCC-scan with less

Page 23: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz

BDCCFetchScan

23

• usescount-table tofindtheneeded_bdcc_ranges• fetchestuple rangesinaparticularorder• returnsanascending_gid_columninthetuples

Page 24: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz

BDCC- QueryProcessing

• Partition-wise operatorexecution– hashbasedjoin,grouping/aggregation– bettercacheutilization

• SandwichOperatorsè PartitionSplit & PartitionRestart– sidewaysinformationpassing:PartitionRestart.cross-partition?(_gid_change)

è HashAggr/Join.flush()&PartitionSplit.next-partition()

24

Page 25: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz

BDCC- PerformanceSandwichOperators

• Micro-Benchmarks(TPC-HSF10,LINEITEM-ORDERS)

25

Page 26: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

RelationalAlgebraPlan

SelectionPushdown+DimensionJoinElimination

Page 27: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Page 28: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

RelationalAlgebraPlan

SelectionPushdown+DimensionJoinElimination

Page 29: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Co-ClusteringClose-upPartdimension

Page 30: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Co-ClusteringClose-upPartdimension

Page 31: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

CommonPath =Co-ClusteringPartdimension

Page 32: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

CommonDimension =AcceleratedJoin

Partdimension

Page 33: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC:AllFKJoinsAccelerated!Partdimension

Datedimension

Nationdimension

Page 34: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCC- SchemaDesign

• Semi-automatic• Input:CREATEINDEX()andFOREIGNKEY()

• Schematraversalalongforeignkeypaths• propagationof„Index“dimensions• weightedaccordingtoFKpaths

• automatic creationofdimensionsandtables• roundrobinbitinterleaving• clusteringdepthbasedoncolumndensities,typically32KB(SSD)and512KB(HDD)blocks

34

Page 35: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz

BDCC- Optimizer

• IDU:InterestingDimensionUses• alldimensionsdeterminedbyjoin,sortoraggregationattribute

• IDO:InterestingDimensionOrders• alldimensionorderpermutationsofeachIDU

• MDO:MaximalDimensionOrders• PruningofdominatedsortordersofIDOs

• MDOsrepresent„interestingorders“forenumeration

35

Page 36: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCPerformance• TPC-HSF100executiontimeforBDCC,coldbufferpool

36

muchbetterpowerscoreswithmuchlessmemory

Page 37: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCPerformance• TPC-HSF1000executiontimeforBDCC,coldbufferpool

37

muchbetterpowerscoreswithmuchlessmemory

Page 38: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:BitwiseDimensionalCo-Clustering– GoogleTalk-- 23May2017-- PeterBoncz

BDCC- Updates

• BatchUpdateSupport• in-memorybuffer• „log-structuredmerge“

38

Page 39: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

BDCCUpdates• TPC-HSF100updateset

39

•60%bulkappendspeedupcomp.toclustertrees(orderedprojections,usingPDTs)

• formanyupdatesets,BDCConlymergeswithpreviousupdatesinsteadofPDTmergewithfulltable

Page 40: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

DeepDimensionalClustering(DDC)

• Idea:– Make_bdcc_ haveasmanybitsaspossible– ForI/O(BDCC-scan)onlyusethemajorbits(groupsof~32KB)– Note,insidethe32KBtuple block,thereismoreclustering

• Insideacachelinetuples tendtobelongtothesamegroup

– Idea:exploitthislocality(thesedeepbits)inoperators• Forreallycheapcachepartitioning• makejoinscache-conscious again

Page 41: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

DDCExtensions

41

Page 42: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

DDCPerformance

42

BDCC

DDC

Page 43: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Conclusion

• BDCC&DDC– cleverorderingoftables,andco-ordering oftables– millionsoftinygroups(NVRAMfriendly!)– Allthegoodiesinonego:

• fastselections(evencross-tablepropagation)• fastjoins,fastgroupbys,fastsorts(littleRAMneeded)

– Sidewaysinfopassingsandwichoperators• Noneedfornewjoin/aggr operators

– QOPTframeworkthatextendsinterestingorders– UpdatableusingLSMideas– dataisstoredonlyonce

Page 44: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

DimensionConstruction

Dimension=setofbins

• Range-Binningofadomain

• Histogram-basedapproach– Needsfrequencyinformation

Page 45: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

AssigningBinNumbers:NaïveWay

• Skew/frequentvalues(èsingle-valuebins)value frequency code c2 c1(null) .70 000 00 0HBO .15 001 00 0

Bachelor .08 010 01 0Master .06 011 01 0

PhD .01 100 10 1

value frequency code(null) .70 000

Polytech .15 001Bachelor .08 010Master .06 011

PhD .01 100

value frequency code c2(null) .70 000 00

Polytech .15 001 00Bachelor .08 010 01Master .06 011 01

PhD .01 100 10

Page 46: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Hu-TuckerBinning

• Frequency-basedBinNumberAssignment

Hu-Tucker =OrderRespectingHuffman Coding

value frequency code c3 c2 c1(null) .70 0000 000 00 0HBO .15 1000 100 10 1

Bachelor .08 1100 110 11 1Master .06 1110 111 11 1

PhD .01 1111 111 11 1

value frequency code c3 c2(null) .70 0000 000 00HBO .15 1000 100 10

Bachelor .08 1100 110 11Master .06 1110 111 11

PhD .01 1111 111 11

value frequency code c3(null) .70 0000 000HBO .15 1000 100

Bachelor .08 1100 110Master .06 1110 111

PhD .01 1111 111

value frequency code(null) .70 0000

Polytech .15 1000Bachelor .08 1100Master .06 1110

PhD .01 1111

Page 47: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Hu TuckerDimensionBinning

butwhyisthisrelevant?

Page 48: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Variety inDataDensityofColumns• l_linestatus 0.25 b/tuple• l_comment 30 b/tuple

Factor120difference

WhatistheoptimalBDCbinsize?- Dependsondiskblock size- Dependsoncolumndensity

Whattodoifaqueryaccessesmultiplecolumnsofverydifferent densities?

Page 49: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

GranularityTuninginBDCC

1. Isanissueduringtablecreation– Adimensionisusedinmultipletables– eachtableneedsadifferentgranularity

2. Isanissueduringqueryexecution– Tableisclusteredatsomegranularity– Givenasetofcolumns toscan:

atwhatgranularitytoscanthetable?

Page 50: BDCC: Exploiting Fine-Grained Persistent Memories for OLAPboncz/BDCC-HardBD-20180416.pdf · BDCC: Exploiting Fine-Grained Persistent Memories for OLAP –HardBD2018, Paris What BDCC

BDCC:ExploitingFine-GrainedPersistentMemoriesforOLAP– HardBD 2018,Paris

Z-OrderingforColumnStoresthereisacolumn-storespecificargumentforbitinterleaving,also:• supposeBDCC-scan(T,C1) isefficientat8 bits,needingsortedaccesstosupplier(s)• supposeBDCC-scan(T,C2) thatselectsothercolumnsC2 thatareonaveragemuch

smallerthanthoseinC1,isefficientonlyupto5 bitsgranularity

Takeaway:columnstoresneedavariableaccessgranularity• Major-minorclusteringleavestheminordimensionunusableforthincolumns(C2)• Bit-interleaving(Z-ordering)allowsthincolumnscanstoprofitfromalldimensions

: d3d2d1c3c2c1s3s2s1 : d3s3c3d2s2c2d1s1c1

BDCC-scan(T,C1) d3d2d1c3c2c1s3s2s1 d3s3c3d2s2c2d1s1c1

BDCC-scan(T,C2) d3d2d1c3c2c1.L.… d3s3c3d2s2c2d1s1c1

Bit5Bit8 Bit5Bit8FastI/Oaccessuntil..

Major-minorclustering Z-Ordering_key_shape: