towards weakly consistent local storage systemsjyshin/talks/socc16-poster-shin.pdftowards weakly...

1
Towards Weakly Consistent Local Storage Systems Ji-Yong Shin 1,2 , Mahesh Balakrishnan 2 , Tudor Marian 3 , Jakub Szefer 2 and Hakim Weatherspoon 1 1 Cornell University , 2 Yale University, 3 Google StaleStore Primary/Backup seLng Primary performs the worst due to network delays (100ms) Yogurt performs bePer than local latest by using the trade-off Performance: Accessing Blocks and K-V Pairs Modern servers are as powerful as distributed systems in the past ü CPU and storage devices are parallel, similar to distributed nodes Goal is to trade-off consistency and performance in a local store ü Use of stale data in different storage devices for bePer performance Server Trends GetCost Overhead Yogurt: A Block Level StaleStore Summary Modern servers are similar to distributed systems Local storage systems can adopt weak consistency ü We define them as StaleStores Yogurt, a block level StaleStore ü EffecYvely trades-off consistency and performance ü Supports high level mulY-block data constructs Year 2006 2016 Comparisons Model (4U) Dell PowerEdge 6850 Dell PowerEdge R930 CPU [# of cores] 4 × 2 core Xeon [8] 4 × 24 core Xeon [ 96 ] 12X Memory 64GB 6TB 96X Network bandwidth 2 × 1GigE 2 × 1GigE 2 × 10GigE 11X Storage 8 × SCSI/SAS HDD 24 × SAS HDD/SSD 10 x PCIe SSD # of devices: 4.2X Capacity: 175.3X Use of SSDs Distributed vs Modern Server Distributed Systems Modern Servers Different versions of data exist in different servers due to network delays during replicaYon Different versions of data exist in different storage media due to logging, caching, copy-on-write, deduplicaYon, etc. Older versions are faster to access when the network overhead is low Older versions are faster to access when they are on faster storage media Reasons for different access speeds ü RAM, SSD, HDD, hybrid-drives, etc. ü Disk with arm contenYon or SSD under garbage collecYon ü RAID under degraded mode Local storage systems in any form that can trade-off consistency and performance (e.g. KV-store, filesystem, block store, DB, etc.) Requirements: 1. Maintain mulYple versions of data - Should have interface to access older versions 2. Aware of consistency semanYcs - Bounded Staleness, monotonic-reads, read-my-writes, etc. 3. Can give cost esYmates for accessing each version - ConsideraYons for data locaYons and storage condiYons 1. Issue GetCost() for block 1 between versions 3 and 6 (N queries with uniform distance) 2. Read the cheapest: e.g. 1 (5): Read(1, 5) 3. Record the selected version for block 1 3 (3) 1 (4) 2 (4) 1 (5) 3 (5) 1 (6) Cache Log I/O Write(blk, data, ver), Read(blk, ver) Versioned writes to snapshots Versioned reads from snapshots Cost GetCost(blk, ver) cache << disk, # of queued I/O (read << write) MulY-block object access GetVersonRange(blk, ver) Returns a version range which a block is valid Reading block 1 (monotonic-reads) Key-value stores, filesystems can store an object over mulYple blocks Read should be served from a persistent snapshot: GetVersionRange() MulY-Block Object Access in Yogurt Hard Drive Disk Solid State Disk 0 1 2 3 Drive Solid State Solid State Disk 3 1 1 1 3 2 0 0 0 1 2 Block Addr Timestamp (Snapshot #) 0 50000 100000 150000 200000 1 2 3 4 5 6 7 8 Average Read Latency (us) # of Stale Versions @ start Ome Primary Local latest Yogurt MR Yogurt RMW 0 50000 100000 150000 200000 4KB 8KB 12KB 16KB 20KB Key-Value Pair Size 0 1 2 3 4 5 6 7 32B (3) 64B (7) 128B (15) 256B (31) 512B (63) 1024B (127) Average Latency (us) GetCost Query Size (# of queries) Cost querying overhead is negligible compared to disk and SSD access latencies Other Possible StaleStores Single disk log-structured store SSD flash translaYon layers Log-structured arrays Durable write caches that are fast for writes but slow for reads Deduplicated systems with read caches Fine-grained logging over a block-grained cache Systems storing differences from previous versions

Upload: others

Post on 28-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Weakly Consistent Local Storage Systemsjyshin/talks/socc16-poster-shin.pdfTowards Weakly Consistent Local Storage Systems Ji-Yong Shin1,2, Mahesh Balakrishnan2, Tudor Marian3,

TowardsWeaklyConsistentLocalStorageSystemsJi-YongShin1,2,MaheshBalakrishnan2,TudorMarian3,JakubSzefer2andHakimWeatherspoon1

1CornellUniversity,2YaleUniversity,3Google

StaleStore

•  Primary/BackupseLng•  Primaryperformstheworstduetonetworkdelays(100ms)•  YogurtperformsbePerthanlocallatestbyusingthetrade-off

Performance:AccessingBlocksandK-VPairs

• Modernserversareaspowerfulasdistributedsystemsinthepastü  CPUandstoragedevicesareparallel,similartodistributednodes• Goalistotrade-offconsistencyandperformanceinalocalstoreü  UseofstaledataindifferentstoragedevicesforbePerperformance

ServerTrends

GetCostOverhead

Yogurt:ABlockLevelStaleStore

Summary

• Modernserversaresimilartodistributedsystems

•  Localstoragesystemscanadoptweakconsistencyü WedefinethemasStaleStores

•  Yogurt,ablocklevelStaleStoreü  EffecYvelytrades-offconsistencyandperformanceü  SupportshighlevelmulY-blockdataconstructs

Year 2006 2016 ComparisonsModel(4U) DellPowerEdge6850 DellPowerEdgeR930

CPU[#ofcores]

4×2coreXeon[8]

4×24coreXeon[96] 12X

Memory 64GB 6TB 96XNetworkbandwidth 2×1GigE

2×1GigE2×10GigE 11X

Storage 8×SCSI/SASHDD 24×SASHDD/SSD10xPCIeSSD

#ofdevices:4.2XCapacity:175.3X

UseofSSDs

DistributedvsModernServerDistributedSystems ModernServersDifferentversionsofdataexistindifferentserversduetonetworkdelaysduringreplicaYon

Differentversionsofdataexistindifferentstoragemediaduetologging,caching,copy-on-write,deduplicaYon,etc.

Olderversionsarefastertoaccesswhenthenetworkoverheadislow

Olderversionsarefastertoaccesswhentheyareonfasterstoragemedia

Reasonsfordifferentaccessspeedsü RAM,SSD,HDD,hybrid-drives,etc.ü DiskwitharmcontenYonorSSDundergarbagecollecYonü RAIDunderdegradedmode

•  Localstoragesystemsinanyformthatcantrade-offconsistencyandperformance(e.g.KV-store,filesystem,blockstore,DB,etc.)

Requirements:1.  MaintainmulYpleversionsofdata-Shouldhaveinterfacetoaccessolderversions2.  AwareofconsistencysemanYcs-BoundedStaleness,monotonic-reads,read-my-writes,etc.3.  CangivecostesYmatesforaccessingeachversion-ConsideraYonsfordatalocaYonsandstoragecondiYons

1.  IssueGetCost()forblock1betweenversions3and6(Nquerieswithuniformdistance)

2.  Readthecheapest:e.g.1(5):Read(1,5)3.  Recordtheselectedversionforblock1

3(3) 1(4) 2(4) 1(5) 3(5) 1(6)

Cache… … Lo

g

I/O

Write(blk,data,ver),Read(blk,ver)

VersionedwritestosnapshotsVersionedreadsfromsnapshots

Cost GetCost(blk,ver) cache<<disk,#ofqueuedI/O(read<<write)

MulY-blockobjectaccess

GetVersonRange(blk,ver) Returnsaversionrangewhichablockisvalid

Readingblock1(monotonic-reads)

•  Key-valuestores,filesystemscanstoreanobjectovermulYpleblocks•  Readshouldbeservedfromapersistentsnapshot:GetVersionRange()

MulY-BlockObjectAccessinYogurt

Hard DriveDisk

Solid State

Disk

0

1

2

3

Drive

Solid State

Solid State Disk31

11

32

00

0 1 2 BlockAddr

Timestamp

(Snapsho

t#)

0

50000

100000

150000

200000

1 2 3 4 5 6 7 8

AverageRe

adLaten

cy

(us)

#ofStaleVersions@startOme

PrimaryLocallatestYogurtMRYogurtRMW

0

50000

100000

150000

200000

4KB 8KB 12KB 16KB 20KBKey-ValuePairSize

01234567

32B(3)

64B(7)

128B(15)

256B(31)

512B(63)

1024B(127)

AverageLatency(us)

GetCostQuerySize(#ofqueries)

•  CostqueryingoverheadisnegligiblecomparedtodiskandSSDaccesslatencies

OtherPossibleStaleStores•  Singledisklog-structuredstore•  SSDflashtranslaYonlayers•  Log-structuredarrays•  Durablewritecachesthatarefastforwritesbutslowforreads

•  Deduplicatedsystemswithreadcaches•  Fine-grainedloggingoverablock-grainedcache

•  Systemsstoringdifferencesfrompreviousversions