exalt nsdi clean - usenix · exalt:’empowering’researchers’to’...

23
Exalt: Empowering Researchers to Evaluate LargeScale Storage Systems Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and Mike Dahlin The University of Texas at Austin

Upload: others

Post on 22-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Exalt:  Empowering  Researchers  to  Evaluate  Large-­‐Scale  Storage  Systems  

Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and Mike Dahlin The University of Texas at Austin

Page 2: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

We need to evaluate our prototypes

Design Implementation

Evaluation

Page 3: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

We need to evaluate our prototypes

Design Implementation

Evaluation

Page 4: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Industrial deployment: tens of PBs thousands of nodes

Researchers: hundreds of TBs hundreds of nodes

• Salus (Wang et al. NSDI 13): 108 servers • Eiger (Lloyd et al. NSDI 13): 256 servers • Spanner (Corbett et al. OSDI 12): Hundreds of servers

How does one validate the scalability of a storage system?

Page 5: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Extrapolation?

• Measure with a small cluster • Predict the behavior at full scale • Assumption: – Resource consumption grows linearly with scale

CPU

Network

100  nodes

10%

5%

Extrapolate:  The  system  can  scale  to  1,000  nodes.

Scale

Resource  uHlizaHon

Page 6: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Extrapolation?

• Measure with a small cluster • Predict the behavior at full scale • Assumption: May not hold – Resource consumption grows linearly with scale

CPU

Network

100  nodes

10%

Scale

Resource  uHlizaHon

Page 7: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Can we run prototypes at full scale?

7

Processes

Machines

Page 8: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Can we run prototypes at full scale?

• Colocate multiple processes on one node

8

Processes

Machines

Page 9: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Can we run prototypes at full scale?

• Colocate multiple processes on one node

9

Processes

Machines

Problem: Limited I/O resource

Page 10: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Data content doesn’t affect system behavior

!

• Clients can write/read synthetic data !

• Abstract away data on I/O devices !

• Reduce resource requirement of each process

Page 11: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

How to abstract away data?

• Discard data? (David, Agrawal et al. FAST 2011) – Doesn’t work with large-scale storage systems

Upper layer (Bigtable, HBase, …)

Lower layer (GFS, HDFS, …)

Data

Metadata

• Our approach: Compress data

Treat all bytes as data

Page 12: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Requirements of compression

• CPU efficient – General-purpose algorithms (e.g. Gzip) are CPU heavy !

• High compression ratio !

• Lossless compression !

• Be able to work with mixed data and metadata

Page 13: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Challenge: Data mixed with metadata

• System may add metadata • System may split data (possibly nondeteministically)

MetadataClient data

Key: Locate metadata inside data

Page 14: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

• Make data distinguishable from metadata –Flag: sequence of bytes that does not appear in metadata

• Efficiently locate metadata: Follow sorted pattern –Marker: number of remaining bytes to the end

Flag Marker

Solution: Tardis data pattern

1016 1008 1000 ... 0

1KB data chunk and 4-byte flags and markers

Tardis

Page 15: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Tardis compression

Search for flag

Retrieve marker Skip 504 bytes

Search for flag again

504 ... 1016 1008

Retrieve markerSkip 1016 bytes: Hit the end of chunk

504 ... 1016 1008

Tardis 1024:16

33,000 times faster than gzip

00

Tardis 512:512

Starting point Length

Original data

Compressed data

Page 16: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

How to find an appropriate flag?

• Scan all metadata: Expensive !

• Observation: Tardis is only used for testing !

• A randomly chosen 8-byte flag works – HDFS – HBase

Page 17: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Testing with Tardis• Run potential bottleneck nodes in real mode. • Run most nodes in emulated mode.

……

Clients

Process

Network

Bottleneck

Page 18: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Implementation

• Emulated devices: disk, network, and memory !

• Disk and network: Transparent emulation – Byte code instrumentation (BCI) – Usage: java -Xbootclasspath exalt.jar <original app> !

• Memory: Require code modification – None for HDFS; 71 LOC for HBase

Page 19: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Case studies

• Apply our emulator to HDFS and HBase – Measure their scalability – When we find a problem, analyze its root cause, and fix it !

• Testbed: – Texas Advanced Computing Center (TACC)

Page 20: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Scalability of HDFS

Increase number of RPC threads

Put debug information in tmpfs

Same as reported by HDFS developers

Disable sync Put metadata in tmpfs

Page 21: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

One problem of HDFS: Big files

HDFS performance degradation as file grows large.

long[] computeContentSummary(long[] summary) { long bytes = 0; for(Block blk : blocks) { bytes += blk.getNumBytes(); } summary[0] += bytes; …… }

Page 22: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Applying Exalt more broadly

• CPU intensive systems? –DieCast (Gupta et al. NSDI 2008) !

• Data sensitive applications/benchmarks? – Record (on a large testbed) and replay (on a small one) !

• The target system modifies data? – Ad-hoc solutions for de-duplication, encryption, etc

Page 23: Exalt NSDI clean - USENIX · Exalt:’Empowering’Researchers’to’ Evaluate’Large8Scale’Storage’Systems’ Yang Wang, Manos Kapritsos, Lara Schmidt, Lorenzo Alvisi, and

Conclusion

Industry

Researchers

https://code.google.com/p/exalt/