: tiering storage for data analytics in the cloudyuecheng/docs/hpdc15_talk.pdf · 2017-08-21 · 10...

Post on 21-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

: Tiering Storage for Data Analytics in the Cloud Yue Cheng★, M. Safdar Iqbal★, Aayush Gupta†, Ali R. Butt★

Virginia Tech★, IBM Research – Almaden†

2

Cloud enables cost-efficient data analytics

Cloud infrastructure

Google Cloud BigTable

Google BigQuery

Amazon EMR

Microsoft Azure HDInsight

Sahara

3

Cloud storage enables data analytics in the cloud

Cloud infrastructure

Google Cloud BigTable

Google BigQuery

Amazon EMR

Microsoft Azure HDInsight

Sahara

4

Vast variety of cloud storage services

5

Vast variety of cloud storage services

Object storage

6

Vast variety of cloud storage services

Object storage

Network-attached block storage

7

Vast variety of cloud storage services

Object storage

Network-attached block storage

VM-local ephemeral storage

8

Heterogeneity in cloud storage services

Storage type

Capacity (GB/volume)

Throughput (MB/sec)

IOPS (4KB)

Cost ($/month)

ephSSD 375 733 100000 0.218×375

persSSD 100 48 3000 0.17×100

250 118 7500 0.17×250

500 234 15000 0.17×500

persHDD 100 20 150 0.04×100

250 45 375 0.04×250

500 97 750 0.04×500

objStore N/A 265 550 0.026/GB

ephSSD: VM-local ephemeral SSD, persSSD: Network-attached persistent SSD, persHDD: Network-attached persistent HDD, objStore: Google cloud object storage

9

Heterogeneity in cloud storage services

Storage type

Capacity (GB/volume)

Throughput (MB/sec)

IOPS (4KB)

Cost ($/month)

ephSSD 375 733 100000 0.218×375

persSSD 100 48 3000 0.17×100

250 118 7500 0.17×250

500 234 15000 0.17×500

persHDD 100 20 150 0.04×100

250 45 375 0.04×250

500 97 750 0.04×500

objStore N/A 265 550 0.026/GB

ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage ephSSD offers best performance w/o data persistence.

10

Heterogeneity in cloud storage services

Storage type

Capacity (GB/volume)

Throughput (MB/sec)

IOPS (4KB)

Cost ($/month)

ephSSD 375 733 100000 0.218×375

persSSD 100 48 3000 0.17×100

250 118 7500 0.17×250

500 234 15000 0.17×500

persHDD 100 20 150 0.04×100

250 45 375 0.04×250

500 97 750 0.04×500

objStore N/A 265 550 0.026/GB

ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage Performance of the network-attached block storage depends on

the size of the volume.

11

Heterogeneity in cloud storage services

Storage type

Capacity (GB/volume)

Throughput (MB/sec)

IOPS (4KB)

Cost ($/month)

ephSSD 375 733 100000 0.218×375

persSSD 100 48 3000 0.17×100

250 118 7500 0.17×250

500 234 15000 0.17×500

persHDD 100 20 150 0.04×100

250 45 375 0.04×250

500 97 750 0.04×500

objStore N/A 265 550 0.026/GB

ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage objStore provides the cheapest service and offers comparable

sequential throughput compared to that of a 500GB persSSD.

12

Heterogeneity in data analytics jobs

Application I/O-intensive CPU-intensive

Map Shuffle Reduce

Sort ✗ ✔ ✗ ✗

Join ✗ ✔ ✔ ✗

Grep ✔ ✗ ✗ ✗

KMeans ✗ ✗ ✗ ✔

13

Decision paralysis

Cloud tenant

How do I get the MOST BANG-for-the-

buck?

Highly heterogeneous cloud storage

services

Highly heterogeneous

analytics workloads

14

Motivation

¤ A need for a comprehensive experimental analysis ¤  To study the analytics-job to cloud-storage

relationships

¤ How to exploit heterogeneity in cloud storage and analytics workloads ¤  To reduce $ cost ¤  To improve performance ¤  To meet the deadline

15

Outline

Motivation

Quantitative analysis

CAST design

Evaluation

16

Outline

Motivation

Quantitative analysis

CAST design

Evaluation

17

Experimental study methodology

¤ Experiments on Google Cloud ¤ One n1-standard-16 VM (16 vCPUs, 60GB RAM)

Application I/O-intensive CPU-intensive

Map Shuffle Reduce

Sort ✗ ✔ ✗ ✗

Join ✗ ✔ ✔ ✗

Grep ✔ ✗ ✗ ✗

KMeans ✗ ✗ ✗ ✔

18

Experimental study methodology

¤ Experiments on Google Cloud ¤ One n1-standard-16 VM (16 vCPUs, 60GB RAM)

¤ Application granularity

¤ Workload granularity

Application I/O-intensive CPU-intensive

Map Shuffle Reduce

Sort ✗ ✔ ✗ ✗

Join ✗ ✔ ✔ ✗

Grep ✔ ✗ ✗ ✗

KMeans ✗ ✗ ✗ ✔

19

Application granularity: Performance

0

500

1000

1500

2000

2500

3000

Runt

ime

(se

c)

0

200

400

600

800

1000

1200

1400

1600

0

200

400

600

800

1000

1200

1400

1600

1800

0

200

400

600

800

1000

1200

1400

1600

Sort Join Grep KMeans Input download Output upload Processing

20

Application granularity: Performance

0

500

1000

1500

2000

2500

3000

Runt

ime

(se

c)

0

200

400

600

800

1000

1200

1400

1600

0

200

400

600

800

1000

1200

1400

1600

1800

0

200

400

600

800

1000

1200

1400

1600

Sort Join Grep KMeans Input download Output upload Processing

No storage service provides the best raw performance

21

Application granularity: Tenant utility

0

0.2

0.4

0.6

0.8

1

1.2

0

500

1000

1500

2000

2500

3000

Runt

ime

(se

c)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

200

400

600

800

1000

1200

1400

1600

0

0.5

1

1.5

2

2.5

3

0

200

400

600

800

1000

1200

1400

1600

1800

0

0.5

1

1.5

2

2.5

3

0

200

400

600

800

1000

1200

1400

1600

No

rma

lize

d te

nant

util

ity

Tenant utility = 1T$

Sort Join Grep KMeans Input download Output upload Processing Tenant utility

Performance

$ cost

22

Application granularity: Tenant utility

0

0.2

0.4

0.6

0.8

1

1.2

0

500

1000

1500

2000

2500

3000

Runt

ime

(se

c)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

200

400

600

800

1000

1200

1400

1600

0

0.5

1

1.5

2

2.5

3

0

200

400

600

800

1000

1200

1400

1600

1800

0

0.5

1

1.5

2

2.5

3

0

200

400

600

800

1000

1200

1400

1600

No

rma

lize

d te

nant

util

ity

Sort Join Grep KMeans Input download Output upload Processing Tenant utility

Slower storage, in some case, may provide higher utility & comparable performance

23

Workload granularity: Data reuse

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rma

lize

d te

nant

util

ity

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

0.5

1

1.5

2

2.5

3

0

0.5

1

1.5

2

2.5

3

Sort Join Grep KMeans No reuse Reuse-lifetime (1-hr) Reuse-lifetime (1-week)

Tenant utility = 1T$

Performance

$ cost

24

Workload granularity: Data reuse

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rma

lize

d te

nant

util

ity

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0

0.5

1

1.5

2

2.5

3

0

0.5

1

1.5

2

2.5

3

Sort Join Grep KMeans No reuse Reuse-lifetime (1-hr) Reuse-lifetime (1-week)

Data reuse patterns affect data placement choices

25

Workload granularity: Inter dependency

26

Workload granularity: Inter dependency

2

2.2

2.4

2.6

2.8

3

3.2

7000 8000 9000 10000 11000 12000

Co

st (

$)

Total runtime (sec)

objStore

objStore objStore

objStore

deadline

objStore

27

Workload granularity: Inter dependency

2

2.2

2.4

2.6

2.8

3

3.2

7000 8000 9000 10000 11000 12000

Co

st (

$)

Total runtime (sec)

persSSD

persSSD

persSSD

persSSD

deadline

objStore persSSD

28

Workload granularity: Inter dependency

2

2.2

2.4

2.6

2.8

3

3.2

7000 8000 9000 10000 11000 12000

Co

st (

$)

Total runtime (sec)

objStore

objStore ephSSD

ephSSD

objStore

deadline

objStore persSSD

objStore+ephSSD

29

Workload granularity: Inter dependency

2

2.2

2.4

2.6

2.8

3

3.2

7000 8000 9000 10000 11000 12000

Co

st (

$)

Total runtime (sec)

objStore

objStore ephSSD

persSSD

deadline

objStore persSSD

objStore+ephSSD

objStore+ephSSD+persSSD

30

Workload granularity: Inter dependency

2

2.2

2.4

2.6

2.8

3

3.2

7000 8000 9000 10000 11000 12000

Co

st (

$)

Total runtime (sec)

objStore

objStore ephSSD

persSSD

deadline

objStore persSSD

objStore+ephSSD

objStore+ephSSD+persSSD

Complex inter-job dependencies require rethinking about use of multiple storage services

31

CAST: Cloud Analytics Storage Tiering

Different cloud storage services

Heterogeneity in analytics workloads

...

Different application characteristics

Data reuse across jobs Inter-job dependency

...

CAST exploits

32

CAST: Cloud Analytics Storage Tiering

Different cloud storage services

...

Different application characteristics

Data reuse across jobs Inter-job dependency

...

CAST

Heterogeneity in analytics workloads

exploits

33

Outline

Motivation

Quantitative analysis

CAST design

Evaluation

34

CAST framework

Job performance estimator

Tiering solver

Job profiles

Analytics workload spec’s

Job-storage relationships

Cloud service spec’s

Tenant goals

Frameworks

Dryad ...

...

35

CAST framework

Job performance estimator

Tiering solver

Job profiles

Analytics workload spec’s

Job-storage relationships

Cloud service spec’s

Tenant goals

Frameworks

Dryad ...

...

36

CAST framework

Job performance estimator

Tiering solver

Job profiles

Analytics workload spec’s

Job-storage relationships

Cloud service spec’s

J0: <S0,C0> J1: <S1,C1>

...

objStore persSSD ephSSD persHDD

Generated job assignment tuples

Tier the cloud storage, execute the workload

Frameworks

Dryad ...

Tenant goals

37

Job performance estimator Job performance estimator

Tiering solver

EST R̂, M̂ (si,Li )( ) = mnvm ⋅mc

"

##

$

%%⋅

inputim

bwmapsi

&

'

(((

)

*

++++

rnvm ⋅ rc

"

##

$

%%⋅interi

rbwshuffle

si

&

'

(((

)

*

+++

+r

nvm ⋅ rc

"

##

$

%%⋅

ouputir

bwreducesi

&

'

(((

)

*

+++

Map phase Shuffle phase

Reduce phase

# waves Runtime/wave

* MRCute: Bazaar [SoCC’12]

38

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

max Tenant utility =1T

($vm +$store )

39

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

¤ Constraints

=1T

($vm +$store )max Tenant utility

ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )

T = REG si,C[si ], R̂, L̂i( )i=1

J

∑ , where si ∈ F

$vm = nvm ⋅ (Pvm ⋅T )

$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#

$%)( )

f

F

40

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

¤ Constraints

=1T

($vm +$store )max Tenant utility

ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )

T = REG si,C[si ], R̂, L̂i( )i=1

J

∑ , where si ∈ F

$vm = nvm ⋅ (Pvm ⋅T )

$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#

$%)( )

f

F

Space capacity constraint

41

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

¤ Constraints

=1T

($vm +$store )max Tenant utility

ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )

T = REG si,C[si ], R̂, L̂i( )i=1

J

∑ , where si ∈ F

$vm = nvm ⋅ (Pvm ⋅T )

$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#

$%)( )

f

F

Total workload runtime

42

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

¤ Constraints

=1T

($vm +$store )max Tenant utility

ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )

T = REG si,C[si ], R̂, L̂i( )i=1

J

∑ , where si ∈ F

$vm = nvm ⋅ (Pvm ⋅T )

$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#

$%)( )

f

F

VM $ cost

43

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

¤ Constraints

=1T

($vm +$store )max Tenant utility

ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )

T = REG si,C[si ], R̂, L̂i( )i=1

J

∑ , where si ∈ F

$vm = nvm ⋅ (Pvm ⋅T )

$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#

$%)( )

f

F

∑ Storage $ cost

44

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

¤ Constraints

=1T

($vm +$store )max Tenant utility

ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )

T = REG si,C[si ], R̂, L̂i( )i=1

J

∑ , where si ∈ F

$vm = nvm ⋅ (Pvm ⋅T )

$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#

$%)( )

f

F

Tuning knob: Capacity of Ji Tuning knob:

Storage service of Ji

J0: <s0,c0> J1: <s1,c1> J2: <s2,c2> ...

Assigned job storage, adjusted storage capacity

Simulated annealing

45

Enhancements: CAST++

¤ Enhancement 1: Data reuse awareness ¤ All jobs sharing the same dataset have the same

storage service assigned to them

¤ Enhancement 2: Workflow awareness ¤ Objective

¤ Constraints

Depth-first traversal in workflow DAG for allocating storage capacities

min $total = $vm +$store

T ≤ deadline

46

Outline

Motivation

Quantitative analysis

CAST design

Evaluation

47

Methodology

¤ 400-core Hadoop cluster in Google Cloud ¤  25 n1-standard-16 VM (16 vCPUs, 60GB RAM)

¤ Tenant utility measurement ¤ CAST: Effectiveness for general workloads

¤ CAST++: Effectiveness for data reuse

¤ Meeting workflow deadlines with CAST++

48

Tenant utility improvement

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST

CAST++

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rma

lize

d te

nant

util

ity

Deployment configuration

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

49

Tenant utility improvement

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST

CAST++

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rma

lize

d te

nant

util

ity

Deployment configuration

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

No tiering involved

50

Tenant utility improvement

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST

CAST++

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rma

lize

d te

nant

util

ity

Deployment configuration

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

Greedy algo’s

51

Tenant utility improvement

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST

CAST++

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rma

lize

d te

nant

util

ity

Deployment configuration

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

CAST tiering

52

Tenant utility improvement

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST

CAST++

0

0.2

0.4

0.6

0.8

1

1.2

1.4

No

rma

lize

d te

nant

util

ity

Deployment configuration

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

14%

53

$ cost vs. runtime

0

50

100

150

200

250

300

350

400

450

0

50

100

150

200

250

300

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST CAST++

Runt

ime

(m

in)

Co

st (

$)

Cost Runtime

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

54

$ cost vs. runtime

0

50

100

150

200

250

300

350

400

450

0

50

100

150

200

250

300

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST CAST++

Runt

ime

(m

in)

Co

st (

$)

Cost Runtime

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

55

$ cost vs. runtime

0

50

100

150

200

250

300

350

400

450

0

50

100

150

200

250

300

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST CAST++

Runt

ime

(m

in)

Co

st (

$)

Cost Runtime

100-job Hadoop workload, simulating behaviors of Facebook’s 3000-machine Hadoop cluster

56

Meeting workflow deadlines

0

0.2

0.4

0.6

0.8

1

1.2

0

10

20

30

40

50

60

70

80

90

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

CAST CAST++

De

ad

line

mis

s ra

te

Co

st (

$)

Cost Deadline misses

A workload consisting of 5 workflows, with a total of 31 analytics jobs

57

Meeting workflow deadlines

0

0.2

0.4

0.6

0.8

1

1.2

0

10

20

30

40

50

60

70

80

90

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

CAST CAST++

De

ad

line

mis

s ra

te

Co

st (

$)

Cost Deadline misses

A workload consisting of 5 workflows, with a total of 31 analytics jobs

58

Meeting workflow deadlines

0

0.2

0.4

0.6

0.8

1

1.2

0

10

20

30

40

50

60

70

80

90

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

CAST CAST++

De

ad

line

mis

s ra

te

Co

st (

$)

Cost Deadline misses

A workload consisting of 5 workflows, with a total of 31 analytics jobs

59

Conclusion

¤ CAST performs storage allocation and data placement for cloud analytics workloads ¤  Leverages performance and pricing models of cloud

storage services ¤  Leverages analytics workload heterogeneity

¤ CAST++ enhancements detect data reuse and inter-job dependencies ¤  To further improve tenant utility

¤  To effectively meet deadlines while minimizing $ cost

60

http://research.cs.vt.edu/dssl/

Yue Cheng M. Iqbal Safdar Aayush Gupta Ali R. Butt

61

62

Backup Slides

63

Heterogeneity in cloud storage services

Storage type

Capacity (GB/volume)

Throughput (MB/sec)

IOPS (4KB)

Cost ($/month)

ephSSD 375 733 100000 0.218×375

persSSD 100 48 3000 0.17×100

250 118 7500 0.17×250

500 234 15000 0.17×500

persHDD 100 20 150 0.04×100

250 45 375 0.04×250

500 97 750 0.04×500

objStore N/A 265 550 0.026/GB

ephSSD: VM-local ephemeral SSD, persSSD: network-attached persistent SSD, persHDD: network-attached persistent HDD, objStore: Google cloud object storage A 500GB persSSD provides 1.4X higher throughput & 19X higher

IOPS than a 500GB persHDD.

64

Straggler issue in fine-grained tiering

0%

100%

200%

300%

400%

500%

ephSSD 100% persSSD 100% persHDD 100% ephSSD 50%persSSD 50%

ephSSD 50%persHDD 50%

Nor

mal

ized

runt

ime

0%

100%

200%

300%

400%

500%

0% 30% 70% 90% 100%

Nor

mal

ized

runt

ime

%-age data stored in ephSSD

persHDD 100% ephSSD 30%persHDD 70%

ephSSD 70%persHDD 30%

ephSSD 90%persHDD 10%

ephSSD 100%

65

Tiering solver Job performance estimator

Tiering solver

¤ Optimization ¤ Objective function

¤ Constraints

=1T

($vm +$store )max Tenant utility

ci ≥ (Ii +Mi +Oi ) (∀i ∈ J )

T = REG si,C[si ], R̂, L̂i( )i=1

J

∑ , where si ∈ F

$vm = nvm ⋅ (Pvm ⋅T )

$store = C[ f ]⋅ (Pstore[ f ]⋅ T 60"#

$%)( )

f

F

Tuning knob: capacity of Ji

Input size of Ji

Intermediate data size of Ji

Output size of Ji

Tuning knob: Storage service of Ji

66

Hadoop traces from Facebook

¤ More than 99% of data touched by large jobs that incur most of the storage cost

¤ The aggregated data size for small jobs is only 0.1% of the total dataset size

¤ We focus on large jobs that have enough # mappers & reducers to fully utilize the cluster computing capacity

67

Storage capacity/service breakdown

0

0.2

0.4

0.6

0.8

1

1.2

ephSSD 100%

persSSD 100%

persHDD 100%

objStore 100%

Greedy exact-fit

Greedy over-prov

CAST CAST++

No

rma

lize

d c

ap

ac

ity

ephSSD persSSD persHDD objStore

top related