cardio: cost-aware replication for data-intensive workflows

CARDIO: Cost-Aware Replication for Data-Intensive workflOws

Presented by Chen He

Motivation• Is large scale cluster reliable?

5 average worker deaths per Map-Reduce job At least 1 disk failure in every run of a 6- hour MapReduce job on a 4000-node cluster

Motivation

• How to prevent node failure from affecting performance?– Replication

• Capacity constraint• Replication time, etc

– Regeneration through re-execution• Delay program progress• Cascaded re-execution

Motivation

AVAILABILITY

COST

All pictures adopted from the Internet

Outline

• Problem Exploration• CARDIO Model• Hadoop CARDIO System• Evaluation• Discussion

Problem Exploration

• Performance Costs– Replication cost (R)– Regeneration cost (G)– Reliability cost (Z)– Execution cost (A)– Total cost (T)– Disk cost (Y)

T=A+Z Z=R+G

Problem Exploration

• Experiment Environment– Hadoop 0.20.2– 25 VMs– Workloads: Tagger->Join->Grep->RecordCounter

Problem Exploration Summary

• Replication Factor for MR Stages

Problem Exploration Summary

• Detailed Execution Time of 3 Cases

CARDIO Model

• Block Failure Model– Output of stage i is – Replication factor is– Total block number is – Single block failure probability is– Failure probability in stage i:

( ) 1 (1 )i ix bif x p

ixiD

ib

p

CARDIO Model

• Cost Computation Model– Total time of stage i:– Replication cost of stage i: – Expected regeneration time of stage i:

– Reliability cost for all stages:– Storage Constraint C of all stages:

– Choose to minimize Z

1i i i iT A R G

( 1)i i i iG f x T

1 2{ , ,... }nX x x x

1

1 1

n n

i ii i

Z R G

1

n

i ii

Y xY C

i i iR x Y

CARDIO Model

• Dynamic Replication– Replication number x may vary during the

program approaching• Job is in Step k, the replication factor at this step is:

( ), 1, 2,... 1, 2,...,ix k i k k n

CARDIO Model

• Model for Reliability– Minimize– Based on

– In the condition of

1 2{ ( ), ( ),... ( )}nX x k x k x k

1 2

( ) ( )n n

k k

Z R k G k

1

( ) ( )k

i ii

Y k x k Y C

CARDIO Model

• Resource Utilization Model– Model Cost = resource utilized– Resource type Q

• CPU, Network, Disk, and Storage resource, etc.• Utilization of q resource in stage i:• Normalize usage by

• Relative costs weights:

, 1,2,...i qu q Q

,i q,

,

,1

, 1, 2,...i qi q n

j qj

uq Q

u

, 1, 2,...qw q Q

CARDIO Model

• Resource Utilization Model– The cost for A is:

– Total Cost:

– Optimization target:• Choose to minimize T

,1

Q

i q i qq

A w

1' '

,1 1 1 1

Qn n n

q i q i ii q i i

T A Z w R G

1 2{ ( ), ( ),... ( )}nX x k x k x k

CARDIO Model

• Optimization Problem– Job optimality (JO)– Stage optimality (SO)

Hadoop CARDIO System

• CardioSense– Obtain progress from JT periodically– Be triggered by pre-configured threshold-value– Collect resource usage statistics for running stages– Rely on HMon on each worker node

• HMon based on Atop has low overhead


• CardioSolve– Receive data from CardioSense– Solve SO problem– Decide the replication factors for current and

previous stages


• CardioAct– Implement the command from CardioSolve– Use HDFS API setReplication(file, replicaNumber)

Evaluation• Several Important Parameters

– p is the failure rate 0.2 if not specified– is the time to replicate a data unit, 0.2 as well– is the computation resource of stage i, it follows

uniform distribution U(1,Cmax),Cmax=100 in general.– is the output of stage i, it is obtained from a

uniform distribution U(1, Dmax), Dmax varies within the [1,Cmax].

– C is the storage constraint for the whole process. Default value is

iC

iD

Evaluation

• Effect of Dmax

Evaluation

• Effect of Failure rate p

Evaluation

• Effect of block size

Evaluation

• Effect of different resource constraints++ means over-utilzed, and this type of resource is regarded as expensive

P=0.08, C=204GB, delta=0.6

S3 is CPU intensive

DSK has similar performance pattern as NET

CPU 0010, NET 0011, DSKIO 0011,STG0011

Evaluation

S2 re-execute more frequently due to the failure injection. Because it has large data output.

P=0.02, 0.08 and 0.1 1 , 3, 21

API reason

Discussion

• Problems– Typos and misleading

symbols– HDFS API setReplication()

• Any other ideas?

cardio: cost-aware replication for data-intensive workflows

Documents

replication cost of

cardio modelmodel

costaware replication

computation resource

storage resource

type of resource

disk failure

node failure