cardio: cost-aware replication for data-intensive workflows
DESCRIPTION
CARDIO: Cost-Aware Replication for Data-Intensive workflOws. Presented by Chen He. Motivation. Is large scale cluster reliable? 5 average worker deaths per Map-Reduce job At least 1 disk failure in every run of a 6- hour MapReduce job on a 4000-node cluster. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
CARDIO: Cost-Aware Replication for Data-Intensive workflOws
Presented by Chen He
Motivation• Is large scale cluster reliable?
5 average worker deaths per Map-Reduce job At least 1 disk failure in every run of a 6- hour MapReduce job on a 4000-node cluster
Motivation
• How to prevent node failure from affecting performance?– Replication
• Capacity constraint• Replication time, etc
– Regeneration through re-execution• Delay program progress• Cascaded re-execution
Motivation
AVAILABILITY
COST
All pictures adopted from the Internet
Outline
• Problem Exploration• CARDIO Model• Hadoop CARDIO System• Evaluation• Discussion
Problem Exploration
• Performance Costs– Replication cost (R)– Regeneration cost (G)– Reliability cost (Z)– Execution cost (A)– Total cost (T)– Disk cost (Y)
T=A+Z Z=R+G
Problem Exploration
• Experiment Environment– Hadoop 0.20.2– 25 VMs– Workloads: Tagger->Join->Grep->RecordCounter
Problem Exploration Summary
• Replication Factor for MR Stages
Problem Exploration Summary
• Detailed Execution Time of 3 Cases
CARDIO Model
• Block Failure Model– Output of stage i is – Replication factor is– Total block number is – Single block failure probability is– Failure probability in stage i:
( ) 1 (1 )i ix bif x p
ixiD
ib
p
CARDIO Model
• Cost Computation Model– Total time of stage i:– Replication cost of stage i: – Expected regeneration time of stage i:
– Reliability cost for all stages:– Storage Constraint C of all stages:
– Choose to minimize Z
1i i i iT A R G
( 1)i i i iG f x T
1 2{ , ,... }nX x x x
1
1 1
n n
i ii i
Z R G
1
n
i ii
Y xY C
i i iR x Y
CARDIO Model
• Dynamic Replication– Replication number x may vary during the
program approaching• Job is in Step k, the replication factor at this step is:
( ), 1, 2,... 1, 2,...,ix k i k k n
CARDIO Model
• Model for Reliability– Minimize– Based on
– In the condition of
1 2{ ( ), ( ),... ( )}nX x k x k x k
1 2
( ) ( )n n
k k
Z R k G k
1
( ) ( )k
i ii
Y k x k Y C
CARDIO Model
• Resource Utilization Model– Model Cost = resource utilized– Resource type Q
• CPU, Network, Disk, and Storage resource, etc.• Utilization of q resource in stage i:• Normalize usage by
• Relative costs weights:
, 1,2,...i qu q Q
,i q,
,
,1
, 1, 2,...i qi q n
j qj
uq Q
u
, 1, 2,...qw q Q
CARDIO Model
• Resource Utilization Model– The cost for A is:
– Total Cost:
– Optimization target:• Choose to minimize T
,1
Q
i q i qq
A w
1' '
,1 1 1 1
Qn n n
q i q i ii q i i
T A Z w R G
1 2{ ( ), ( ),... ( )}nX x k x k x k
CARDIO Model
• Optimization Problem– Job optimality (JO)– Stage optimality (SO)
Hadoop CARDIO System
• CardioSense– Obtain progress from JT periodically– Be triggered by pre-configured threshold-value– Collect resource usage statistics for running stages– Rely on HMon on each worker node
• HMon based on Atop has low overhead
Hadoop CARDIO System
• CardioSolve– Receive data from CardioSense– Solve SO problem– Decide the replication factors for current and
previous stages
Hadoop CARDIO System
• CardioAct– Implement the command from CardioSolve– Use HDFS API setReplication(file, replicaNumber)
Hadoop CARDIO System
Evaluation• Several Important Parameters
– p is the failure rate 0.2 if not specified– is the time to replicate a data unit, 0.2 as well– is the computation resource of stage i, it follows
uniform distribution U(1,Cmax),Cmax=100 in general.– is the output of stage i, it is obtained from a
uniform distribution U(1, Dmax), Dmax varies within the [1,Cmax].
– C is the storage constraint for the whole process. Default value is
iC
iD
Evaluation
• Effect of Dmax
Evaluation
• Effect of Failure rate p
Evaluation
• Effect of block size
Evaluation
• Effect of different resource constraints++ means over-utilzed, and this type of resource is regarded as expensive
P=0.08, C=204GB, delta=0.6
S3 is CPU intensive
DSK has similar performance pattern as NET
CPU 0010, NET 0011, DSKIO 0011,STG0011
Evaluation
S2 re-execute more frequently due to the failure injection. Because it has large data output.
P=0.02, 0.08 and 0.1 1 , 3, 21
API reason
Discussion
• Problems– Typos and misleading
symbols– HDFS API setReplication()
• Any other ideas?