navraj chohan claris casllo mike spreitzer malgorzata ...navraj chohan1 claris casllo 2 mike...
TRANSCRIPT
![Page 1: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/1.jpg)
Navraj Chohan1
Claris Cas/llo2
Mike Spreitzer2
Malgorzata Steinder2
Asser Tantawi2
Chandra Krintz1
UC Santa Barbara 1
IBM Research2
![Page 2: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/2.jpg)
Data Analy/c Cloud Instance Op/ons MapReduce Spot Instances Evalua/on
![Page 3: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/3.jpg)
Public Cloud
DFS
Data Accelerators
![Page 4: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/4.jpg)
Different VM Sizes Pricing Options ◦ On-demand ◦ Leased ◦ Spot Instances
![Page 5: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/5.jpg)
Instance Type EC2 Compute Units
Memory (GB) Storage (GB) On-Demand Price (per hr)
m1.small 1 1.7 160 $0.095
c1.medium 5 1.7 350 $0.19
m1.large 4 7.5 850 $0.380
m2.xlarge 6.5 17.1 420 $0.570
m1.xlarge 8 15 1690 $0.760
c1.xlarge 20 7 1690 $0.760
m2.2xlarge 13 34.2 850 $1.340
m2.4xlarge 26 68.4 1690 $2.68
Pricing from http://aws.amazon.com/ec2/
![Page 6: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/6.jpg)
Instance Type On-Demand Price (per hr)
Reserved-1 Year Price (per hr)
Reserved-3Year Price (per hr)
Spot Instance Average Price (per hr)
m1.small $0.095 $0.056 $0.043 $0.0399
c1.medium $0.19 $0.112 $0.087 $0.0798
m1.large $0.380 $0.224 $0.173 $0.167
m2.xlarge $0.570 $0.321 $0.246 $0.240
m1.xlarge $0.760 $0.448 $0.347 $0.320
c1.xlarge $0.760 $0.448 $0.347 $0.323
m2.2xlarge $1.340 $0.784 $0.606 $0.559
m2.4xlarge $2.68 $1.56 $1.21 $1.12
Pricing from http://aws.amazon.com/ec2/
![Page 7: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/7.jpg)
EC2 Cloud
HDFS
Leased Machines Spot Instances
![Page 8: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/8.jpg)
M3 M2 M1
R0 R2 R1
M0
Output File from DFS
Input File from DFS
![Page 9: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/9.jpg)
Reducers
Mappers MA
Input File from DFS
Output File from DFS
MA MA
R0 R0 RA
Spot Instances
Leased Machines
![Page 10: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/10.jpg)
Make a max bid on a spot instance Spot instance is available if ◦ Max bid > market price
Not available if ◦ Max bid ≤ market price
Always pay market price Pay for full hour if terminated by user Free partial hour if terminated by Amazon
![Page 11: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/11.jpg)
MR paradigm ◦ Embarrassingly parallel jobs ◦ Fault tolerant ◦ Transient workers ◦ Workers pull data
Spot Instances ◦ Provide transient and (relatively) inexpensive
resources
![Page 12: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/12.jpg)
Job Speedup
![Page 13: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/13.jpg)
Speedup Cost
![Page 14: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/14.jpg)
Downside of Spot Instances
Termination has a cost VM uptime probability is a function of the
user’s maximum bid price Work will have to be redone ◦ Operational nodes must pick up the slack ◦ This includes map output which has been
already consumed by a reducer
![Page 15: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/15.jpg)
Modeling m1.small instance using
data from cloudexchange.net
![Page 16: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/16.jpg)
WordCount Sort
Fault injected at half‐way point of original job
![Page 17: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/17.jpg)
Handling Faults Efficiently
Have Hadoop track which map output has been consumed by a reducer to avoid re-execution
Store intermediate data (map output) in HDFS*
Lower fault detection time ◦ Default: 10 minutes
*Steven Y. Ko et al. from HotOS09’
![Page 18: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/18.jpg)
Summary
Spot instances provide inexpensive resources for transient workloads
MapReduce jobs speedup with more resources
Spot instance termination hurts a job’s time to completion
![Page 19: Navraj Chohan Claris Casllo Mike Spreitzer Malgorzata ...Navraj Chohan1 Claris Casllo 2 Mike Spreitzer2 Malgorzata Steinder2 Asser Tantawi2 Chandra Krintz1 UC Santa Barbara 1 IBM Research2](https://reader036.vdocuments.site/reader036/viewer/2022071011/5fca027ffa43c271c84ff9ad/html5/thumbnails/19.jpg)
Questions?