greenhadoop : leveraging green energy in data-processing frameworks
DESCRIPTION
GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks. Íñigo Goiri , Kien Le, Thu D. Nguyen, Jordi Guitart , Jordi Torres, and Ricardo Bianchini. Motivation. Datacenters consume large amounts of energy Energy cost is not the only problem Brown sources: coal, natural gas… - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/1.jpg)
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks
Íñigo Goiri, Kien Le, Thu D. Nguyen,Jordi Guitart, Jordi Torres, and Ricardo Bianchini
![Page 2: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/2.jpg)
2
Motivation• Datacenters consume large amounts of energy• Energy cost is not the only problem– Brown sources: coal, natural gas…
• Connect datacenters to green sources– Solar panels, wind turbines…– Green datacenter– Early examples in the field
![Page 3: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/3.jpg)
3
Green datacenter• Energy sources
– Solar/wind: variable over time– Electrical grid: backup
• Mitigation approaches are not ideal– Batteries and net metering
• We need to match the energy demand to the supply
Power
Time
Load
Solar power
Workload
![Page 4: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/4.jpg)
4
J3
J3
Delaying load within time bounds
J1 J2Nod
esPow
er
Time
Nod
esPow
er
Delay some jobs is OK (respecting time bounds)
J2
J2J1
![Page 5: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/5.jpg)
5
Scheduling data-processing workloadsin green datacenters
• Data-processing jobs– Each task operates on a chunk of data– Data distributed among servers
• Simple workflow: MapReduce– Map tasks: process input data– Reduce tasks: merge maps’ outputs
Challenges• Match MapReduce workload with green energy availability
– No information on #nodes, length, power…• Conserve energy while ensuring data availability
Map1
Map2
Map3
Map4
Map5
Reduce
Reduce 6
7
Shuffle
![Page 6: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/6.jpg)
6
Overview of GreenHadoop
• Predict solar energy availability• May delay jobs but must meet time bounds
– Maximize green energy use– If not enough green energy, minimize brown electricity cost– Brown energy cost + peak brown power cost
• Deactivate idle servers while keeping data available
• Divided into two parts1. Computation scheduling2. Data management
![Page 7: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/7.jpg)
7
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
Estimate the energy required by jobs (EWMA)
Job3Job1
Job4
Job5
Job6
Job2
![Page 8: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/8.jpg)
8
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
Power
TimeNow
Assign green energy first
Predict energy availability(weather forecast)
On-peakOff-peak Off-peak
![Page 9: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/9.jpg)
9
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
TimeNow
Assign cheap brown energy
Power
Previouspeak
On-peakOff-peak Off-peak
![Page 10: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/10.jpg)
10
1. Computation scheduling
Job3Job1
Job4
Job5
Job6
Job2
TimeNow
Assign expensive energy
Power
Activeservers
On-peakOff-peak Off-peak
Current power → Active servers
![Page 11: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/11.jpg)
11
1. Computation scheduling
TimeNow
Activeservers
Power
As time goes by…
the number of active servers changes
![Page 12: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/12.jpg)
12
2. Data management• Deactivate servers to save energy
– Some data might become unavailable• Prior solution: covering subset [Leverich’09]
– Set of servers always running has ALL data
Covering subset
7
3
45
21 6
8
7 1
4 5
6
3
2
8 1
7 3
• Our approach• Only required data has to be available• We usually require fewer active servers
![Page 13: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/13.jpg)
13
2. Data managementServer 1
1 72Active
Decommission
Down
Server 24
356
Server 3
46
Required fileNon-required file
Server 42
3 84
Server 5
3 67
JobA 4
JobB 5
JobC 1
6
Running queue:
![Page 14: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/14.jpg)
14
2. Data management
Server 42
3 84
Server 5
3 67
Active
Decommission
Down
GreenHadoop (computation) requires only 2 servers
Server 1
1 72
Server 1
1 72
Server 24
356
Server 3
46
Required fileNon-required file JobA 4
JobB 5
JobC 1
6
Running queue:
![Page 15: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/15.jpg)
15
2. Data management
Active
Decommission
Down
Move required files to Active servers
Server 1
1 72
Server 24
356
Server 3
46
1
Server 42
3 84
Server 5
3 67
Replicate
JobA 4
JobB 5
JobC 1
6
Running queue:
![Page 16: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/16.jpg)
16
Server 1
1 72
2. Data management
Active
Decommission
Down
Decommissioned server can be sent to Down
Server 1
1 72
Server 24
356
Server 3
46
Required fileNon-required file
1
Server 42
3 84
Server 5
3 67
JobA 4
JobB 5
JobC 1
6
Running queue:
![Page 17: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/17.jpg)
17
Server 1
1 72
2. Data management
Active
Decommission
Down
Jobs to be executed change → Required files change
Server 24
356
Server 3
46
Non-required file
1
Server 42
3 84
Server 5
3 67
JobA 4
JobB 5
JobC 1
6
JobD 8
Required file
646
4
648
Required file
Running queue:
![Page 18: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/18.jpg)
18
Server 42
3 84
Server 1
1 72
2. Data management
Active
Decommission
Down
Make missing data available
Server 24
356
Server 42
3 84
Server 5
3 67
Server 3
46
1
Required file
Non-required file
JobB 5
JobC 1
JobD 8
Required fileRunning queue:
![Page 19: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/19.jpg)
19
Server 42
3 84
Server 1
1 72
2. Data management
Active
Decommission
Down
Server 24
356
Server 42
3 84
Server 5
3 67
GreenHadoop (computation) requires 3 servers
Server 3
46
1
Non-required file
JobB 5
JobC 1
JobD 8
Required fileRunning queue:
![Page 20: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/20.jpg)
20
Evaluation methodology
• Cluster with 16 Xeon servers– Hadoop and Hadoop turning off idle servers (EAHadoop)– GreenHadoop: green energy, brown electricity cost
• Energy profile– NJ electricity pricing (on/off peak and peak cost)– Solar farm energy availability (14 PV panels)– Five pairs of days (combinations of high and low days)
• Workload– Derived from Facebook [Zaharia’09]– Jobs with up to 37GB, 600 tasks, and 6 hours of length– Internal time bound of one day
![Page 21: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/21.jpg)
21
Energy prediction vs actual
6:00 AM
7:00 AM
8:00 AM
9:00 AM
10:00 AM
11:00 AM
12:00 PM
1:00 PM
2:00 PM
3:00 PM
4:00 PM
5:00 PM
6:00 PM
7:00 PM
0.0
0.5
1.0
1.5
2.0PredictionActual
Ener
gy (k
Wh)
0 6 12 18 24 30 36 42 480
10
20
30
40
Hours ahead
Erro
r (%
)
rain thunderstormcloud cover
![Page 22: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/22.jpg)
22
30 kWh59 kWh
$8.00
39 kWh25 kWh
$6.06 -24%
31% more green 39% cost savings
GreenHadoop for Facebook & high-high days
Greenconsumed
Brownconsumed
Brownprice
Greenpredicted
Greenproduced
![Page 23: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/23.jpg)
23
Green energy increase Cost savings05
10152025303540
High-High High-Low Low-HighLow-Low Very Low
%
Green energy increase Cost savings05
10152025303540
EAHadoopGreenGreen & Brown EnergyGreen & Brown Energy & Brown Peak
%
Different pairs of days Effect of parameters inGreenHadoop
GreenHadoop for Facebook
![Page 24: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/24.jpg)
24
Other results
• Workload intensity (datacenter utilization)• High-priority jobs• Shorter time bounds• Data availability• Workloads variations
• Consistent green energy increases and cost savings
![Page 25: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/25.jpg)
25
Conclusions• Data-processing scheduler for green datacenters• Predicts green energy availability• Increases the use of green energy• Reduces brown electricity costs• Manages data availability
• We are building Parasol– Solar-powered μdatacenter– Poster session
![Page 26: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks](https://reader036.vdocuments.site/reader036/viewer/2022081603/56813651550346895d9dd4a0/html5/thumbnails/26.jpg)
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks
Íñigo Goiri, Kien Le, Thu D. Nguyen,Jordi Guitart, Jordi Torres, and Ricardo Bianchini