cw13 0.01-final

12
Making HTCondor Energy Efficient by iden5fying miscreant jobs Stephen McGough, Ma@hew Forshaw & Clive Gerrard Newcastle University Stuart Wheater Arjuna Technologies Limited

Upload: asm100

Post on 30-Jun-2015

305 views

Category:

Technology


1 download

DESCRIPTION

Presentation on miscreant jobs in HTCondor presented at HTCondor week 2013. Showing how to reduce the number of bad jobs run and increase the chances of good jobs running quickly.

TRANSCRIPT

Page 1: Cw13 0.01-final

Making  HTCondor  Energy  Efficient  by  iden5fying  miscreant  jobs  

Stephen  McGough,  Ma@hew  Forshaw    &  Clive  Gerrard  

Newcastle  University  Stuart  Wheater  

Arjuna  Technologies  Limited  

Page 2: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

Task  lifecycle  

5me  

HTC  user  submits    task  

resource  selec5on  

Interac5ve  user  logs    in  

Task  evic5on  

resource  selec5on  

Computer  reboot  

resource  selec5on  

Interac5ve  user  logs    in  

…  

resource  selec5on  

Task  evic5on  

Task  evic5on  

Task  comple5on  

5me  

…  

Good  

Bad  

‘Miscreant’  –  has  been  evicted  but  don’t  know  if  it’s  good  or  bad    

Mo5va5on  

Page 3: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

Mo5va5on  •  We  have  run  a  high-­‐throughput  cluster  for  ~6  years  –  Allowing  many  researchers  to  perform  more  work  quicker  

•  University  has  strong  desire  to  reduce  energy  consump5on  and  reduce  CO2  produc5on  –  Currently  powering  down  computer  &  buying  low  power  PCs  –  “If  a  computer  is  not  ‘working’  it  should  be  powered  down”  

•  Can  we  go  further  to  reduce  wasted  energy?  –  Reduce  5me  computers  spend  running  work  which  does  not  complete  

–  Prevent  re-­‐submission  of  ‘bad’  jobs  –  Reduce  the  number  of  resubmissions  for  ‘good’  jobs  

•  Aims  –  Inves5gate  policy  for  reducing  energy  consump5on  –  Determine  the  impact  on  high-­‐throughput  users  

Mo5va5on  

Page 4: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

Can  we  fix  the  number  of  retries?  

0 200 400 600 800 1000 1200 1400 1600 1800 20000

100

200

300

400

500

600

700

800

900

Number of evictions

Cum

ulat

ive

was

ted

seco

nds

(milli

ons)

Good JobsBad Jobs

•  ~57  years  of  compu5ng  5me  during  2010  •  ~39  years  of  wasted  5me  

–  ~27  years  for  ‘bad’  tasks  :  average  45  retries  :  max  1946  retries  –  ~12  years  for  ‘good’  tasks  :  average  1.38  retries  :  max  360  retries  

•  100%  ‘good’  task  comple5on  -­‐>  360  retries  •  S5ll  wastes  ~13  years  on  ‘bad’  tasks  

–  95%  ‘good’  task  comple5on  -­‐>  3  retries  :  9,808  good  tasks  killed  (3.32%)  –  99%  ‘good’  task  comple5on  -­‐>  6  retries  :    2,022  good  tasks  killed  (0.68%)  

Mo5va5on  

Page 5: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

0 200 400 600 800 1000 1200 14000

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

Idle time (minutes)

Cum

ulat

ive

free

seco

nds

Can  we  make  tasks  short  enough?  •  Make  tasks  short  enough  to  reduce  miscreants  •  Average  idle  interval  –  371  minutes  •  But  to  ensure  availability  of  intervals  – 95%  :  need  to  reduce    5me  limit  to  2  minutes  

– 99%  :  need  to  reduce      5me  limit  to  1  minute    

•  Imprac5cal  to  make  tasks  this  short  

Mo5va5on  

Page 6: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

Cluster  Simula5on  

•  High  Level  Simula5on  of  Condor  – Trace  logs  from  a  twelve  month  period  are  used  as  input  •  User  Logins  /  Logouts  (computer  used)  •  Condor  Job  Submission  5mes  (‘good’/’bad’  and  dura5on)  

different rooms, each room having its own opening hours.These hours vary between clusters that are predominantly forteaching purposes and open during teaching hours (normally9am till 5pm) through to 24-hour access computer clusters.The location of clusters has a significant impact on through-put of interactive users. From clusters buried deep within aparticular school to those within busy thoroughfares such asthe University Library.

Computers within the clusters are replaced on a five yearrolling programme with computers falling into one of threebroad categories as outlined in Table I. PCs within a clusterare provisioned at the same time and will contain equivalentcomputing resources. Thus there is a wide variance betweenclusters within the University but no variance within them.

Whilst we expect casual use to migrate onto user ownedportable devices and virtual desktops, the demand forcompute/graphics intensive workstations running high-endsoftware is, if anything, increasing. Further, these high-end applications are unlikely to migrate to virtual desktopsor user owned devices due to hardware requirements andlicensing conditions, so we expect to need to maintain apool of hardware that will be useful for Condor for sometime.

PUE values have been assigned at the cluster level withvalues in the range of 0.9 to 1.4. These values have not beenempirically evaluated but used here to steer jobs. In mostcases the cluster rooms have a low enough computer densitynot to require cooling giving these clusters a PUE value of1.0. However, two clusters are located in rooms that requireair conditioning, giving these a PUE of 1.4. Likewise, fourclusters are based in a basement room, which is cold allyear round; hence computer heat is used to offset heatingrequirements for the room, giving a PUE value of 0.9.

By default computers within the cluster will enter thesleep state after a given interval of inactivity. This time willdepend on whether the cluster is open or not. During openhours computers will remain in the idle state for one hourbefore entering the sleep state whilst outside of these hoursthe idle interval before sleep is reduced to 15 minutes. Thispolicy (P2) was originally trialled under Windows XP wherethe time for computers to resume from the shutdown statewas considerable (sleep was an unreliable option for ourenvironment). Likewise the time interval before a Condorjob could start using a computer (M1) was set to be 15minutes during cluster opening hours and 0 minutes outside

Table I: Computer Types

Type Cores Speed Power ConsumptionActive Idle Sleep

Normal 2 ⇠3Ghz 57W 40W 2WHigh End 4 ⇠3Ghz 114W 67W 3WLegacy 2 ⇠2Ghz 100-180W 50-80W 4W

!"

#"

$"

%"

&"

'"

("

)"

*"

+"

!"#$

%&'(!)*!#

+'(!,)-.%+!/

'(!012!34

5)#+1%

0+6!

718'!

Figure 3: Interactive user logins showing seasonality

of opening hours. The latter was possible as computerswould only have their states changed at these times due toCondor waking them up or a scheduled reboot.

The simulation is based on trace logs generated frominteractive user logins and Condor execution logs for 2010.Figure 3 illustrates the interactive logins for this periodshowing the high degree of seasonality within the data. Itis easy to distinguish between week and weekends as wellas where the three terms lie along with the vacations. Thisrepresents 1,229,820 interactive uses of the computers.

Figure 4 depicts the profile for the 532,467 job submis-sions made to Condor during this period. As can be seenthe job submissions follow no clearly definable pattern. Notethat out of these submissions 131,909 were later killed by theoriginal Condor user. In order to simulate these killed jobsthe simulation assumes that these will be non-terminatingjobs and will keep on submitting them to resources until thetime at which the high-throughput user terminates them. Thegraph is clipped on Thursday 03/06/2010 as this date had93,000 job submissions.

For the simulations we will report on the total powerconsumed (in MWh) for the period. In order to determinethe effect on high-throughput users of a policy we will alsoreport the average overhead observed by jobs submitted toCondor (in seconds). Where overhead is defined to be theamount of time in excess of the execution duration of the job.Other statistics will be reported as appropriate for particular

!"!#$

!"#$

#$

#!$

#!!$!"#

$%&'(

)'*"$

#+,,+(-

,'."

-/&%/,'

012%'

Figure 4: Condor job submission profile

Ac5ve  User  /  Condor  

Idle  

Sleep  WOL

ZZZ

Cycle Stealing Users

Interactive Users

High-Throughput Management

ZZZ

Management policy

Policy  and  Simula5on  

Page 7: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

0 5 10 15 20 25 30101

102

103

104

105

Number of retries (n)

Num

ber o

f goo

d tasks

kille

d

N1 C1N1 C2N1 C3N2 C1N2 C2N2 C3N3 C1N3 C2N3 C3

n  realloca5on  policies  •  N1(n):  Abandon  task  if  deallocated  n  5mes.  •  N2(n):  Abandon  task  if  deallocated  n  5mes  

ignoring  interac5ve  users.  •  N3(n):  Abandon  task  if  deallocated  n  5mes  

ignoring  planned  machine  reboots.  •  C1:  Tasks  allocated  to  resources  at  random,  

favouring  awake  resources  •  C2:  Target  less  used  computers  (longer  idle  

5mes)  •  C3:  Tasks  are  allocated  to  computers  in  clusters  

with  least  amount  of  5me  used  by  interac5ve  users  

0 5 10 15 20 25 303

3.5

4

4.5

5

5.5

6x 106

Number of retries (n)

Ener

gy c

onsu

mpt

ion

(MW

h)

N1 C1N1 C2N1 C3N2 C1N2 C2N2 C3N3 C1N3 C2N3 C3

0 5 10 15 20 25 300

2

4

6

8

10

12x 105

Number of retries (n)

Ove

rhea

ds o

n al

l goo

d tasks

N1 C1N1 C2N1 C3N2 C1N2 C2N2 C3N3 C1N3 C2N3 C3

Policy  and  Simula5on  

Page 8: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

0 10 20 30 40 50 60 70 80 90 100101

102

103

104

105

Accrued time (hours)

Num

ber o

f goo

d tasks

kille

d

A1 C1A1 C2A1 C3A2 C1A2 C2A2 C3A3 C1A3 C2A3 C3

Accrued  Time  Policies  •  Impose  a  limit  on  cumula5ve  

execu5on  5me  for  a  task.  •  A1(t):  Abandon  if  accrued  5me  >  t  

and  task  deallocated.  •  A2(t):  As  A1,  discoun5ng  

interac5ve  users..  •  A3(t):  As  A1,  discoun5ng  reboots.   0 10 20 30 40 50 60 70 80 90 100

3.5

4

4.5

5

5.5

6x 106

Accrued time (hours)

Ener

gy c

onsu

mpt

ion

(MW

h)

A1 C1A1 C2A1 C3A2 C1A2 C2A2 C3A3 C1A3 C2A3 C3

0 10 20 30 40 50 60 70 80 90 1003

4

5

6

7

8

9

10

11

12x 105

Accrued time (hours)

Ove

rhea

ds o

n al

l goo

d tasks

A1 C1A1 C2A1 C3A2 C1A2 C2A2 C3A3 C1A3 C2A3 C3

Policy  and  Simula5on  

Page 9: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

Individual  Time  Policy  •  Impose  a  limit  on  individual  

execu5on  5me  for  a  task.  –  Nightly  reboots  bound  this  to  24  hours.  –  What  is  the  impact  of  lowering  this?  

•  I1(t):  Abandon  if  individual  5me  >  t.  0 5 10 15 20 25 30

3.5

4

4.5

5

5.5

6x 106

Individual time (hours)

Ener

gy c

onsu

mpt

ion

(MW

h)

I1 C1I1 C2I1 C3

0 5 10 15 20 25100

101

102

103

104

105

Individual time (hours)

Num

ber o

f goo

d tasks

kille

d

I1 C1I1 C2I1 C3

0 5 10 15 20 25 303

4

5

6

7

8

9

10

11

12x 105

Individual time (hours)

Ove

rhea

ds o

n al

l goo

d tasks

I1 C1I1 C2I1 C3

Policy  and  Simula5on  

Page 10: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

20 40 60 80 100 120 140106

107

108

109

1010

1011

1012

Maximum execution duration (hours)

Ener

gy c

onsu

mpt

ion

(MW

h)

m=10, n=10m=10, n=20m=10, n=30m=20, n=10m=20, n=20m=20, n=30m=30, n=10m=30, n=20m=30, n=30m=40, n=10m=40, n=20m=40, n=30

Dedicated  Resources  D1(m,d):  Miscreant  tasks  are  permi@ed  to  con5nue  execu5ng  on  a  dedicated  set  of  m  resources  (without  interac5ve  users  or  reboots),  with  a  maximum  dura5on  d.  

20 40 60 80 100 120 1400

2

4

6

8

10

12

14

16

Maximum execution duration (hours)

Num

ber o

f goo

d tasks

kille

d

m=10, n=10m=10, n=20m=10, n=30m=20, n=10m=20, n=20m=20, n=30m=30, n=10m=30, n=20m=30, n=30m=40, n=10m=40, n=20m=40, n=30

20 40 60 80 100 120 1401.12

1.14

1.16

1.18

1.2

1.22

1.24

1.26

1.28

1.3x 106

Maximum execution duration (hours)

Ove

rhea

ds o

n al

l goo

d tasks

m=10, n=10m=10, n=20m=10, n=30m=20, n=10m=20, n=20m=20, n=30m=30, n=10m=30, n=20m=30, n=30m=40, n=10m=40, n=20m=40, n=30

Policy  and  Simula5on  

Page 11: Cw13 0.01-final

Mo5va5on   Policy  and  Simula5on   Conclusion  

Conclusion  

•  Simple  policies  can  be  used  to  reduce  the  effect  of  miscreant  tasks  in  a  mul5-­‐use  cycle  stealing  cluster.  – N2  (total  evic5ons  ignoring  users)  

•  Order  of  magnitude  reduc5on  in  energy  consump5on  –  Reduce  amount  of  effort  wasted  on  tasks  that  will  never  complete  

•  Policies  may  be  combined  to  achieve  further  improvements.  – Adding  in  dedicated  computers  

Conclusion  

Page 12: Cw13 0.01-final

Ques5ons?  

[email protected]  [email protected]    

More  info:  McGough,  A  Stephen;  Forshaw,  Ma@hew;  Gerrard,  Clive;  Wheater,  Stuart;    Reducing  the  Number  of  Miscreant  Tasks  ExecuBons  in  a  MulB-­‐use  Cluster,    Cloud  and  Green  Compu5ng  (CGC),  2012