cw13 0.01-final

Making HTCondor Energy Efficient by iden5fying miscreant jobs

Stephen McGough, Ma@hew Forshaw & Clive Gerrard

Newcastle University Stuart Wheater

Arjuna Technologies Limited

Mo5va5on Policy and Simula5on Conclusion

Task lifecycle

5me

HTC user submits task

resource selec5on

Interac5ve user logs in

Task evic5on

resource selec5on

Computer reboot

resource selec5on

Interac5ve user logs in

…

resource selec5on

Task evic5on

Task evic5on

Task comple5on

5me

…

Good

Bad

‘Miscreant’ – has been evicted but don’t know if it’s good or bad

Mo5va5on


Mo5va5on •  We have run a high-‐throughput cluster for ~6 years –  Allowing many researchers to perform more work quicker

•  University has strong desire to reduce energy consump5on and reduce CO2 produc5on –  Currently powering down computer & buying low power PCs –  “If a computer is not ‘working’ it should be powered down”

•  Can we go further to reduce wasted energy? –  Reduce 5me computers spend running work which does not complete

–  Prevent re-‐submission of ‘bad’ jobs –  Reduce the number of resubmissions for ‘good’ jobs

•  Aims –  Inves5gate policy for reducing energy consump5on –  Determine the impact on high-‐throughput users

Mo5va5on


Can we fix the number of retries?

0 200 400 600 800 1000 1200 1400 1600 1800 20000

100

200

300

400

500

600

700

800

900

Number of evictions

Cum

ulat

ive

was

ted

seco

nds

(milli

ons)

Good JobsBad Jobs

•  ~57 years of compu5ng 5me during 2010 •  ~39 years of wasted 5me

–  ~27 years for ‘bad’ tasks : average 45 retries : max 1946 retries –  ~12 years for ‘good’ tasks : average 1.38 retries : max 360 retries

•  100% ‘good’ task comple5on -‐> 360 retries •  S5ll wastes ~13 years on ‘bad’ tasks

–  95% ‘good’ task comple5on -‐> 3 retries : 9,808 good tasks killed (3.32%) –  99% ‘good’ task comple5on -‐> 6 retries : 2,022 good tasks killed (0.68%)

Mo5va5on


0 200 400 600 800 1000 1200 14000

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

Idle time (minutes)

Cum

ulat

ive

free

seco

nds

Can we make tasks short enough? •  Make tasks short enough to reduce miscreants •  Average idle interval – 371 minutes •  But to ensure availability of intervals – 95% : need to reduce 5me limit to 2 minutes

– 99% : need to reduce 5me limit to 1 minute

•  Imprac5cal to make tasks this short

Mo5va5on


Cluster Simula5on

•  High Level Simula5on of Condor – Trace logs from a twelve month period are used as input •  User Logins / Logouts (computer used) •  Condor Job Submission 5mes (‘good’/’bad’ and dura5on)

different rooms, each room having its own opening hours.These hours vary between clusters that are predominantly forteaching purposes and open during teaching hours (normally9am till 5pm) through to 24-hour access computer clusters.The location of clusters has a significant impact on through-put of interactive users. From clusters buried deep within aparticular school to those within busy thoroughfares such asthe University Library.

Computers within the clusters are replaced on a five yearrolling programme with computers falling into one of threebroad categories as outlined in Table I. PCs within a clusterare provisioned at the same time and will contain equivalentcomputing resources. Thus there is a wide variance betweenclusters within the University but no variance within them.

Whilst we expect casual use to migrate onto user ownedportable devices and virtual desktops, the demand forcompute/graphics intensive workstations running high-endsoftware is, if anything, increasing. Further, these high-end applications are unlikely to migrate to virtual desktopsor user owned devices due to hardware requirements andlicensing conditions, so we expect to need to maintain apool of hardware that will be useful for Condor for sometime.

PUE values have been assigned at the cluster level withvalues in the range of 0.9 to 1.4. These values have not beenempirically evaluated but used here to steer jobs. In mostcases the cluster rooms have a low enough computer densitynot to require cooling giving these clusters a PUE value of1.0. However, two clusters are located in rooms that requireair conditioning, giving these a PUE of 1.4. Likewise, fourclusters are based in a basement room, which is cold allyear round; hence computer heat is used to offset heatingrequirements for the room, giving a PUE value of 0.9.

By default computers within the cluster will enter thesleep state after a given interval of inactivity. This time willdepend on whether the cluster is open or not. During openhours computers will remain in the idle state for one hourbefore entering the sleep state whilst outside of these hoursthe idle interval before sleep is reduced to 15 minutes. Thispolicy (P2) was originally trialled under Windows XP wherethe time for computers to resume from the shutdown statewas considerable (sleep was an unreliable option for ourenvironment). Likewise the time interval before a Condorjob could start using a computer (M1) was set to be 15minutes during cluster opening hours and 0 minutes outside

Table I: Computer Types

Type Cores Speed Power ConsumptionActive Idle Sleep

Normal 2 ⇠3Ghz 57W 40W 2WHigh End 4 ⇠3Ghz 114W 67W 3WLegacy 2 ⇠2Ghz 100-180W 50-80W 4W

!"

#"

$"

%"

&"

'"

("

)"

*"

+"

!"#$

%&'(!)*!#

+'(!,)-.%+!/

'(!012!34

5)#+1%

0+6!

718'!

Figure 3: Interactive user logins showing seasonality

of opening hours. The latter was possible as computerswould only have their states changed at these times due toCondor waking them up or a scheduled reboot.

The simulation is based on trace logs generated frominteractive user logins and Condor execution logs for 2010.Figure 3 illustrates the interactive logins for this periodshowing the high degree of seasonality within the data. Itis easy to distinguish between week and weekends as wellas where the three terms lie along with the vacations. Thisrepresents 1,229,820 interactive uses of the computers.

Figure 4 depicts the profile for the 532,467 job submis-sions made to Condor during this period. As can be seenthe job submissions follow no clearly definable pattern. Notethat out of these submissions 131,909 were later killed by theoriginal Condor user. In order to simulate these killed jobsthe simulation assumes that these will be non-terminatingjobs and will keep on submitting them to resources until thetime at which the high-throughput user terminates them. Thegraph is clipped on Thursday 03/06/2010 as this date had93,000 job submissions.

For the simulations we will report on the total powerconsumed (in MWh) for the period. In order to determinethe effect on high-throughput users of a policy we will alsoreport the average overhead observed by jobs submitted toCondor (in seconds). Where overhead is defined to be theamount of time in excess of the execution duration of the job.Other statistics will be reported as appropriate for particular

!"!#$

!"#$

#$

#!$

#!!$!"#

$%&'(

)'*"$

#+,,+(-

,'."

-/&%/,'

012%'

Figure 4: Condor job submission profile

Ac5ve User / Condor

Idle

Sleep WOL

ZZZ

Cycle Stealing Users

Interactive Users

High-Throughput Management

ZZZ

Management policy

Policy and Simula5on


0 5 10 15 20 25 30101

102

103

104

105

Number of retries (n)

Num

ber o

f goo

d tasks

kille

d

N1 C1N1 C2N1 C3N2 C1N2 C2N2 C3N3 C1N3 C2N3 C3

n realloca5on policies •  N1(n): Abandon task if deallocated n 5mes. •  N2(n): Abandon task if deallocated n 5mes

ignoring interac5ve users. •  N3(n): Abandon task if deallocated n 5mes

ignoring planned machine reboots. •  C1: Tasks allocated to resources at random,

favouring awake resources •  C2: Target less used computers (longer idle

5mes) •  C3: Tasks are allocated to computers in clusters

with least amount of 5me used by interac5ve users

0 5 10 15 20 25 303

3.5

4

4.5

5

5.5

6x 106


Ener

gy c

onsu

mpt

ion

(MW

h)


0 5 10 15 20 25 300

2

4

6

8

10

12x 105


Ove

rhea

ds o

n al

l goo

d tasks




0 10 20 30 40 50 60 70 80 90 100101

102

103

104

105

Accrued time (hours)

Num

ber o

f goo

d tasks

kille

d

A1 C1A1 C2A1 C3A2 C1A2 C2A2 C3A3 C1A3 C2A3 C3

Accrued Time Policies •  Impose a limit on cumula5ve

execu5on 5me for a task. •  A1(t): Abandon if accrued 5me > t

and task deallocated. •  A2(t): As A1, discoun5ng

interac5ve users.. •  A3(t): As A1, discoun5ng reboots. 0 10 20 30 40 50 60 70 80 90 100

3.5

4

4.5

5

5.5

6x 106


Ener

gy c

onsu

mpt

ion

(MW

h)


0 10 20 30 40 50 60 70 80 90 1003

4

5

6

7

8

9

10

11

12x 105


Ove

rhea

ds o

n al

l goo

d tasks




Individual Time Policy •  Impose a limit on individual

execu5on 5me for a task. –  Nightly reboots bound this to 24 hours. –  What is the impact of lowering this?

•  I1(t): Abandon if individual 5me > t. 0 5 10 15 20 25 30

3.5

4

4.5

5

5.5

6x 106

Individual time (hours)

Ener

gy c

onsu

mpt

ion

(MW

h)

I1 C1I1 C2I1 C3

0 5 10 15 20 25100

101

102

103

104

105


Num

ber o

f goo

d tasks

kille

d

I1 C1I1 C2I1 C3

0 5 10 15 20 25 303

4

5

6

7

8

9

10

11

12x 105


Ove

rhea

ds o

n al

l goo

d tasks

I1 C1I1 C2I1 C3



20 40 60 80 100 120 140106

107

108

109

1010

1011

1012

Maximum execution duration (hours)

Ener

gy c

onsu

mpt

ion

(MW

h)

m=10, n=10m=10, n=20m=10, n=30m=20, n=10m=20, n=20m=20, n=30m=30, n=10m=30, n=20m=30, n=30m=40, n=10m=40, n=20m=40, n=30

Dedicated Resources D1(m,d): Miscreant tasks are permi@ed to con5nue execu5ng on a dedicated set of m resources (without interac5ve users or reboots), with a maximum dura5on d.

20 40 60 80 100 120 1400

2

4

6

8

10

12

14

16


Num

ber o

f goo

d tasks

kille

d


20 40 60 80 100 120 1401.12

1.14

1.16

1.18

1.2

1.22

1.24

1.26

1.28

1.3x 106


Ove

rhea

ds o

n al

l goo

d tasks




Conclusion

•  Simple policies can be used to reduce the effect of miscreant tasks in a mul5-‐use cycle stealing cluster. – N2 (total evic5ons ignoring users)

•  Order of magnitude reduc5on in energy consump5on –  Reduce amount of effort wasted on tasks that will never complete

•  Policies may be combined to achieve further improvements. – Adding in dedicated computers

Conclusion

Ques5ons?

[email protected] [email protected]

More info: McGough, A Stephen; Forshaw, Ma@hew; Gerrard, Clive; Wheater, Stuart; Reducing the Number of Miscreant Tasks ExecuBons in a MulB-‐use Cluster, Cloud and Green Compu5ng (CGC), 2012

cw13 0.01-final

Technology