cw13 0.01-final
DESCRIPTION
Presentation on miscreant jobs in HTCondor presented at HTCondor week 2013. Showing how to reduce the number of bad jobs run and increase the chances of good jobs running quickly.TRANSCRIPT
Making HTCondor Energy Efficient by iden5fying miscreant jobs
Stephen McGough, Ma@hew Forshaw & Clive Gerrard
Newcastle University Stuart Wheater
Arjuna Technologies Limited
Mo5va5on Policy and Simula5on Conclusion
Task lifecycle
5me
HTC user submits task
resource selec5on
Interac5ve user logs in
Task evic5on
resource selec5on
Computer reboot
resource selec5on
Interac5ve user logs in
…
resource selec5on
Task evic5on
Task evic5on
Task comple5on
5me
…
Good
Bad
‘Miscreant’ – has been evicted but don’t know if it’s good or bad
Mo5va5on
Mo5va5on Policy and Simula5on Conclusion
Mo5va5on • We have run a high-‐throughput cluster for ~6 years – Allowing many researchers to perform more work quicker
• University has strong desire to reduce energy consump5on and reduce CO2 produc5on – Currently powering down computer & buying low power PCs – “If a computer is not ‘working’ it should be powered down”
• Can we go further to reduce wasted energy? – Reduce 5me computers spend running work which does not complete
– Prevent re-‐submission of ‘bad’ jobs – Reduce the number of resubmissions for ‘good’ jobs
• Aims – Inves5gate policy for reducing energy consump5on – Determine the impact on high-‐throughput users
Mo5va5on
Mo5va5on Policy and Simula5on Conclusion
Can we fix the number of retries?
0 200 400 600 800 1000 1200 1400 1600 1800 20000
100
200
300
400
500
600
700
800
900
Number of evictions
Cum
ulat
ive
was
ted
seco
nds
(milli
ons)
Good JobsBad Jobs
• ~57 years of compu5ng 5me during 2010 • ~39 years of wasted 5me
– ~27 years for ‘bad’ tasks : average 45 retries : max 1946 retries – ~12 years for ‘good’ tasks : average 1.38 retries : max 360 retries
• 100% ‘good’ task comple5on -‐> 360 retries • S5ll wastes ~13 years on ‘bad’ tasks
– 95% ‘good’ task comple5on -‐> 3 retries : 9,808 good tasks killed (3.32%) – 99% ‘good’ task comple5on -‐> 6 retries : 2,022 good tasks killed (0.68%)
Mo5va5on
Mo5va5on Policy and Simula5on Conclusion
0 200 400 600 800 1000 1200 14000
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Idle time (minutes)
Cum
ulat
ive
free
seco
nds
Can we make tasks short enough? • Make tasks short enough to reduce miscreants • Average idle interval – 371 minutes • But to ensure availability of intervals – 95% : need to reduce 5me limit to 2 minutes
– 99% : need to reduce 5me limit to 1 minute
• Imprac5cal to make tasks this short
Mo5va5on
Mo5va5on Policy and Simula5on Conclusion
Cluster Simula5on
• High Level Simula5on of Condor – Trace logs from a twelve month period are used as input • User Logins / Logouts (computer used) • Condor Job Submission 5mes (‘good’/’bad’ and dura5on)
different rooms, each room having its own opening hours.These hours vary between clusters that are predominantly forteaching purposes and open during teaching hours (normally9am till 5pm) through to 24-hour access computer clusters.The location of clusters has a significant impact on through-put of interactive users. From clusters buried deep within aparticular school to those within busy thoroughfares such asthe University Library.
Computers within the clusters are replaced on a five yearrolling programme with computers falling into one of threebroad categories as outlined in Table I. PCs within a clusterare provisioned at the same time and will contain equivalentcomputing resources. Thus there is a wide variance betweenclusters within the University but no variance within them.
Whilst we expect casual use to migrate onto user ownedportable devices and virtual desktops, the demand forcompute/graphics intensive workstations running high-endsoftware is, if anything, increasing. Further, these high-end applications are unlikely to migrate to virtual desktopsor user owned devices due to hardware requirements andlicensing conditions, so we expect to need to maintain apool of hardware that will be useful for Condor for sometime.
PUE values have been assigned at the cluster level withvalues in the range of 0.9 to 1.4. These values have not beenempirically evaluated but used here to steer jobs. In mostcases the cluster rooms have a low enough computer densitynot to require cooling giving these clusters a PUE value of1.0. However, two clusters are located in rooms that requireair conditioning, giving these a PUE of 1.4. Likewise, fourclusters are based in a basement room, which is cold allyear round; hence computer heat is used to offset heatingrequirements for the room, giving a PUE value of 0.9.
By default computers within the cluster will enter thesleep state after a given interval of inactivity. This time willdepend on whether the cluster is open or not. During openhours computers will remain in the idle state for one hourbefore entering the sleep state whilst outside of these hoursthe idle interval before sleep is reduced to 15 minutes. Thispolicy (P2) was originally trialled under Windows XP wherethe time for computers to resume from the shutdown statewas considerable (sleep was an unreliable option for ourenvironment). Likewise the time interval before a Condorjob could start using a computer (M1) was set to be 15minutes during cluster opening hours and 0 minutes outside
Table I: Computer Types
Type Cores Speed Power ConsumptionActive Idle Sleep
Normal 2 ⇠3Ghz 57W 40W 2WHigh End 4 ⇠3Ghz 114W 67W 3WLegacy 2 ⇠2Ghz 100-180W 50-80W 4W
!"
#"
$"
%"
&"
'"
("
)"
*"
+"
!"#$
%&'(!)*!#
+'(!,)-.%+!/
'(!012!34
5)#+1%
0+6!
718'!
Figure 3: Interactive user logins showing seasonality
of opening hours. The latter was possible as computerswould only have their states changed at these times due toCondor waking them up or a scheduled reboot.
The simulation is based on trace logs generated frominteractive user logins and Condor execution logs for 2010.Figure 3 illustrates the interactive logins for this periodshowing the high degree of seasonality within the data. Itis easy to distinguish between week and weekends as wellas where the three terms lie along with the vacations. Thisrepresents 1,229,820 interactive uses of the computers.
Figure 4 depicts the profile for the 532,467 job submis-sions made to Condor during this period. As can be seenthe job submissions follow no clearly definable pattern. Notethat out of these submissions 131,909 were later killed by theoriginal Condor user. In order to simulate these killed jobsthe simulation assumes that these will be non-terminatingjobs and will keep on submitting them to resources until thetime at which the high-throughput user terminates them. Thegraph is clipped on Thursday 03/06/2010 as this date had93,000 job submissions.
For the simulations we will report on the total powerconsumed (in MWh) for the period. In order to determinethe effect on high-throughput users of a policy we will alsoreport the average overhead observed by jobs submitted toCondor (in seconds). Where overhead is defined to be theamount of time in excess of the execution duration of the job.Other statistics will be reported as appropriate for particular
!"!#$
!"#$
#$
#!$
#!!$!"#
$%&'(
)'*"$
#+,,+(-
,'."
-/&%/,'
012%'
Figure 4: Condor job submission profile
Ac5ve User / Condor
Idle
Sleep WOL
ZZZ
Cycle Stealing Users
Interactive Users
High-Throughput Management
ZZZ
Management policy
Policy and Simula5on
Mo5va5on Policy and Simula5on Conclusion
0 5 10 15 20 25 30101
102
103
104
105
Number of retries (n)
Num
ber o
f goo
d tasks
kille
d
N1 C1N1 C2N1 C3N2 C1N2 C2N2 C3N3 C1N3 C2N3 C3
n realloca5on policies • N1(n): Abandon task if deallocated n 5mes. • N2(n): Abandon task if deallocated n 5mes
ignoring interac5ve users. • N3(n): Abandon task if deallocated n 5mes
ignoring planned machine reboots. • C1: Tasks allocated to resources at random,
favouring awake resources • C2: Target less used computers (longer idle
5mes) • C3: Tasks are allocated to computers in clusters
with least amount of 5me used by interac5ve users
0 5 10 15 20 25 303
3.5
4
4.5
5
5.5
6x 106
Number of retries (n)
Ener
gy c
onsu
mpt
ion
(MW
h)
N1 C1N1 C2N1 C3N2 C1N2 C2N2 C3N3 C1N3 C2N3 C3
0 5 10 15 20 25 300
2
4
6
8
10
12x 105
Number of retries (n)
Ove
rhea
ds o
n al
l goo
d tasks
N1 C1N1 C2N1 C3N2 C1N2 C2N2 C3N3 C1N3 C2N3 C3
Policy and Simula5on
Mo5va5on Policy and Simula5on Conclusion
0 10 20 30 40 50 60 70 80 90 100101
102
103
104
105
Accrued time (hours)
Num
ber o
f goo
d tasks
kille
d
A1 C1A1 C2A1 C3A2 C1A2 C2A2 C3A3 C1A3 C2A3 C3
Accrued Time Policies • Impose a limit on cumula5ve
execu5on 5me for a task. • A1(t): Abandon if accrued 5me > t
and task deallocated. • A2(t): As A1, discoun5ng
interac5ve users.. • A3(t): As A1, discoun5ng reboots. 0 10 20 30 40 50 60 70 80 90 100
3.5
4
4.5
5
5.5
6x 106
Accrued time (hours)
Ener
gy c
onsu
mpt
ion
(MW
h)
A1 C1A1 C2A1 C3A2 C1A2 C2A2 C3A3 C1A3 C2A3 C3
0 10 20 30 40 50 60 70 80 90 1003
4
5
6
7
8
9
10
11
12x 105
Accrued time (hours)
Ove
rhea
ds o
n al
l goo
d tasks
A1 C1A1 C2A1 C3A2 C1A2 C2A2 C3A3 C1A3 C2A3 C3
Policy and Simula5on
Mo5va5on Policy and Simula5on Conclusion
Individual Time Policy • Impose a limit on individual
execu5on 5me for a task. – Nightly reboots bound this to 24 hours. – What is the impact of lowering this?
• I1(t): Abandon if individual 5me > t. 0 5 10 15 20 25 30
3.5
4
4.5
5
5.5
6x 106
Individual time (hours)
Ener
gy c
onsu
mpt
ion
(MW
h)
I1 C1I1 C2I1 C3
0 5 10 15 20 25100
101
102
103
104
105
Individual time (hours)
Num
ber o
f goo
d tasks
kille
d
I1 C1I1 C2I1 C3
0 5 10 15 20 25 303
4
5
6
7
8
9
10
11
12x 105
Individual time (hours)
Ove
rhea
ds o
n al
l goo
d tasks
I1 C1I1 C2I1 C3
Policy and Simula5on
Mo5va5on Policy and Simula5on Conclusion
20 40 60 80 100 120 140106
107
108
109
1010
1011
1012
Maximum execution duration (hours)
Ener
gy c
onsu
mpt
ion
(MW
h)
m=10, n=10m=10, n=20m=10, n=30m=20, n=10m=20, n=20m=20, n=30m=30, n=10m=30, n=20m=30, n=30m=40, n=10m=40, n=20m=40, n=30
Dedicated Resources D1(m,d): Miscreant tasks are permi@ed to con5nue execu5ng on a dedicated set of m resources (without interac5ve users or reboots), with a maximum dura5on d.
20 40 60 80 100 120 1400
2
4
6
8
10
12
14
16
Maximum execution duration (hours)
Num
ber o
f goo
d tasks
kille
d
m=10, n=10m=10, n=20m=10, n=30m=20, n=10m=20, n=20m=20, n=30m=30, n=10m=30, n=20m=30, n=30m=40, n=10m=40, n=20m=40, n=30
20 40 60 80 100 120 1401.12
1.14
1.16
1.18
1.2
1.22
1.24
1.26
1.28
1.3x 106
Maximum execution duration (hours)
Ove
rhea
ds o
n al
l goo
d tasks
m=10, n=10m=10, n=20m=10, n=30m=20, n=10m=20, n=20m=20, n=30m=30, n=10m=30, n=20m=30, n=30m=40, n=10m=40, n=20m=40, n=30
Policy and Simula5on
Mo5va5on Policy and Simula5on Conclusion
Conclusion
• Simple policies can be used to reduce the effect of miscreant tasks in a mul5-‐use cycle stealing cluster. – N2 (total evic5ons ignoring users)
• Order of magnitude reduc5on in energy consump5on – Reduce amount of effort wasted on tasks that will never complete
• Policies may be combined to achieve further improvements. – Adding in dedicated computers
Conclusion
Ques5ons?
[email protected] [email protected]
More info: McGough, A Stephen; Forshaw, Ma@hew; Gerrard, Clive; Wheater, Stuart; Reducing the Number of Miscreant Tasks ExecuBons in a MulB-‐use Cluster, Cloud and Green Compu5ng (CGC), 2012