htcondor admin tutorial - · pdf file[ 4 ] htcondor differences and implications •...
TRANSCRIPT
HTCONDOR ADMIN TUTORIAL
Greg Thain Center for High Throughput Computing
University of Wisconsin – Madison [email protected]
© 2015 Internet2
[ 2 ]
Overview
• What makes HTCondor different? • Nitty gritty of HTCondor configuration?
[ 3 ]
HTCondor differences and implications
• Goal: Run jobs on as many machines as possible* • -> implies heterogeneity:
• Generational: different speeds, memory • OS Distribution and type (Linux/Windows) • HTCondor versions • Installed software base // installed datasets • Geography: building, campus, • Temporal
[ 4 ]
HTCondor differences and implications
• Goal: Run jobs on as many machines as possible* • -> implies scalability and reliability:
• Worker machines can come and go And there can be a lot of them
• Horizontal scalability Many independent submit nodes
• No single point of failure • HTCondor job is “like money in the bank”
[ 5 ]
Requires Networking • Not just fibers and packets and routers • Community building:
– Breaking out of silos • Asymmetric relationships
[ 6 ]
HTCondor history • Cycle scavenging on desktops
– Implies temporal disruptions – Delegating ownerships temporarily
• Not just desktop scavenging anymore – But same Model applies at different scales..
[ 7 ]
The Lehigh Story • Professor needed more compute power • Access to a small cluster not sufficient
• The power of one lunch…
• Computing demand outstrips capacity
[ 8 ]
Condominium cluster • Several groups pool money for big buy:
– Chemistry, physics, biology each buy 500 cores – When sufficient demand, each get 500 cores – When less demand, can get more than 500
• Same model as delegated ownership
[ 9 ]
HPC Backfill • HPC / MPI implies low utilization
– HTCondor can backfill HPC Cluster – Same model as delegated ownership
• HPC offload – Many HPC workloads really HTC, clogging up – Make HPC Admins happy
[ 10 ]
Cloud annexation • EC2 “spot instances”
– Delegated ownership yet again
• Can build pool in cloud (Cycle computing)
• Or can expand into the cloud
[ 11 ]
OSG and Glidein • Running everywhere -> running w/o root
• HTCondor worker nodes can be jobs
• This is how OSG works
[ 12 ]
Can Mix and Match • HTCondor at Wisconsin:
– Condo clusters – Desktops / Kiosks – HPC backfill – OSG glidein
• Any user can use all of these seamlessly
[ 13 ]
Implications of delegated Ownership • Separation of needs of owners vs. users
– Separate Unix processes – As admin, decide where policy really lies
• When conflict, who wins?
[ 14 ]
Speeds and Feeds, briefly • Thousands of machines, thousands of job
– Out of the box • 100,000 cores, 100,000 jobs
– With careful • Dozens of submit points
[ 15 ]
Networks and Notworks • No infiniband needed • Can work around firewalls
– But you need to tell us where they are • All transfers are queued • Great way to stress-test NAT boxes • Experimental support for SDNs
The nitty gritty… • HTCondor Daemons
& Job Startup • Configuration Files • Security, briefly • Policy Expressions
– Startd (Machine) – Negotiator
• Priorities • Useful Tools • Log Files • Debugging Jobs
16
Job Startup
17
Execute Machine Submit Machine
submit
schedd
starter Job shadow
startd
Central Manager
collector nego9ator
Q
J
S
Q
S
J
J S
J J S S
master
master master
• CONDOR_CONFIG environment variable, /etc/condor/condor_config, ~condor/condor_config
• All settings can be in this one file • Might want to share between all machines (NFS,
automated copies, Wallaby, etc)
Configuration File
18
• LOCAL_CONFIG_FILE – Comma separated, processed in order LOCAL_CONFIG_FILE = \
/var/condor/config.local,\ /var/condor/policy.local,\ /shared/condor/config.$(HOSTNAME),\
/shared/condor/config.$(OPSYS) • LOCAL_CONFIG_DIR
LOCAL_CONFIG_DIR = \
/var/condor/config.d/,\
/var/condor/$(OPSYS).d/
Other Configuration Files
19
# I’m a comment! CREATE_CORE_FILES=TRUE MAX_JOBS_RUNNING = 50 # HTCondor ignores case: log=/var/log/condor # Long entries: collector_host=condor.cs.wisc.edu,\ secondary.cs.wisc.edu
Configuration File Syntax
20
• You reference other macros (settings) with: – A = $(B) – SCHEDD = $(SBIN)/condor_schedd
• Can create additional macros for organizational purposes
Configuration File Macros
21
• Can append to macros: A=abc
A=$(A),def • Don’t let macros recursively define each other! A=$(B)
B=$(A)
Configuration File Macros
22
• Later macros in a file overwrite earlier ones – B will evaluate to 2: A=1
B=$(A)
A=2
Configuration File Macros
23
• These are simple replacement macros • Put parentheses around expressions TEN=5+5
HUNDRED=$(TEN)*$(TEN) • HUNDRED becomes 5+5*5+5 or 35!
TEN=(5+5)
HUNDRED=($(TEN)*$(TEN)) • ((5+5)*(5+5)) = 100
Macros and Expressions Gotcha
24
Security, briefly
• “Padlock” by Peter Ford © 2005
• Licensed under the Creative Commons Attribution 2.0 license
• http://www.flickr.com/photos/peterf/72583027/
• http://www.webcitation.org/5XIiBcsUg
HTCondor Security • Strong authentication
of users and daemons • Encryption over the
network • Integrity checking over
the network
“locks-‐masterlocks.jpg” by Brian De Smet, © 2005 Used with permission. hKp://www.fief.org/sysadmin/blosxom.cgi/2005/07/21#locks
26
Minimal Security Settings
• You must set ALLOW_WRITE, or nothing works
• Simplest setting: ALLOW_WRITE=*
– Extremely insecure! • A bit better: ALLOW_WRITE= \ *.cs.wisc.edu
“Bank Security Guard” by “Brad & Sabrina” © 2006 Licensed under the CreaXve Commons AKribuXon 2.0 license hKp://www.flickr.com/photos/madaboutshanghai/184665954/ hKp://www.webcitaXon.org/5XIhUAfuY
27
• Chapter 3.6, “Security,” in the HTCondor Manual • [email protected] • Can authenticate with
– IP – SSL – Shared Password – Windows – Kerberos
More on Security
“Zach Miller” by Alan De Smet
Policy
• “Don't even think about it” by Kat “tyger_lyllie” © 2005
• Licensed under the Creative Commons Attribution 2.0 license
• http://www.flickr.com/photos/tyger_lyllie/59207292/
• http://www.webcitation.org/5XIh5mYGS
• Who gets to run jobs, when?
Policy
30
• Specified in condor_config – Ends up slot ClassAd
• Policy evaluates both a slot ClassAd and a job ClassAd together – Policy can reference items in either ClassAd (See manual for list)
• Can reference condor_config macros: $(MACRONAME)
Policy Expressions
31
• Machine – An individual computer, managed by one startd
• Slot – A place to run a job, managed by one starter. – A machine may have many slots – Partionable slots create more slots on the fly
• The start advertises each slot – The ClassAd is a “Machine” ad for historical reasons
Slots vs Machines
32
• START • RANK • SUSPEND • CONTINUE • PREEMPT • KILL
Slot Policy Expressions
33
• START is the primary policy • When FALSE the slot enters the Owner state and will not
run jobs • Acts as the Requirements expression for the slot, the job
must satisfy START – Can reference job ClassAd values including Owner and ImageSize
START
34
• Indicates which jobs a slot prefers – Jobs can also specify a rank
• Floating point number – Larger numbers are higher ranked – Typically evaluate attributes in the Job ClassAd – Typically use + instead of &&
RANK
35
• Often used to give priority to owner of a particular group of machines
• Claimed slots still advertise looking for higher ranked job to preempt the current job – RANK causes preemption!
RANK
36
• When SUSPEND becomes true, the job is suspended • When CONTINUE becomes true a suspended job is
released
SUSPEND and CONTINUE
37 “DSC03753” by Eva Schiffer © 2008 Used with permission hKp://www.digitalchangeling.com/pictures/ourCats2008/january2008/DSC03753.html
• When PREEMPT becomes true, the job will be politely shut down – Vanilla universe jobs get SIGTERM
• Or user requested signal
– Standard universe jobs checkpoint • When KILL becomes true, the job is SIGKILLed
– Checkpointing is aborted if started
PREEMPT and KILL
38
Minimal / Default Settings
• Always runs jobs START = True RANK = 0 SUSPEND = False CONTINUE = True PREEMPT = False KILL = False
39
Policy Configuration
• I am adding nodes to the Cluster… but the Chemistry Department has priority on these nodes
40
• Prefer Chemistry jobs START = True RANK = Department == "Chemistry" SUSPEND = False CONTINUE = True PREEMPT = False KILL = False
New Settings for the Chemistry nodes
41
• Prefix an entry with “+” to add to job ClassAd Executable = charm-run
Universe = standard
+Department = "Chemistry"
queue
Submit file with Custom Attribute
42
START = True
RANK = Department =?= "Chemistry"
SUSPEND = False
CONTINUE = True
PREEMPT = False
KILL = False
What if “Department” not specified?
43
• Give the machine’s owners (adesmet and roy) highest priority, followed by the Chemistry department, followed by the Physics department, followed by everyone else. – Can use automatic Owner attribute in job attribute to identify
adesmet and roy
More Complex RANK
44
IsOwner = (Owner == "adesmet" \
|| Owner == "roy")
IsChem =(Department =?= "Chemistry")
IsPhys =(Department =?= "Physics")
RANK = $(IsOwner)*20 + $(IsChem)*10 \
+ $(IsPhys)
More Complex RANK
45
Policy Configuration • I have an unhealthy fixation with PBS so… kill jobs
after 12 hours, except Physics jobs get 24 hours.
46
“I R BIZNESS CAT” by “VMOS” © 2007 Licensed under the CreaXve Commons AKribuXon 2.0 license hKp://www.flickr.com/photos/vmos/2078227291/ hKp://www.webcitaXon.org/5XIff1deZ
• CurrentTime – Current time, in Unix epoch time (seconds since midnight Jan 1,
1970) • EnteredCurrentActivity
– When did HTCondor enter the current activity, in Unix epoch time
Useful Attributes
47
ActivityTimer = \ (CurrentTime - EnteredCurrentActivity) HOUR = (60*60) HALFDAY = ($(HOUR)*12) FULLDAY = ($(HOUR)*24) PREEMPT = \ ($(IsPhys) && ($(ActivityTimer) > $FULLDAY)) \ || \ (!$(IsPhys) && ($(ActivityTimer) > $HALFDAY)) KILL = $(PREEMPT)
Configuration
48
Policy Configuration • The cluster is okay, but... HTCondor can only use
the desktops when they would otherwise be idle
49
“I R BIZNESS CAT” by “VMOS” © 2007 Licensed under the Creative Commons Attribution 2.0 license http://www.flickr.com/photos/vmos/2078227291/ http://www.webcitation.org/5XIff1deZ
• One possible definition: – No keyboard or mouse activity for 5 minutes – Load average below 0.3
Defining Idle
50
• START jobs when the machine becomes idle • SUSPEND jobs as soon as activity is detected • PREEMPT jobs if the activity continues for 5 minutes or
more • KILL jobs if they take more than 5 minutes to preempt
Desktops should
51
• LoadAvg – Current load average
• CondorLoadAvg – Current load average generated by HTCondor
• KeyboardIdle – Seconds since last keyboard or mouse activity
Useful Attributes
52
NonCondorLoadAvg = (LoadAvg - CondorLoadAvg) BgndLoad = 0.3 CPU_Busy = ($(NonCondorLoadAvg) >= $(BgndLoad)) CPU_Idle = (!$(CPU_Busy)) KeyboardBusy = (KeyboardIdle < 10) KeyboardIsIdle = (KeyboardIdle > 300) MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))
Macros in Configuration Files
53
START = $(CPU_Idle) && $(KeyboardIsIdle) SUSPEND = $(MachineBusy) CONTINUE = $(CPU_Idle) && KeyboardIdle > 120 PREEMPT = (Activity == "Suspended") && \ $(ActivityTimer) > 300 KILL = $(ActivityTimer) > 300
Desktop Machine Policy
54
55 Slot States
Slot Activities
SecXon 3.5: Policy ConfiguraXon for the condor_startd)
• Can add attributes to a slot’s ClassAd, typically done in the local configuration file INSTRUCTIONAL=TRUE
NETWORK_SPEED=1000
STARTD_EXPRS=INSTRUCTIONAL, NETWORK_SPEED
Custom Slot Attributes
57
• Jobs can now specify Rank and Requirements using new attributes: Requirements = INSTRUCTIONAL=!=TRUE
Rank = NETWORK_SPEED • Dynamic attributes are available; see STARTD_CRON_*
in the manual
Custom Slot Attributes
58
• For further information, see section 3.5 “Policy Configuration for the condor_startd” in the HTCondor manual
• htcondor-users mailing list http://research.cs.wisc.edu/htcondor/mail-lists/
Further Machine Policy Information
59
Priorities
• “IMG_2476” by “Joanne and Matt” © 2006 Licensed under the Creative Commons Attribution 2.0 license
• http://www.flickr.com/photos/joanne_matt/97737986/ http://www.webcitation.org/5XIieCxq4
• Set in submit file • Users can set priority of their own jobs • Integers, larger numbers are higher priority • Only impacts order between jobs for a single user on a
single schedd • A tool for users to sort their own jobs
Job Priority
61
• Determines allocation of machines to waiting users • View with condor_userprio • Inversely related to machines allocated (lower is better
priority) – A user with priority of 10 will be able to claim twice as many
machines as a user with priority 20
User Priority
62
• Effective User Priority is determined by multiplying two components – Real Priority – Priority Factor
User Priority
63
• Based on actual usage • Defaults to 0.5 • Approaches actual number of machines used over time
– Configuration setting PRIORITY_HALFLIFE
Real Priority
64
• Assigned by administrator – Set with condor_userprio
• Defaults to 1000 (DEFAULT_PRIO_FACTOR)
Priority Factor
65
Negotiator Policy Expressions
• PREEMPTION_REQUIREMENTS and PREEMPTION_RANK
• Evaluated when condor_negotiator considers replacing a lower priority job with a higher priority job
• Completely unrelated to the PREEMPT expression
66
• If false will not preempt machine – Typically used to avoid pool thrashing – Typically use:
• RemoteUserPrio – Priority of user of currently running job (higher is worse)
• SubmittorPrio – Priority of user of higher priority idle job (higher is worse)
• PREEMPTION_REQUIREMENTS=FALSE
PREEMPTION_REQUIREMENTS
67
• Only replace jobs running for at least one hour and 20% lower priority
StateTimer = \ (CurrentTime – EnteredCurrentState) HOUR = (60*60) PREEMPTION_REQUIREMENTS = \ $(StateTimer) > (1 * $(HOUR)) \ && RemoteUserPrio > SubmittorPrio * 1.2
PREEMPTION_REQUIREMENTS
68
• Picks which already claimed machine to reclaim • Strongly prefer preempting jobs with a large (bad) priority
and a small image size PREEMPTION_RANK = \ (RemoteUserPrio * 1000000)\ - ImageSize
PREEMPTION_RANK
69
• Manage priorities across groups of users and jobs • Can guarantee minimum numbers of computers for
groups (quotas) • Supports hierarchies • Anyone can join any group
Accounting Groups
70
Condor Tools
71
• Find current configuration values % condor_config_val MASTER_LOG /var/condor/logs/MasterLog % cd `condor_config_val LOG`
condor_config_val
72
• Can identify source % condor_config_val –v CONDOR_HOST CONDOR_HOST: condor.cs.wisc.edu Defined in ‘/etc/condor_config.hosts’, line 6
condor_config_val -v
73
• What configuration files are being used? % condor_config_val –config Config source: /var/home/condor/condor_config Local config sources: /unsup/condor/etc/condor_config.hosts /unsup/condor/etc/condor_config.global /unsup/condor/etc/condor_config.policy /unsup/condor-test/etc/hosts/puffin.local
condor_config_val -config
74
• Retrieve logs remotely condor_fetchlog beak.cs.wisc.edu Master
condor_fetchlog
75
• condor_status • condor_q
Checking the current status
76
• Queries the collector for information about daemons in your pool
• Defaults to finding condor_startds • condor_status –schedd summarizes all job queues • condor_status –master returns list of all condor_masters
Querying daemons condor_status
77
• -long displays the full ClassAd • Optionally specify a machine name to limit results to a
single host condor_status –l node4.cs.wisc.edu • Do not use in scripts/programs
condor_status
78
• Only return ClassAds that match an expression you specify
• Show me idle slots with 1GB or more memory
– condor_status -constraint 'Memory >= 1024 && Activity == "Idle"'
condor_status -constraint
79
• Report only fields you request • Census of systems in your pool: > condor_status -af Activity OpSys Arch | sort | uniq -c
56 Busy LINUX X86_64 35 Idle LINUX INTEL 1515 Idle LINUX X86_64 369 Idle WINDOWS X86_64 31 Retiring LINUX X86_64
condor_status -autoformat
80
• Separate by tabs, commas, spaces, newlines • Label each field by name • Escape as a ClassAd value • Add headers • Several easy to parse options
condor_status -autoformat
81
condor_status -format • Like autoformat, but with manual
formatting • Useful for writing simple reports • Uses C printf style formats
– One field per argument
82
% condor_status -format '%-10s ' Activity -format '%-7s ' OpSys -format '%s\n' Arch | sort | uniq -c
54 Busy LINUX X86_64 35 Idle LINUX INTEL 1513 Idle LINUX X86_64 369 Idle WINDOWS X86_64 31 Retiring LINUX X86_64
condor_status -format
83
• View the job queue • The -long option is useful to see the entire ClassAd for
a given job • supports –constraint, -autoformat, and -format
• Can view job queues on remote machines with the -name option
Examining Queues condor_q
84
• Why isn't this job running? default • On this machine? -machine • What does this machine hate my job? -better-analyse:reverse • General reports -analyze:sum -analyze:sum,rev
condor_q -analyze and -better-analyze
85
Log Files
• “Ready for the Winter” by Anna “bcmom” © 2005 Licensed under the Creative Commons Attribution 2.0 license
• http://www.flickr.com/photos/bcmom/59207805/ http://www.webcitation.org/5XIhRO8L8
• HTCondor maintains one log file per daemon
• Can increase verbosity of logs on a per daemon basis – SHADOW_DEBUG, SCHEDD_DEBUG, and others – Space separated list
HTCondor’s Log Files
87
• D_FULLDEBUG dramatically increases information logged – Does not include other debug levels!
• D_COMMAND adds information about about commands received SHADOW_DEBUG = D_FULLDEBUG D_COMMAND
Useful Debug Levels
88
• Log files are automatically rolled over when a size limit is reached – Only one old version is kept – Defaults to 10 megabytes – Rolls over quickly with D_FULLDEBUG – MAX_DEFAULT_LOG – Also per daemon settings
• MAX_SHADOW_LOG, MAX_SCHEDD_LOG, and others
Log Rotation
89
• Many log files entries primarily useful to HTCondor developers – Especially if D_FULLDEBUG is on – Minor errors are often logged but corrected – Take them with a grain of salt – [email protected]
HTCondor’s Log Files
90
Debugging Jobs
• “Wanna buy a Beetle?” by “Kevin” © 2006 Licensed under the Creative Commons Attribution 2.0 license
• http://www.flickr.com/photos/kevincollins/89538633/ http://www.webcitation.org/5XIiMyhpp
• Examine the job with condor_q – especially the very powerful –analyze and -better-analyze
Debugging Jobs: condor_q
92
• Examine the job’s user log – Can find with:
condor_q -af UserLog 17.0 – Set with “log” in the submit file – You can set EVENT_LOG to get a unified log for all jobs under a schedd
• Contains the life history of the job • Often contains details on problems
Debugging Jobs: User Log
93
• Examine ShadowLog on the submit machine – Note any machines the job tried to execute on – There is often an “ERROR” entry that can give a good indication of
what failed
Debugging Jobs: ShadowLog
94
• No ShadowLog entries? Possible problem matching the job. – Examine ScheddLog on the submit machine – Examine NegotiatorLog on the central manager
Debugging Jobs: Matching Problems
95
• ShadowLog entries suggest an error but aren’t specific? – Examine StartLog and StarterLog on the execute machine
Debugging Jobs: Remote Problems
96
• HTCondor logs will note the job ID each entry is for – Useful if multiple jobs are being processed simultaneously – grepping for the job ID will make it easy to find relevant entries
• Occasionally HTCondor doesn't know yet…
Debugging Jobs: Reading Log Files
97
• If necessary add “D_FULLDEBUG D_COMMAND” to DEBUG_DAEMONNAME setting for additional log information
• Increase MAX_DAEMONNAME_LOG if logs are rolling over too quickly
• If all else fails, email us – [email protected]
Debugging Jobs: What Next?
98
What now?
99
• Dealing with job runtimes can be interesting
• MaxJobRetirementTime = expression • Startd-side parameter • Guarantees minimum job runtime • MaxJobRetirementTime = 3600 • MaxJobRetirementTime = \ 300 + 3600 * (Owner == “BigWig”)
HTCondor doesn’t have queues (really)
100
Multicore example Machine:
• 8 cores • 8 Gigabytes memory • 2 disks
The old way (Still the default)
• $ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
[email protected] LINUX X86_64 Unclaimed Idle 0.110 1024 0+00:45:04 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:05 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:06 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:07 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:08 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:09 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:10 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:03 Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 8 0 0 8 0 0 0 Total 8 0 0 8 0 0 0
Problem: Job Image Size
Simple solution: Static non-uniform memory
# condor_config NUM_SLOTS_TYPE_1 = 1 NUM_SLOTS_TYPE_2 = 7 SLOT_TYPE_1 = mem=4096 SLOT_TYPE_2 = mem=auto
Result is $ condor_status Name OpSys Arch State Activity LoadAv Mem [email protected] LINUX X86_64 Unclaimed Idle 0.150 4096 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585
Better, but not good..
4Gb Slot 1Gb 1Gb 1Gb 8 Gb machine partitioned into 5 slots
4 Gb Job 1Gb 1Gb
4Gb Slot 1Gb 1Gb 1Gb 8 Gb machine partitioned into 5 slots
1Gb 1Gb 1Gb
4Gb Slot 1Gb 1Gb 1Gb 8 Gb machine partitioned into 5 slots
1Gb
4 Gb Job 7 Gb free, but idle job
Partitionable slots: The big idea • One “partionable” slot • From which “dynamic” slots are made • When dynamic slot exit, merged back into “partionable” • Split happens at claim time
3 types of slots
• Static (e.g. the usual kind) • Partitionable (e.g. leftovers) • Dynamic (usableable ones)
– Dynamically created – But once created, static
8 Gb Partitionable slot 4Gb
8 Gb Partitionable slot 5Gb
How to configure
NUM_SLOTS = 1 NUM_SLOTS_TYPE_1 = 1 SLOT_TYPE_1 = cpus=100% SLOT_TYPE_1_PARTITIONABLE = true
Looks like
• $ condor_status
• Name OpSys Arch State Activity LoadAv Mem ActvtyTime
• slot1@c LINUX X86_64 Unclaimed Idle 0.110 8192
• Total Owner Claimed Unclaimed Matched • X86_64/LINUX 1 0 0 1 0 • Total 1 0 0 1 0
When running
• $ condor_status
• Name OpSys Arch State Activity LoadAv Mem ActvtyTime
• slot1@c LINUX X86_64 Unclaimed Idle 0.110 4096 • slot1_1@c LINUX X86_64 Claimed Busy 0.000 1024 • slot1_2@c LINUX X86_64 Claimed Busy 0.000 2048 • slot1_3@c LINUX X86_64 Claimed Busy 0.000 1024
•
Fragmentation: still not perfect
Name OpSys Arch State Activity LoadAv Mem
slot1@c LINUX X86_64 Unclaimed Idle 0.110 4096 slot1_1@c LINUX X86_64 Claimed Busy 0.000 2048 slot1_2@c LINUX X86_64 Claimed Busy 0.000 1024 slot1_3@c LINUX X86_64 Claimed Busy 0.000 1024
Now I submit a job that needs 8G – what happens?
Solution: New Daemon
• condor-defrag • One daemon defrags whole pool
• Central manager good place to run
• Scan pool,try to fully defrag some startds • Only looks at partitionable machines • Admin picks some % of pool that can be “whole”
Oh, we got knobs…
DEFRAG_DRAINING_MACHINES_PER_HOUR default is 0
DEFRAG_MAX_WHOLE_MACHINES default is -1
DEFRAG_SCHEDULE • graceful (obey MaxJobRetirementTime, default)
• quick (obey MachineMaxVacateTime)
• fast (hard-killed immediately)
Defrag vs. Preemption
• Defrag can be general purpose – Looks only at startds, not at demand – Can also preempt non-partitionable slots
• (if so configured)
• Negotiator preemption looks at 2 jobs
• Between schedd and remote central manager
• On schedd, set – FLOCK_TO = NAME_OF_REMOTE_MACHINE
• On remote CM set – FLOCK_FROM = NAME_OF_REMOTE_SCHEDD
Federating pools with flocking
120
• Where to go for more info – Talk to us! – [email protected] – Come to HTCondorWeek! – http://htcondorproject.org – 800+ page Condor manual
Thank you!
121
HTCONDOR USER TUTORIAL
Greg Thain Center for High Throughput Computing
University of Wisconsin – Madison [email protected]
© 2015 Internet2