htcondor admin tutorial - · pdf file[ 4 ] htcondor differences and implications •...

HTCONDOR ADMIN TUTORIAL

Greg Thain Center for High Throughput Computing

University of Wisconsin – Madison [email protected]

© 2015 Internet2

[ 2 ]

Overview

•  What makes HTCondor different? •  Nitty gritty of HTCondor configuration?

[ 3 ]

HTCondor differences and implications

•  Goal: Run jobs on as many machines as possible* •  -> implies heterogeneity:

•  Generational: different speeds, memory •  OS Distribution and type (Linux/Windows) •  HTCondor versions •  Installed software base // installed datasets •  Geography: building, campus, •  Temporal

[ 4 ]

HTCondor differences and implications

•  Goal: Run jobs on as many machines as possible* •  -> implies scalability and reliability:

•  Worker machines can come and go And there can be a lot of them

•  Horizontal scalability Many independent submit nodes

•  No single point of failure •  HTCondor job is “like money in the bank”

[ 5 ]

Requires Networking • Not just fibers and packets and routers • Community building:

– Breaking out of silos • Asymmetric relationships

[ 6 ]

HTCondor history • Cycle scavenging on desktops

– Implies temporal disruptions – Delegating ownerships temporarily

• Not just desktop scavenging anymore – But same Model applies at different scales..

[ 7 ]

The Lehigh Story • Professor needed more compute power • Access to a small cluster not sufficient

• The power of one lunch…

• Computing demand outstrips capacity

[ 8 ]

Condominium cluster • Several groups pool money for big buy:

– Chemistry, physics, biology each buy 500 cores – When sufficient demand, each get 500 cores – When less demand, can get more than 500

• Same model as delegated ownership

[ 9 ]

HPC Backfill • HPC / MPI implies low utilization

– HTCondor can backfill HPC Cluster – Same model as delegated ownership

• HPC offload – Many HPC workloads really HTC, clogging up – Make HPC Admins happy

[ 10 ]

Cloud annexation • EC2 “spot instances”

– Delegated ownership yet again

• Can build pool in cloud (Cycle computing)

• Or can expand into the cloud

[ 11 ]

OSG and Glidein • Running everywhere -> running w/o root

• HTCondor worker nodes can be jobs

• This is how OSG works

[ 12 ]

Can Mix and Match • HTCondor at Wisconsin:

– Condo clusters – Desktops / Kiosks – HPC backfill – OSG glidein

• Any user can use all of these seamlessly

[ 13 ]

Implications of delegated Ownership • Separation of needs of owners vs. users

– Separate Unix processes – As admin, decide where policy really lies

• When conflict, who wins?

[ 14 ]

Speeds and Feeds, briefly • Thousands of machines, thousands of job

– Out of the box • 100,000 cores, 100,000 jobs

– With careful • Dozens of submit points

[ 15 ]

Networks and Notworks • No infiniband needed • Can work around firewalls

– But you need to tell us where they are • All transfers are queued • Great way to stress-test NAT boxes • Experimental support for SDNs

The nitty gritty… • HTCondor Daemons

& Job Startup • Configuration Files • Security, briefly • Policy Expressions

– Startd (Machine) – Negotiator

• Priorities • Useful Tools • Log Files • Debugging Jobs

16

Job Startup

17

Execute Machine Submit Machine

submit

schedd

starter Job shadow

startd

Central Manager

collector nego9ator

Q

J

S

Q

S

J

J S

J J S S

master

master master

•  CONDOR_CONFIG environment variable, /etc/condor/condor_config, ~condor/condor_config

•  All settings can be in this one file •  Might want to share between all machines (NFS,

automated copies, Wallaby, etc)

Configuration File

18

•  LOCAL_CONFIG_FILE – Comma separated, processed in order LOCAL_CONFIG_FILE = \

/var/condor/config.local,\ /var/condor/policy.local,\ /shared/condor/config.$(HOSTNAME),\

/shared/condor/config.$(OPSYS) •  LOCAL_CONFIG_DIR

LOCAL_CONFIG_DIR = \

/var/condor/config.d/,\

/var/condor/$(OPSYS).d/

Other Configuration Files

19

# I’m a comment! CREATE_CORE_FILES=TRUE MAX_JOBS_RUNNING = 50 # HTCondor ignores case: log=/var/log/condor # Long entries: collector_host=condor.cs.wisc.edu,\ secondary.cs.wisc.edu

Configuration File Syntax

20

•  You reference other macros (settings) with: –  A = $(B) –  SCHEDD = $(SBIN)/condor_schedd

•  Can create additional macros for organizational purposes

Configuration File Macros

21

•  Can append to macros: A=abc

A=$(A),def •  Don’t let macros recursively define each other! A=$(B)

B=$(A)


22

•  Later macros in a file overwrite earlier ones –  B will evaluate to 2: A=1

B=$(A)

A=2


23

•  These are simple replacement macros •  Put parentheses around expressions TEN=5+5

HUNDRED=$(TEN)*$(TEN) •  HUNDRED becomes 5+5*5+5 or 35!

TEN=(5+5)

HUNDRED=($(TEN)*$(TEN)) •  ((5+5)*(5+5)) = 100

Macros and Expressions Gotcha

24

Security, briefly

•  “Padlock” by Peter Ford © 2005

•  Licensed under the Creative Commons Attribution 2.0 license

•  http://www.flickr.com/photos/peterf/72583027/

•  http://www.webcitation.org/5XIiBcsUg

HTCondor Security • Strong authentication

of users and daemons • Encryption over the

network •  Integrity checking over

the network

“locks-‐masterlocks.jpg” by Brian De Smet, © 2005 Used with permission. hKp://www.fief.org/sysadmin/blosxom.cgi/2005/07/21#locks

26

Minimal Security Settings

• You must set ALLOW_WRITE, or nothing works

• Simplest setting: ALLOW_WRITE=*

– Extremely insecure! • A bit better: ALLOW_WRITE= \ *.cs.wisc.edu

“Bank Security Guard” by “Brad & Sabrina” © 2006 Licensed under the CreaXve Commons AKribuXon 2.0 license hKp://www.flickr.com/photos/madaboutshanghai/184665954/ hKp://www.webcitaXon.org/5XIhUAfuY

27

•  Chapter 3.6, “Security,” in the HTCondor Manual •  [email protected] •  Can authenticate with

–  IP –  SSL –  Shared Password – Windows –  Kerberos

More on Security

“Zach Miller” by Alan De Smet

Policy

•  “Don't even think about it” by Kat “tyger_lyllie” © 2005

•  Licensed under the Creative Commons Attribution 2.0 license

•  http://www.flickr.com/photos/tyger_lyllie/59207292/

•  http://www.webcitation.org/5XIh5mYGS

• Who gets to run jobs, when?

Policy

30

•  Specified in condor_config –  Ends up slot ClassAd

•  Policy evaluates both a slot ClassAd and a job ClassAd together –  Policy can reference items in either ClassAd (See manual for list)

•  Can reference condor_config macros: $(MACRONAME)

Policy Expressions

31

•  Machine – An individual computer, managed by one startd

•  Slot – A place to run a job, managed by one starter. –  A machine may have many slots –  Partionable slots create more slots on the fly

•  The start advertises each slot –  The ClassAd is a “Machine” ad for historical reasons

Slots vs Machines

32

•  START •  RANK •  SUSPEND •  CONTINUE •  PREEMPT •  KILL

Slot Policy Expressions

33

•  START is the primary policy •  When FALSE the slot enters the Owner state and will not

run jobs •  Acts as the Requirements expression for the slot, the job

must satisfy START – Can reference job ClassAd values including Owner and ImageSize

START

34

•  Indicates which jobs a slot prefers –  Jobs can also specify a rank

•  Floating point number –  Larger numbers are higher ranked –  Typically evaluate attributes in the Job ClassAd –  Typically use + instead of &&

RANK

35

•  Often used to give priority to owner of a particular group of machines

•  Claimed slots still advertise looking for higher ranked job to preempt the current job – RANK causes preemption!

RANK

36

•  When SUSPEND becomes true, the job is suspended •  When CONTINUE becomes true a suspended job is

released

SUSPEND and CONTINUE

37 “DSC03753” by Eva Schiffer © 2008 Used with permission hKp://www.digitalchangeling.com/pictures/ourCats2008/january2008/DSC03753.html

•  When PREEMPT becomes true, the job will be politely shut down –  Vanilla universe jobs get SIGTERM

•  Or user requested signal

–  Standard universe jobs checkpoint •  When KILL becomes true, the job is SIGKILLed

– Checkpointing is aborted if started

PREEMPT and KILL

38

Minimal / Default Settings

•  Always runs jobs START = True RANK = 0 SUSPEND = False CONTINUE = True PREEMPT = False KILL = False

39

Policy Configuration

•  I am adding nodes to the Cluster… but the Chemistry Department has priority on these nodes

40

•  Prefer Chemistry jobs START = True RANK = Department == "Chemistry" SUSPEND = False CONTINUE = True PREEMPT = False KILL = False

New Settings for the Chemistry nodes

41

•  Prefix an entry with “+” to add to job ClassAd Executable = charm-run

Universe = standard

+Department = "Chemistry"

queue

Submit file with Custom Attribute

42

START = True

RANK = Department =?= "Chemistry"

SUSPEND = False

CONTINUE = True

PREEMPT = False

KILL = False

What if “Department” not specified?

43

•  Give the machine’s owners (adesmet and roy) highest priority, followed by the Chemistry department, followed by the Physics department, followed by everyone else. – Can use automatic Owner attribute in job attribute to identify

adesmet and roy

More Complex RANK

44

IsOwner = (Owner == "adesmet" \

|| Owner == "roy")

IsChem =(Department =?= "Chemistry")

IsPhys =(Department =?= "Physics")

RANK = $(IsOwner)*20 + $(IsChem)*10 \

+ $(IsPhys)

More Complex RANK

45

Policy Configuration •  I have an unhealthy fixation with PBS so… kill jobs

after 12 hours, except Physics jobs get 24 hours.

46

“I R BIZNESS CAT” by “VMOS” © 2007 Licensed under the CreaXve Commons AKribuXon 2.0 license hKp://www.flickr.com/photos/vmos/2078227291/ hKp://www.webcitaXon.org/5XIff1deZ

•  CurrentTime – Current time, in Unix epoch time (seconds since midnight Jan 1,

1970) •  EnteredCurrentActivity

– When did HTCondor enter the current activity, in Unix epoch time

Useful Attributes

47

ActivityTimer = \ (CurrentTime - EnteredCurrentActivity) HOUR = (60*60) HALFDAY = ($(HOUR)*12) FULLDAY = ($(HOUR)*24) PREEMPT = \ ($(IsPhys) && ($(ActivityTimer) > $FULLDAY)) \ || \ (!$(IsPhys) && ($(ActivityTimer) > $HALFDAY)) KILL = $(PREEMPT)

Configuration

48

Policy Configuration •  The cluster is okay, but... HTCondor can only use

the desktops when they would otherwise be idle

49

“I R BIZNESS CAT” by “VMOS” © 2007 Licensed under the Creative Commons Attribution 2.0 license http://www.flickr.com/photos/vmos/2078227291/ http://www.webcitation.org/5XIff1deZ

•  One possible definition: – No keyboard or mouse activity for 5 minutes –  Load average below 0.3

Defining Idle

50

•  START jobs when the machine becomes idle •  SUSPEND jobs as soon as activity is detected •  PREEMPT jobs if the activity continues for 5 minutes or

more •  KILL jobs if they take more than 5 minutes to preempt

Desktops should

51

•  LoadAvg – Current load average

•  CondorLoadAvg – Current load average generated by HTCondor

•  KeyboardIdle –  Seconds since last keyboard or mouse activity

Useful Attributes

52

NonCondorLoadAvg = (LoadAvg - CondorLoadAvg) BgndLoad = 0.3 CPU_Busy = ($(NonCondorLoadAvg) >= $(BgndLoad)) CPU_Idle = (!$(CPU_Busy)) KeyboardBusy = (KeyboardIdle < 10) KeyboardIsIdle = (KeyboardIdle > 300) MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))

Macros in Configuration Files

53

START = $(CPU_Idle) && $(KeyboardIsIdle) SUSPEND = $(MachineBusy) CONTINUE = $(CPU_Idle) && KeyboardIdle > 120 PREEMPT = (Activity == "Suspended") && \ $(ActivityTimer) > 300 KILL = $(ActivityTimer) > 300

Desktop Machine Policy

54

55 Slot States

Slot Activities

SecXon 3.5: Policy ConfiguraXon for the condor_startd)

•  Can add attributes to a slot’s ClassAd, typically done in the local configuration file INSTRUCTIONAL=TRUE

NETWORK_SPEED=1000

STARTD_EXPRS=INSTRUCTIONAL, NETWORK_SPEED

Custom Slot Attributes

57

•  Jobs can now specify Rank and Requirements using new attributes: Requirements = INSTRUCTIONAL=!=TRUE

Rank = NETWORK_SPEED •  Dynamic attributes are available; see STARTD_CRON_*

in the manual

Custom Slot Attributes

58

•  For further information, see section 3.5 “Policy Configuration for the condor_startd” in the HTCondor manual

•  htcondor-users mailing list http://research.cs.wisc.edu/htcondor/mail-lists/

•  [email protected]

Further Machine Policy Information

59

Priorities

•  “IMG_2476” by “Joanne and Matt” © 2006 Licensed under the Creative Commons Attribution 2.0 license

•  http://www.flickr.com/photos/joanne_matt/97737986/ http://www.webcitation.org/5XIieCxq4

•  Set in submit file •  Users can set priority of their own jobs •  Integers, larger numbers are higher priority •  Only impacts order between jobs for a single user on a

single schedd •  A tool for users to sort their own jobs

Job Priority

61

•  Determines allocation of machines to waiting users •  View with condor_userprio •  Inversely related to machines allocated (lower is better

priority) –  A user with priority of 10 will be able to claim twice as many

machines as a user with priority 20

User Priority

62

•  Effective User Priority is determined by multiplying two components – Real Priority –  Priority Factor

User Priority

63

•  Based on actual usage •  Defaults to 0.5 •  Approaches actual number of machines used over time

– Configuration setting PRIORITY_HALFLIFE

Real Priority

64

•  Assigned by administrator –  Set with condor_userprio

•  Defaults to 1000 (DEFAULT_PRIO_FACTOR)

Priority Factor

65

Negotiator Policy Expressions

•  PREEMPTION_REQUIREMENTS and PREEMPTION_RANK

•  Evaluated when condor_negotiator considers replacing a lower priority job with a higher priority job

•  Completely unrelated to the PREEMPT expression

66

•  If false will not preempt machine –  Typically used to avoid pool thrashing –  Typically use:

•  RemoteUserPrio – Priority of user of currently running job (higher is worse)

•  SubmittorPrio – Priority of user of higher priority idle job (higher is worse)

•  PREEMPTION_REQUIREMENTS=FALSE

PREEMPTION_REQUIREMENTS

67

•  Only replace jobs running for at least one hour and 20% lower priority

StateTimer = \ (CurrentTime – EnteredCurrentState) HOUR = (60*60) PREEMPTION_REQUIREMENTS = \ $(StateTimer) > (1 * $(HOUR)) \ && RemoteUserPrio > SubmittorPrio * 1.2

PREEMPTION_REQUIREMENTS

68

•  Picks which already claimed machine to reclaim •  Strongly prefer preempting jobs with a large (bad) priority

and a small image size PREEMPTION_RANK = \ (RemoteUserPrio * 1000000)\ - ImageSize

PREEMPTION_RANK

69

•  Manage priorities across groups of users and jobs •  Can guarantee minimum numbers of computers for

groups (quotas) •  Supports hierarchies •  Anyone can join any group

Accounting Groups

70

Condor Tools

71

•  Find current configuration values % condor_config_val MASTER_LOG /var/condor/logs/MasterLog % cd `condor_config_val LOG`

condor_config_val

72

• Can identify source % condor_config_val –v CONDOR_HOST CONDOR_HOST: condor.cs.wisc.edu Defined in ‘/etc/condor_config.hosts’, line 6

condor_config_val -v

73

• What configuration files are being used? % condor_config_val –config Config source: /var/home/condor/condor_config Local config sources: /unsup/condor/etc/condor_config.hosts /unsup/condor/etc/condor_config.global /unsup/condor/etc/condor_config.policy /unsup/condor-test/etc/hosts/puffin.local

condor_config_val -config

74

•  Retrieve logs remotely condor_fetchlog beak.cs.wisc.edu Master

condor_fetchlog

75

•  condor_status •  condor_q

Checking the current status

76

•  Queries the collector for information about daemons in your pool

•  Defaults to finding condor_startds •  condor_status –schedd summarizes all job queues •  condor_status –master returns list of all condor_masters

Querying daemons condor_status

77

•  -long displays the full ClassAd •  Optionally specify a machine name to limit results to a

single host condor_status –l node4.cs.wisc.edu •  Do not use in scripts/programs

condor_status

78

•  Only return ClassAds that match an expression you specify

•  Show me idle slots with 1GB or more memory

– condor_status -constraint 'Memory >= 1024 && Activity == "Idle"'

condor_status -constraint

79

•  Report only fields you request •  Census of systems in your pool: > condor_status -af Activity OpSys Arch | sort | uniq -c

56 Busy LINUX X86_64 35 Idle LINUX INTEL 1515 Idle LINUX X86_64 369 Idle WINDOWS X86_64 31 Retiring LINUX X86_64

condor_status -autoformat

80

•  Separate by tabs, commas, spaces, newlines •  Label each field by name •  Escape as a ClassAd value •  Add headers •  Several easy to parse options

condor_status -autoformat

81

condor_status -format • Like autoformat, but with manual

formatting • Useful for writing simple reports • Uses C printf style formats

– One field per argument

82

% condor_status -format '%-10s ' Activity -format '%-7s ' OpSys -format '%s\n' Arch | sort | uniq -c

54 Busy LINUX X86_64 35 Idle LINUX INTEL 1513 Idle LINUX X86_64 369 Idle WINDOWS X86_64 31 Retiring LINUX X86_64

condor_status -format

83

•  View the job queue •  The -long option is useful to see the entire ClassAd for

a given job •  supports –constraint, -autoformat, and -format

•  Can view job queues on remote machines with the -name option

Examining Queues condor_q

84

•  Why isn't this job running? default •  On this machine? -machine •  What does this machine hate my job? -better-analyse:reverse •  General reports -analyze:sum -analyze:sum,rev

condor_q -analyze and -better-analyze

85

Log Files

•  “Ready for the Winter” by Anna “bcmom” © 2005 Licensed under the Creative Commons Attribution 2.0 license

•  http://www.flickr.com/photos/bcmom/59207805/ http://www.webcitation.org/5XIhRO8L8

• HTCondor maintains one log file per daemon

•  Can increase verbosity of logs on a per daemon basis –  SHADOW_DEBUG, SCHEDD_DEBUG, and others –  Space separated list

HTCondor’s Log Files

87

•  D_FULLDEBUG dramatically increases information logged – Does not include other debug levels!

•  D_COMMAND adds information about about commands received SHADOW_DEBUG = D_FULLDEBUG D_COMMAND

Useful Debug Levels

88

•  Log files are automatically rolled over when a size limit is reached – Only one old version is kept – Defaults to 10 megabytes – Rolls over quickly with D_FULLDEBUG –  MAX_DEFAULT_LOG –  Also per daemon settings

•  MAX_SHADOW_LOG, MAX_SCHEDD_LOG, and others

Log Rotation

89

•  Many log files entries primarily useful to HTCondor developers –  Especially if D_FULLDEBUG is on – Minor errors are often logged but corrected –  Take them with a grain of salt –  [email protected]

HTCondor’s Log Files

90

Debugging Jobs

•  “Wanna buy a Beetle?” by “Kevin” © 2006 Licensed under the Creative Commons Attribution 2.0 license

•  http://www.flickr.com/photos/kevincollins/89538633/ http://www.webcitation.org/5XIiMyhpp

•  Examine the job with condor_q –  especially the very powerful –analyze and -better-analyze

Debugging Jobs: condor_q

92

•  Examine the job’s user log – Can find with:

condor_q -af UserLog 17.0 –  Set with “log” in the submit file –  You can set EVENT_LOG to get a unified log for all jobs under a schedd

•  Contains the life history of the job •  Often contains details on problems

Debugging Jobs: User Log

93

•  Examine ShadowLog on the submit machine – Note any machines the job tried to execute on –  There is often an “ERROR” entry that can give a good indication of

what failed

Debugging Jobs: ShadowLog

94

•  No ShadowLog entries? Possible problem matching the job. –  Examine ScheddLog on the submit machine –  Examine NegotiatorLog on the central manager

Debugging Jobs: Matching Problems

95

•  ShadowLog entries suggest an error but aren’t specific? –  Examine StartLog and StarterLog on the execute machine

Debugging Jobs: Remote Problems

96

•  HTCondor logs will note the job ID each entry is for – Useful if multiple jobs are being processed simultaneously –  grepping for the job ID will make it easy to find relevant entries

•  Occasionally HTCondor doesn't know yet…

Debugging Jobs: Reading Log Files

97

•  If necessary add “D_FULLDEBUG D_COMMAND” to DEBUG_DAEMONNAME setting for additional log information

•  Increase MAX_DAEMONNAME_LOG if logs are rolling over too quickly

•  If all else fails, email us –  [email protected]

Debugging Jobs: What Next?

98

What now?

99

•  Dealing with job runtimes can be interesting

•  MaxJobRetirementTime = expression •  Startd-side parameter •  Guarantees minimum job runtime •  MaxJobRetirementTime = 3600 •  MaxJobRetirementTime = \ 300 + 3600 * (Owner == “BigWig”)

HTCondor doesn’t have queues (really)

100

Multicore example Machine:

•  8 cores •  8 Gigabytes memory •  2 disks

The old way (Still the default)

•  $ condor_status

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

[email protected] LINUX X86_64 Unclaimed Idle 0.110 1024 0+00:45:04 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:05 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:06 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:07 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:08 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:09 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:10 [email protected] LINUX X86_64 Unclaimed Idle 0.000 1024 0+00:45:03 Total Owner Claimed Unclaimed Matched Preempting Backfill X86_64/LINUX 8 0 0 8 0 0 0 Total 8 0 0 8 0 0 0

Problem: Job Image Size

Simple solution: Static non-uniform memory

# condor_config NUM_SLOTS_TYPE_1 = 1 NUM_SLOTS_TYPE_2 = 7 SLOT_TYPE_1 = mem=4096 SLOT_TYPE_2 = mem=auto

Result is $ condor_status Name OpSys Arch State Activity LoadAv Mem [email protected] LINUX X86_64 Unclaimed Idle 0.150 4096 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585 [email protected] LINUX X86_64 Unclaimed Idle 0.000 585

Better, but not good..

4Gb Slot 1Gb 1Gb 1Gb 8 Gb machine partitioned into 5 slots

4 Gb Job 1Gb 1Gb


1Gb 1Gb 1Gb


1Gb

4 Gb Job 7 Gb free, but idle job

Partitionable slots: The big idea •  One “partionable” slot •  From which “dynamic” slots are made •  When dynamic slot exit, merged back into “partionable” •  Split happens at claim time

3 types of slots

•  Static (e.g. the usual kind) •  Partitionable (e.g. leftovers) •  Dynamic (usableable ones)

– Dynamically created –  But once created, static

8 Gb Partitionable slot 4Gb

8 Gb Partitionable slot 5Gb

How to configure

NUM_SLOTS = 1 NUM_SLOTS_TYPE_1 = 1 SLOT_TYPE_1 = cpus=100% SLOT_TYPE_1_PARTITIONABLE = true

Looks like


•  Name OpSys Arch State Activity LoadAv Mem ActvtyTime

•  slot1@c LINUX X86_64 Unclaimed Idle 0.110 8192

•  Total Owner Claimed Unclaimed Matched •  X86_64/LINUX 1 0 0 1 0 •  Total 1 0 0 1 0

When running


•  Name OpSys Arch State Activity LoadAv Mem ActvtyTime

•  slot1@c LINUX X86_64 Unclaimed Idle 0.110 4096 •  slot1_1@c LINUX X86_64 Claimed Busy 0.000 1024 •  slot1_2@c LINUX X86_64 Claimed Busy 0.000 2048 •  slot1_3@c LINUX X86_64 Claimed Busy 0.000 1024

• 

Fragmentation: still not perfect

Name OpSys Arch State Activity LoadAv Mem

slot1@c LINUX X86_64 Unclaimed Idle 0.110 4096 slot1_1@c LINUX X86_64 Claimed Busy 0.000 2048 slot1_2@c LINUX X86_64 Claimed Busy 0.000 1024 slot1_3@c LINUX X86_64 Claimed Busy 0.000 1024

Now I submit a job that needs 8G – what happens?

Solution: New Daemon

•  condor-defrag •  One daemon defrags whole pool

•  Central manager good place to run

•  Scan pool,try to fully defrag some startds •  Only looks at partitionable machines •  Admin picks some % of pool that can be “whole”

Oh, we got knobs…

DEFRAG_DRAINING_MACHINES_PER_HOUR default is 0

DEFRAG_MAX_WHOLE_MACHINES default is -1

DEFRAG_SCHEDULE •  graceful (obey MaxJobRetirementTime, default)

•  quick (obey MachineMaxVacateTime)

•  fast (hard-killed immediately)

Defrag vs. Preemption

•  Defrag can be general purpose –  Looks only at startds, not at demand – Can also preempt non-partitionable slots

•  (if so configured)

•  Negotiator preemption looks at 2 jobs

•  Between schedd and remote central manager

•  On schedd, set –  FLOCK_TO = NAME_OF_REMOTE_MACHINE

•  On remote CM set –  FLOCK_FROM = NAME_OF_REMOTE_SCHEDD

Federating pools with flocking

120

•  Where to go for more info –  Talk to us! –  [email protected] – Come to HTCondorWeek! –  http://htcondorproject.org –  800+ page Condor manual

Thank you!

121

HTCONDOR USER TUTORIAL

Greg Thain Center for High Throughput Computing

University of Wisconsin – Madison [email protected]

© 2015 Internet2

htcondor admin tutorial - · pdf file[ 4 ] htcondor differences and implications •...

Documents