jaime frey computer sciences department university of wisconsin-madison what’s new in condor-g
DESCRIPTION
What Is Condor-G › Use Condor to run jobs on the Grid › Uses Globus Toolkit GRAM (submit a remote job) GASS (transfer job’s files) › Two components Globus Universe GlideInTRANSCRIPT
![Page 1: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/1.jpg)
Jaime FreyComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
What’s New in Condor-G
![Page 2: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/2.jpg)
www.cs.wisc.edu/condor
Outline› What is Condor-G› Released New Features› In Development
![Page 3: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/3.jpg)
www.cs.wisc.edu/condor
What Is Condor-G› Use Condor to run jobs on the Grid› Uses Globus Toolkit
GRAM (submit a remote job) GASS (transfer job’s files)
› Two components Globus Universe GlideIn
![Page 4: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/4.jpg)
www.cs.wisc.edu/condor
Globus Universe› Run a job on a Grid resource› Features
Job management Fault tolerance Credential management
› Roughly equivalent to the vanilla universe
![Page 5: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/5.jpg)
www.cs.wisc.edu/condor
How It Works
Schedd
LSF
Condor-G Grid Resource
![Page 6: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/6.jpg)
www.cs.wisc.edu/condor
How It Works
Schedd
LSF
Condor-G Grid Resource
600 Globusjobs
![Page 7: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/7.jpg)
www.cs.wisc.edu/condor
How It Works
Schedd
LSF
Condor-G Grid Resource
GridManager
600 Globusjobs
![Page 8: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/8.jpg)
www.cs.wisc.edu/condor
How It Works
Schedd JobManager
LSF
Condor-G Grid Resource
GridManager
600 Globusjobs
![Page 9: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/9.jpg)
www.cs.wisc.edu/condor
How It Works
Schedd JobManager
LSF
User Job
Condor-G Grid Resource
GridManager
600 Globusjobs
![Page 10: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/10.jpg)
www.cs.wisc.edu/condor
GlideIn› Run the Condor daemons on Grid
resources as user jobs› Create your own personal Condor pool
from temporarily-acquired Grid resources
› Brings the full power of Condor to the Grid
![Page 11: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/11.jpg)
www.cs.wisc.edu/condor
Globus Grid
PBS LSF
Condor
Condor-G
![Page 12: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/12.jpg)
www.cs.wisc.edu/condor
Globus Grid
PBS LSF
Condor
600 Condorjobs
Condor-G
![Page 13: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/13.jpg)
www.cs.wisc.edu/condor
Condor-G
Globus Grid
PBS LSF
Condor
600 Condorjobs
![Page 14: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/14.jpg)
www.cs.wisc.edu/condor
Condor-G
Globus Grid
PBS LSF
Condor glide-ins
600 Condorjobs
![Page 15: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/15.jpg)
www.cs.wisc.edu/condor
Condor-G
Globus Grid
PBS LSF
Condor glide-ins
600 Condorjobs
![Page 16: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/16.jpg)
www.cs.wisc.edu/condor
Condor-G
Globus Grid
PBS LSF
Condor glide-ins
600 Condorjobs
![Page 17: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/17.jpg)
www.cs.wisc.edu/condor
Condor-G
Globus Grid
PBS LSF
Condor glide-ins
600 Condorjobs
![Page 18: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/18.jpg)
www.cs.wisc.edu/condor
Released New Features› Stuff we’ve added in the past year› Released and ready for use in
Condor 6.6
![Page 19: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/19.jpg)
www.cs.wisc.edu/condor
Globus ASCII Helper Protocol (GAHP)
› Encapsulates Globus libraries in separate process
› Simple ASCII protocol› Easy for legacy applications to use
Globus when they can’t link directly with the libraries
![Page 20: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/20.jpg)
www.cs.wisc.edu/condor
How It Works - GAHP
Schedd JobManager
Condor-G Grid Resources
GridManager
JobManager
JobManagerGAHP Client
GAHP Server
![Page 21: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/21.jpg)
www.cs.wisc.edu/condor
File Staging› Arbitrary input and output files can
be staged to and from execution site
› Same syntax as other universes› Limitation
Output files must be explicitly named
![Page 22: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/22.jpg)
www.cs.wisc.edu/condor
File Staging (cont)› Input, Output, and Error can be
URLs Files will be transferred directly to
and from execution site› Output and Error can be staged or
streamed
![Page 23: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/23.jpg)
www.cs.wisc.edu/condor
Credential Refresh› Renewed credentials are used by
Condor-G and forwarded to the execution site automatically
› No processes need to be restarted
![Page 24: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/24.jpg)
www.cs.wisc.edu/condor
Better Credential Management
› One GridManager process can handle multiple credential files with same subject
› More efficient when you want to have different credential lifetimes for different jobs
![Page 25: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/25.jpg)
www.cs.wisc.edu/condor
Grid Match-Making› Globus jobs matched with Globus
resources by the Condor match-maker using ClassAds
› Current limitation User/admin must create resources
ads
![Page 26: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/26.jpg)
www.cs.wisc.edu/condor
Fault Tolerance› Condor-G does its best to automatically
recover from failures› User can guide decisions with job policy
expressions Periodic Release GlobusResubmit Rematch
![Page 27: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/27.jpg)
www.cs.wisc.edu/condor
PeriodicRelease Expression
› Condor-G puts problematic jobs on hold
› This expression tells Condor-G when to release and retry such jobs
![Page 28: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/28.jpg)
www.cs.wisc.edu/condor
GlobusResubmit Expression
› Tells Condor-G when a problematic job submission should be abandoned
› When this expression becomes true Best effort is made to clean up current
job submission New job submission is attempted
![Page 29: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/29.jpg)
www.cs.wisc.edu/condor
Rematch Expression› Tells Condor-G when a problematic
resource should be abandoned› Evaluated when GlobusResubmit
evaluates to true› When this expression becomes true
Best effort is made to clean up current job submission
Job is rematched
![Page 30: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/30.jpg)
www.cs.wisc.edu/condor
Job Ad ExampleGlobusContactString = TARGET.gatekeeper_urlRequirements = TARGET.Arch == “LINUX” &&
TARGET.OpSys == “LINUX”Rank = TARGET.MflopsPeriodicRelease = ((NumMatches < 10) &&
((CurrentTime-EnteredCurrentStatus) > 600))GlobusResubmit = NumSystemHolds >= NumMatchesRematch = True
![Page 31: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/31.jpg)
www.cs.wisc.edu/condor
Hardening› Regular testing on the CMS testbed
with real applications› Many bugs and integration issues
found and fixed Hostile Environment
![Page 32: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/32.jpg)
www.cs.wisc.edu/condor
Hostile Environment› Full disks› Machine crashes› File server lock-ups› Network outages› Power outages
![Page 33: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/33.jpg)
www.cs.wisc.edu/condor
One CMS Dataset Run› 300 jobs› Last fall
~50 (16%) of the jobs stalled and required human recovery
Multiple service restarts (20 daemon crashes over 6 hours)
› Now 0 jobs stalled 0 service restarts
![Page 34: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/34.jpg)
www.cs.wisc.edu/condor
Integration Work› Dozens of Condor-G improvements
and bug fixes› Over 40 Globus “bugzilla”
incidents, many with patches Globus 2.2.4 has 21 “Advisories” as of
4/11/04› Use latest version of both
![Page 35: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/35.jpg)
www.cs.wisc.edu/condor
Scalability› Submitting several hundred jobs
produced high load on server Machine became unresponsive We saw a load average of 1000 at
one point› Caused Globus JobManager
processes
![Page 36: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/36.jpg)
www.cs.wisc.edu/condor
Grid Manager Monitor Agent
› New tool Condor-G can use to reduce this load
› Efficient job status polling program› Allows Condor-G to shut down
JobManager processes when they’re not needed
![Page 37: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/37.jpg)
www.cs.wisc.edu/condor
Load Reduced› 400 jobs (/bin/sleep 900)› Without Grid Monitor
42 hours to complete Peak load average of 610
› With Grid Monitor 40 minutes Peak load average of 104
![Page 38: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/38.jpg)
www.cs.wisc.edu/condor
Miscellaneous Stuff› Email notification on job
completion› Port range restrictions› Problem jobs put on hold
![Page 39: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/39.jpg)
www.cs.wisc.edu/condor
In Development› Stuff we’re currently working on› Will be released sometime in the
next year
![Page 40: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/40.jpg)
www.cs.wisc.edu/condor
Job Policy Expressions› PeriodicHold› PeriodicRemove› OnExitHold› OnExitRemove
![Page 41: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/41.jpg)
www.cs.wisc.edu/condor
Improved GlideIn› MDS use optional
User specifies necessary information› Automatic setup
GlideIn job transfers and installs binaries if needed
Binaries can come from submit machine
![Page 42: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/42.jpg)
www.cs.wisc.edu/condor
New Job Types› Submit jobs directly to other
schedulers (not through Globus)› Why?
Richer interface semantics Not supported by Globus
![Page 43: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/43.jpg)
www.cs.wisc.edu/condor
NorduGrid› Grid batch system designed by
Nordic countries› Globus GRAM didn’t offer
necessary semantics Client control of file staging Automatic cleanup of abandoned jobs
![Page 44: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/44.jpg)
www.cs.wisc.edu/condor
Oracle› Oracle DBMS supports a job queue
Run this query in 5 hours Run this query every Monday
› Condor can add more management features
![Page 45: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/45.jpg)
www.cs.wisc.edu/condor
Generic Job Interface› Re-arrange GridManager to allow
easy addition of new job types› Define appropriate interface› Plug-ins for new job types?
![Page 46: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/46.jpg)
www.cs.wisc.edu/condor
Globus Toolkit 3.0› OGSA (Open Grid Services
Architecture)› Submit jobs to GT3 sites› Grid Service client interface to
Condor-G
![Page 47: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/47.jpg)
www.cs.wisc.edu/condor
Miscellaneous› Condor-G for Windows› MyProxy credential management› URLs for executable, staged files
![Page 48: Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G](https://reader033.vdocuments.site/reader033/viewer/2022052708/5a4d1b5d7f8b9ab0599abb3d/html5/thumbnails/48.jpg)
www.cs.wisc.edu/condor
Thank You!› Questions?› Also…
Condor-G & Globus Q/A session• Wednesday, 9am-12pm, room TBA
E-mail [email protected]