glideinwms architecture - glideinwms training jan 2012
DESCRIPTION
An overview of the glideinWMS Architecture. Part of the glideinWMS Training session held in Jan 2012 at UCSD.TRANSCRIPT
UCSD Jan 17th 2012 glideinWMS architecture 1
glideinWMS training @ UCSD
glideinWMS architectureby Igor Sfiligoi (UCSD)
UCSD Jan 17th 2012 glideinWMS architecture 2
Outline
● A high level overview of the glideinWMS
● Description of the components
UCSD Jan 17th 2012 glideinWMS architecture 3
glideinWMS
glideinWMSfrom 10k feet
UCSD Jan 17th 2012 glideinWMS architecture 4
Refresher - Condor
● A Condor pool is composed of 3 pieces
Collector
Negotiator
Central manager
Submit node
Schedd
Execution node
Startd
Job
Execution node
Execution node
Execution node
Execution node
Submit node
Submit node
UCSD Jan 17th 2012 glideinWMS architecture 5
What is a glidein?
● A glidein is just a properly configured execution node submitted as a Grid job
Collector
Negotiator
Central manager
Submit node
Schedd
Execution node
Startd
Job
Submit node
Submit node
glidein
Execution nodeglidein
Execution nodeglidein
Execution nodeglidein
UCSD Jan 17th 2012 glideinWMS architecture 6
What is glideinWMS?
● glideinWMS is an automated tool for submitting glideins on demand
Collector
Negotiator
Central manager
Submit node
Schedd
Execution node
Startd
Job
Submit node
Submit node
glideinWMS
GlobusGlobus
CREAMExecution nodeglidein
Execution nodeglidein
Execution nodeglidein
glidein
UCSD Jan 17th 2012 glideinWMS architecture 7
glideinWMS architecture
● glideinWMS has 3 logical pieces
Factory node
Condor
Factory
Frontend node
Frontend
Globus
CREAM
Submit node
Submit node
Submit node
Central manager
Execution nodeglidein
Execution nodeglidein
Worker node
glidein_startup
Startd
MonitorCondor
Requestglideins
Submitglideins
ConfigureCondor G.N.
Match
Frontend domain
UCSD Jan 17th 2012 glideinWMS architecture 8
Site selection logic and job monitoring
Grid knowledge and troubleshooting
glideinWMS architecture
● glideinWMS has 3 logical pieces● glidein_startup – Configures and starts
Condor execution daemons
● Factory – Knows about the sites and does the submission
● Frontend – Knows about user jobs and requests glideins
Runtime environmentdiscovery and validation
UCSD Jan 17th 2012 glideinWMS architecture 9
Cardinality
● N-to-M relationship● Each Frontend can talk to many Factories● Each Factory may serve many Frontends
Startd
Glidein Factory
ScheddUser job
Collector
Negotiator
VO Frontend
StartdUser job
ScheddCollector
Negotiator
VO Frontend
StartdUser jobGlidein Factory
UCSD Jan 17th 2012 glideinWMS architecture 10
Many operators
● Factory and Frontend are usually operated by different people
● Frontends VO specific● Operated by VO admins● Each sets policies for its users
● Factories generic● Do not need to be affiliated with any group● Factory ops main task is Grid monitoring and
troubleshooting
UCSD Jan 17th 2012 glideinWMS architecture 11
glideinWMS
A (sort of) detailed view of
glidein_startup
UCSD Jan 17th 2012 glideinWMS architecture 12
Refresher – glideinWMS arch.
● glidein_startup configures and starts Condor
Factory node
Condor
Factory
Frontend node
Frontend
Globus
CREAM
Submit node
Submit node
Submit node
Central manager
Execution nodeglidein
Execution nodeglidein
Worker node
glidein_startup
Startd
MonitorCondor
Requestglideins
Submitglideins
ConfigureCondor G.N.
Match
UCSD Jan 17th 2012 glideinWMS architecture 13
glidein_startup tasks
● Validate node (environment)● Download Condor binaries● Configure Condor● Start Condor daemon(s)● Collect post-mortem monitoring info● Cleanup
Performed by plugins
UCSD Jan 17th 2012 glideinWMS architecture 14
glidein_startup plugins
● Config files and scripts loaded via HTTP● From both the factory and the frontend Web servers● Can use local Web proxy (e.g. Squid)● Mechanism tamper proof and cache coherent
glidein_startup
● Load files from factory Web
● Load files from frontend Web
● Run executables● Start Condor● Cleanup
StartdFrontend node
HTTPd
Factory node
HTTPdS
qu
id
UCSD Jan 17th 2012 glideinWMS architecture 15
glidein_startup scripts
● Standard plugins● Basic Grid node validation (certs, disk space, etc.)● Setup Condor (glexec, CCB, etc.)
● VO provided plugins● Optional, but can be anything● CMS@UCSD checks for CMS SW
● Factory admin can also provide them● Details about the plugins can be found at
http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html
UCSD Jan 17th 2012 glideinWMS architecture 16
glideinWMS
A (sort of) detailed view of the
glidein factory
UCSD Jan 17th 2012 glideinWMS architecture 17
Refresher – glideinWMS arch.
● The factory knowns about the grid and submits glideins
Factory node
Condor
Factory
Frontend node
Frontend
Globus
CREAM
Submit node
Submit node
Central manager
Execution nodeglidein
Execution nodeglidein
Worker node
glidein_startup
Startd
MonitorCondor
Requestglideins
Submitglideins
ConfigureCondor G.N.
Match
UCSD Jan 17th 2012 glideinWMS architecture 18
Glidein factory
● Glidein factory knows how to contact sites● List in a local config● Only trusted and tested sites should be included
● For each site (called entry)● Contact info (Node, grid type, jobmanager)● Site config (startup dir, glexec, OS type, …)● VOs supported● Other attributes (Site name, closest SE, ...)
● Admin maintained, periodically compared to BDIIhttp://tinyurl.com/glideinWMS/doc.prd/factory/configuration.html
UCSD Jan 17th 2012 glideinWMS architecture 19
Glidein factory role
● The glidein factory is just a slave● The frontend(s) tell it how many glideins
to submit where● Once the glideins start to run, they report to
the VO collector and the factory is not involved
● The communication is based on ClassAds● The factory has a Collector for this purpose
Factory node
Collector
Factory
Frontend node
Frontend
UCSD Jan 17th 2012 glideinWMS architecture 20
Factory collector
● The factory collector handles all communication
Factory node
Collector
Factory
Frontend node
Frontend
EntryEntry Entry
Spawn
Frontend node
Frontend
...
.
.
.
Advertiseentry
Find sites
Requestglideins
Retrieveorders
http://tinyurl.com/glideinWMS/doc.prd/factory/design_data_exchange.html
UCSD Jan 17th 2012 glideinWMS architecture 21
Frontend node
Frontend
Frontends
● The factory admin decides which Frontends to serve● Valid proxy
with known DN needed to talk to the collector
● Factory config has furtherfine grained controls Factory node
Collector
Factory
Frontend node
Frontend
UCSD Jan 17th 2012 glideinWMS architecture 22
glidein
glideinFactory node
Glidein submission
● The glidein factory (entry) usesCondor-G to submit glideins● Condor-G does the heavy lifting● The factory just monitors the progress
Entry
ScheddEntry
Schedd
.
.
.
.
.
.
Submit
Monitor
Submit
Monitor
Globus
CREAM
glidein
glidein
UCSD Jan 17th 2012 glideinWMS architecture 23
Credentials/Proxy
● Proxy typically provided by the frontend● Although the factory can provide a default one (rarely used)
● Proxy delivered encrypted in the ClassAd● Factory (entry) provides the encryption key (PKI)
● Proxy stored on disk● Each VO mapped to a different UID
Factory node
Collector
Entry
Frontend node
Frontend
Get key
Deliver proxy(encrypted)
Schedd
UCSD Jan 17th 2012 glideinWMS architecture 24
glideinWMS
A (sort of) detailed view of the
VO frontend
UCSD Jan 17th 2012 glideinWMS architecture 25
Factory node
Condor
Factory
Frontend node
Frontend
Globus
CREAM
Submit node
Submit node
Central manager
Execution nodeglidein
Execution nodeglidein
Worker node
glidein_startup
Startd
MonitorCondor
Requestglideins
Submitglideins
ConfigureCondor G.N.
Match
Refresher – glideinWMS arch.
● The frontend monitors the user Condor pool,does the matchmaking and requests glideins
Frontend domain
UCSD Jan 17th 2012 glideinWMS architecture 26
VO frontend
● The VO frontend is the brain of a glideinWMS-based pool● Like a site-level “negotiator”
Factory node
Frontend node
Frontend
Submit node
Submit node
Central manager
MonitorCondor
Requestglideins
Match
VO domain Findidle jobs
Findentries
Match
Requestglideins
UCSD Jan 17th 2012 glideinWMS architecture 27
Collector
Negotiator
Central manager
Submit node
Schedd
Execution node
Startd
Job
Factory
GlobusGlobus
CREAMExecution nodeglidein
Execution nodeglidein
Execution nodeglidein
glidein
Frontend
Two level matchmaking
● The frontend triggers glidein submission● The “regular” negotiator matches jobs to glideins
UCSD Jan 17th 2012 glideinWMS architecture 28
Frontend logic
● The glideinWMS glidein request logicis based on the principle on “constant pressure”● Frontend requests a certain number of
“idle glideins” in the factory queue at all times● It does not request a specific number of glideins
● This is done due to the asynchronous nature of the system● Both the factory and the frontend are
in a polling loop and talk to each other indirectly
UCSD Jan 17th 2012 glideinWMS architecture 29
Frontend logic
● Frontend matches job attrs against entry attrs● It then counts the matched idle jobs● A fraction of this number becomes the
“pressure requests” (up to 1/3)
● The matchmaking expression is defined by the frontend admin● Not the user● Debatable if it is better or worse, but it does reduce
frontend code complexity
UCSD Jan 17th 2012 glideinWMS architecture 30
● The frontend owns the “glidein proxy”● And delegates it to the factory(s)
when requesting glideins● Must keep it valid at all times
(usually at OS level)
● The VO frontend can (and should) provide VO specific validation scripts‑
● The VO frontend can (and should) set the glidein start expression● Used by the VO negotiator for final matchmaking
Frontend config
UCSD Jan 17th 2012 glideinWMS architecture 31
glideinWMS
And the
summary
UCSD Jan 17th 2012 glideinWMS architecture 32
Summary
● Glideins are just properly configured Condor execute nodes submitted as Grid jobs
● The glideinWMS is a mechanism to automate glidein submission
● The glideinWMS is composed of three logical entities, two being actual services:● Glidein factories know about the Grid● VO frontend know about the users and
drive the factories
UCSD Jan 17th 2012 glideinWMS architecture 33
Pointers
● glideinWMS development team is reachable [email protected]
● The official project Web page ishttp://tinyurl.com/glideinWMS
● CMS frontend at UCSDhttp://glidein-collector.t2.ucsd.edu:8319/vofrontend/monitor/frontend_UCSD-v5_2/frontendStatus.html
● OSG glidein factory at UCSDhttp://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactoryhttp://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html
UCSD Jan 17th 2012 glideinWMS architecture 34
Acknowledgments
● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI
● The glideinWMS factory operations at UCSD is sponsored by OSG
● The funding comes from NSF, DOE and the UC system