glideinwms architecture - glideinwms training jan 2012

34
UCSD Jan 17th 2012 glideinWMS architecture 1 glideinWMS training @ UCSD glideinWMS architecture by Igor Sfiligoi (UCSD)

Upload: igor-sfiligoi

Post on 15-Jan-2015

469 views

Category:

Technology


3 download

DESCRIPTION

An overview of the glideinWMS Architecture. Part of the glideinWMS Training session held in Jan 2012 at UCSD.

TRANSCRIPT

Page 1: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 1

glideinWMS training @ UCSD

glideinWMS architectureby Igor Sfiligoi (UCSD)

Page 2: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 2

Outline

● A high level overview of the glideinWMS

● Description of the components

Page 3: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 3

glideinWMS

glideinWMSfrom 10k feet

Page 4: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 4

Refresher - Condor

● A Condor pool is composed of 3 pieces

Collector

Negotiator

Central manager

Submit node

Schedd

Execution node

Startd

Job

Execution node

Execution node

Execution node

Execution node

Submit node

Submit node

Page 5: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 5

What is a glidein?

● A glidein is just a properly configured execution node submitted as a Grid job

Collector

Negotiator

Central manager

Submit node

Schedd

Execution node

Startd

Job

Submit node

Submit node

glidein

Execution nodeglidein

Execution nodeglidein

Execution nodeglidein

Page 6: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 6

What is glideinWMS?

● glideinWMS is an automated tool for submitting glideins on demand

Collector

Negotiator

Central manager

Submit node

Schedd

Execution node

Startd

Job

Submit node

Submit node

glideinWMS

GlobusGlobus

CREAMExecution nodeglidein

Execution nodeglidein

Execution nodeglidein

glidein

Page 7: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 7

glideinWMS architecture

● glideinWMS has 3 logical pieces

Factory node

Condor

Factory

Frontend node

Frontend

Globus

CREAM

Submit node

Submit node

Submit node

Central manager

Execution nodeglidein

Execution nodeglidein

Worker node

glidein_startup

Startd

MonitorCondor

Requestglideins

Submitglideins

ConfigureCondor G.N.

Match

Frontend domain

Page 8: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 8

Site selection logic and job monitoring

Grid knowledge and troubleshooting

glideinWMS architecture

● glideinWMS has 3 logical pieces● glidein_startup – Configures and starts

Condor execution daemons

● Factory – Knows about the sites and does the submission

● Frontend – Knows about user jobs and requests glideins

Runtime environmentdiscovery and validation

Page 9: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 9

Cardinality

● N-to-M relationship● Each Frontend can talk to many Factories● Each Factory may serve many Frontends

Startd

Glidein Factory

ScheddUser job

Collector

Negotiator

VO Frontend

StartdUser job

ScheddCollector

Negotiator

VO Frontend

StartdUser jobGlidein Factory

Page 10: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 10

Many operators

● Factory and Frontend are usually operated by different people

● Frontends VO specific● Operated by VO admins● Each sets policies for its users

● Factories generic● Do not need to be affiliated with any group● Factory ops main task is Grid monitoring and

troubleshooting

Page 11: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 11

glideinWMS

A (sort of) detailed view of

glidein_startup

Page 12: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 12

Refresher – glideinWMS arch.

● glidein_startup configures and starts Condor

Factory node

Condor

Factory

Frontend node

Frontend

Globus

CREAM

Submit node

Submit node

Submit node

Central manager

Execution nodeglidein

Execution nodeglidein

Worker node

glidein_startup

Startd

MonitorCondor

Requestglideins

Submitglideins

ConfigureCondor G.N.

Match

Page 13: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 13

glidein_startup tasks

● Validate node (environment)● Download Condor binaries● Configure Condor● Start Condor daemon(s)● Collect post-mortem monitoring info● Cleanup

Performed by plugins

Page 14: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 14

glidein_startup plugins

● Config files and scripts loaded via HTTP● From both the factory and the frontend Web servers● Can use local Web proxy (e.g. Squid)● Mechanism tamper proof and cache coherent

glidein_startup

● Load files from factory Web

● Load files from frontend Web

● Run executables● Start Condor● Cleanup

StartdFrontend node

HTTPd

Factory node

HTTPdS

qu

id

Page 15: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 15

glidein_startup scripts

● Standard plugins● Basic Grid node validation (certs, disk space, etc.)● Setup Condor (glexec, CCB, etc.)

● VO provided plugins● Optional, but can be anything● CMS@UCSD checks for CMS SW

● Factory admin can also provide them● Details about the plugins can be found at

http://tinyurl.com/glideinWMS/doc.prd/factory/custom_scripts.html

Page 16: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 16

glideinWMS

A (sort of) detailed view of the

glidein factory

Page 17: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 17

Refresher – glideinWMS arch.

● The factory knowns about the grid and submits glideins

Factory node

Condor

Factory

Frontend node

Frontend

Globus

CREAM

Submit node

Submit node

Central manager

Execution nodeglidein

Execution nodeglidein

Worker node

glidein_startup

Startd

MonitorCondor

Requestglideins

Submitglideins

ConfigureCondor G.N.

Match

Page 18: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 18

Glidein factory

● Glidein factory knows how to contact sites● List in a local config● Only trusted and tested sites should be included

● For each site (called entry)● Contact info (Node, grid type, jobmanager)● Site config (startup dir, glexec, OS type, …)● VOs supported● Other attributes (Site name, closest SE, ...)

● Admin maintained, periodically compared to BDIIhttp://tinyurl.com/glideinWMS/doc.prd/factory/configuration.html

Page 19: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 19

Glidein factory role

● The glidein factory is just a slave● The frontend(s) tell it how many glideins

to submit where● Once the glideins start to run, they report to

the VO collector and the factory is not involved

● The communication is based on ClassAds● The factory has a Collector for this purpose

Factory node

Collector

Factory

Frontend node

Frontend

Page 20: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 20

Factory collector

● The factory collector handles all communication

Factory node

Collector

Factory

Frontend node

Frontend

EntryEntry Entry

Spawn

Frontend node

Frontend

...

.

.

.

Advertiseentry

Find sites

Requestglideins

Retrieveorders

http://tinyurl.com/glideinWMS/doc.prd/factory/design_data_exchange.html

Page 21: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 21

Frontend node

Frontend

Frontends

● The factory admin decides which Frontends to serve● Valid proxy

with known DN needed to talk to the collector

● Factory config has furtherfine grained controls Factory node

Collector

Factory

Frontend node

Frontend

Page 22: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 22

glidein

glideinFactory node

Glidein submission

● The glidein factory (entry) usesCondor-G to submit glideins● Condor-G does the heavy lifting● The factory just monitors the progress

Entry

ScheddEntry

Schedd

.

.

.

.

.

.

Submit

Monitor

Submit

Monitor

Globus

CREAM

glidein

glidein

Page 23: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 23

Credentials/Proxy

● Proxy typically provided by the frontend● Although the factory can provide a default one (rarely used)

● Proxy delivered encrypted in the ClassAd● Factory (entry) provides the encryption key (PKI)

● Proxy stored on disk● Each VO mapped to a different UID

Factory node

Collector

Entry

Frontend node

Frontend

Get key

Deliver proxy(encrypted)

Schedd

Page 24: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 24

glideinWMS

A (sort of) detailed view of the

VO frontend

Page 25: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 25

Factory node

Condor

Factory

Frontend node

Frontend

Globus

CREAM

Submit node

Submit node

Central manager

Execution nodeglidein

Execution nodeglidein

Worker node

glidein_startup

Startd

MonitorCondor

Requestglideins

Submitglideins

ConfigureCondor G.N.

Match

Refresher – glideinWMS arch.

● The frontend monitors the user Condor pool,does the matchmaking and requests glideins

Frontend domain

Page 26: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 26

VO frontend

● The VO frontend is the brain of a glideinWMS-based pool● Like a site-level “negotiator”

Factory node

Frontend node

Frontend

Submit node

Submit node

Central manager

MonitorCondor

Requestglideins

Match

VO domain Findidle jobs

Findentries

Match

Requestglideins

Page 27: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 27

Collector

Negotiator

Central manager

Submit node

Schedd

Execution node

Startd

Job

Factory

GlobusGlobus

CREAMExecution nodeglidein

Execution nodeglidein

Execution nodeglidein

glidein

Frontend

Two level matchmaking

● The frontend triggers glidein submission● The “regular” negotiator matches jobs to glideins

Page 28: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 28

Frontend logic

● The glideinWMS glidein request logicis based on the principle on “constant pressure”● Frontend requests a certain number of

“idle glideins” in the factory queue at all times● It does not request a specific number of glideins

● This is done due to the asynchronous nature of the system● Both the factory and the frontend are

in a polling loop and talk to each other indirectly

Page 29: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 29

Frontend logic

● Frontend matches job attrs against entry attrs● It then counts the matched idle jobs● A fraction of this number becomes the

“pressure requests” (up to 1/3)

● The matchmaking expression is defined by the frontend admin● Not the user● Debatable if it is better or worse, but it does reduce

frontend code complexity

Page 30: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 30

● The frontend owns the “glidein proxy”● And delegates it to the factory(s)

when requesting glideins● Must keep it valid at all times

(usually at OS level)

● The VO frontend can (and should) provide VO specific validation scripts‑

● The VO frontend can (and should) set the glidein start expression● Used by the VO negotiator for final matchmaking

Frontend config

Page 31: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 31

glideinWMS

And the

summary

Page 32: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 32

Summary

● Glideins are just properly configured Condor execute nodes submitted as Grid jobs

● The glideinWMS is a mechanism to automate glidein submission

● The glideinWMS is composed of three logical entities, two being actual services:● Glidein factories know about the Grid● VO frontend know about the users and

drive the factories

Page 33: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 33

Pointers

● glideinWMS development team is reachable [email protected]

● The official project Web page ishttp://tinyurl.com/glideinWMS

● CMS frontend at UCSDhttp://glidein-collector.t2.ucsd.edu:8319/vofrontend/monitor/frontend_UCSD-v5_2/frontendStatus.html

● OSG glidein factory at UCSDhttp://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactoryhttp://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html

Page 34: glideinWMS Architecture - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 glideinWMS architecture 34

Acknowledgments

● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI

● The glideinWMS factory operations at UCSD is sponsored by OSG

● The funding comes from NSF, DOE and the UC system