osg (overview services and client tools)

33
OSG (overview services and client tools) Rob Gardner University of Chicago US ATLAS Tier2/Tier3 Workshop SLAC, November 28-30, 2007

Upload: alina

Post on 11-Jan-2016

51 views

Category:

Documents


2 download

DESCRIPTION

OSG (overview services and client tools). Rob Gardner University of Chicago US ATLAS Tier2/Tier3 Workshop SLAC, November 28-30, 2007. OSG Software and Grids. There is an OSG Facility project run by Miron that organizes efforts Software - the VDT Operations Security Integration - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OSG  (overview services and client tools)

OSG (overview services and client tools)

Rob GardnerUniversity of Chicago

US ATLAS Tier2/Tier3 WorkshopSLAC, November 28-30, 2007

Page 2: OSG  (overview services and client tools)

2

OSG Software and Grids• There is an OSG Facility project run by Miron

that organizes efforts– Software - the VDT– Operations– Security– Integration– Troubleshooting– Applications

• ATLAS participates in these in various ways– Integration: the ITB and VTB test beds– US ATLAS VO support center– RSV+Nagios monitoring – Application area for workload management systems– Requirements into OSG 1.0

Page 3: OSG  (overview services and client tools)

3

OSG Grids

Page 4: OSG  (overview services and client tools)

4

Validation Testbedhttps://twiki.grid.iu.edu/twiki/bin/view/Integration/ValidationTestbed

• Motivation– create a limited, small-scale testbed that provides

rapid, self-contained, limited installation, configuration, and validation of VDT and other services

– configured as an actual grid with distributed sites & services

– gives very quick feedback to VDT– prepares packages and configurations for the ITB

• Sites– UC, CIT, LBNL, FNAL, IU

• Components– SVN repository, http://osg-vtb.uchicago.edu/. – Pacman cache– Support and build tools; central logging host

(syslog-ng)

Page 5: OSG  (overview services and client tools)

5

Integration Testbed• Motivation

– Broader, larger scale testing, eg. more platforms, batch schedulers, site specifics...

– VO validation: application integration platform; first tests of the OSG software stack

– Operated: monitored, scrutinized: Persistent ITB (FermiGrid, BNL, UC)

• Components– SVN repository and Pacman cache, support and build

tools– ITB release description– Site validation table: by-hand bookkeeping– Services: ITB instances of ReSS, BDII, Gratia, GIP

validation

• Processes– Stakeholder requirements– New service integration (readiness plans)– Install fests, validation, documentation

Page 6: OSG  (overview services and client tools)

6

Service validation on the ITB

• Validation task assigned for each service, validated by site• Coverage pretty good for the standard CE services

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 7: OSG  (overview services and client tools)

7

Validation, continued

• Pretty good coverage for these CE services too (VOMRS for a VOMS admin host, not tested on sites)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 8: OSG  (overview services and client tools)

8

Validation, continued

• Could have used more testing of gLexec and Squid

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 9: OSG  (overview services and client tools)

9

Deployment• Site organization - components:

– Compute element (CE)– Storage element (SE)– GUMS

• Configuration– osg-configure.sh– RSV configuration a separate step presently

• Execute local validation tests - site-verify• Validate grid-level services: how does my CE

appear in OSG services?– check VORS scans– check reporting of ClassAds in ReSS– check reporting of ldiff information in BDII– check accounting in Gratia

Page 10: OSG  (overview services and client tools)

10

Release documentation• Improved - hopefully! feedback

welcomed

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 11: OSG  (overview services and client tools)

11

Status of documentation• Followed the ATLAS workbook style

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 12: OSG  (overview services and client tools)

12

OSG deployment options

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Site planning: A. RoyNot shown are RSV, Gratia servicesNot shown are RSV, Gratia services

Page 13: OSG  (overview services and client tools)

13

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Site planning: A. Roy

Page 14: OSG  (overview services and client tools)

14

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Site planning: A. Roy

Page 15: OSG  (overview services and client tools)

15

OSG compute element install• PrepareConsult:

https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/PreparingComputeElement

$ export VDTSETUP_CONDOR_LOCATION=/opt/condor/$ export VDT_GUMS_HOST=uct2-grid4.uchicago.edu

• InstallConsult:

https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/ComputeElementInstall. Will install in /opt/osg-0.8.0/. After will sym link /opt/osg to it.

$ pacman -get OSG:ce$ export VDTSETUP_CONDOR_CONFIG=/opt/condor/etc/condor_config$ PATH=$PATH:/opt/condor/bin/$ pacman -get OSG:Globus-Condor-Setup

• Managed Fork$./vdt/setup/configure_globus_gatekeeper --managed-fork y --server ySuggested Condor configuration settings for managed fork * Only allow 20 local universe jobs to execute concurrently: START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 20 * Set a hard limit on most jobs, but always let grid monitor jobs run (strongly

recommended):

START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 20 || GridMonitorJob =?= TRUE

Page 16: OSG  (overview services and client tools)

16

CE, install (cont)• Authorization mode: full privilegeConsult https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/FullPrivilegeAuthorization

– Edit ./post-install/prima-authz.conf to point to uct2-grid4, our gums server.– copy prima-authz.conf to /etc/grid-security/– Same kind of thing for gsi-authz.conf

• gums-client.properties– Check that /opt/osg/gums/config/gums-client.properties

points to your gums server• Testing osg-user-vo-map.txt file generation. This is test of

the gums client and server.$ source $VDT_LOCATION/setup.sh $ cd $VDT_LOCATION/gums/scripts $ ./gums-host generateGrid3UserVoMap --file grid-mapfile-test

#User-VO map#---- accounts for vo: cernusatlasProd ----#usatlas1 usatlas#---- accounts for vo: cernusatlasSoft ----#usatlas2 usatlas#---- accounts for vo: cernusatlas ----#usatlas3 usatlas#---- accounts for vo: cernatlas ----#usatlas4 usatlas

Page 17: OSG  (overview services and client tools)

17

CE install, cont• Turn services on$ vdt-control --onenabling cron service fetch-crl... okenabling cron service vdt-rotate-logs... okskipping init service 'gris' -- marked as disabledenabling inetd service globus-gatekeeper... okenabling inetd service gsiftp... okenabling init service mysql... okenabling init service globus-ws... okskipping cron service 'edg-mkgridmap' -- marked as disabledskipping cron service 'gums-host-cron' -- marked as disabledskipping init service 'MLD' -- marked as disabledskipping cron service 'vdt-update-certs' -- marked as disabledenabling init service condor-devel... okenabling init service apache... okskipping init service 'osg-rsv' -- marked as disabledenabling init service tomcat-5... okenabling init service syslog-ng... okenabling cron service gratia-condor... ok

Page 18: OSG  (overview services and client tools)

18

configure-osg• This is to setup the attributes to

advertise to the information services in OSG

• Good reference https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/EnvironmentVariables

• ./monitoring/configure-osg.sh

Page 19: OSG  (overview services and client tools)

19

RSV configuration• See

https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/InstallAndConfigureRSV

• Shut everything off, then configure# vdt-control --off# $VDT_LOCATION/vdt/setup/configure_osg_rsv --user rwg --

init --server y# $VDT_LOCATION/vdt/setup/configure_osg_rsv --uri tier2-

osg.uchicago.edu --proxy /tmp/x509up_u1063 --probes --gratia --verbose

# $VDT_LOCATION/vdt/setup/configure_osg_rsv --setup-for-apache

Pages can be viewed at http://HOSTNAME:8080/rsv# $VDT_LOCATION/vdt/setup/configure_gratia --probe metric

--report-to rsv.grid.iu.edu:8880# vdt-control --on

Page 20: OSG  (overview services and client tools)

20

RSV site monitor example

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

UC_ATLAS_MWT2

Page 21: OSG  (overview services and client tools)

21

Select which VOs to support• Edit osg-supported-vo-list.txt to include

which VOs to support• Minimum:

# List of VOs this site claims to support

MIS

ATLAS

OSG

Page 22: OSG  (overview services and client tools)

22

wn-client• Must be available to the worker node (either local

install or NFS-exported, eg.)rwg@uct2-c001:~$ source /share/wn-client/setup.sh rwg@uct2-c001:~$ vdt-versionYou have installed a subset of VDT version 1.8.1c: CA Certificates v32 (includes IGTF 1.17 CAs) cURL 7.16.2 dccp (dCache client) 1.7.0-39 Fetch CRL 2.6.2 Globus Toolkit, pre web-services, client 4.0.5 Globus Toolkit, web-services, client 4.0.5 GPT 3.2 Java 5 SDK 1.5.0_13 Logrotate 3.7 MyProxy 3.9 Pegaus Worker Package 2.0.1 RLS, client 3.0.041021 SRM V1 Client 1.25 SRM V2 Client 2.2.0.4 UberFTP 1.24

Wget 1.10.2

Page 23: OSG  (overview services and client tools)

23

Groups, roles and unix accounts

• The typical ATLAS site has been setup to recognize production and software roles, the usatlas group, and everyone else– usatlas1: production– usatlas2: software (highest priority for

software installs)– usatlas3: usatlas group (US ATLAS users)– usatlas4: all other ATLAS users

• To properly implement requires setup of a GUMS server, and the “Full Privilege” security configuration of the OSG compute element

Page 24: OSG  (overview services and client tools)

24

OSG Client - install• $ pacman -get OSG:client• Can be done as non-root - users can have their

private client tools, Condor-G job manager, etc• Common thing is to install a client at a site,

NSF-export to places where users work - separate from the CE node

• Options for this mode, install as root:– make Condor job manager available on server

restarts– Job manager shared among users as a grid job

submits– Run CRL updater - keep these up-to-date

automatically– Log rotation

Page 25: OSG  (overview services and client tools)

25

OSG Client - contents$ source /share/osg-client/setup.sh$ vdt-versionYou have installed a subset of VDT version 1.8.1e: CA Certificates v33 (includes IGTF 1.18 CAs) Condor/Condor-G 6.8.6 cURL 7.16.2 Fetch CRL 2.6.2 Globus Toolkit, pre web-services, client 4.0.5 Globus Toolkit, web-services, client 4.0.5 GPT 3.2 GSI-Enabled OpenSSH 4.0 Java 5 SDK 1.5.0_13 KX509 20031111 lcg-info 1.11.0-1 lcg-infosites 2.6-2 Logrotate 3.7 MyProxy 3.9 Pegasus 2.0.1 PPDG Cert Scripts 2.5 pyGlobus gt4.0.1-1.13 PyGlobus URL Copy 1.1.2.11 RLS, client 3.0.041021 SRM V1 Client 1.25 SRM V2 Client 2.2.0.4 UberFTP 1.24 Wget 1.10.2

Page 26: OSG  (overview services and client tools)

26

Aside: VO stuff• https://www.racf.bnl.gov/docs/howto/grid

/voatlas• https://lcg-voms.cern.ch:8443/vo/atlas/vo

mrs

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

John Hover, Jay Packard handle all US requestsJohn Hover, Jay Packard handle all US requests

Page 27: OSG  (overview services and client tools)

27

cert-scripts• Best way to wrangle user and host certs!• Comes with OSG client (also in CE package)

– cert-check-time - checks lifetime of certificates and revocation lists

– cert-gridadmin - immediate issuance of service certificates for authorized requestors

– cert-lookup - queries directory based on DN of certificates– cert-request - generates and submits a certificate signing

request– cert-retrieve - retrieves signed certificate previously requested– cert-renew - renews existing person certificate (not host or

service)– multi-cert-gridadmin - handle many service certificate requests

at once; generate CSRs, submit to Grid Admin interface, etc.

• See further– https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/

CertScripts– https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/

GetGridCertificates

Page 28: OSG  (overview services and client tools)

28

voms-proxy-init• For extended attributes - production and

software users. Example - for the “software” role

$ voms-proxy-init --voms atlas:/Role=softwareCannot find file or dir:

/home/condor/execute/dir_11128/userdir/glite/etc/vomsesEnter GRID pass phrase:Your identity: /DC=org/DC=doegrids/OU=People/CN=Robert W.

Gardner Jr. 669916Cannot find file or dir:

/home/condor/execute/dir_11128/userdir/glite/etc/vomsesCreating temporary proxy ........................................ DoneContacting vo.racf.bnl.gov:15003

[/DC=org/DC=doegrids/OU=Services/CN=vo.racf.bnl.gov] "atlas" Done

Creating proxy .............................................................. DoneYour proxy is valid until Thu Nov 29 10:46:29 2007

warning: lots of annoying warning

messages

Page 29: OSG  (overview services and client tools)

29

Inspect attributes and test mapping$ voms-proxy-info -all

WARNING: Unable to verify signature! Server certificate possibly not installed.Error: Cannot find certificate of AC issuer for vo atlassubject : /DC=org/DC=doegrids/OU=People/CN=Robert W. Gardner Jr.

669916/CN=proxyissuer : /DC=org/DC=doegrids/OU=People/CN=Robert W. Gardner Jr. 669916identity : /DC=org/DC=doegrids/OU=People/CN=Robert W. Gardner Jr. 669916type : proxystrength : 512 bitspath : /tmp/x509up_u20001timeleft : 11:59:36=== VO atlas extension information ===VO : atlassubject : /DC=org/DC=doegrids/OU=People/CN=Robert W. Gardner Jr. 669916issuer : /DC=org/DC=doegrids/OU=Services/CN=vo.racf.bnl.govattribute : /atlas/usatlas/Role=software/Capability=NULLattribute : /atlas/Role=NULL/Capability=NULLattribute : /atlas/usatlas/Role=NULL/Capability=NULLattribute : /atlas/lcg1/Role=NULL/Capability=NULLtimeleft : 11:59:35

$ globus-job-run gk01.swt2.uta.edu /usr/bin/whoamiusatlas2

Page 30: OSG  (overview services and client tools)

30

ClassAd based information service• $ condor_status -pool osg-ress-1.fnal.gov -format '%s\n'

GlueSiteName | uniq

TTU-ANTAEUSUTA_DPCCDukeAtlas_T3LTU_OSGMIT_CMSLCG-CBPFCIT_CMS_T2CIT_CMS_DISUNOSG_INSTALL_TEST_2GLOWGLOW-CMSUSCMS-FNAL-WC1-CEUSCMS-FNAL-WC1-CE2NERSC-DavinciFNAL_FERMIGRIDFNAL_GPFARMMCGILL_HEPAGLT2IPAS_OSGUTA_SWT2gpnjayhawkOU_OSCER_ATLASOSG_LIGO_PSU

BNL_ATLAS_1BNL_ATLAS_2

GROW-PROD Boulder_HEPUFlorida-IHEPAPurdue-CaesarPurdue-LearCornellLEPPLTU_CCTIU_OSGNYSGRID-CORNELL-

NYS1WISC-OSG-EDUUCSDT2UCSDT2-BOSG_LIGO_MITORNL_NSTGNWICG_NotreDamePurdue-RCACUTENN_CMSASGC_OSGPROD_SLACOUHEP_OSGNERSC-PDSFUFlorida-PGcinvestav

STAR-WSUUCLA_Saxon_Tier3SPRACESTAR-BNLOU_OSCER_CONDORUVA-sunfireOU_OCHEP_SWT2UC_ATLAS_MWT2UCR-HEPNYSGRID-CCR-U2MWT2_UCUmissHEPVanderbilt

Page 31: OSG  (overview services and client tools)

31

ldap based info service (BDII)• $ lcg-info --list-ce --bdii is-itb.grid.iu.edu:2170 --vo

atlas- CE: cithep201.ultralight.org:2119/jobmanager-condor-atlas- CE: cms-xen1.fnal.gov:2119/jobmanager-condor-atlas- CE: cms-xen9.fnal.gov:2119/jobmanager-condor-atlas- CE: cmsitbsrv01.fnal.gov:2119/jobmanager-condor-atlas- CE: cmssrv09.fnal.gov:2119/jobmanager-condor-atlas- CE: gridtest01.racf.bnl.gov:2119/jobmanager-condor-atlas- CE: osg-gw-3.t2.ucsd.edu:2119/jobmanager-condor-atlas- CE: osg-itb.ligo.caltech.edu:2119/jobmanager-condor-atlas- CE: osg-vtb.ligo.caltech.edu:2119/jobmanager-condor-atlas- CE: osgitb1.nhn.ou.edu:2119/jobmanager-condor-atlas- CE: tb10.grid.iu.edu:2119/jobmanager-condor-atlas- CE: testwulf.hpcc.ttu.edu:2119/jobmanager-pbs-TIGRE- CE: testwulf.hpcc.ttu.edu:2119/jobmanager-pbs-long- CE: testwulf.hpcc.ttu.edu:2119/jobmanager-pbs-priority_queue- CE: testwulf.hpcc.ttu.edu:2119/jobmanager-pbs-small- CE: testwulf.hpcc.ttu.edu:2119/jobmanager-pbs-verylong- CE: uct3-edge7.uchicago.edu:2119/jobmanager-pbs-int_exec- CE: uct3-edge7.uchicago.edu:2119/jobmanager-pbs-int_exec- CE: uct3-edge7.uchicago.edu:2119/jobmanager-pbs-test_exec- CE: uct3-edge7.uchicago.edu:2119/jobmanager-pbs-uct3_exec

Page 32: OSG  (overview services and client tools)

32

ldap based info service (BDII)http://is.grid.iu.edu/cgi-bin/status.cgi

$ ldapsearch -x -l 60 -b mds-vo-name=BNL_ATLAS_1,mds-vo-name=local,o=grid -h is.grid.iu.edu -p 2170

(edited output... more follows, depending on configure-osg, osg-attributes.conf, gip-attributes.cont)

# BNL_ATLAS_1, local, griddn: mds-vo-name=BNL_ATLAS_1,mds-vo-name=local,o=gridobjectClass: GlueTop# gridgk01.racf.bnl.gov, BNL_ATLAS_1, local, griddn: GlueSiteUniqueID=gridgk01.racf.bnl.gov,mds-vo-name=BNL_ATLAS_1,mds-vo-name =local,o=gridGlueSiteUniqueID: gridgk01.racf.bnl.govGlueSiteName: BNL_ATLAS_1GlueSiteDescription: OSG SiteGlueSiteEmailContact: mailto: [email protected]: Long Island,NY ,USAGlueSiteLatitude: 40.366GlueSiteLongitude: -72.388GlueSiteWeb: https://www.racf.bnl.gov/Facility/LinuxFarm/CondorPolicy_BNL_USAT LAS.htmlGlueSiteSponsor: usatlas:100GlueSchemaVersionMajor: 1GlueSchemaVersionMinor: 3

Page 33: OSG  (overview services and client tools)

33

OSG further information• https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/SiteAdmin

Resources

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Troubleshooting campaign link: http://www.grid.iu.edu/cgi-bin/contact_080.pl

OSG-STORAGE

[email protected]