d. olson, l b n l 1 star collab. mtg. 13 aug 2003 grid enabling a small cluster doug olson lawrence...

14
D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a Grid Enabling a small small Cluster Cluster Doug Olson Doug Olson Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory STAR Collaboration Meeting STAR Collaboration Meeting 13 August 2003 13 August 2003 Michigan State University Michigan State University

Upload: brittney-maxwell

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 1STAR Collab. Mtg. 13 Aug 2003

Grid Enabling a Grid Enabling a smallsmall Cluster Cluster

Doug OlsonDoug OlsonLawrence Berkeley National LaboratoryLawrence Berkeley National Laboratory

STAR Collaboration MeetingSTAR Collaboration Meeting13 August 200313 August 2003

Michigan State UniversityMichigan State University

Page 2: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 2STAR Collab. Mtg. 13 Aug 2003

ContentsContents

• Overview of multi-site data gridOverview of multi-site data grid

• Features of a grid-enabled clusterFeatures of a grid-enabled cluster

• How to grid-enable a clusterHow to grid-enable a cluster

• CommentsComments

Page 3: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 3STAR Collab. Mtg. 13 Aug 2003

Page 4: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 4STAR Collab. Mtg. 13 Aug 2003

CMS Integration Grid Testbed

Managed by ONE Linux box at FermiTime to process

1 event:

500 sec @ 750 MHz

From Miron Livny, example from last fall.

Page 5: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 5STAR Collab. Mtg. 13 Aug 2003

Example Grid Application:Example Grid Application:Data Grids for High Energy PhysicsData Grids for High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

www.griphyn.org www.ppdg.net www.eu-datagrid.orgFamous Harvey Newman slide

SLACFNALBNL

Page 6: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 6STAR Collab. Mtg. 13 Aug 2003

What do we get?What do we get?

Distribute load across available resources.

Access to resources shared with other groups/projects.Eventually sharing across grid will look like sharing within a cluster (see below).

On-demand access to much larger resource than available in dedicated fashion.

(Also spreading costs across more funding sources.)

Page 7: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 7STAR Collab. Mtg. 13 Aug 2003

Features of a grid site Features of a grid site (server side services)(server side services)

• Local compute & storage resourcesLocal compute & storage resources• Batch system for cluster (pbs, lsf, condor, …)• Disk storage (local, NFS, …)• NIS or Kerberos user accounting system• Possibly robotic tape (HPSS, OSM, Enstore, …)

• Added grid servicesAdded grid services• Job submission (Globus gatekeeper)• Data transport (GridFTP)• Grid user to local account mapping (gridmap file, …)• Grid security (GSI)• Information services (MDS, GRIS, GIIS, Ganglia)• Storage management (SRM, HRM/DRM software)• Replica management (HRM & FileCatalog for STAR)• Grid admin person

• Required STAR servicesRequired STAR services• MySQL db for FileCatalog• Scheduler provides (will provide) client-side grid interface

Page 8: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 8STAR Collab. Mtg. 13 Aug 2003

How to grid-enable a clusterHow to grid-enable a cluster

• Signup on email listsSignup on email lists

• Study globus toolkit administrationStudy globus toolkit administration

• Install and configureInstall and configure• VDT (grid)• Ganglia (cluster monitoring)• HRM/DRM (storage management & file transfer)

• Set up method for grid-mapfile (user) managementSet up method for grid-mapfile (user) management

• Additionally install/configure MySQL & FileCatalog & Additionally install/configure MySQL & FileCatalog & STAR softwareSTAR software

Page 9: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 9STAR Collab. Mtg. 13 Aug 2003

Background URL’sBackground URL’s

• stargrid-l mail liststargrid-l mail list• Globus Toolkit - Globus Toolkit - www.globus.org/toolkitwww.globus.org/toolkit

• Mail lists, see - http://www-unix.globus.org/toolkit/support.html

• Documentation - www-unix.globus.org/toolkit/documentation.html

• Admin guide - http://www.globus.org/gt2.4/admin/index.html

• Condor - Condor - www.cs.wisc.edu/condorwww.cs.wisc.edu/condor• Mail lists: condor-users and condor-world

• VDT - VDT - http://www.lsc-group.phys.uwm.edu/vdt/software.htmlhttp://www.lsc-group.phys.uwm.edu/vdt/software.html

• SRM - SRM - http://sdm.lbl.gov/projectindividual.php?ProjectID=SRMhttp://sdm.lbl.gov/projectindividual.php?ProjectID=SRM

Page 10: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 10STAR Collab. Mtg. 13 Aug 2003

VDT grid software distributionVDT grid software distribution(http://www.(http://www.lsclsc-group.phys.-group.phys.uwmuwm..eduedu//vdtvdt/software.html)/software.html)

• Virtual Data Toolkit (VDT) is the software distribution Virtual Data Toolkit (VDT) is the software distribution packaging for the US Physics Grid Projects (GriPhyN, PPDG, packaging for the US Physics Grid Projects (GriPhyN, PPDG, iVDGL).iVDGL).• It uses pacman for the distribution tool (developed by Saul Youssef,

BU Atlas)• VDT contents (1.1.10)

• Condor/Condor-G 6.5.3, Globus 2.2.4, GSI OpenSSH, Fault Tolerant Shell v2.0, Chimera Virtual Data System 1.1.1, Java JDK1.1.4, KX509 / KCA, MonaLisa, MyProxy, PyGlobus, RLS 2.0.9, ClassAds 0.9.4, Netlogger 2.0.13

• Client, Server and SDK packages• Configuration scripts

• Support model for VDT• The VDT team centered at U. Wisc. performs testing and patching of code

included in VDT• VDT is the prefered contact for support of the included software packages

(Globus, Condor, …)• Support effort comes from iVDGL, NMI, other contributors

Page 11: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 11STAR Collab. Mtg. 13 Aug 2003

Additional softwareAdditional software

• Ganglia - cluster monitoringGanglia - cluster monitoring• http://ganglia.sourceforge.net/• Not strictly req’d for grid but STAR uses as input to grid

info svcs

• HRM/DRM - storage management & data transferHRM/DRM - storage management & data transfer• Contact Eric Hjort & Alex Sim

• Expected to be in VDT in future

• Being used for bulk data ransfer between BNL & LBNL

• + STAR software …+ STAR software …

Page 12: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 12STAR Collab. Mtg. 13 Aug 2003

VDT installation (globus, condor, …)VDT installation (globus, condor, …)(http://www.lsc-group.phys.uwm.edu/vdt/installation.html)(http://www.lsc-group.phys.uwm.edu/vdt/installation.html)

• Steps:Steps:• Install pacman• Prepare to install VDT (directory, accounts)• Install VDT software using pacman• Prepare to run VDT components• Get host & service certificates (www.doegrids.org)• Optionally install & run tests (from VDT)

• Where to install VDTWhere to install VDT• VDT-Server on gatekeeper nodes• VDT-Client on nodes that initiate grid activities• VDT-SDK on nodes for grid-dependent s/w development

Page 13: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 13STAR Collab. Mtg. 13 Aug 2003

Manage users (grid-mapfile, …)Manage users (grid-mapfile, …)

• Users on grid are identified by their X509 certificate.Users on grid are identified by their X509 certificate.• Every grid transaction is authenticated with a proxy derived Every grid transaction is authenticated with a proxy derived

from the user’s certificate.from the user’s certificate.• Also every grid communicaiton path is authenticated with host &

service certificates (SSL).• Default gatekeep installation uses grid-mapfile to convert Default gatekeep installation uses grid-mapfile to convert

X509 id to local user idX509 id to local user id• [stargrid01] ~/> cat /etc/grid-security/grid-mapfile | grep doegrids• "/DC=org/DC=doegrids/OU=People/CN=Douglas L Olson" olson• "/DC=org/DC=doegrids/OU=People/CN=Alexander Sim 546622" asim• "/OU=People/CN=Dantong Yu 254996/DC=doegrids/DC=org" grid_a• "/OU=People/CN=Dantong Yu 542086/DC=doegrids/DC=org" grid_a• "/OU=People/CN=Mark Sosebee 270653/DC=doegrids/DC=org" grid_a• "/OU=People/CN=Shawn McKee 83467/DC=doegrids/DC=org" grid_a

• There are obvious security considerations that need to fit with There are obvious security considerations that need to fit with your site requirementsyour site requirements

• There are projects underway to manage this mapping for a There are projects underway to manage this mapping for a collaboration across several sites - a work in progresscollaboration across several sites - a work in progress

Page 14: D. Olson, L B N L 1 STAR Collab. Mtg. 13 Aug 2003 Grid Enabling a small Cluster Doug Olson Lawrence Berkeley National Laboratory STAR Collaboration Meeting

D. Olson, L B N L 14STAR Collab. Mtg. 13 Aug 2003

CommentsComments

• Figure 6 mo. full time to start, then 0.25 FTE for Figure 6 mo. full time to start, then 0.25 FTE for cluster that is used rather heavily by a number of cluster that is used rather heavily by a number of usersusers• Assuming reasonably competent linux cluster administrator

who is not yet familiar with grid

• Grid software and STAR distributed data Grid software and STAR distributed data management software is still evolving so there is management software is still evolving so there is some work to follow this (in the 0.25 FTE)some work to follow this (in the 0.25 FTE)

• During next year - static data distributionDuring next year - static data distribution

• In 1+ year should have rather dynamic user-driven In 1+ year should have rather dynamic user-driven data distributiondata distribution