enabling grids for e-science the infn grid marco verlato (infn-padova) eela wp2 e-infrastructure...
Post on 16-Jan-2016
214 Views
Preview:
TRANSCRIPT
Enabling Grids for E-sciencE
The INFN GRID
Marco Verlato (INFN-Padova)
EELA WP2 E-infrastructure Workshop
Rio de Janeiro, 20-23 August 2007
Enabling Grids for E-sciencE
2
Outline
• A little of history
• INFNGRID Overview
• INFNGRID Release
• INFNGRID Services
• From developers to production…
• Monitoring and Accountig
• Users and Sites Support
• Managing procedures
Enabling Grids for E-sciencE
3
The INFN GRID project• The 1° National Project (Feb. 2000) aiming to develop the grid
technology and the new e-infrastructure to solve LHC (and e-Science) computing requirements
• e-Infrastructure = Internet + new WEB and Grid Services on top of a physical layer composed by Network, Computing, Supercomputing and Storage Resources, made properly available in a shared fashion by the new Grid services
• Since then many Italian and EU projects made this a reality• Many scientific sectors in italy, EU and the entire World base now their
research activities on the Grid• INFN Grid continues to be the national container used by INFN to reach
its goals coordinating all the activities:– In the national, european and international Grid projects – In the standardization processes of the Open Grid Forum (OGF)– In the definition of EU policies in the ICT sector of Research
Infrastructures – Through its managerial structure: Executive Board, Technical Board…
Enabling Grids for E-sciencE
4
The INFN GRID portal
http://grid.infn.it
Enabling Grids for E-sciencE
5
The strategy• Clear and stable objectives: development of the technology and of the
infrastructure needed for the LHC computing but of general value• Variable instruments: use of projects and external funds ( from EU,
MIUR...) to reach the goal • Coordination among all the projects (Executive Board)
– Grid middleware & infrastructure Grid needed by INFN and LHC within a number of core European and International projects, often coordinated by CERN
DataGrid, DataTAG, EGEE, EGEE II, WLCG– Often fostered by INFN itself
• International collaboration with US Globus and Condor for the middleware and Grid projects like Open Science Grid e Open Grid Forum in order to reach global interoperability among developed services and the adoption of international standards
• National pioneer developments of the MW and the national infrastructure in the areas not covered by EU projects via national projects like Grid.it , LIBI, EGG …
• Strong contribution to political committees: e-Infrastructure Reflection Group (eIRG ->ESFRI), EU Concertation meetings and with involved Units of Commission (F2 e F3) to establish activities programs (Calls)
Enabling Grids for E-sciencE
6
Some history … LHC EGEE Grid
• 1999 – Monarc Project– Early discussions on how to organise distributed computing
for LHC• 2000 – growing interest in grid technology
– HEP community was the driver in launching the DataGrid project
• 2001-2004 - EU DataGrid project / EU DataTAG project– middleware & testbed for an operational grid
• 2002-2005 – LHC Computing Grid – LCG– deploying the results of DataGrid to provide aproduction facility for LHC experiments
• 2004-2006 – EU EGEE project phase 1– starts from the LCG grid– shared production infrastructure– expanding to other communities and sciences
• 2006-2008 – EU EGEE-II – Building on phase 1– Expanding applications and communities …
• … and in the future – Worldwide grid infrastructure??– Interoperating and co-operating infrastructures?
CERN
Enabling Grids for E-sciencE
7
Other FP6 activities of INFN GRID in Europe/1
• To guarantee Open Source Grid Middleware evolutions towards international standards– OMII Europe
• …and its availability through an effective repository– ETICS
• To contribute to R&D informatics activities– Core Grid
• To Coordinate EGEE extension in the world– EUMedGrid– Eu-IndiaGrid – EUChinaGrid – EELA
Enabling Grids for E-sciencE
8
Other FP6 activities of INFN GRID in Europe/2
• To promote EGEE for new scientific communities– GRIDCC (real time applications and instruments control)– BioInfoGrid (Bioinformatics: Coordinated by CNR)– LIBI (MIUR, Bionfomatics in Italy)– Cyclops (Civil Protection)
• To contribute to e-IRG, the e-Infrastructure Reflection Group born in Rome the December 2003 – Initiative of Italian Presidency on “eInfrastructures (Internet and
Grids) – The new foundation for knowledge-based Societies”Event organised by MIUR, INFN and EU Commission
– Representatives in EIRG appointed by EU Science Ministres– Policies and Roadmap for e-Infrastrutture development in EU
• To coordinate participation to Open Grid Forum (ex GGF)
Enabling Grids for E-sciencE
9
INFN GRID / FP6 active projects
10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
FP7:guarantee sustainability
• The future of Grids in FP7 after 2008– EGEE proposed to European Parlament to set up an European Grid
Initiative (EGI) in order to: Guarantee long-term support & development to European e-
Infrastructure based on EGEE, DEISA and the Grid national projects being fundend by the National Grid Initiatives (NGI)
Provide a coordination framework at EU level as done for the research networks by Geant, DANTE and the National Networks like GARR
• The Commission asked that a plan for long-term sustainability Grid infrastructure (EGI + EGEE-III, …) to be included among the goals of EGEE-II (other than DANTE+ Geant 1-2)
• The building of EGI at EU level and of a National Grid Initiave at national level is among the main goals of FP7
11
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
The future of INFNGRID :IGI
• In 2006 ended Grid.IT, the 3+1 years National Project funded by MIUR with 12 M€ (2002-05)
• The future: the Italian Grid Infrastructure (IGI) Association – EU (eIRG, ESFRI) requires the fusion of different pieces of National Grids into a
single National Organisation (NGI) to be unique interface to EU --> IGI for Italy– Substantial consensus for the creation of IGI for a common governance of the
italian e-Infrastructure from all involved public bodies:INFN Grid, S-PACI, ENEA Grid, CNR, INAF, Centri Nazionali di supercalcolo : CINECA, CILEA, CASPUR, and new consortia “nuovi PON”
– Under evaluation with MIUR the evolution of GARR towards a more general body to manage all the components of the infrastructure: Network, Grid, Digital Libraries…
• Crucial for INFN in 2007-2008 will be to manage the transition from INFN Grid to IGI, in such a way to preserve and if possible enhance the organisation levels which allowed Italy to reach world leadership and become a leading partner of EGI
Enabling Grids for E-sciencE
12
Overview
INFNGRID Overview
Enabling Grids for E-sciencE
13
Supported Sites
40 Sites supported:
• 31 INFN Sites
• 9 NON INFN Sites
Total Resources:
• About 4600 CPUs
• About 1000 TB Disk Storage
(+ About 700 TB Tape)
Enabling Grids for E-sciencE
14
Supported VOs
40 VOs supported:
•4 LHC (ALICE, ATLAS, CMS, LHCB)
•3 cert (DTEAM, OPS, INFNGRID)
•8 Regional (BIO, COMPCHEM, ENEA, INAF, INGV, THEOPHYS, VIRGO)
•1 catch all VO: GRIDIT
•23 Other VOs
Recentrly a new regional VO enabled: COMPASSIT
Enabling Grids for E-sciencE
15
Components of the production Grid
Grid is not only CPUs and Storage
Other elements are as much fundamental for running, managing and
monitoring the grid:
• Middleware
• Grid Services
• Monitoring tools
• Accounting tools
• Management and control infrastructure
• Users
Enabling Grids for E-sciencE
16
GRID ManagementGrid management is performed by the Italian Regional Operation
Center (ROC). Its main activities are:
Production of the INFNGRID release and test it
Deployment of the release to the sites, support to local administrators and sites certification
Deployment of the release into central grid services
Maintenance of grid services
Periodical check of the resources and services status
Account the resources usage
Support at an Italian level to site managers and users
Support at an European level to site managers and users
Introduction of new Italian sites
Introduction of new regional VOs
The IT-ROC is involved in many other activities, not directly related to the production infrastructure, i.e. PreProduction, PreView and Certification Testbeds
Enabling Grids for E-sciencE
17
The Italian Regional Operation Center (ROC)
One of 10 existing ROCs in EGEE
Operations Coordination Centre (OCC)
– Management, oversight of all operational and support activities
• Regional Operations Centres (ROC)
– providing the core of the support infrastructure, each supporting a number of resource centres within its region
• Grid Operator on Duty
• Grid User Support (GGUS)
– At FZK, coordination and management of user support, single point of contact for users
Enabling Grids for E-sciencE
18
Middleware
INFNGRID RELEASE
Enabling Grids for E-sciencE
19
The m/w installed on INFNGRID nodes is a customization of the gLite
m/w used in the LCG/EGEE community. The customized INFNGRID
release is packaged by the INFN release team (grid-release<at>infn.it).
The ROC is responsible for the deployment of the release. At the
moment the INFNGRID-3.0-Update28 (based on gLite3.0-Update 28) is
deployed.LCG
LCG 1.0
INFN-GRID
1.0
EGEE
EGEE II
2004 20072003 2008
LCG 2.0
2.0
gLite 3.0
3.0
2005 2006
INFNGRID Release
Enabling Grids for E-sciencE
20
INFNGRID customizations: why?
• VOs not supported by EGEE: define once configuration parameters (e.g. VO servers, poolaccounts, add VOMS certificates, ...) to reduce misconfiguration risks
• MPI (requested by non-HEP sciences), additional GridICE config (monitor Wns), AFS read-only (CDF requirement), ...
• Deploy additional middleware in a non intrusive way:Since Nov. 2004 VOMS, now in EGEE; DGAS (DataGrid Accounting System); NetworkMonitor (monitor network connection metrics)
Enabling Grids for E-sciencE
21
INFNGRID customizations
• Additional VOs (~20)• GridICE on almost all profiles (including WN)• Preconfigured support for MPI:
– WN without home shared, but home synchronization using scp with host based authentication
• DGAS accounting:– New profile (HLR server) + additional packages on CE
• NME (Network Monitor Element)• Collaboration with CNAF-T1 for Quattor• UI “PnP”
– UI installable without administrator privilegies
• NTP• AFS (read-only) on WN (needed by CDF VO)
Enabling Grids for E-sciencE
22
• The packages are distributed in repositories available via HTTP
• For each release EGEE, there are 2 repositories collecting different types of packages:– Middleware
http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30/– Security http://linuxsoft.cern.ch/LCG-CAs/current/
• INFNGRID customizations => 3-rd repository– http://grid-it.cnaf.infn.it/apt/ig_sl3-i386
Packages and metapackages
Enabling Grids for E-sciencE
23
Metapackages management process
• 1: starting from EGEE lists, update INFNGRID lists (maintained in SVN repository)
• 2: once the lists are ok, to test them generate a first version of INFNGRID metapackages
• 3: install and/or upgrade the metapackages on the release testbed
• 4: if there are errors, correct and goto 2:
• 5: publish the new metapackages on the official repositories so they are available for everybody
Enabling Grids for E-sciencE
24
Metapackages management
• our metapackages are supersets of the EGEE ones: – INFNGRID metapackage = EGEE metapackage + INFNGRID
additional rpms• EGEE distributed metapackages
– http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/rhel30• Flat rpm lists are available:
– http://glite.web.cern.ch/glite/packages/R3.0/deployment• We maintain a customized copy of the lists and resync them easily
– https://forge.cnaf.infn.it/plugins/scmsvn/viewcvs.php/trunk/ig-metapackages/tools/getglists?rev=1888&root=igrelease&view=log
• Using another tool (bmpl) we can generate all artifacts starting from the lists
– “Our” (INFNGRID) customized metapackages http://grid-it.cnaf.infn.it/apt/ig_sl3-i386
– HTML files with the lists of the packages (one list per profile) http://grid-it.cnaf.infn.it/?packages
– Quattor templates lists: http://grid-it.cnaf.infn.it/?quattor
Enabling Grids for E-sciencE
25
ig-yaim
• The package ig-yaim is an extension of glite-yaim. It provides:– Additional functions or functions that override existing ones. Both
are stored in functions/local instead of functions/– e.g to configure NTP, AFS, LCMAPS gridmapfile/groupmapfile, ..
• More poolaccounts => ig-users.def instead of users.def• More configuration parameters => ig-site-info.def instead
of site-info.def– Both packages (glite-yaim, ig-yaim) are needed!!
Enabling Grids for E-sciencE
26
Documentation
• Documentation is published at each release– Release notes, upgrade and installation guides:
http://grid-it.cnaf.infn.it/?siteinstall http://grid-it.cnaf.infn.it/?siteupgrade http://grid-it.cnaf.infn.it/?releasenotes
written in LaTeX and published in html, pdf and txt• Additional informations about Updates, various
Notes are published also in wiki pages:– https://grid-it.cnaf.infn.it/checklist/modules/dokuwiki/doku.php?
id=rel:updates– https://grid-it.cnaf.infn.it/checklist/modules/dokuwiki/doku.php?
id=rel:hlr_server_installation_and_configuration
• Everything is available for site managers on a central repository
Enabling Grids for E-sciencE
27
gLite Updates:17/10/2006 - gLite Update 0620/10/2006 - gLite Update 0724/10/2006 - gLite Update 08
14/11/2006 - gLite Update 09
11/12/2006 - gLite Update 1019/12/2006 - gLite Update 11
22/01/2007 - gLite Update 1205/02/2007 - gLite Update 1319/02/2007 - gLite Update 1426/02/2007 - gLite Update 15…….…….
INFNGRID Updates: 27/10/2006 - INFNGRID Update 06/07/08(+ new dgas, gridice packages)
15/11/2006 - INFNGRID Update 09
19/12/2006 - INFNGRID Update 10/11
29/01/2007 - INFNGRID Update 1214/02/2007 - INFNGRID Update 1320/02/2007 - INFNGRID Update 1427/02/2007 - INFNGRID Update 15…………
Steps:– gLite Update announcement– INFNGRID release alignment to announced update
(ig-metapackages, ig-yaim)– Local testing– IT-ROC deployment
UpdatesUpdates deployment – Since the introduction of gLite3.0, from EGEE there where no more big release changes, but a series of smaller frequent updates (about weekly) – INFNGRID release was updated consequently
Enabling Grids for E-sciencE
28
INFNGRID services Overview
INFNGRID Services Overview
Enabling Grids for E-sciencE
29
The general web portal
Enabling Grids for E-sciencE
30
The technical web portal
Enabling Grids for E-sciencE
31
General Purpose Services
Enabling Grids for E-sciencE
32
General purpose services – VOMS servers
Enabling Grids for E-sciencE
33
VOMSes Stats
VO User
argo 17
bio 44
compchem 31
enea 8
eumed 56
euchina 35
gridit 89
inaf 25
infngrid 178
ingv 12
libi 10
pamela 16
planck 16
theophys 20
virgo 9
Cdf 1133
Egrid 28
VOMS NUMBER OF USERS PER VO
TOP USERS (about 85% of total proxies):
CDF (~50k proxies/month)
EUMED (~500 proxies/month)
PAMELA (~500 proxies/month)
EUCHINA (~400 proxies/month)
INFNGRID (Test purposes ~ 200 proxies/month)
Enabling Grids for E-sciencE
34
General purpose Services - HLRs
Accounting:Home Location Register
• DGAS (Distributed Grid Accounting System) is used to account jobs running on the farm (grid and not-grid jobs)
• 12 HLR (1st level) distributed
• 1 experimental 2nd level HLR to aggregate data from 1st level
• DGAS2Apel used to send job to the GOC for all sites.
Enabling Grids for E-sciencE
35
VOs Dedicated Services
New DEVEL-INFNGRID-3.1 WMS and LB are coming soon as VO dedicated services into production (atlas, cms, cdf, lhcb)
VO specific services previously run
by the INFNGRID Certification
Testbed and now moved to
production DEVEL RELEASE
A total of 18 VO dedicated services that will become 25 with the introduction of the 3.1 WMS and LB
Enabling Grids for E-sciencE
36
FTS channels and VOs
• Installed and fully managed via Quattor-Yaim;• 3 hosts as frontend, 1 backend oracle cluster;• Not only LHC VOs
– PAMELA – VIRGO
• Full standard T1-T1 + T1-T2 + STAR channels– 51 channel agents;– 7 VO agents;
• (A prototype of) Monitoring tool available – Agent and Tomcat log file parsing and saved in a mysql db– Web interface: http://argus.cnaf.infn.it/fts/index-FTS.php
• Support:– Dedicated department team for Tickets;– Mailing list: fts-support<at>cnaf.infn.it
Enabling Grids for E-sciencE
37
FTS transfer overview
Enabling Grids for E-sciencE
38
Testbeds
M/W FLOW FROM DEVELOPERS
TO PRODUCTION
IN EGEE AND INFNGRID
Enabling Grids for E-sciencE
39
TESTBEDS•Preview•Certification CERN•Certification INFN•Pre-Production Service (PPS)
Testbeds
JRA1Developers
SA3(Certification
CERN)
SA1PPS
(Pre-Production)
SA1EGEE PS(Production)
VOs VOs
JRA1/SA1Preview TB
VOs
INFNCertification
TB
VOs
INFNGRIDRelease
Team
SA1INFNGRID PS
(Production)
SA1INFNGRID PS
(DEVEL Production)
VOs
Enabling Grids for E-sciencE
40
• AIM:the last step for m/w testing before being deployed at the production scale
• INPUT: CERN Certification (SA3)
• SCOPE: EGEE SA1 about 30 sites spread all over Europe (1 Taiwan)
• COORDINATION: CERN
• USER ALLOWED: all the LHC VOs, diligent, switch and 2 PPS fake VOs
• CONTACTS : project-eu-egee-pre-production-service<at>cern.ch
http://egee-pre-production-service.web.cern.ch/egee-pre-production-service/
• ACTIVITIES: Main activity is the testing of the installation procedures and basic
functionalities of releases/patches done by site-managers.
There is limited m/w testing done by users: this is the main pps issue!
Pre-Production Service (PPS) in EGEE
Enabling Grids for E-sciencE
41
• PPS is run as the Production Service:
– SAM TESTs
– Tickets from COD
– GOCDB registration
– Etc…
Pre-Production Service (PPS) in EGEE
Enabling Grids for E-sciencE
42
Italian Participation to PPS
• 3 INFN sites:• CNAF• PADOVA• BARI
• 2 Diligent sites:• CNR• ESRIN cert-se-01cert-ce-01
150 slots production farm
cert-ce-03
cert-rb-01 cert-bdii-03
ALL OTHER PPS SITES OUTSIDE INFNprep-ce-02 prep-se-01
68 slots production farm
prep-ce-01
pccms2vgridba5
150 slots production farm
cert-voms-01 pps-fts pps-lfc
cert-ui-01 pps-apt-repo
cert-mon-01
cert-mon
CNAF
BARI
PADOVA
Central Services
CNAF: 2 CE with access to the production farm, 1 SE, 1 mon box + central services (VOMS, UI, BDII, WMS, FTS, LFC, APT REPO)
people: D.Cesini, M.Selmi, D.DongiovanniPADOVA: 2 CE with access to the production farm, 1 SE, 1 Mon Box
people: M.Verlato, S.BertoccoBARI: 1 CE with access to the production farm, 1 SE
people: G.Donvito
Enabling Grids for E-sciencE
43
Preview Testbed
•It is now an official EGEE activity asked by JRA1 to expose to users those components not yet considered by CERN (SA3) certification. The aim is getting feedback from end-users and sitemanagers.
•It is a distributed testbed deployed in few European sites.
•A joint SA1-JRA1 effort is needed in order not to dedicate persons at 100% of their time to this activity as acknowledged by TCG and PMB
•COORDINATOR : JRA1 (Claudio Grandi)
•USER ALLOWED: JRA1/Preview people and all interested users
•CURRENT ACTIVITIES: CREAM, gLexec, gPBox
•CONTACTS : project-eu-egee-middleware-preview<at>cern.chhttps://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLitePreviewNowTesting
Enabling Grids for E-sciencE
44
Italian Participation to the Preview Testbed
• 3 INFN sites:- CNAF (D.Cesini, D.Dongiovani)- PADOVA(M.Sgaravatto, M.Verlato, S.Bertocco)• ROMA1 (A. Barchiesi)
H/W resources are partly taken from the INFN certification testbed and partly from the jra1 testbed.
cert-wn-04
egee-rb-05 cert-bdii-02
egee-rb-08
cert-ce-04 cert-ce-05
cert-wn-05
pre-ce-01
cert-wn-03
cert-se-01
cert-04
pad-wn-02
rm1-ce
pre-ui-01 cert-pbox-01
cert-pbox-02
cream-01
cream-02 cream-03 cream-04
cream-05 Cream-06
ALL OTHER PREVIEW SITES OUTSIDE INFN
rm1-wn
cert-ce-06
cert-wn-06
CNAF
PADOVA
Central Services
ROMA1
Physical nodes that run virtual services
Preview services deployed in Italy:PADOVA: 1 CREAM CE + 5 WNCNAF: 1 WMS 3.1, 1 BDII, 1 gLiteCE+ 1 WN, 1 UI, 1 DPM-SE
(for gpbox) 1 WMS3.1 + 2 gLiteCE + 1 LCG CE + 3 WN + 2 gpbox servers
ROMA1: 1 CE + 1 WN for gpbox tests (to be installed)
Virtual machines used at cnaf to optimize h/w resources
Enabling Grids for E-sciencE
45
•EGEE Activity run by SA3 – It is the official EGEE certification testbed that releases gLite m/w to PPS and to Production.
•ACTIVITY: Test and certify all gLite components, release packaging.
•COORDINATION: CERN
•INFN Involved Sites: CNAF (A.Italiano), MILANO (E.Molinari), PADOVA (A.Gianelle)
•Italian Activities: Testing of Information providers, DGAS, WMS
CERN Certification (SA3)
Services provided: 1 lsf CE + 1 batch system server on a dedicated machine + 1 DGAS HLR + 1 site BDII + 2 WN.
All services are located at CNAF.
wmstest-ce-02
wmstest-ce-03
wmstest-ce-04
wmstest-ce-05
wmstest-ce-06
wmstest-ce-07
wmstest-ce-08
SA3 CERN Certification testbed INFN participation
CNAF
Recently the responsibility of WMS testing passed
from CERN to INFN – Main Focus of SA3-Italia
Enabling Grids for E-sciencE
46
Distributed testbed deployed in a few Italian sites where EGEE m/w with INFNGRID customizations and INFNGRID grid products are installed for testing purposes by a selected number of end users and grid-managers before being released.
It is NOT an official EGEE activity and it should not be confused with the CERN certification testbed run by the SA3 EGEE activity.
Most of the server migrated to the PREVIEW TESTBED
SITES and PEOPLE: CNAF(D.Cesini, D.Dongiovani)PADOVA(S.DallaFina, C., Aifitimiei, M.Verlato)TORINO(R.Brunetti, G.Patania,F.Nebiolo)ROMA1 (A.Barchiesi)
•CONTACTS : cert-release<at>infn.ithttp://grid-it.cnaf.infn.it/certification
INFNGRID Certification Testbed
Enabling Grids for E-sciencE
47
WMS (CNAF) No more time to perform detailed test as in the first phase of the
certification tb.( https://grid-it.cnaf.infn.it/certification/?INFN_Grid_Certification_Testbed:WMS%2BLB_TEST )
Provide resources to VOs or developers and maintain patched and experimental WMS:Experimental WMS 3.0:
- 1 ATLAS WMS- 1 ATLAS LB- 1 CMS WMS + LB- 1 CDF WMS + LB- 1 LHCB WMS + LB
WMS for developers:- 2 WMS + LB
The Experimental WMS were heavily used in the last period because more stable than those officially released due to the long time needed for patches to reach the PS:
- bad support from certification- production usage statistics altered
recently tagged as INFNGRID DEVEL (see next slide) PRODUCTION services
Support to JRA1 for the installation of WMS 3.1 in the development TB
ACTIVITIES / 1
INFNGRID Certification Testbed
Enabling Grids for E-sciencE
48
DGAS CERTIFICATION (TORINO)- 4 physical servers virtualized in a very dynamic way
DEVEL RELEASE (PADOVA/CNAF):-To speed up the flow of patches into the service used by VOs, does not follow the normal m/w certification process-Based on the INFNGRID official release (3.0)-Wiki page on how to transform a normal INFNGRID release into a DEVELhttp://agenda.cnaf.infn.it/materialDisplay.py?contribId=4&materialId=0&confId=18
-apt repository to maintain control on what is going into the DEVEL release-1 WMS Server at CNAF- Announced via mail after testing at CNAF-Cannot come with all the guarantees of normally certified m/w
INFNGRID Certification Testbed
ACTIVITIES / 2
Enabling Grids for E-sciencE
49
RELEASE INFNGRID CERTIFICATON (PADOVA)- 20 Virtual Machines on 5 Physical Servers- http://igrelease.forge.cnaf.infn.it
STORM – Some resources Provided- 3 physical servers
SERVER VIRTUALIZATION (all sites)
INFNGRID Certification Testbed
ACTIVITIES / 3
Enabling Grids for E-sciencE
50
cert-rb-02
cert-rb-03
cert-rb-04
cert-rb-05
cert-rb-06
egee-rb-06
cert-bdii-01
Ce torino
server roma1
INFN Certification services
cert-wn-01 cert-ce-02
cert-wn-02
ibm139
Experimental Patched
WMSPASSED TO
DEVEL PROD or used by
JRA1
Virtualization Tests
Reources provided to STORM test
cert-wn-03
Release DEVEL
Release1
Release2 Release4
Release3 Release5
Ce torino
Ce torino
Release INFNGRID
5 Physical servers X 4 VM = 20 VM
DGAS test
CNAF
PADOVA
ROMA1
TORINO
cert-rb-07
Egee-rb-04
Testbed snapshot
Ce torino
INFNGRID Certification Testbed
Enabling Grids for E-sciencE
51
VIRTUAL GRID ’NEW’Create a self contained grid using old T1 h/w resource to be dedicated to WMS tests:
- Total control of what is installed - No interference with the production grid (altered statistics, site-managers
complaining in case of stuck jobs, no production cpu wasting)
INFNGRID Certification Testbed
Physical servicesUnder the
developpers control
Virtual SitesExact deployment
is under study, probably:
1 LCG CE and 1 WN per physical
box
WMSLB
BDIICE WNCE WNCE WNCE WNCE WNCE WNCE WNCE WNCE WNCE WN
37 Physical Box available per RACK (2 racks available)
Dual PIII 1.4 GHz 2 GB RAM - box dedicated to virtual sites, services can be installed on more powerful machines
A Virtual Site Prototype is already installed on a couple of Boxes
We are investigating the performance that can be reached with this kind of hardware/deployment
Enabling Grids for E-sciencE
52
Monitoring and Accounting
Monitoring and
Accounting Tools used by the ROC
Enabling Grids for E-sciencE
53
Monitoring
GridICE:
http://gridice4.cnaf.infn.it:50080/gridice/site
Developed by INFN
Several servers with different scopes are
installed and maintained by
the IT-ROC
Enabling Grids for E-sciencE
54
GSTAT:
http://goc.grid.sinica.edu.tw/gstat//Italy.html
Developed out of INFN
A GSTAT server is maintained by
the IT-ROC
Monitoring
• GSTAT queries the Information System every 5 minutes
• The sites and nodes checked are those registered in the GOC DB
• The inconsistency of the information published and the eventual missing of a service that a site should publish are reported as an error
Enabling Grids for E-sciencE
55
SAM:
https://lcg-sam.cern.ch:8443/sam/sam.py
SAM-ADMIN:
https://cic2.gridops.org/samadmin/
Is the CERN-EGEE official testing tool, tests are performed by jobs submitted to sites. Submission is triggered by an admin web interface. A mirror of the web interface is hosted at CNAF and maintained by the IT-ROC.
Monitoring
Enabling Grids for E-sciencE
56
ROCRep && HLRMON:
http://grid-it.cnaf.infn.it/rocrep/index.php
http://grid-it.cnaf.infn.it/hlrmon/index.php
(Data about all VOs, all sites, T1 excluded)
Web interface to obtain aggregated Grid usage data. Two versions exists:
1)Data taken from the GridiceDB
2)Data taken from DGAS HLR DB – a new interface is being released
Accounting
Enabling Grids for E-sciencE
57
GOC ACCOUNITNG SYSTEM:http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php
Data from the HLR server are accounted into the GOC system through the dgas2apel tool
Accounting
Enabling Grids for E-sciencE
58
Users and Sites Support
Support
Enabling Grids for E-sciencE
59
• The IT-ROC offers a number of grid services and controls their
correct operation. But not only….
• The IT-ROC also continuously monitors the status of the sites
inside the ROC itself and in case of problems helps site managers
or users to find a solution.
• As a parallel activity the IT-ROC is also involved in the monitoring
and support of the entire EGEE infrastructure (TPM and COD) – The
same support activity to users and sites given to the INFNGRID is
given to the LCG/EGEE Grid on a round robin manner among the
ROCs
Support
Enabling Grids for E-sciencE
60
Users and sites support
The main tools to give support to users are the ticketing systems:
EGEE make use of the GGUS (Global Grid User Support) ticketing system (www.ggus.org)
Each ROC uses different tools interfaced to GGUS in a bidirectional Way.
By means of Web services, it is possible to:Transfer tickets from the global to regional systemTransfer tickets from the regional to the global system
Once tickets are logged they are assigned to a proper support unit either in GGUS either in the regional systems
The IT-ROC ticketing system is based on XOOPS/xHelp
Enabling Grids for E-sciencE
61
Interface to GGUS
Web Portal
GGUS System
GGUS/TPM
ROC-1 Helpdesk
ROC-1 Interface
Ticket solved
Ticket assignment to ROC-1
SU-1SU-2
SU-N
ROC-X Helpdesk
ROC-X Interface
SU-1SU-2
SU-N
Ticket re-assigned
Enabling Grids for E-sciencE
62
Interface to GGUS
A new ticket arrives from GGUS
We assign the ticket to the site concerning it
Enabling Grids for E-sciencE
63
Interface to GGUS
The site reassigns the ticket to GGUS…
…and adds a response
Enabling Grids for E-sciencE
64
IT-ROC Control Shifts
About 20 supporters perform a monitoring activity composed by 2 shifts per day,
from Monday to Friday, with 2 persons per shift. At the end of the shift a report is produced.
During the shift the supporters:
• Check the Grid status and try to discover problems before the users. In case of problems open tickets to the interested department in order to find a solution. If he/she is able suggests a possible solution.
• Perform sites certification during the deployment phases
• Check the status of tickets and urges experts or site-managers to give answers and solutions to them
Enabling Grids for E-sciencE
65
IT-ROC Shifts ISSUES
• The ROC monitoring is oriented to the infrastructure and not to the VOs
• The active monitoring done via test jobs (i.e. the SAM tool) uses 3 VOs dedicated to infrastructure testing: dteam, ops and infngrid that in general have greater priority on sites the side effect of this is that VO specific problems are not observed. Passive controls (i.e. gstat and gridice) are not affected by this problem.
• The infrastructure test can be ok, but users can experience problems as well.
The actual control shift organization seems to be insufficient for the VOs needs and the LHC VOs are already performing their own tests (VO dashboards) in order to face this situation.
Enabling Grids for E-sciencE
66
IT-ROC Shifts ISSUES
Both the Italian and the European experiences in Grid monitoring show that it is necessary to integrate the infrastructure oriented monitoring with a more VO specific monitoring But just in INFNGRID we have about 40 VOs !!
Collaboration between the ROC and the people involved in the VO dashboards is desirable, at least to define a set of controls that are important for the VOs, but still not performed by the ROC
Enabling Grids for E-sciencE
67
TPM and COD
TPM (Ticket Process Manager): is responsible of the right ticket assignment in the central GGUS system. When a ticket is logged it is automatically assigned to the TPM group that routes the ticket to the proper support unit or, if able, proposes a solution. The whole ticket life is under the control of the TPM that can at any time modify the ticket urging for an answer or solution. Each ROC performs 1 week shift on a round robin cycle.
COD (CIC On Duty): the same monitoring done for the INFNGRID infrastructure is done for the EGEE infrastructure using the same tools (i.e. GSTAT, SAM, GRIDICE, GGUS) and some COD specific tools (i.e. COD dashboard)
The Italian ROC is involved also in the monitoring and support of the entire LCG/EGEE infrastructure. It participates to the TPM and COD activities.
Enabling Grids for E-sciencE
68
Procedures
Managing procedures
Enabling Grids for E-sciencE
69
Introducing a new site
• Before joining the INFNGRID, a site have to accept several rules, described in a Memorandum of Understanding (MoU). The COLG (Grid Local Coordinator) read and sign it, and they fax this document to INFN-CNAF.
• Moreover all sites must provide this email alias: grid-prod@<domain>. This alias will be used to report problems and it will be added to the site managers' mailing list. Of course it should include all site managers of your grid site.
• The IT-ROC registers the site and site-managers in the GOC-DB, and create a supporter-operative group in the ticketing system XOOPS.
• Site-managers have to register themselves in XOOPS, so they can be assigned to their supporter-operative groups; each site-manager has to register in the test VOs infngrid and dteam
• Site-managers install the middleware, following the instructions distributed by the Release Team (http://grid-it.cnaf.infn.it/ Installation section) . When finished, they make some preliminary test (http://grid-it.cnaf.infn.it/ --> Test&Cert --> Fry) and then they make the request for the ROC certification
(http://grid-it.cnaf.infn.it/index.php?id=cmtreport&type=1).• IT-ROC log a ticket to communicate with site-managers during the
certification.
Enabling Grids for E-sciencE
70
MoU for sites
Every site have to:
• Provide computing and storage resources. Farm dimensions (at least 10 cpu) and storage capacity
will be agreed with each site
• Guarantee sufficient man power to manage the site: at least 2 persons
• Manage efficently the site resources: middleware installation and upgrade, patch application,
configuration changes as requested by CMT and do that by the maximum time stated for the several
operation
• Answer to the ticket by 24 hours (T2) or 48 hours (other sites) from Monday to Friday
• Check from time to time own status
• Guarantee continuity to site management and support, also in holidays period
• Partecipate to SA1/Production-Grid phone conferences an meetings and compile weekly pre report
• Keep updated the information on the GOC DB
• Enable test VOs (ops, dteam and infngrid), with a higher priority than other VOs
• Eventual non-fulfilment noticed by ROC will be referred to the biweekly INFNGRID phone
conferences, then to COLG, eventually to EB
Enabling Grids for E-sciencE
71
Introducing a new VO
When an experiment asks to enter in grid as a new VO, it is necessary a formal request
followed by some technical steps.
Formal Part:
• Needed resources and economical contribution to be agreed between the experiment and the INFN
GRID Executive Board (EB)
• Pick out the experiment software and verify it will work in the Grid environment
• Verify the support that it will receive in the several INFN GRID production sites
• Communicate to IT-ROC the names of VO-managers, Software-managers, persons responsible of
resources and of the support for the software experiment
• Software requisites, kind of jobs and of the storage final destination (CASTOR, SE, experiment disk
server)
Enabling Grids for E-sciencE
72
Introducing a new VO
Once the Executive Board (EB) has
approved the experiment request, the
technical part begins:
• IT-ROC creates the VO voms server
• IT-ROC creates the VO support group on the ticketing system
• VO-managers fill in the VO identity card on the CIC portal
• IT-ROC announces the new VO to sites
Enabling Grids for E-sciencE
73
Useful links…
• INFN GRID project: http://grid.infn.it/
• Italian Production grid: http://grid-it.cnaf.infn.it/
• SAM: https://lcg-sam.cern.ch:8443/sam/sam.py
• CIC Portal: http://cic.gridops.org/
• GSTAT: http://goc.grid.sinica.edu.tw/goc/
• GridICE: http://gridice4.cnaf.infn.it:50080/gridice/site/site.php
• GOC Accounting: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php
THANK YOU
top related