Download - John Gordon [email protected] CCLRC e-Science Centre LCG Deployment in the UK John Gordon GridPP10
• You’ve heard about LCG…
• … so what’s happening in the UK?
• LCG Deployment, now and future
• The wider UK picture
• ….and what’s this EGEE?
• The Plan
In LCG Context
A. Management Structure
ARDA
Ex
pm
tsEG
EE LCG
Deployment Board
Tier1/Tier2,Testbeds,
Rollout
Servicespecification& provision
User Board
Requirements
ApplicationDevelopment
Userfeedback
Metadata
Workload
Network
Security
Info. Mon.
PMB
CB
Storage
Recent LCG
• Tier1 +10 other sites• DCs• Tier2 structure• Support structure• GOC Monitoring• LCG Accounting
GridPP Summary: From Prototype to Production
BaBar
D0CDF
ATLAS
CMS
LHCb
ALICE
19 UK Institutes
RAL Computer Centre
CERN ComputerCentre
SAMGrid
BaBarGrid
LCG
EDGGANGA
EGEE
UK PrototypeTier-1/A Centre
CERN PrototypeTier-0 Centre
4 UK Tier-2 Centres
LCG
UK Tier-1/ACentre
CERN Tier-0Centre
200720042001
4 UK Prototype Tier-2 Centres
ARDA
Separate Experiments, Resources, Multiple
Accounts 'One' Production GridPrototype Grids
Vision
• GridPP2 should deliver a production quality grid • Meeting the computing needs of UK Particle
Physics• Autonomous and self-supporting with its own
identity• Participating in LCG, EGEE, BaBarGrid,
SAMGrid, and any others desired by its members
• Part of an integrated UK Grid• Independent but integrated, separate but
seamless
Delivery Plans
• Keep up with LCG
• Participate in LHC Data Challenges
• TierA for BaBar and BaBarGrid
• Participate in LCG Service Challenges
• Use by other VOs
• Put in place the structure to deliver this
• …..and more
Production Team
• Deployment
• User Support
• Middleware Support
• Applications Support
• Network Support
• Security
• Operations
UK Tier-2 Centres
NorthGrid ****Daresbury, Lancaster, Liverpool,Manchester, Sheffield
SouthGrid *Birmingham, Bristol, Cambridge,Oxford, RAL PPD, Warwick
ScotGrid *Durham, Edinburgh, Glasgow
LondonGrid ***Brunel, Imperial, QMUL, RHUL, UCL
Current UK Status:11 Sites via LCG
Tier2 Centres
• UK model of distributed Tier2 Centres• Managerial and organisational ‘centre’• Tier2 is free to organise internally
– so I cannot describe yet• Tier2 is smaller than an EGEE Region
– but some aspects of the model may be useful (their own VO? own RB?)
• May hide some of the internal structure CE, GIIS?
Deployment
• A Team to roll out software across UK
• Software release certification, installation support, site certification
• Specialist support for sysadmins
• Consists of staff from T1 + T2
User Support
• Migrate from mailing list to problem-tracking
• From sysadmin support to user support
• Managed Helpdesk
– for assignment, tracking, escalation
• We already have a lot of experience
– we haven’t encapsulated it in FAQs etc
Middleware, Security and Network Development
M/S/N builds upon UK strengths as part of International development
Configuration Management
Storage Interfaces
Network Monitoring
Security
Information Services
Grid Data Management
SecurityMiddleware
Networking
Middleware Support
• GridPP2 Middleware development should have an emphasis on delivery and support
• Middleware teams should support their software area
• T2 assigned 5 specialist support posts
• Integrate support effort into Production Team
Applications Support
• Stephen Burke – roaming support
• 2 T1 experiment-facing people
• UK experiments
• Get deployment and middleware support working with experiments
– to ensure successful UK involvement in experiments’ use of Grid.
Network Support
• Mark Leese (CCLRC-DL)– Rolled out network monitoring to UK Core e-Science
programme– GridPP2 role in network support– Network optimisation– Participation in service challenges– Hopefully using lightpaths
Security
• New Security Officer (to be appointed)– Security operations
• Consultants – Kelsey - Joint EGEE-LCG Security– Jensen – technical advice to CA/
middleware– McNab – e-Science Security Centre
• Track UK developments (Permis, Shibboleth)
Grid Operations
GOC
GOC GridSite MySQL
Resource CentreResources & Site Information
EDG, LCG-1, LCG-2, …
ce
se
bdii
rb
Monitoring
Secure Database Management via HTTPS / X.509
RC
Operations
• LCG Operations centre• EGEE ROC• Monitor GridPP (and NGS and
GridIreland)• Developed tools for LCG, reuse for
GridPP• Continue developing for EGEE• EGEE CIC running grid-wide services• Accounting
LCG Core Accounting
1.00E+00
1.00E+02
1.00E+04
1.00E+06
1.00E+08
1.00E+10
TAIPEI NIKHEF CNAF RAL FZK CERN CAM
Base CPU Time (Seconds)
Alice
Atlas
cms
d0
LHCb
dteam
Wider Support
• GSC
– UK helpdesk
– UK E-Science CA
• Training
– Our own and EGEE(NeSC)
Other UK Grids
• NGS– National Grid Service– 4 large clusters + 2 UK Supercomputers– Already using VDT and BDII
• ETF– Developing UK OGSA/WSRF Grid
• UK Grid Operations Centre Director– Speaking next
• Should all be part of EGEE
EGEE
• UK/I Region in EGEE covers GridPP, NGS, and Grid Ireland – one of 10 regions
• EGEE’s aim is to integrate national grids– Not to interfere or impose limits on them
• All of the work I have described, short of actually running the Resource Centres, is EGEE work – Many sites are actually signed up to EGEE so we can
report it formally as such– Many of you will be asked to report work to EGEE
(timesheets, quarterly reports) but this shouldn’t be an imposition
• The development of GridPP will be aligned with EGEE– But EGEE is not well defined, so we plan GridPP and
participate in the developing EGEE to learn, adopt, and influence.
EGEE Issues
• EGEE=LCG?
– non-European sites in LCG
– non-LCG sites in EGEE
• Platform Support
– non-Linux, free linux (cf RHEL)
• Integrated user support
• Support for new VOs
• Security, security, security
The Next Steps
• Just appointed Jeremy Coles
– as GridPP Production Manager
• Grid Definition
– define GridPP,
– get buy-in of stakeholders
• Production Team
– build the team
• Workplan
Production Manager Tasks
• Develop work plan (deliverables/milestones)• Compile problems and issues list (implement
tracking)• Organise a GridPP deployment group workshop• Better establish GridPP identity – address UK specific
needs• Review/develop operating procedures to maintain
GridPP service• Get GridPP more involved at UK/experiment software
meetings• Coordinate UK Tier-2 resource input to LCG and
EGEE• Work with other grids to establish a single production
grid.
Running a production service: areas to be reviewed and developed
Main areas to be considered (transparency, control, accountability, security, improvement)
• Grid accounting– Who needs to know what and in what form? Where are the gaps in LCG accounting?
• Grid monitoring– Service-level management tools. Efficiency of resource usage. Replication issues.– Detailed metrics to be agreed – Real-time notification and problem resolution
• Management & reporting– Grid management: VO setup procedures; adding new Tier-2 resources– Frequency, structure and content of reports to be agreed (e.g. resource usage, job success rates against targets)
• Security– Processes and procedures (e.g. incident handling)– Mechanics of trust model defined: identity, privacy, policy and authority. (e.g how are rights revoked. Appeals.)– Misuse of resources (intrusion), user & usage audits
• Support – Installation (joining) requirements/guidelines– integration & helpdesk requirements– Library – deployment documentation. User feedback – mechanism to inform future developments
• Training– For new GridPP users and new operations staff
• Middleware release strategy (and stabilisation!)• Tier-2 management
– Service levels (SLAs/MoUs to be developed)– Resource, quota and priority handling
• Resource– Maintenance plans
• Audit– Of Grid usage by user/VO
Vision
• GridPP2 should deliver a production quality grid • Meeting the computing needs of UK Particle
Physics• Autonomous and self-supporting with its own
identity• Participating in LCG, EGEE, BaBarGrid,
SAMGrid, and any others desired by its members
• Part of an integrated UK Grid• Independent but integrated, separate but
seamless
Challenge
• LCG has given us a good base
– We now have a critical mass based on LCG2
• Make it production quality grid
• Attract the satellite grids UKQCD, BaBar,
– And bring in other experiments
• Participate fully in LCG and EGEE
– Without alienating non LHC experiments
Can we do it?
Yes, we can!