southgrid status report pete gronbech: february 2005 gridpp 12 - brunel
TRANSCRIPT
Southgrid Status Report
Pete Gronbech: February 2005
GridPP 12 - Brunel
Southgrid Member Institutions
• Oxford • RAL PPD• Cambridge • Birmingha
m• Bristol• Warwick
Status at Warwick
• No change since Gridpp 11.• Third line institute – no resources as
yet but remain interested in being involved in the future.
• Will not receive GridPP resources and so does not need to sign the MOU yet.
Operational Status
• RAL PPD• Cambridge• Bristol• Birmingham• Oxford
Status at RAL PPD
• Always on the leading edge of software deployment (Benefit of RAL Tier 1)
• SL3 cluster on 2.3.0 worker nodes increasing.
• Legacy service LCG 2.3.0 on RH7.3 (Winding down)
• CPUs: 24 2.4 GHz, 30 2.8GHz– 100% Dedicated to LCG
• 0.5 TB Storage– 100% Dedicated to LCG
Status at Cambridge
• Currently LCG 2.2.0 on RH7.3
• Parallel install of SL3 with 2.3.0 using yaim.
• CPUs: 32 2.8GHz – increase to 40 soon.– 100% Dedicated to
LCG
• 3 TB Storage– 100% Dedicated to
LCG
Status at Bristol
• Status– LCG involvement limited (“black dot”) for previous six months
due to lack of manpower– New resources, posts now on the horizon!
• Existing resources– 80-CPU BaBar farm to be switched to LCG– ~ 2TB storage resources to be LCG – accessible– LCG head nodes installed by SouthGrid support team with 2.3.0
• New resources– Funding now confirmed for large University investment in
hardware– Includes CPU, high quality and scratch disk resources
• Humans– New system manager post (RG) being filled– New SouthGrid support / development post (GridPP / HP) being
filled– HP keen to expand industrial collaboration – suggestions?
Status at Birmingham
• Currently LCG 2.2 (since August).
• Currently installing SL3 on Gridpp Frontend Nodes, will use yaim to install LCG-2_3_0
• CPUs: 22 2.0GHz Xenon (+48 soon)– 100% LCG
• 2 TB Storage awaiting “Front End Machines”– 100% LCG.
• Southgrid’s “Hardware Support Post”Yves Coppens appointed.
Status at Oxford
• Currently LCG 2.3.0 on RH7.3• Parallel SL3 install, will use yaim to
install 2.3.0 asap• CPUs: 80 2.8 GHz
– 100% LCG• 1.5 TB Storage – upgrade to 3TB
planned– 100% LCG.
Two racks each containing 20 Dell dual 2.8GHz Xeon’s with SCSI system disks.
1.6TB SCSI disk array in each rack.
Systems are loaded with LCG2 software version 2.3.0
SCSI disks and Broadcom Gigabit Ethernet causes some problems with installation initially.
The systems have been heavily used by the LHCb Data Challenge.
Oxford Tier 2 centre for LHC
First rack in very crowed computer room (650)Second rack currently temporarily located in theoretical physics computer room.
On the limit of power in 650
Air Conditioning not reliable
Problems: Space, Power and Cooling.
A proposal for a new purpose built computer room on Level 1 (underground) is in progress.
CERN Computer Room
Site on Level 1 for proposed computer room
• An ideal location– Lots of power (5000A)– Underground (no heat from
the sun and very secure).– Lots of headroom (false
floor/ceiling for cooling systems)
– Basement (so no floor loading limit)
• False floor, large Air conditioning units and power for approx 50-80 racks to be provided.
• A rack full of 1U servers can create 12KW of heat and use 50A of power.
• Will offer space to other Oxford University departments
DWB computer room project. 26-Nov-2004
Centre of the Racks
LCG2 Administrator’s Course
• A lot of interest in a repeat, especially when the 8.5 “Hardware Support” posts are filled (suggestions welcome).
• PXE / kickstart install vs Quattor…?
Ongoing Issues
• Complexity of the installation. New yaim scripts have helped enormously.
• Difficulty sharing resources – almost all of those listed are 100% LCG due to difficult sharing issues.
• How will we manage clusters without LCFGng? Quattor has a learning curve. Course showed that it is very modular but PXE/kickstart + yaim preferred option at the moment.
• grid certificates supported browsers.