site lightning report: mwt2
DESCRIPTION
Site Lightning Report: MWT2. Mark Neubauer University of Illinois at Urbana-Champaign US ATLAS Facilities Meeting @ UC Santa Cruz Nov 14, 2012. Midwest Tier-2. 0101001011110…. The Team: - PowerPoint PPT PresentationTRANSCRIPT
Site Lightning Report: MWT2
Mark NeubauerUniversity of Illinois at Urbana-Champaign
US ATLAS Facilities Meeting @ UC Santa CruzNov 14, 2012
Midwest Tier-2
2
Three site Tier-2 consortia
0101001011110…
The Team:Rob Gardner, Dave Lesny, Mark Neubauer, Sarah Williams, Illija Vukotic, Lincoln Bryant, Fred Luehring
Midwest Tier-2
3
Focus of this talk:Illinois Tier-2
Tier-2 @ Illinois
4
History of the project:– Fall 2007−: Development/operation of T3gs– 08/26/10: Torre’s US ATLAS IB talk– 10/26/10: Tier2@Illinois Proposal
submitted to US ATLAS Computing Mgmt– 11/23/20: Proposal formally accepted– 10/5/11: First successful test of ATLAS
production jobs run on Campus Cluster(CC)• Jobs read data from our Tier3gs cluster
Tier-2 @ Illinois
5
History of the project (cont):– 03/1/12: Successful T2@Illinois Pilot• Squid proxy cache, Condor head node job
flocking from UC– 4/4/12: First hardware into Taub cluster• 16 compute nodes (dual x5650, 48 GB
memory, 160 GB disk, IB) 196 cores• 60 2TB drives in DDN array 120 TB raw
– 4/17/12: PerfSONAR nodes online
Illinois Tier-2
6
T2onTaub
History of the project (cont)–4/18/12: T2@Illinois in production
Illinois Tier-2
7
Stable operation: Last two weeks
Illinois Tier-2
8
Last day on MWT2:
Why at Illinois?
9
• National Center for Supercomputing Applications (NCSA)
• National Petascale Computing Facility (NPCF): Blue Waters
• Advanced Computation Building– 7000 sq. ft with 70” raised floor– 2.3 MW of power capacity– 250 kW UPS– 750 tons of cooling capacity
• Experience in HEP computing
NCSA Building
ACB
NPCF
Tier-2 @ Illinois
10
• Deployed in a shared campus cluster (CC) in ACB– “Taub” first instance of CC– Tier2@Illinois on Taub in production within MWT2
• Pros (ATLAS perspective)– Free building, power, cooling, core
infrastructure support w/ plenty of room for future expansion
– Pool of Expertise, heterogeneous HW– Bulk Pricing important given DDD
(Dell Deal Demise)– Opportunistic resources
• Challenges– Constraints on hardware, pricing, architecture, timing
Tier-2 @ Illinois
11
Current CPU and disk resources:• 16 compute nodes (taubXXX)– dual x5650, 48 GB memory, 160 GB disk, IB) 196
cores ~400 js
• 60 2TB drives in Data Direct Networks (DDN) array 120 TB raw ~70 TB usable
Tier-2 @ Illinois
12
• Utility nodes / services (.campuscluster.illinois.edu):– Gatekeeper (mwt2-gt)• Primary schedd for Taub condor pool
– Flocks other jobs to UC and IU Condor Pools
– Condor Head Node (mwt2-condor)• Collector and Negotiator for Taub condor pool
– Accepts flocked jobs from other MWT2 Gatekeepers
– Squid (mwt2-squid)• Proxy cache for CVMFS, Frontier for Taub (backup for IU/UC)
– CVMFS Replica server (mwt2-cvmfs)• CVMFS replica server for Master CVMFS server
– dCache s-node (mwt2-s1)• Pool node for GPFS data storage (installed, dCache in progress)
Next CC Instance (to be named) Overview
13
• Mix of Ethernet-only and Ethernet + InfiniBand connected nodes– assume 50-100% will be IB enabled
• Mix of CPU-only and CPU+GPU nodes– assume up to 25% of nodes will have GPUs
• New storage device and support nodes– added to shared storage environment– Allow for other protocols (SAMBA, NFS, GridFTP, GPFS)
• VM hosting and related services– persistent services and other needs directly related to
use of compute/storage resources
Next CC Instance (basic configuration)
14
• Dell PowerEdge C8220 2-socket Intel Xeon E5-2670 – 8-core Sandy Bridge processors @ 2.60GHz– 1 “sled” : 2 SB processors– 8 sleds in 4U : 128 cores
• Memory configuration options: – 2GB/core, 4GB/core, 8GB/core
• Options:– InfiniBand FDR (GigE otherwise)– NVIDIA M2090 (Fermi
GPU) Accelerators– Storage via DDN SFA12000– can add in 30TB (raw) increments
Dell C8220 compute sled
Summary and Plans
15
• New Tier-2 @ Illinois– Modest (currently) resource integrated into MWT2 and
in production use– Cautious optimism: Deploying an Tier-2 within a shared
campus cluster a success• Near term plans– Buy into 2nd campus cluster instance• $160k of FY12 funds with 60/40 CPU/disk split
– Continue dCache deployment– LHCONE @ Illinois due to turn on 11/20/12 – Virtualization of Tier-2 utility services– Better integration into MWT2 monitoring