steinbuch centre for computing - scc kit – university of the state of baden-württemberg and...
TRANSCRIPT
STEINBUCH CENTRE FOR COMPUTING - SCC
www.kit.eduKIT – University of the State of Baden-Württemberg andNational Laboratory of the Helmholtz Association
GridKa Site Report
Andreas Petzold
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
GridKa Batch Farm
Univa Grid Engine is running fine
~150kHS06
~10k job slots
98 replacement machines this summerSysGen 2U 4 node chassis
2x Intel Xeon E5-2670 (8-core, 2,6 GHz, 312 HS06) 3GB/core3x 500GB HDD
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
WN Migration to SL6
Migration of GridKa compute fabric to SL6 finished
Performance: +5.4%Intel Xeon E5-2670 (8 cores, 2.6 GHz) HT off / HT on:
SL5 + default compiler: 267 HS06 /335 HS06
SL6 + default compiler: 283 HS06(+5.8%)/348 HS06(+3.9 %)
SL5 + gcc-4.8.1: 289 HS06 /353 HS06
AMD Opteron 6168 (12 cores, 1.9 GHz):
SL5 + default compiler: 183 HS06
SL6 + default compiler: 193 HS06 (+ 5.6 %)
SL5 + gcc-4.8.1: 187 HS06
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
Ivy Bridge Benchmarks
New Intel Ivy Bridge processors on the market (E5-26## v2)Manufacturing process: 0.022 micron
Sandy Bridge: 0.032 micron
Up to 12 cores
Sandy Bridge: up to 8 cores
Increasing HS06 score according to number of cores:
E5-2670 (8 cores, 2.6 GHz, HT on, SL6, default compiler)348 HS06
E5-2670 v2 (10 cores, 2.5 GHz, HT on, SL6, default compiler)411 HS06
Power saving of around 25...30 %
Thanks to DELL for providing test machine
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
Power Efficiency
Power Usage (W) per Performance Score (HS06)
Worker node class machines at GridKa / E5-2670v2 is a test system provided by DELL
0
2
4
6
8
10
12
14
16
18
20
2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
AMD 6168
Intel E5430
Intel L5420
AMD 246
AMD 270
Intel 5160
Intel E5345
Intel E5520(HT on) Intel E5-2670
(HT on)
Wat
ts p
er H
S06
Intel E5-2670v2(HT on)
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
GridKa dCache & xrootd6 production dCache instances + pre-production setup
5 instances running 2.6, 1 running 2.2
9 PB, 287 pools on 58 servers
Upgrade to 2.6 instead of 2.2 recommended by dCache.orglast minute decision one week before planned downtime
full support for SHA-2 and xrootd monitoring
great support from dCache devs
CMS disk-tape separationmost CMS tape pools converted to disk-only pools
last CMS config changes today
GridKa 1st CMS T1 successfully migrated
two xrootd instances for ALICE2.7PB
15 servers
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
GridKa Disk Storage
9x DDN S2AA9900150 enclosures
9000 disks
796 LUNs
SAN Brocade DCX
1x DDN SFA10K10 enclosures
600 disks
1x DDN SFA12K5 enclosures
360 disks
14PB usable storage
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
Evaluating new Storage Solutions
DDN SFA12K-Eallows to run server VMs directly in storage controller
DDN are testing complete dCache instance inside controller
expected benefitsshortening long IO paths: no SAN + FC HBAs, reduced latency
less hardware: less power consumption, improved MTBF
possible drawbackslimited resources in storage controllers for VMs
loss of redundancy
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
DDN SFA12K-E
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
Glimpse at Performance
Preliminary performance evaluationIOZONE testing 30-100 parallel threads on XFS file system
still a lot of work ahead
no tuning
file system + controller setup tuning
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
GridKa Tape Storage
2x Oracle/Sun/STK SL85002x 10088 slots
22 LTO5, 16 LTO4 drives
1x IBM TS35005800 slots
24 LTO4 drives
1x GRAU XL5376 slots
16 LTO3, 8 LTO4
>20k cartridges
17PB
Migration to HPSSplanned for 2014
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
100G WAN at GridKa
Current WAN setup7x10Gb/s links to LHCOPN, LHCONE, German research network, FZU Prague + 1x1Gb/s link to Poznan
participation in 100G tests at SC2013100G equipment provided by CISCO
100G connection provided by DFN, time-shared by Aachen, Dresden, KIT
plan to move LHCOPN, LHCONE to 100G link in 2014replace old Catalyst border routers
procurement of new Nexus 7k with 100G line cards already underway
requires new arrangement of LHCOPN operation between KIT and DFN
INSTITUTS-, FAKULTÄTS-, ABTEILUNGSNAME (in der Masteransicht ändern)
Steinbuch Centre for ComputingAndreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
Configuration Management
Still mostly using Cfengine 2
Middleware services used as testbed for puppetStarted in early 2012
Still based on old homegrown deployment infrastructure “CluClo”
Very smooth operation
Now starting to draw up plans for puppet migration we’d like to try many new things: git integration, deployment management with Foreman, MCollective, …
Will be step by step process
bwLSDF
Andreas Petzold – GridKa Site Report – HEPiX Ann Arbor 2013
News from SCC/KIT outside GridKa
new services for state of Baden-Württemberg run by SCC/KITbwSync&Share
“dropbox” for scientistswinner of software evaluation: PowerFolderstart of production Jan 1st 2014expect active 55k users from all universities, 10GB quota
bwFileStoragesimple/overflow storage for scientific dataaccess via SCP, SFTP, HTTPS (r/o) provided by IBM SONASstart of production Dec 1st 2013
bwBlockStorageiSCSI storage over WAN for universities
all services based on storage hosted at Large Scale Data Facility
bwIDM
The bwIDM Project: Vision
• Federated access to services of the State of Baden-Württemberg
• Access control based on local accounts of the home organizations
„bwIDM is not about establishing IDM systems, it‘s about federating existing IDM systems and services.“
bwLSDFbwCloud
bwArchivebwData
bwHPC
bwIDM
bwServices
20.6.2013 M.Nussbaumer@ISC2013 | Federating HPC access via SAML
Vision: In the state of Baden-Württemberg,
researchers can access decentralized web-
based AND non web-based services by the
use of their local account
15
bwIDM
bwIDM Overview
• bwIDM– …federation of 9 universities of the state of Baden-Württemberg → (non) web-based services– …federates the access to non web-based services such as grid, cloud, and HPC resources.
• LDAP Facade– Deployable, operable, and maintainable approach to federate non web-based services:
→
• LDAP facade makes active use of the SAML-ECP and AssertionQuery profile• LDAP facade offers users a high usability in trustworthy federations• LDAP facade facilitates temporary trust for scientific portals
• Easy-to-deploy solution for service collaborations of universities, research centres or companies• Single registration process per service → service access• Successfully deployed in testing environments
• Deployed Services– Federated HPC Service “bwUniCluster” (8640 cores, 40.8 TIB Ram, IB FDR) going live in Q4/2013– Federated Sync&Share Service going live in Q1/2014
• Any Questions? Feel free to contact me: [email protected].
20.6.2013 16M.Nussbaumer@ISC2013 | Federating HPC access via SAML
If you have to bring non web-based services together with SAML, make use of the LDAP facade!