cern computing fabric status lhcc review , 19 th november 2007
Post on 07-Jan-2016
34 Views
Preview:
DESCRIPTION
TRANSCRIPT
-
CERN Computing Fabric Status
LHCC Review , 19th November 2007
Bernd Panzer-Steindel, CERN/IT
-
Coarse grain functional differences in the CERN computing fabric
T0 Central Data Recording, first pass processing, tape migration, data export to the Tier 1 sites
CAF selected data copies from the T0 (near real-time) , calibration and alignment, analysis, T1/T2/T3 functions means something different for each experiment
Both contain the same hardware, distinction is done via logical configurationsin the Batch system and the Storage system
CPU nodes for processing ( ~65% of the total capacity for the T0)Disk server for storage ( ~40% of the total capacity for the T0)Tape libraries, tape drives and tape serverService nodesIngredients
Bernd Panzer-Steindel, CERN/IT
-
~linear growth rates ! underestimates ?experience/past shows exponential growth ratesGrowth rates based on the latest experiment requirements at CERN
Bernd Panzer-Steindel, CERN/IT
-
Tendering preparations for the 2008 purchases started in May 2007
Deliveries of equipment have started, first ~100 nodes have arrived and are being installed
More deliveries spread over the next 4 month
heavy logistic operations ongoing :
preparation for the installation of ~2300 new nodes
racks, rack preparations, power+console+network cabling, shipment and unpacking, installation (physical and logical), quality control and burn-in tests
preparations to retire ~1000 old nodes during the next few month
Preparations for 2008 I
Bernd Panzer-Steindel, CERN/IT
-
Resource increase in 2008
more than doubling the amount of CPU resources ~ 1200 CPU nodes (~4 MCHF)
2. increasing the disk space by a factor 4 ~ 700 Disk servers ( ~6 MCHF)
The experiment requirements for disk space this year were underestimated. We had to increase disk space by up to 50% during the various data challenges and productions.
3. increase and consolidation of redundant and stable services ~ 350 service nodes (~3 MCHF)
Grid services (CE, RB, UI, etc.) , Castor, Castor data bases, condition data bases, VO-boxes, experiment specific services (bookkeeping, production steering, monitoring, etc.), build server dont underestimate the service investments !
Preparations for 2008 II
Bernd Panzer-Steindel, CERN/IT
-
Power and CoolingCurrent computer center has a capacity of 2.5 MW for powering the nodes and 2.5 MW cooling capacity. A battery based UPS system allows for 10 min autonomy for the 2.5 MW.
Power for critical nodes is limited to about 340 KW (capacity backed-up by the CERN diesel generators). no free capacity left (DB systems, network, AFS, Web, Mail, etc.)
We will reach ~2 MW already in March 2008 and will not be able to host the full required capacity in 2010
Started activities already more than a year ago, slow progressnow active discussion between IT PH and TS
Identify building in Prevessin (Meyrin does not have enough power available)and start preparations for infrastructure upgrade
Budget is already foreseen
Bernd Panzer-Steindel, CERN/IT
-
based on the latest round of requirement gathering from the experiments during the late summer period.
includes provisioning money for a new computer center
presented to the CCRB on the 23rd of October
CPU server, Disk storage, Tape storage and infrastructure, service nodes,Ortacle Data Base infrastructure, LAN and WAN network, testbeds, new CC costsspread over 10 yearssmall deficit but within the error of the cost predictionsMaterial Budget
[MCHF]20082009201020112012Material budget31.223.422.222.222.2Balance0.9-1.3-1.9-0.1-1.2
Bernd Panzer-Steindel, CERN/IT
-
Processors I cost of a full server nodecost of a separate single processor
Bernd Panzer-Steindel, CERN/IT
-
less than 50% of a node costs are the processors plus memory
2007 was special year , heavy price war between INTEL and AMD, INTEL pushing their quad-cores (even competing with their own products)
new trend dual motherboard per 1u unit, very good power supply power efficiencies, as good as for blades
our purchases will consist out of these nodes with a possibility of getting also bladesProcessors II
Bernd Panzer-Steindel, CERN/IT
-
Technology trends :
aim to have a two year cycle now architecture improvements and structure reduction (45nm products already announced by INTEL) multi-core 2 3 4 6 8
BUTwhat to do with the expected billion transistors and multi-cores ?
the market is not clear wide spread activities of INTEL e.g.\
-- initiatives to get multithreading into the software, quite some time away, complicated especially debugging (we have a hard time to get our simple programs to work)
-- co-processors (audio, video, etc.)
-- merge of CPU and GPU (graphics) , AMD + ATI combined processors, NVIDIA use GPU as processor, INTEL move graphics to the cores -- on the fly re-programmable cores (FPGA like)
not clear where we are goingspecialized hardware in the consumer area change of price structure for usProcessors III
Bernd Panzer-Steindel, CERN/IT
-
Memory I
Bernd Panzer-Steindel, CERN/IT
-
still monthly fluctuations in costs, up and down
large variety of memory modules frequency and latency 533 and 667 MHz about 10% cost difference, factor 2 for 1 Ghz higher frequency goes along with higher latency CAS
how does HEP code depend on memory speed ?
DDR3 upcoming , more expensive in the beginning
is 2 GB per core really enough ?Memory II
Bernd Panzer-Steindel, CERN/IT
-
Disk storage Icost of a full disk server nodecost of a separate single disk
Bernd Panzer-Steindel, CERN/IT
-
Trends
cost evolution of single disks is still good ( ~ factor 2 per year, model dependent)
lots of infrastructure needed upgrade of CPU and memory footprint of applications : RFIO, Gridftp, buffers, new functions, checksums, RAID5 consistency checks, data integrity probes
need disk space AND spindles use smaller disks or buy more increase overall costs
solid-state-disks, much more expensive (factor ~50) data base area
hybrid disks good for VISTA (at least in the future, does not work yet) but higher price e.g. new Seagate disks + 256 MB flash == + 25% costs general trend for notebooks cant profit in our environment seldom cache reuse
Disk storage II
Bernd Panzer-Steindel, CERN/IT
-
The physical network topology (connections of nodes to switches and routers)is defined by space, electricity, cooling and cabling constraintsNetwork routerService NodesDisk ServerCPU ServerInternal Network I
Bernd Panzer-Steindel, CERN/IT
-
Changing access patterns, high aggregate IO on the disk serversCPU serverDisk serverInternal Network II3000 nodes running 16000 concurrent physics applications are trying to access 1000 disk servers with 22000 disksLogical network topology
Bernd Panzer-Steindel, CERN/IT
-
Need to upgrade the internal network infrastructure :decrease the blocking factor on the switches = spread the existing serversover more switches
Changes since the 2005 LCG computing TDR :
disk space increased by 30 %
concurrent running applications increased by a factor 4 (multi-core technology evolution)
computing model evolution, more high IO applications (calibration and alignment, analysis)
doubling the number of connections (switches) to the network Core routers which as a consequences requires also to double the number of routers
additional investment of 3 MCHF in 2008 (already approved by finance committee)Internal Network III
Bernd Panzer-Steindel, CERN/IT
-
Batch Systemsolved with the upgrade to LSF 7andhardware upgrade of LSF control nodesMuch improved response time, removedthrottling bottlenecksaverage about 75000 jobs/daypeak value 115000 jobs/dayup to 50000 in the queue at any timetested with 500000 jobssome scalability and stability problems in spring and early summer
Bernd Panzer-Steindel, CERN/IT
-
Tape StorageToday we have :
10 PB of data on tape, 75 million files
5 Silos with ~30000 tapes, ~5 PB free space
120 tape drives (STK and IBM)
during the last month we have 3PB written to tape and 2.4 PB read from tape small files and spread of data sets over too many tapes caused very highmount load in the silosspace increase to 8 free PB in the next 3-4 monthmore drives to cope with high recall rates and small files need Castor improvements
Bernd Panzer-Steindel, CERN/IT
-
CASTOR much improved stability andperformance during the summerperiod (Castor Task Force)CMS CSA07ATLAS export tests and M5 runregular running at nominal speed(with 100% beam efficiency assumed)
very high load on the disk serverssmall scale problems observed identified and fixed (Castor+Experiment)complex patterns and large numberof IO streams require more disk spacefor the T0 (probably factor 2)successful coupling of DAQ and T0 for LHCb, ALICE and ATLAS (not yet 100%nominal speed)CMS is planned for the beginning ofnext year
Bernd Panzer-Steindel, CERN/IT
-
Data ExportATLAS successfully demonstrated for several days their nominaldata export speed (~1030 MB/s) all in parallel to the CMS CSA07 exercise
no Castor issues, no internal network issues
Bernd Panzer-Steindel, CERN/IT
-
Data Management Enhance Castor disk pool definitions activity in close collaboration with the experiments, new Castor functionalities are now available (access control) avoid disk server overload, better tape recall efficiencies
Small files creating problems for the experiment bookkeeping systems and the HSM tape system need Castor improvements in the tape area (some amount of small files will be unavoidable) Experiments are investing into file merging procedures creates more IO streams and activity, needs more disk space
Data integrity the deployment and organization of data checksums needs more work will create more IO and bookkeeping
CPU and data flow efficiency To increase the efficiencies one has to integrate the 4 large functional units much closer (information exchange).
Bernd Panzer-Steindel, CERN/IT
-
SummaryLarge scale logistic operation ongoing for the 2008 resource upgrades
Very good Castor performance and stability improvements
Large scale network (LAN) upgrade has started
Successful stress tests and productions from the experiments (T0 and partly CAF)
Power and cooling growth rate requires a new computer center, planning started
Bernd Panzer-Steindel, CERN/IT
top related