computing at norwegian meteorological institute roar skålin director of information technology...
TRANSCRIPT
Computing atNorwegian Meteorological Institute Roar SkålinDirector of Information TechnologyNorwegian Meteorological [email protected]
CAS 2003 – Annecy 10.09.2003
Norwegian Meteorological Institute met.no
Norwegian Meteorological Institute met.no
Norwegian Meteorological Institute
• Main office in Oslo• Regional offices in Bergen and
Tromsø• Aviation offices at four military
airports and Spitsbergen• Three arctic stations: Jan Mayen,
Bear Island and Hopen• 430 employees (1.1.2004)
Norwegian Meteorological Institute met.no
met.no Computing Infrastructure
SGI O3800512/512/7.2
SGI O3800384/304/7.2
CXFS
Climate Storage2/8/20
Backup ServerSGI O200/2
S-AIT33 TB
DLT20 TB
ProductionDell 4/8 Linux
ProductionDell 2/4 Linux
NetApp790 GB
Scali Cluster20/5/0.3
XX Clustery/y/y
STA5 TB
Storage ServerSGI O2000/2
DLT20 TB
met.no - Oslo NTNU - Trondheim
Switch
Switch
Router Router Router2.5 GBit/s
155 MBit/s
1 GBit/s
100 MBit/s
x/y/z = processors/GB memory/TB disk
Norwegian Meteorological Institute met.no
met.no Local Production Servers
• Dell PowerEgde servers with two and four CPUs
• NetApp NAS
• Linux
• ECMWF Supervisor Monitor Scheduler (SMS)
• Perl, shell, Fortran, C++, XML, MySQL, PostgreSQL
• Cfengine
Production Environment November 2003:
Norwegian Meteorological Institute met.no
Linux replace proprietary UNIX at met.no
Advantages:– Off-the-shelf hardware replace proprietary
hardware• Reduced cost of new servers• Reduced operational costs
– Overall increased stability– Easier to fix OS problems– Changing hardware vendor becomes feasible– Become an attractive IT-employer with highly
motivated employees
Disadvantages:– Cost of porting software– High degree of freedom: a Linux distribution is
as many systems as there are users
Norwegian Meteorological Institute met.no
Data storage: A critical resource
• We may loose N-1 production servers and still be up-and-running, but data must be available everywhere all the time
• We used to duplicate data files, but increased use of databases reduce the value of this strategy
• met.no replace a SAN by a NetApp NAS because:– availability– Linux support– ”sufficient” IO-bandwidth (40-50 MB/s per
server)
Norwegian Meteorological Institute met.no
HPC in Norway – A national collaboration
Norwegian Meteorological Institute met.no
Performance available to met.no
100
1000
10000
100000
1000000
10000000
100000000
1989
1991
1993
1995
1997
1999
2001
2003
Year
Mflop/s
met.no Peak HIRLAM sustained Top500 peak
CRAY X-MP CRAY Y-MP CRAY T3E SGI O3000
Norwegian Meteorological Institute met.no
met.no Production Compute ServersSGI Origin 3800
Embla:
• 512 MIPS R14K processors
• 614 Gflops peak
• 512 GB memory
Gridur:
• 384 MIPS R14K processors
• 384 Gflops peak
• 304 GB memory
Trix OS / LSF batch system
7.2 TB CXFS filesystem
Norwegian Meteorological Institute met.no
Atmospheric models
HIRLAM 20
HIRLAM 10
HIRLAM 5
UM MM5 MM5
Purpose Operation
Operation
Exp Exp Air Poll. Air Poll.
Resolution
20 km40 layers360 s
10 km40 layers240 s
5 km40 layers120 s
3 km38 layers75 s
3 km17 layers9 s
1 km17 layers3 s
Grid Points
468*378 248*341 152*150
280*276
61x76 61x76
Forecast time
60 h 48 h 48 h 48 h 48 h 48 h
Result data 24 h
8 GB 1.2 GB 0.3 GB 2.1 GB 0.1 GB 0.1 GB
Norwegian Meteorological Institute met.no
Oceanograpic models
MIPOM22 ECOM3D WAVE ECOM3D
Purpose Operation
Operation
Operation
Exp
Resolution
4 km17 layers150 s
20+20 km21/5 layers600 s
45/8 km 4/300m km17 layers360/50 s
Grid Points
1022x578
208x120 142x113121x163
390x250
Forecast time
60 h 60 h 60 h 60 h
Result data 24 h
1 GB
Norwegian Meteorological Institute met.no
Production timeline
00:00 04:00 08:00 12:00 16:00 20:00 00:00
HIRLAM20HIRLAM10HIRLAM5UMMM5
ECOM3D/WAVEECOM3DMIPOM22
Norwegian Meteorological Institute met.no
HIRLAM scales, or …?
• The forecast model without I/O and support programs scales reasonably well up to 512 processors on a SGI O3800
• In real life:– data transfer, support programs and I/O has a
very limited scaling– there are other users of the system– machine dependent modifications to increase
scaling has a high maintenance cost for a shared code such as HIRLAM
• For cost-efficient operational use, 256 processors seems to be a limit
Norwegian Meteorological Institute met.no
How to utilise 898 processors operationally?
• Split in two systems of 512 and 384 processors and used as primary and backup system
• Will test a system to overlap model runs based on dependencies:
HIRLAM 20HIRLAM 10
ECOM3D
HIRLAM 5
WAVE
Norwegian Meteorological Institute met.no
Overlapping Production Timeline
00:00 04:00 08:00 12:00 16:00 20:00 00:00
HIRLAM20HIRLAM10HIRLAM5UMMM5
ECOM3D/WAVE
ECOM3DMIPOM22
Norwegian Meteorological Institute met.no
RegClim: Regional Climate Development Under Global Warming
Overall aim:• Produce scenarios for regional climate
change suitable for impact assessment• Quantify uncertaintiesSome keywords:• Involve personell from met.no,
universities and research organisations• Based on global climate scenarios• Dynamical and empircal downscaling• Regional and global coupled models• Atmosphere – ocean – sea-ice
Norwegian Meteorological Institute met.no
Climate Computing Infrastructure
SGI O3000512/512/7.2
SGI O3000384/304/7.2
CXFS
Climate Storage2/8/20
S-AIT33 TB
Para//ab - Bergen NTNU - Trondheim
Router Router2.5 GBit/s
155 MBit/s
IBM Cluster64/64/0.58
IBM p690 Regatta96/320/7
IBM 358412 TB
x/y/z = processors/GB memory/TB disk
Norwegian Meteorological Institute met.no
Climate Storage Server
Low-cost solution:
• Linux server
• Brocade switch
• Nexsan AtaBoy/AtaBeast RAID 19.7 TB
• 34 TB Super-AIT library, tapecapacity 0.5 TB uncompressed
Norwegian Meteorological Institute met.no
GRID in Norway
• Testgrid comprising experimental computers at the four universities
• Globus 2.4 -> 3.0• Two experimental portals: Bioinformatics
and Gaussian• Plan testing of large datasets (Storage
Resource Broker) and metascheduling autumn 2003
• Plan to phase in production supercomputers in 2004