angèle simard canadian meteorological center meteorological service of canada msc computing update
TRANSCRIPT
2
Canadian Meteorological
Centre
National Meteorological Centre for Canada
Provide services to National Defence and Emergency organizations
International WMO commitments ( emergency response, Telecomm, data, etc…)
Supercomputer used for NWP, climate, air quality, etc… ; operations and R&D
5
NECSX-532M2
NECSX-4/80M3
NEC SX-3/44R
Cray1SCDC
176
CrayXMP 416
CDC 7600
NEC SX-3/44
NEC SX-6/80M10
1
10
100
1000
10000
100000
1000000
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
MF
LO
PF
s (P
EA
K)
CrayXMP 28
MSC’s Supercomputing History
6
CMC Supercomputer Infrastructure
Supercomputer
High Speed Network
1000 Mb/s switched128 ports. Each host has links
to ops & dev nets
SGI O300: 8 PEs
1 TB4xFC
Central File Server
LSIRAID
ADIC AML-E (Tape Robot)
145 TB4 DST drives20 MB/s ea.
Front Ends
NetappF840.05 TB
LinuxCluster
12 DL380
NECSX-6/80M10
3.6 TB RAIDFC switched
128 ports
SGIOrigin 3000
8
0
20
40
60
80
100
120
140
160
180
200
1996 1997 1998 1999 2000 2001 2002
YEAR
TB
Archive Size
New data Written
Read
Archive Statistics
11
RFP/Contract Process Competitive process
Formal RFP released in mid-2002
Bidders need to meet all mandatory requirements to make it to next phase
Bidders were rejected if bids above the funding envelope
Bidders were rated on performance (90%) and price (10%)
IBM came first
Live preliminary benchmark at IBM facility
Contract awarded to IBM in November 2002
12
1.042
2.47996
5.89251
0
1
2
3
4
5
6
0 2.5 5
Initial, Upgrade & Optional Upgrade
Su
stai
ned
TF
lop
s
IBM
IBM Committed Sustained Performance in MSC TFlops
13
On-going Support
The systems must achieve a minimum of 99 % availability.
Remedial maintenance 24/7: 30 minute response time between 8 A.M. and 5 P.M. on weekdays. One hour response outside above periods.
Preventive maintenance or engineering changes: maximum of eight (8) hours a month, in blocks of time not exceeding two (2) or four (4) hours per day (subject to certain conditions) .
Software support: Provision of emergency and non-emergency assistance.
14
CMC Supercomputer Infrastructure
High Speed Network
1000 Mb/s switched128 ports. Each host has links
to ops & dev nets
SGI O300: 8 PEs
1 TB4xFC
Central File Server
LSIRAID
ADIC AML-E (Tape Robot)
145 TB4 DST drives20 MB/s ea.
Front Ends
NetappF840.05 TB
LinuxCluster
12 DL380
SGIO3000's24 PEs
(MIPS R12K)
Supercomputers
IBM SP -832 PE & 1.6TB
15
Supercomputers: Now and Then…
NEC
– 10 node
– 80 PE
– 640 GB Memory
– 2 TB disks
–640 Gflops (peak)
IBM
– 112 nodes
– 928 PE
– 2.19 TB Memory
– 15 TB Disks
– 4.825 Tflops (peak)
17
Transition to IBM
Prior to conversion:
Developed and tested code on multiple platforms (MPP & vector)
Where possible, avoided proprietary tools
When required, hid proprietary routines under a local API to centralize changes
Following contract award, obtained early access to the IBM (thanks to NCAR)
18
Transition to IBM
Application Conversion Plan:
Plan was made to comply to tight schedule
Plan’s key element included a rapid implementation of an operational “core” (to test mechanics)
Phased approach for optimized version of models
End to End testing scheduled to begin mid-September
19
Scalability
Rule of Thumb
To obtain required wall-clock timings:
Require 4 times more IBM processors to obtain same performance with NEC SX-6
20
Timing (min) Configuration NEC IBM NEC IBM
GEMDM regional 38 34 8 MPI 8 MPI x4 OpenMP (48 hour) (32 total)
GEMDM Global 35 36 4 MPI 4 MPI x4 OpenMP (240 hour) (16 total) 3D-Var(R1) 11 09 4 6 CPU OpenMP (without AMSU-B) 3D-Var (G2) 16 12 4 6 CPU OpenMP (with AMSU-B)
Preliminary Results
22
Global System
Now: Uniform resolution of 100 km (400 X 200 X 28) 3D-Var at T108 on model levels, 6-hr cycle Use of raw radiances from AMSUA, AMSUB and GOES
2004: Resolution to 35 km (800 X 600 X 80) Top at 0.1 hPa (instead of 10 hPa) with additional AMSUA and AMSUB channels 4D-Var assimilation, 6-hr time window with 3 outer loops at full model resolution and inner loops at T108 (cpu equivalent of a 5- day forecast of full resolution model) new datasets: profilers, MODIS winds, QuikScat
2005+: Additional datasets (AIRS, MSG, MTSAT, IASI, GIFTS, COSMIC) Improved data assimilation
23
Regional System
Now: Variable resolution, uniform region at 24 km (354 X 415 X 28) 3D-Var assimilation on model levels at T108, 12-hr spin-up
2004: Resolution to 15 km in uniform region (576 X 641 X 58) Inclusion of AMSU-B and GOES data in assimilation cycle Four model runs a day (instead of two) New datasets: profilers, MODIS winds, QuikScat
2006: Limited area model at 10 km resolution (800 X 800 X 60) LAM 4D-Var data assimilation Assimilation of Radar data
24
Ensemble Prediction System
Now: 16 members global system (300 X 150 X 28) Forecasts up to 10 days once a day at 00Z Optimal Interpolation assimilation system, 6-hr cycle,use of derived radiance data (Satems)
2004: Ensemble Kalman Filter assimilation system, 6-hr cycle,
use of raw radiances from AMSUA, AMSUB and GOES Two forecast runs per day (12Z run added) Forecasts extended to 15 days (instead of 10)
25
...Ensemble Prediction System
2005:
Increased resolution to 100 km (400 X 200 X 58) Increased members to 32 Additional datasets such as in global deterministic system
2007:
Prototype regional ensemble 10 members LAM (500 X 500 X 58) No distinct data assimilation; initial and boundary conditions from global EPS
26
Mesoscale System
Now: Variable resolution, uniform region at 10 km (290 X 371 X 35)
Two windows; no data assimilation
2004: Prototype Limited area model at 2.5 km (500 X 500 X 45)
over one area
2005: Five Limited area model windows at 2.5 km (500 X 500 X 45)
2008: 4D data assimilation
27
Coupled Models
Today
In R&D: coupled atmosphere, ocean, ice, wave, hydrology, biology & chemistry
In Production: storm surge, wave
Future
Global coupled model for both prediction & data assimilation