scalability of msc.nastran on compaq alpha clusters
DESCRIPTION
MSC Southern Europe Users Conference Monaco, 7-9 June 2000. Scalability of MSC.NASTRAN on Compaq Alpha Clusters. Yvan BANTOURE Sales Director HPC Europe Compaq Computer Corporation. Deliver maximum sustained application performance & scaling Capability for the largest parallel jobs - PowerPoint PPT PresentationTRANSCRIPT
Better answers
Scalability of MSC.NASTRAN Scalability of MSC.NASTRAN on Compaq Alpha Clusters on Compaq Alpha Clusters
Yvan BANTOUREYvan BANTOURE
Sales Director HPC EuropeSales Director HPC Europe
Compaq Computer CorporationCompaq Computer Corporation
MSC Southern Europe Users Conference MSC Southern Europe Users Conference Monaco, 7-9 June 2000Monaco, 7-9 June 2000
2Presentation nameBetter answers
HPC Strategy HPC Strategy Deliver maximum sustained Deliver maximum sustained
application performance & scaling application performance & scaling Capability for the largest parallel jobsCapability for the largest parallel jobs Capacity for the most multi-job throughputCapacity for the most multi-job throughput
Based on standard products from Based on standard products from Compaq and its partnersCompaq and its partners
64-bit Alpha CPUs64-bit Alpha CPUs Balanced high-performance SMP systemsBalanced high-performance SMP systems Scalable, integrated clusters with single Scalable, integrated clusters with single
system viewsystem view Tru64 UNIX & LINUXTru64 UNIX & LINUX Complete developmentComplete development
environment for HPTCenvironment for HPTC Long-term relationships with the leading Long-term relationships with the leading
technical ISVstechnical ISVs
3Presentation nameBetter answers1 2 4 8 16 32 64 128 256
1
2
4
8
16
32
64
128
256
CPUs per SMPS
MP
s p
er C
lust
er
10GF
100GF
1TF
10TF
Shared Memory
Shared Memory
Distrib
uted
M
emory
Distrib
uted
M
emory
More SMPs/Cluster
More CPUs/SMP
FasterAlphas
2121stst Century Century SupercomputingSupercomputing
4Presentation nameBetter answers
What is the Alpha Strategy?What is the Alpha Strategy?
Keep Alpha the fastest processor in the industryKeep Alpha the fastest processor in the industry Dynamic (runtime) and Static (compile time) OptimizationsDynamic (runtime) and Static (compile time) Optimizations
Maintain long term performance advantageMaintain long term performance advantage High-end / Server-centric focusHigh-end / Server-centric focus
– Memory-bound (real) applicationsMemory-bound (real) applications– Multi-stream parallel workloadsMulti-stream parallel workloads– Balanced SMP system structureBalanced SMP system structure
Extract performance from today’s applications out-of-the-boxExtract performance from today’s applications out-of-the-box Leveraging semiconductor manufacturing capabilities from our Leveraging semiconductor manufacturing capabilities from our
partners (Intel, Samsung ...)partners (Intel, Samsung ...) State-of-the-art semiconductor technologyState-of-the-art semiconductor technology Manufacturing economies of scale Manufacturing economies of scale Minimize manufacturing investmentMinimize manufacturing investment
5Presentation nameBetter answers
Alpha EV67Alpha EV67Performance and FeaturesPerformance and FeaturesPerformancePerformance
667-729 MHz Operating 667-729 MHz Operating FrequencyFrequency
Spectacular Memory BandwidthSpectacular Memory Bandwidth 4+ GB/s L2 Cache Bandwidth4+ GB/s L2 Cache Bandwidth ~2 GB/s Peak Memory ~2 GB/s Peak Memory
BandwidthBandwidth Out-of-order execution masks Out-of-order execution masks
latency of access to memorylatency of access to memory Prefetch instruction lowers Prefetch instruction lowers
cache miss impactcache miss impact
FeaturesFeatures Four-way instruction mapFour-way instruction map
Dynamic executionDynamic execution Register renamingRegister renaming Out-of-Order ExecutionOut-of-Order Execution Quad integer pipelinesQuad integer pipelines Dual floating-point pipelinesDual floating-point pipelines
Enhanced Branch PredictionEnhanced Branch Prediction Motion Video InstructionsMotion Video Instructions Square root & divideSquare root & divide 64 KB I-cache -- 2-Set64 KB I-cache -- 2-Set 64 KB D-cache -- 2-Set64 KB D-cache -- 2-Set CMOS6 -- 0.25u, 2.0V, 6LMCMOS6 -- 0.25u, 2.0V, 6LM
6Presentation nameBetter answers
AlphaServer SMP SystemsAlphaServer SMP Systems
•1- 2 Processors•Up to 8GB of memory•6 PCI slots
Switched based system - 64-bit PCI I/O subsystems - Very Large Memory
Tru64 UNIX, OpenVMS & LINUX
Modular system packaging - advanced systems management
•1-32 Processors•Up to 128+GB of memory•Up to 224+ PCI slots
•1- 4 Processors•Up to 32GB of memory•Up to 10 PCI slots
ES Series
DS Series
GS320 Series
•1- 8 Processors•Up to 32GB of memory•up to 56 PCI slots
GS80 Series•1 Processor•Up to 1GB of memory•Up to 4 PCI slots
DS Series
7Presentation nameBetter answers
Scalable Cluster SolutionsScalable Cluster Solutions
Clustered configurations built from standard Compaq productsClustered configurations built from standard Compaq products Configured specifically for HPTC capacity and capability Configured specifically for HPTC capacity and capability Factory integrated, tested and shipped as a single systemFactory integrated, tested and shipped as a single system
HPC160 and HPC320HPC160 and HPC320 Based on Alpha systems with Tru64 UNIX and TruCluster Based on Alpha systems with Tru64 UNIX and TruCluster
technologytechnology Memory Channel II SAN, up to 8 nodesMemory Channel II SAN, up to 8 nodes
AlphaServer SCAlphaServer SC Based on Alpha systems with Tru64 UNIX and QSW technologyBased on Alpha systems with Tru64 UNIX and QSW technology QSW SAN, up to 128+ nodesQSW SAN, up to 128+ nodes
..
8Presentation nameBetter answers
Parallel Processing EnvironmentParallel Processing Environment
Coordination is the keyCoordination is the key Message Passing protocols keep nodes in touchMessage Passing protocols keep nodes in touch
– Allow the parallel processes to exchange data and Allow the parallel processes to exchange data and coordinate operationscoordinate operations
System Area Networks provide the meansSystem Area Networks provide the means– Switched networking similar to a LAN, but with better Switched networking similar to a LAN, but with better
bandwidth and latencybandwidth and latency– Each node communicates directly with every other Each node communicates directly with every other
nodenode
9Presentation nameBetter answers
Compaq HPC320Compaq HPC320Scalable TruClusterScalable TruCluster
AlphaServer ES40 quad-AlphaServer ES40 quad-CPU serversCPU servers
4 or 8 nodes, 16 or 32 CPUs 4 or 8 nodes, 16 or 32 CPUs per systemper system
Memory Channel II System Memory Channel II System Area NetworkArea Network
8-way Crossbar8-way Crossbar ~100 MB/s/rail~100 MB/s/rail One or two rails / nodeOne or two rails / node
Multiple storage optionsMultiple storage options UltraSCSIUltraSCSI FibreChannelFibreChannel
MemoryChannel II
MemoryChannel II
Dual-Railed
10Presentation nameBetter answers
TruCluster Software V5.0TruCluster Software V5.0“The cluster is the system”“The cluster is the system”
Easier management…Single System Image
Cluster File System
Cluster Alias
Easier Management
/
/usr /var /...
/.../... /.../... /...
11Presentation nameBetter answers
What’s next:What’s next:The AlphaServer SC ProgramThe AlphaServer SC Program
Use multiple generations of standard Use multiple generations of standard commercial off the shelf components commercial off the shelf components to build scalable computer systemsto build scalable computer systems
Alpha processor and AlphaServersAlpha processor and AlphaServers Scalable QSW system area networkScalable QSW system area network Tru64 UNIX and TruCluster software Tru64 UNIX and TruCluster software QSW resource management software (RMS)QSW resource management software (RMS)
Memory hierarchy able to support high sustained performance of Memory hierarchy able to support high sustained performance of scientific applicationsscientific applications
System softwareSystem software Cluster and parallel file systemCluster and parallel file system Comprehensive resource management (RMS, LSF, etc.)Comprehensive resource management (RMS, LSF, etc.) Rich development environmentRich development environment
Focus on application scalability and system integration Focus on application scalability and system integration
12Presentation nameBetter answers
System Area Interconnect bySystem Area Interconnect byQuadrics Supercomputers World Ltd.Quadrics Supercomputers World Ltd.
Elan-3 PCI adapterElan-3 PCI adapter DMA drivenDMA driven Get and put Get and put 200 MB/s/rail bi-directional200 MB/s/rail bi-directional
Elite “fat tree” switchElite “fat tree” switch 8-way x-bar chips8-way x-bar chips 16 or 128 port package16 or 128 port package Up to 20m cablesUp to 20m cables < 40 ns switch latency per hop< 40 ns switch latency per hop
Multiple virtual circuits Multiple virtual circuits and load balancingand load balancing
Low latencyLow latency 3 3 s DMA/shmems DMA/shmem 5 5 s MPI ping/pongs MPI ping/pong
13Presentation nameBetter answers
Lawrence Livermore National LabsLawrence Livermore National LabsAlphaServer SC: 512 Alpha 500MHz CPUsAlphaServer SC: 512 Alpha 500MHz CPUs
#34#34Top500 ListTop500 List
Nov ‘99Nov ‘99
14Presentation nameBetter answers
Application Development ToolsApplication Development Tools From Compaq and ISVsFrom Compaq and ISVs
Message passing libraries (highly optimized)Message passing libraries (highly optimized)– PVM, MPI, SHMEMPVM, MPI, SHMEM
Optimizing CompilersOptimizing Compilers– Fortran, C, C++Fortran, C, C++
Parallel development toolsParallel development tools– PSE, Totalview, KPTS, VampirPSE, Totalview, KPTS, Vampir
Scientific and math librariesScientific and math libraries– CPML, CXML, NAG, IMSLCPML, CXML, NAG, IMSL
Workload management softwareWorkload management software– LSF, Codine/GRD, QSW RMSLSF, Codine/GRD, QSW RMS
Cross-platform developmentCross-platform development– Compaq Enterprise Toolkit for Developer’s StudioCompaq Enterprise Toolkit for Developer’s Studio
15Presentation nameBetter answers
Direct Frequency Response - LGQDF Direct Frequency Response - LGQDF
MSC.NASTRAN V70.7 (DMP)MSC.NASTRAN V70.7 (DMP)
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
1 2 4 8 16
CPUs
Ela
pse
d T
ime
(s)
Compaq HPC320 (500MHz)
16Presentation nameBetter answers
Linear Statics - XLRSTLinear Statics - XLRSTMSC.NASTRAN V70.7 (DMP)MSC.NASTRAN V70.7 (DMP)
0
200
400
600
800
1000
1200
1 2 4 8
CPUs
Ela
ps
ed
Tim
e (
s)
Compaq HPC320 (500MHz)
17Presentation nameBetter answers
Direct Frequency Response – XLTDFDirect Frequency Response – XLTDFMSC.NASTRAN V70.7 (DMP)MSC.NASTRAN V70.7 (DMP)
0
5000
10000
15000
20000
25000
30000
35000
1 2 4 8 16
CPUs
Ela
ps
ed
Tim
e (
s)
Compaq HPC320 (500MHz)
18Presentation nameBetter answers
Normal Modes - XXCMD Normal Modes - XXCMD MSC.NASTRAN V70.7 (DMP)MSC.NASTRAN V70.7 (DMP)
0
10000
20000
30000
40000
50000
60000
1 2 4CPUs
Ela
ps
ed
Tim
e (
s)
Compaq HPC320 (500MHz)
19Presentation nameBetter answers
How to delivering real supercomputer How to delivering real supercomputer performance for MSC.NASTRANperformance for MSC.NASTRAN
Start with the right hardwareStart with the right hardware Fast 64-bit processorsFast 64-bit processors Balanced SMP systemsBalanced SMP systems High performance, fully integrated, clustersHigh performance, fully integrated, clusters
Use the right software toolsUse the right software tools Highly optimizing compilersHighly optimizing compilers Optimized MPI implementationsOptimized MPI implementations System and workload management toolsSystem and workload management tools
Work closely with leading software developers like Work closely with leading software developers like MSC.SoftwareMSC.Software
http://www.compaq.com/hpchttp://www.compaq.com/hpc
[email protected]@compaq.com