commodity computing clusters - next generation supercomputers? paweł pisarczyk, atm s. a....
TRANSCRIPT
Commodity Computing Commodity Computing Clusters - next generation Clusters - next generation
supercomputers?supercomputers?
Paweł Pisarczyk, ATM S. A.
Agenda Agenda
• Introduction• Supercomputer classification• Architecture and implementations• Commodity clusters• Processors• Operating systems• Summary
SupercomputerSupercomputer
• „A supercomputer is a device for turning compute-bound problems into I/O-bound problem” - Seymour Cray
• A supercomputer is a computer system that leads the world in terms of processing capacity, particularly speed of calculations, at the time of its introduction.
source: http://en.wikipedia.org
Supercomputer History (1)Supercomputer History (1)
• 1945-50 - Manchester Mark I• 1950-55 - MIT Whirlwind• 1955-60 - IBM 7090 - 210 KFLOPS• 1960-65 - CDC 6600 -10.24 MFLOPS• 1965-70 - CDC 7600 - 32.27 MFLOPS• 1970-75 - CDC Cyber 76
Supercomputer History (2)Supercomputer History (2)
• 1975-80 - Cray-1 - 160 MFLOPS• 1980-85 - Cray X-MP - 500 MFLOPS• 1985-90 - Cray Y-MP - 1.3 GFLOPS• 1990-95 - Fujitsu Numerical Wind Tunnel - 236 GFLOPS• 1995-00 - Intel ASCI Red - 2.150 TFLOPS• 2000-02 - IBM ASCI White, SP Power3 375 MHz - 7.226 TFLOPS• 2002-03 - NEC Earth Simulator - 35 TFLOPS
Supercomputer Classes (1)Supercomputer Classes (1)
• General-purpose supercomputers:– vector processing machines - the same operation
carried out on a large amount of data simultaneously
– tightly connected cluster computers (NUMA) - communication oriented architectures engineered from ground up, based on high speed interconnects and large number of processors
– commodity clusters - collection of large number of commodity PCs (COTS) interconnected by high-bandwidth low-latency network
Supercomputer Classes (2)Supercomputer Classes (2)
• Special-purpose supercomputers - high performance computing devices with a hardware architecture dedicated to solve a single problem (equipped with custom ASICS or FPGA chips)
Examples– Deep Blue– GRAPE for astrophysics
Flynn taxonomy - 1972 (1)Flynn taxonomy - 1972 (1)
• SISD - Single Instruction Single Data (DEC, Sun Microsystems, PC)
• SIMD - Single Instruction Multiple Data– computers with large number o processing units (i.e.
ALUs) - CPP DAP Gamma II, Quadrics Apemille– vector processing machines - NEC SX6, IA32 MMX
• MISD - Multiple Instruction Single Data– theoretical model, no practical implementation
Flynn taxonomy - 1972 (2)Flynn taxonomy - 1972 (2)
• MIMD - Multiple Instruction Multiple Data– SM-MIMD - Shared Memory MIMD
• global address space
• SMP systems and ccNUMA systems
– DM-MIMD - Distributed Memory MIMD• many nodes with local address spaces
• high-bandwidth, low-latency communication
• common NUMA architectures (Non Uniform Memory Access)
• operating system have to be communication oriented (Mach project)
SM-MIMD implementationsSM-MIMD implementations
• S-COMA - Simple Cache-Only Memory Architecture
– common SMP systems
• ccNUMA - Cache Coherent NUMA– SGI Origin 3000– SGI Altix 3000– HP SuperDome
S-COMA (SMP)S-COMA (SMP)
CPU 0
RAM
L2 cache L2 cache L2 cache
CPU 1 CPU N
ccNUMAccNUMA
CPU 0
RAM 0
L2 cache L2 cache
CPU 1
L3 cache
L2 cache
CPU N-1
L2 cache
CPU N
L3 cache
RAM K
ccNUMA implementationccNUMA implementation
SGI Altix 3000 (ccNUMA)• 64 Itanium 2 (IA64) processors• C-brick modules with 2 CPUs and ASIC SHUB• NUMAflex, NUMAlink interconnects (6.4 GB/s,
2.4 GB/s)• Modified Linux kernel (2.6 NUMA support)
DM-MIMD implementationsDM-MIMD implementations
• Massively parallel systems (NUMA)– communication oriented architecture– low-latency, high-bandwidth interconnects– topologies: hypercube, torus, tree– Butterfly networks, Omega networks, engineered from
ground up communication
DM-MIMD implementationsDM-MIMD implementations
• Commodity clusters– a cluster is a collection of connected, independent
computers working in unison to solve a problem– COTS technology– nodes are interconnected by Ethernet LAN, Myrinet,
QsNet ELAN etc.– computation can be performed by using popular
programming toolkits and frameworks: OpenMP, MPI– clusters require dedicated management software
NUMA implementationsNUMA implementations
Cray T3E-1350• Processor: Alpha 21164 675 MHz• Number of CPUs: 40 - 2176• 3-D Torus topology• Operating system: UNICOS/mk - microkernel
based• Peak performance: 3 TFLOPS
Commodity cluster implementation (1)Commodity cluster implementation (1)
Linux Networx/Quadrics• Processor: Intel Xeon 2.4 GHz• CPUs: 2304• Interconnections: QsNet ELAN3• Operating system: Linux + management tools +
Lustre Cluster File System• Peak performance: 7.6 TFLOPS• 3rd computer on TOP500 list• Developed for Lawrence Livermore National
Laboratory in 2002
Commodity cluster implementation (2)Commodity cluster implementation (2)
HP XC6000 Cluster (XC3000 Cluster)• Processor: Intel Itanium 2 6M 1.5 GHz (Intel Xeon 3
GHz)• Node: HP Integrity rx2600 (HP ProLiant DL380)• Number of processors: 34-512• Interconnections: QsNet ELAN3 (Myricom Myrinet
XP)• Operating system: Linux + SSI Middleware +
management tools + Lustre Cluster File System• Peak performance: 34 CPUs - 204 GFLOPS, 512
CPUs - 3 TFLOPS
Commodity Clusters - softwareCommodity Clusters - software
• Operating system - Linux or SSI Linux (Single System Image)
• Platform for specialized applications for science, engineering and business (simulation, modeling, data mining)
• Distributed computation environments are used for software development (OpenMP, MPI)
• Common supercomputer applications require porting to clusters
Performance ScalingPerformance Scaling
Scale-Out
(Cluster)
Scale-Up
(SMP, ccNUMA)
Scale Right
Processors (1)Processors (1)
• Many types of existing processors are used in supercomputers
• Microprocessor development directions:– Increasing of clock frequency and speed instruction
stream processing– Processing of large collection of data in single processor
instruction - SIMD– Control path multiplication – multithreading
Processors (2)Processors (2)
• Vector processors– NEC SX-6– Cray (Cray X1)
• RISC processors– MIPS– IBM Power4– Alpha
• CISC processors– IA32– AMD x86-64
• VLIW processors– IA64
Intel Itanium 2 featuresIntel Itanium 2 features
• State-of-the-art unconventional 64-bit architecture
• New programming model implementing VLIW paradigm
• EPIC technology – Explicitly Parallel Instruction Computing – compiler determines instruction dependency informing processor how to process an instruction stream parallel
• Many registers (128 64-bit), register stack management
• 6 GFLOPS peak performance
• Full advantages of the processor can be used by dedicated compiler
Operating systemsOperating systems
• Monolithic kernel based OSs - UNIX (modification of existing solutions)
– BSD– Solaris– Irix– Linux
• Microkernel based OSs– Mach
Microkernel architectureMicrokernel architecture
Task A Task B
Kernel
Task C
Kernel
HardwareHardware
SummarySummary
• Today’s there is a lot of supercomputer architectures
• Both vector processors and common RISC, CISC, VLIW chips are used for supercomputers
• Commodity clusters under control of Linux OS are an attractive method for supercomputer implementation
TOP 500 list (1)TOP 500 list (1)
1. Earth Simulator, NEC - 35.86 TFLOPS
2. HP Alphaserver SC, HP - 13.88 TFLOPS
3. Linux Networx / Quadrics IA32 - 7.634 TFLOPS
Top 500 list (2)Top 500 list (2)
Source: http://www.top500.org/list/2003/06/