computer architecture guidance keio university amano, hideharu hunga@am . ics . keio . ac ....
TRANSCRIPT
![Page 1: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/1.jpg)
Computer Architecture Guidance
Keio University
AMANO, Hideharu
hunga@am . ics . keio . ac . jp
![Page 2: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/2.jpg)
Contents
Techniques of two key architectures for future system LSIs Parallel Architectures Reconfigurable Architectures
Advanced uni-processor architecture→ Special Course of Microprocessors (by Prof. Y
amasaki, Fall term)
![Page 3: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/3.jpg)
Class
Lecture using Powerpoint: (70mins. ) The ppt file is uploaded on the web site
http://www.am.ics.keio.ac.jp, and you can down load/print before the lecture.
When the file is uploaded, the message is sent to you by E-mail.
Textbook: “Parallel Computers” by H.Amano (Sho-ko-do)
Exercise (20mins.) Simple design or calculation on design issues
![Page 4: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/4.jpg)
Evaluation
Exercise on Parallel Programming using SCore on RHiNET-2 (50%)
Exercise after every lecture (50%)
![Page 5: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/5.jpg)
Computer Architecture 1Introduction to Parallel Architectures
Keio University
AMANO, Hideharu
hunga@am . ics . keio . ac . jp
![Page 6: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/6.jpg)
Parallel Architecture
Purposes Classifications Terms Trends
A parallel architecture consists of multiple processing units which work simultaneously.
![Page 7: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/7.jpg)
Boundary between Parallel machines and Uniprocessors
ILP(Instruction Level Parallelism) A single Program Counter Parallelism Inside/Between instructions
TLP(Tread Level Parallelism) Multiple Program Counters Parallelism between processes and jobs
Uniprocessors
Parallel MachinesDefinition
Hennessy & Petterson’s Computer Architecture: A quantitative approach
![Page 8: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/8.jpg)
Increasing of simultaneous issued instructions vs. Tightly couplingSingle pipeline
Multiple instructions issue
Multiple Threads execution
Connecting Multiple Processors
Shared memory, Shared register
On chip implementation
Tightly Coupling
Performance improvement
![Page 9: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/9.jpg)
Purposes of providing multiple processors Performance
A job can be executed quickly with multiple processors Fault tolerance
If a processing unit is damaged, total system can be available : Redundant systems
Resource sharing Multiple jobs share memory and/or I/O modules for cost
effective processing : Distributed systems Low power
High performance with Low frequency operation
Parallel Architecture: Performance Centric!
![Page 10: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/10.jpg)
Flynn’s Classification
The number of Instruction Stream : M(Multiple)/S(Single)
The number of Data Stream : M/S SISD
Uniprocessors ( including Super scalar 、 VLIW ) MISD : Not existing ( Analog Computer ) SIMD MIMD
![Page 11: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/11.jpg)
SIMD (Single Instruction StreamsMultiple Data Streams
Instruction
InstructionMemory
Processing Unit
Data memory
•All Processing Units executes the same instruction•Low degree of flexibility•Illiac-IV/MMX ( coarse grain )•CM-2 type ( fine grain )
![Page 12: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/12.jpg)
Two types of SIMD
Coarse grain : Each node performs floating point numerical operations ILLIAC-IV , BSP,GF-11 Multimedia instructions in recent high-end CPUs Dedicated on-chip approach: NEC’s IMEP
Fine grain : Each node only performs a few bits operations
ICL DAP, CM-2 , MP-2 Image/Signal Processing Connection Machines extends the application to Artificial I
ntelligence (CmLisp)
![Page 13: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/13.jpg)
A processing unit of CM-2
A B F
C
256bit memory
s c
OP
Flags
Context
1bit serial ALU
![Page 14: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/14.jpg)
Element of CM2
P P P PP P P PP P P PP P P P
Router
4x4 Processor Array
12links 4096 Hypercubeconnection
256bit x 16 PE RAM
Instruction
LSI chip
4096 chips =64K PE
![Page 15: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/15.jpg)
Thinking Machines’ CM2 (1996)
![Page 16: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/16.jpg)
The future of SIMD
Coarse grain SIMD A large scale supercomputer like Illiac-IV/GF-11 will not
revive. Multi-media instructions will be used in the future. Special purpose on-chip system will become popular.
Fine grain SIMD Advantageous to specific applications like image
processing General purpose machines are difficult to be built
ex.CM2 → CM5
![Page 17: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/17.jpg)
MIMD
Processors
Memory modules (Instructions ・ Data )
•Each processor executes individual instructions•Synchronization is required•High degree of flexibility•Various structures are possible
Interconnectionnetworks
![Page 18: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/18.jpg)
Classification of MIMD machines Structure of shared memory UMA(Uniform Memory Access Model )
provides shared memory which can be accessed from all processors with the same manner.
NUMA(Non-Uniform Memory Access Model )provides shared memory but not uniformly accessed.
NORA/NORMA ( No Remote Memory Access Model )provides no shared memory. Communication is done with
message passing.
![Page 19: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/19.jpg)
UMA The simplest structure of shared memory machin
e The extension of uniprocessors OS which is an extension for single processor ca
n be used. Programming is easy. System size is limited.
Bus connected Switch connected
A total system can be implemented on a single chip
On-chip multiprocessorChip multiprocessorSingle chip multiprocessor
![Page 20: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/20.jpg)
An example of UMA : Bus connected
PU PU
SnoopCache
PU
SnoopCache
PU
SnoopCache
Main Memory
shared bus
SnoopCache
SMP(Symmetric MultiProcessor)
On chip multiprocessor
![Page 21: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/21.jpg)
Switch connected UMA
Switch
Local Memory
CPU
Interface
Main Memory
. . . .
… .
The gap between switch and bus becomes small
![Page 22: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/22.jpg)
NUMA Each processor provides a local memory,
and accesses other processors’ memory through the network.
Address translation and cache control often make the hardware structure complicated.
Scalable : Programs for UMA can run without modification. The performance is improved as the system size.
Competitive to WS/PC clusters with Software DSM
![Page 23: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/23.jpg)
Typical structure of NUMA
Node 1
Node 2
Node 3
Node 0 0
1
2
3
InterconnectonNetwork
Logical address space
![Page 24: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/24.jpg)
Classification of NUMA Simple NUMA :
Remote memory is not cached. Simple structure but access cost of remote memory is large.
CC-NUMA : Cache Coherent Cache consistency is maintained with hardware. The structure tends to be complicated.
COMA:Cache Only Memory Architecture No home memory Complicated control mechanism
![Page 25: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/25.jpg)
Cray’s T3D: A simple NUMA supercomputer (1993)
Using
Alpha 21064
![Page 26: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/26.jpg)
The Earth simulator(2002)
![Page 27: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/27.jpg)
SGI Origin
Hub Chip
Main Memory
Network
Bristled Hypercube
Main Memory is connected directly with Hub Chip1 cluster consists of 2 PE.
![Page 28: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/28.jpg)
SGI’s CC-NUMA Origin3000(2000)
Using R12000
![Page 29: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/29.jpg)
DDM(Data Diffusion Machine )
... ... ... ...
D
![Page 30: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/30.jpg)
NORA/NORMA
No shared memory Communication is done with message
passing Simple structure but high peak performance
The fastest processor is always NORA(except The Earth Simulator)
Hard for programming
Inter-PU communications Cluster computing
![Page 31: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/31.jpg)
Early Hypercube machine nCUBE2
![Page 32: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/32.jpg)
Fujitsu’s NORA AP1000(1990)
Mesh connection SPARC
![Page 33: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/33.jpg)
Intel’s Paragon XP/S(1991)
Mesh connection i860
![Page 34: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/34.jpg)
PC Cluster
Beowulf Cluster (NASA’s Beowulf Projects 1994, by Sterling) Commodity components TCP/IP Free software
Others Commodity components High performance networks like Myrinet / Infiniban
d Dedicated software
![Page 35: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/35.jpg)
RHiNET-2 cluster
![Page 36: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/36.jpg)
Terms(1)
Multiprocessors : MIMD machines with shared memory ( Strict definition :by Enslow J
r.) Shared memory Shared I/O Distributed OS Homogeneous
Extended definition: All parallel machines ( Wrong usage )
![Page 37: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/37.jpg)
Terms ( 2) Multicomputer
MIMD machines without shared memory, that is NORA/NORMA
Arraycomputer A machine consisting of array of processing elements :
SIMD A supercomputer for array calculation (Wrong usage)
Loosely coupled ・ Tightly coupled Loosely coupled: NORA, Tightly coupled: UMA But, are NORAs really loosely coupled??
Don’t use if possible
![Page 38: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/38.jpg)
Stored programmingbased
Others
SIMDFine grain Coarse grain
MIMD
Bus connected UMASwitch connected UMA
NUMASimple NUMACC-NUMACOMA
NORA
Systolic architectureData flow architectureMixed controlDemand driven architecture
Multiprocessors
Multicomputers
Classification
![Page 39: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/39.jpg)
Systolic Architecture
Data x
Data y
Computational Array
Data streams are inserted into an array of special purpose computing node in a certain rhythm.
Introduced in Reconfigurable Architectures
![Page 40: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/40.jpg)
Data flow machines
x
+
x
+
a bc
d e
(a+b)x(c+(dxe))
A process is driven by the data
Also introduced in Reconfigurable Systems
![Page 41: Computer Architecture Guidance Keio University AMANO, Hideharu hunga@am . ics . keio . ac . jp](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649cb15503460f949761e8/html5/thumbnails/41.jpg)
Exercise 1
In this class, a PC cluster RHiNET is used for exercising parallel programming. Is RHiNET classified into Beowulf cluster ? Why do you think so?
If you take this class, send the answer with your name and student number to [email protected]