lect2 classification

Upload: petru-topala

Post on 02-Mar-2016

16 views

Category:

Documents


2 download

DESCRIPTION

classification of computer architecture

TRANSCRIPT

Parallel computer architectures

Parallel computer architecture classificationHardware ParallelismComputing: execute instructions that operate on data.

Flynns taxonomy (Michael Flynn, 1967) classifies computer architectures based on the number of instructions that can be executed and how they operate on data.ComputerInstructionsDataFlynns taxonomySingle Instruction Single Data (SISD)Traditional sequential computing systemsSingle Instruction Multiple Data (SIMD)Multiple Instructions Multiple Data (MIMD)Multiple Instructions Single Data (MISD)Computer ArchitecturesSISDSIMDMIMDMISDSISDAt one time, one instruction operates on one dataTraditional sequential architecture

SIMDAt one time, one instruction operates on many dataData parallel architectureVector architecture has similar characteristics, but achieve the parallelism with pipelining.Array processors

Array processor (SIMD)IPMARMEMORYMDRADDROPDECODERA1B1C1A2B2C2ANBNCNALUALUALUMIMDMultiple instruction streams operating on multiple data streamsClassical distributed memory or SMP architectures

MISD machineNot commonly seen.Systolic array is one example of an MISD architecture.

Flynns taxonomy summarySISD: traditional sequential architectureSIMD: processor arrays, vector processorParallel computing on a budget reduced control unit costMany early supercomputersMIMD: most general purpose parallel computer todayClusters, MPP, data centersMISD: not a general purpose architecture.Flynns classification on todays architecturesMulticore processors

Superscalar: Pipelined + multiple issues.

SSE (Intel and AMDs support for performing operation on 2 doubles or 4 floats simultaneously).

GPU: Cuda architecture

IBM BlueGeneModern classification (Sima, Fountain, Kacsuk)Classify based on how parallelism is achievedby operating on multiple data: data parallelismby performing many functions in parallel: function parallelismControl parallelism, task parallelism depending on the level of the functional parallelism. Parallel architecturesData-parallelarchitecturesFunction-parallelarchitecturesData parallel architecturesVector processors, SIMD (array processors), systolic arrays.

IPMARMEMORYMDRADDROPDECODERABCALUVector processor (pipelining)Data parallel architecture: Array processorIPMARMEMORYMDRADDROPDECODERA1B1C1A2B2C2ANBNCNALUALUALUControl parallel architecturesFunction-parallel architecturesInstruction level Parallel ArchThread level Parallel ArchProcess level Parallel Arch(ILPs)(MIMDs)VLIWsSuperscalar processorsDistributed Memory MIMDShared Memory MIMDPipelined processorsClassifying todays architecturesMulticore processors?Superscalar?

SSE?

GPU: Cuda architecture?

IBM BlueGene?Performance of parallel architecturesCommon metricsMIPS: million instructions per secondMIPS = instruction count/(execution time x 106)

MFLOPS: million floating point operations per second.MFLOPS = FP ops in program/(execution time x 106)

Which is a better metric?FLOP is more related to the time of a task in numerical code# of FLOP / program is determined by the matrix size

Performance of parallel architectures FlOPS units kiloFLOPS (KFLOPS) 10^3 megaFLOPS (MFLOPS) 10^6 gigaFLOPS (GFLOPS) 10^9 single CPU performance teraFLOPS (TFLOPS) 10^12

petaFLOPS (PFLOPS) 10^15 we are here right now10 petaFLOPS supercomputers

exaFLOPS (EFLOPS) 10^18 the next milestone

Peak and sustained performancePeak performanceMeasured in MFLOPSHighest possible MFLOPS when the system does nothing but numerical computationRough hardware measureLittle indication on how the system will perform in practice.Peak and sustained performanceSustained performanceThe MFLOPS rate that a program achieves over the entire run.Measuring sustained performanceUsing benchmarksPeak MFLOPS is usually much larger than sustained MFLOPSEfficiency rate = sustained MFLOPS / peak MFLOPS

Measuring the performance of parallel computersBenchmarks: programs that are used to measure the performance.LINPACK benchmark: a measure of a systems floating point computing powerSolving a dense N by N system of linear equations Ax=bUse to rank supercomputers in the top500 list.

Other common benchmarksMicro benchmarks suitNumerical computingLAPACKScaLAPACKMemory bandwidthSTREAMKernel benchmarksNPB (NAS parallel benchmark)PARKBENCHSPECSplash

SummaryFlynns classificationSISD, SIMD, MIMD, MISDModern classificationData parallelismfunction parallelismInstruction level, thread level, and process levelPerformanceMIPS, MFLOPSPeak performance and sustained performance

ReferencesK. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993. D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.