survey of multicore architectures marko bertogna scuola superiore s.anna, retis lab, pisa, italy

49
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Upload: horace-blake

Post on 19-Jan-2016

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Survey of multicore architectures

Marko BertognaScuola Superiore S.Anna,

ReTiS Lab, Pisa, Italy

Page 2: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Summary

CELL processor Reconfigurable devices Software-Hardware co-design Parallel programming problems

data dependencies process synchronization memory barriers locking mechanisms

Language extensions for parallel programming

Real-time multiprocessor scheduling

Page 3: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell processor

A Cell Processor

Page 4: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell History

Page 5: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell basic concepts

Page 6: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell synergy

Page 7: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell Chip

Page 8: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell features

Page 9: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell Processor Components

Page 10: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell Processor Components

Page 11: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell Processor Components

Page 12: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell Processor Components

Page 13: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Synergistic Processor Element (SPE)

Page 14: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

SPE

Page 15: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

SPE details

Page 16: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Element Interconnect Bus (EIB)

Page 17: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

EIB: Data topology

Page 18: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Example: 8 concurrent transactions

Page 19: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Theoretical peak operations

Page 20: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Cell BE performance

Page 21: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Why is Cell Processor so fast?

Page 22: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

CELL software environment

Page 23: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

System Level Simulator

Page 24: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

SPE management library

Page 25: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

CELL parallelism

Page 26: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Typical CELL sw development flow

Page 27: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

ARM’s MPcore

Page 28: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

PicoArray (by PicoChip)

Page 29: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

PicoArray scaling

Page 30: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

FPGA and Reconfigurable devices

Page 31: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Field Programmable Gate Arrays

SRAM-based matrix of integrated elements whose interconnections can be programmed statically or even dynamically

Basic block is Logic Element (LE) Chip capacities from 1k to 1000k LEs Each LE is typically composed by logic

gates, LUTs, Flip-Flops and latches Need for optimized CAD or pre-binded

design libraries

Page 32: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

FPGA

CSL organization: Basic Logic Element:

Page 33: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Altera’s Stratix IV basic block Adaptive Logic Module (ALM)

Page 34: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Flexibility vs efficiency

Page 35: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Reconfigurable devices advantages

Efficiency AND Flexibility Time to market Easier upgrade Lower cost (on scale production) Reusable IP Customable interface

Page 36: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Reconfigurable devices parameters

Block granularity Coarse grained: Functional Units, Processor

Cores, Memory Tiles Fin grained: gate and register level

Density Reconfiguration time

Compile-Time Reconfiguration (CTR) Run-Time Reconfiguration (RTR)

Partial or Total reprogramming

Page 37: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Triscend’s A7S chip

Page 38: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Example: multiplier on Altera’s Stratix IV

Page 39: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Typical FPGA software development environment

FPGA optimized module library

IO Editor Generate file.h Bind (placement and

route) file.csl Config file.cfg Download

Page 40: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Typical FPGA module library

Page 41: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Altera’s Nios II

Nios II is a soft-core processor IP that can be downloaded into an Altera’s

FPGA, obtaining the functionalities of a real RISC CPU

Logic elements are programmed so as to behave like gates of classic ASIC processors

Different Nios versions are available faster and with full functionalities bigger size medium sized compact but slower and with limited

functionalities

Page 42: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Nios II core

Page 43: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Selecting Nios II e/s/f

Page 44: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Example of a Nios II Processor system

Page 45: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Final global layout

Page 46: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Soft-core processors and FPGAs

Possible to have multiple cores on a single chip

Customizable hardware can be used to coordinate the various cores

Build and test a whole multicore system in a faster time

Detect and solve bottlenecks without needing to repeatedly return to the integration phase

Page 47: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Co-design problems with FPGAs

A task may be executed by a (soft-core or ASIC) processor or may be entirely implemented in hardware on the reconfigurable logic

“Programming in Space” versus “Programming in Time”

Centralized vs Distributed computing Sequential vs Parallel programming Interconnect Network

Page 48: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

What is a task in hardware?

Software programming

c=a+b;

result=c/2;

Hardware implementation

a

b

c+

shifter

result

Assembler expansion:ldr r0,aldr r1,badd r0,r0,r1mov r0,LSR r0str r0,result

5 operations

All in one clock cycle!

Page 49: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Conclusions

FPGAs are interesting devices for multicore systems developers

Valid benchmark upon which to compare classic serial programming methods and parallel computing approaches

Allow reducing time-to-market for next-generation multicore systems

Provide common platforms that can easily reproduce any architecture (given a proper VHDL/Verilog description)