survey of multicore architectures marko bertogna scuola superiore s.anna, retis lab, pisa, italy
TRANSCRIPT
Survey of multicore architectures
Marko BertognaScuola Superiore S.Anna,
ReTiS Lab, Pisa, Italy
Summary
CELL processor Reconfigurable devices Software-Hardware co-design Parallel programming problems
data dependencies process synchronization memory barriers locking mechanisms
Language extensions for parallel programming
Real-time multiprocessor scheduling
Cell processor
A Cell Processor
Cell History
Cell basic concepts
Cell synergy
Cell Chip
Cell features
Cell Processor Components
Cell Processor Components
Cell Processor Components
Cell Processor Components
Synergistic Processor Element (SPE)
SPE
SPE details
Element Interconnect Bus (EIB)
EIB: Data topology
Example: 8 concurrent transactions
Theoretical peak operations
Cell BE performance
Why is Cell Processor so fast?
CELL software environment
System Level Simulator
SPE management library
CELL parallelism
Typical CELL sw development flow
ARM’s MPcore
PicoArray (by PicoChip)
PicoArray scaling
FPGA and Reconfigurable devices
Field Programmable Gate Arrays
SRAM-based matrix of integrated elements whose interconnections can be programmed statically or even dynamically
Basic block is Logic Element (LE) Chip capacities from 1k to 1000k LEs Each LE is typically composed by logic
gates, LUTs, Flip-Flops and latches Need for optimized CAD or pre-binded
design libraries
FPGA
CSL organization: Basic Logic Element:
Altera’s Stratix IV basic block Adaptive Logic Module (ALM)
Flexibility vs efficiency
Reconfigurable devices advantages
Efficiency AND Flexibility Time to market Easier upgrade Lower cost (on scale production) Reusable IP Customable interface
Reconfigurable devices parameters
Block granularity Coarse grained: Functional Units, Processor
Cores, Memory Tiles Fin grained: gate and register level
Density Reconfiguration time
Compile-Time Reconfiguration (CTR) Run-Time Reconfiguration (RTR)
Partial or Total reprogramming
Triscend’s A7S chip
Example: multiplier on Altera’s Stratix IV
Typical FPGA software development environment
FPGA optimized module library
IO Editor Generate file.h Bind (placement and
route) file.csl Config file.cfg Download
Typical FPGA module library
Altera’s Nios II
Nios II is a soft-core processor IP that can be downloaded into an Altera’s
FPGA, obtaining the functionalities of a real RISC CPU
Logic elements are programmed so as to behave like gates of classic ASIC processors
Different Nios versions are available faster and with full functionalities bigger size medium sized compact but slower and with limited
functionalities
Nios II core
Selecting Nios II e/s/f
Example of a Nios II Processor system
Final global layout
Soft-core processors and FPGAs
Possible to have multiple cores on a single chip
Customizable hardware can be used to coordinate the various cores
Build and test a whole multicore system in a faster time
Detect and solve bottlenecks without needing to repeatedly return to the integration phase
Co-design problems with FPGAs
A task may be executed by a (soft-core or ASIC) processor or may be entirely implemented in hardware on the reconfigurable logic
“Programming in Space” versus “Programming in Time”
Centralized vs Distributed computing Sequential vs Parallel programming Interconnect Network
What is a task in hardware?
Software programming
c=a+b;
result=c/2;
Hardware implementation
a
b
c+
shifter
result
Assembler expansion:ldr r0,aldr r1,badd r0,r0,r1mov r0,LSR r0str r0,result
5 operations
All in one clock cycle!
Conclusions
FPGAs are interesting devices for multicore systems developers
Valid benchmark upon which to compare classic serial programming methods and parallel computing approaches
Allow reducing time-to-market for next-generation multicore systems
Provide common platforms that can easily reproduce any architecture (given a proper VHDL/Verilog description)