single-chip heterogeneous computing does the future include custom logics, fpga, and gpgpus?
DESCRIPTION
Single-Chip Heterogeneous Computing Does the Future Include Custom Logics, FPGA, and GPGPUs?. Presented by Kittisak Sajjapongse. Introduction to the study. Objective of the study. Observe the trends of integrating unconventional cores (U-cores) into single-chip multicores - PowerPoint PPT PresentationTRANSCRIPT
Single-Chip Heterogeneous ComputingDoes the Future Include Custom Logics, FPGA, and GPGPUs?
Presented by Kittisak Sajjapongse
Introductionto
the study
Objective of the studyObserve the trends of integrating
unconventional cores (U-cores) into single-chip multicores
Identify the factors that impact decision to have U-cores
Introduction to the study
Model in the studySymmetric - Multiple fast complex cores (FastCore)- Highly optimized to minimize latency of single thread
Asymmetric- One fast complex core (FastCore)- Multiple simple cores (BCE)- Intended to handle application which has parallelism
Heterogeneous- One fast complex core (FastCore)- U-cores: ASICs, FPGAs, GPGPUs- We are going to study about U-cores
Introduction to the study
ASIC, FPGA, and GPGPUASIC (Application-Specific Integrated Circuit)
◦ A device or integrated circuit customized for specific application domains e.g. H264 codec, JPEG codec etc.
FPGA (Field Programmable Gate Array)◦ A configurable digital integrated circuit capable for
supporting hardware architectures
GPGPU (General-Purpose Graphic Processing Unit)◦ Graphics devices that provides APIs (Application
Programming Interface) for using with parallelizable application
Introduction to the study
ASIC, FPGA, and GPGPUFeatures ASIC FPGA GPGPUDesign/Program CAD/CAM
EDA (Electronic Design Automation) Tool
Hardware Description Language (HDL)
openCL, CUDA, etc.
Design controls Transistors, Standard cells
Logic Components, RTL
Processors, Cache, Memory
Flexibility Fixed-function (1) Configurable (2) Programmable (3)Level of abstraction
Low (1) Medium (2) High (3)
Power efficiency Extremely High (3) High (2) Moderate (1)
They all are used to exploit parallelism!!!
Introduction to the study
What is the study about ?Constains
◦ Power◦ Bandwidth
Questions posedUnder bandwidth- and power- constrains◦ Would single-chip multicores benefit significatly from U-
cores ?◦ Would ASICs be the best choice ?
Introduction to the study
Model for U-core
What is BCE?Baseline Core Equivalent
◦Referred to a basic processor◦Used as baseline reference for
performance and power consumption
Model for U-core
What is BCE?
Two parameters used later◦n : number of total BCE available◦r : number of resources dedicated to
complex cores (in a unit of BCE)
Model for U-core
Amdahl’s Law
Reference: http://en.wikipedia.org/wiki/Amdahl_lawModel for U-core
Hill & Marty’s extended Amdahl’s Law
Reference: M. D. Hill et al., “Amdahl’s Law in the Multicore Era,” ComputerModel for U-core
How about Heterogeneous arch.?
? SpeedupHeterogeneous (??)= ???
Under Power & Bandwidth constrains
Model for U-core
Deriving model for U-coreSpeedupAmdahl = f(f,n)
SpeedupHill&Marty = f(f,n,r)
SpeedupHet.(U-core) = f(f,n,r,B,P,µ,φ)New Parameters:B – Memory Bandwidth of U-core (in unit of BCE compulsory bandwidth)P – Active Power of U-core relative to BCE µ – Performance of U-core relative to BCEΦ – Power efficiency of U-core relative to BCE
Model for U-core
Deriving model for U-coreSpeedupasymmetric =
11-f
perf(r)+ f
perf(r) +n - r
Speedupasym(offload) =
n - r
µ( )
Speeduphet(U-core) =
Model for U-core
Obtaining µ,φ for U-core
Devices & Workload
Workload:
- Dense Matrix Multiplication (MMM)- Fast Fourier Transform (FFT with various input size 24 to 220)- Black-Scholes (BS)
Device Ref. DeviceBCE Intel AtomSymmetric CMP Intel Core i7-960ASIC (U-core) 65nm technology (1.1V)FPGA (U-core) V6-LX760GPU (U-core) GTX285, GTX480,
R5870
Device:
Obtaining µ,φ for U-core
Deriving µ for ASIC in FFT-1024 (case study)
3500.5
Deriving φ for ASIC in FFT-1024 (case study)
100
0.8
Obtained Parameters
Obtaining µ,φ for U-core
Applying the Model for Results
Scaling Projection
Budget and Constrains
Result for FFT-1024
Results for MMM
Results for Black-Scholes
Answering the questions◦ Would single-chip multicores benefit significatly from U-
cores ? Yes , If the application has enough (>90%) parallelism to
exploit.
◦ Would ASICs be the best choice ? Depends on applications, if there is not much parallelism, then ASIC
might not be worth to implement.
Conclusions Sufficient parallelism must exists to significantly obtain
performance improvement from U-core
Flexible U-cores tend to be competitive to ASIC under limited bandwidth and limited parallelism
U-core such as ASIC is useful when power is the primary goal