parallella: a love story - adapteva a love story heterogeneous.. parallel.. efficient.. open.....
TRANSCRIPT
Parallella: A Love StoryHeterogeneous..
Parallel.. Efficient..
Open..Andreas OlofssonMIT, Jan 7,2013
1
Adapteva Achieves 3 “World Firsts”
2
1. First processor company to reach 50 GFLOPS/W
2. First open source OpenCL™ SDK in the mobile market
3. First semiconductor company to successfully crowd‐source project
“OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.”
Prologue
3
Why we need heterogeneous and parallel platforms
4
0
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1990 1995 2000 2005 2010 2015 2020 2025 2030
System ProcessingNeedsLegacy ProcessingEfficiency
“The Efficiency Gap”Von NeumannSaturation
ASIC, FPGA, DSP, CPU?
5
ASIC FPGA DSP CPU
Flexibility Poor Great Good Good
Efficiency Great Good Good Fair
DevelopmentCost/Risk High Medium Medium Low
Leverage Minimal Modest High Huge
A Practical Radar System Example
6
ADC/DAC FPGA
1
DDR
uP
Storage
Display
EpiphanyFPGAs are great for front‐end
DSP and connectivity.
The missing piece: a math engine that is high performance, low-power and C-programmable.
Microprocessors are great for user interfacing, knowledge
extraction, and system management.
Why SOC integration is so disruptive
7
62 cm3 0.00003 cm3>1M X Volume Reduction
2XCPU
A5X‐die~13mm
FPU~0.15mm
ARMA9
~2mm
A5X Chip~16mm
iPhone4s~58mm What if your
smartphone disappears?
The Problem: SOCs are complex!
8
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
$1,000,000,000
Per Product SOC R&D CostsWhat if you could do a
28nm chip for $100k?
Our Vision: True Heterogeneous Computing
9
SYSTEM‐ON‐CHIP
BIGCPU
FPGA
BIGCPU
BIGCPU
BIGCPU
1000 small Epiphany RISC CPUs/DSPs
GPU Analog
Epiphany: Massive Task‐Parallelism
10
Coprocessor to ARM/Intel CPU 25mW per core C/C++ programmable
Programming Models
11
MODEL#1TASK QUEUE MODEL
• Up to 2 GFLOPS/core• Supports standard C/C++• “Cloud on a chip”
MODEL #2DATA PARALLEL MODEL
• openCL programmable• Easy integration of C/C++• openMP/MPI roadmap
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
X86/ARM/FPGA Host
Task1
Task3Task4
Task2
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
MINICPU
X86/ARM/FPGA HostTask1
Epiphany Silicon Devices
12
Features:• 16 RISC CPU cores• 512KB distributed memory• IEEE Floating Point• 32 distributed DMA engines• 4 off-chip serial links• 65nmSpecifications:• 1 GHz• 32 GFLOPS• 2 Watt Max Chip Power• 512 GB/sec memory bandwidth• 8GB/sec off chip BW
eLinkI/O
eLinkI/O
eLink I/O
eLink I/O
Features:• 64 RISC CPU cores• 2MB distributed memory• IEEE Floating Point• 128 distributed DMA engines• 4 off-chip serial links• 28nmSpecifications:• 800 MHz• 100 GFLOPS• 2 Watt Max Chip Power• 1.6 TB/sec memory bandwidth• 8GB/sec off chip BW
Parallella
13
Parallella Open Computing
14
Rj45
USB
GPIO
GPIO
ZYNQ(ARM)CPU
E64
1GB SDRAM
uSD
HDMI
USB
• Open (and ”free”):• Documentation• Board design files• Drivers• Software Tools
• Accessible (NO NDAs!)• $100 entry point• ~4000 devs signed up in 4 weeks
IO IO
How cool is this?
15
100 GFLOPS100 KW$10M
(1992)Connection Machine 5
100 GFLOPS5 W (20k X)$200 (50k X)
(2012/2013)Parallella Board
Rj45
USB
GPIO
GPIO
ZYNQ(ARM)CPU
E64
1GB SDRAM
uSD
HDMI
USB
eLink
Parallella Architecture
16
Dual CoreARM A9
AXI BUS
MIO
SHARED DRAM
“O/S” DRAM
USB OTG USB 2.0
UART Ethernet
SD‐CARD I2C
DAC/ADC IFHDMI
Controller
AXI‐MASTER AXI‐SLAVE
“Glue‐Logic”
DaughterCard
AXI‐MASTER
ZynqFPGA
Zynq“Hard”
Off‐Chip
EpiphanyEpiphany
MEM‐CTRL
“Sandbox”
Parallella Coprocessor Approach
17
ARM runs Linux
Epiphany accelerates key
tasks
Programmable logic “makes
anything possible”
Program Flow
1. ARM boots Linux. First stage boot loader from Flash, everything else from SD card.2. “Main” application executes on ARM3. Application sends critical tasks send to Epiphany using OpenCL or simple threads4. ARM/Epiphany communication through shared DRAM buffer outside virtual memory of O/S.
18
Zedboard Introduction
18
19
The Future is… Open
Heterogeneous Massively Task-Parallel
Efficient
Grande Challenges Ahead…• Rebuild the computer ecosystem• Rewrite billions of lines of code• Retrain millions of programmers• Rewrite the education curriculum