parallella: a love story - adapteva a love story heterogeneous.. parallel.. efficient.. open.....

19
Parallella: A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1

Upload: dinhtruc

Post on 18-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Parallella: A Love StoryHeterogeneous.. 

Parallel.. Efficient..

Open..Andreas OlofssonMIT, Jan 7,2013

1

Page 2: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Adapteva Achieves 3 “World Firsts”

2

1. First processor company to reach 50 GFLOPS/W

2. First open source OpenCL™ SDK in the mobile market

3. First semiconductor company to successfully crowd‐source project

“OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.”

Page 3: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Prologue

3

Page 4: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Why we need heterogeneous and parallel platforms

4

0

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1990 1995 2000 2005 2010 2015 2020 2025 2030

System ProcessingNeedsLegacy ProcessingEfficiency

“The Efficiency Gap”Von NeumannSaturation

Page 5: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

ASIC, FPGA, DSP, CPU?

5

ASIC FPGA DSP CPU

Flexibility Poor Great Good Good

Efficiency Great Good Good Fair

DevelopmentCost/Risk High Medium Medium Low

Leverage Minimal Modest High Huge

Page 6: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

A Practical Radar System Example

6

ADC/DAC FPGA

1

DDR

uP

Storage

Display

EpiphanyFPGAs are great for front‐end 

DSP and connectivity.

The missing piece: a math engine that is high performance, low-power and C-programmable.

Microprocessors are great for user interfacing, knowledge 

extraction, and system management.

Page 7: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Why SOC integration is so disruptive

7

62 cm3 0.00003 cm3>1M X Volume Reduction

2XCPU

A5X‐die~13mm

FPU~0.15mm

ARMA9

~2mm

A5X Chip~16mm

iPhone4s~58mm What if your 

smartphone disappears?

Page 8: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

The Problem: SOCs are complex!

8

$10,000

$100,000

$1,000,000

$10,000,000

$100,000,000

$1,000,000,000

Per Product SOC R&D CostsWhat if you could do a 

28nm chip for $100k?

Page 9: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Our Vision: True Heterogeneous Computing

9

SYSTEM‐ON‐CHIP

BIGCPU

FPGA

BIGCPU

BIGCPU

BIGCPU

1000 small Epiphany RISC CPUs/DSPs

GPU Analog

Page 10: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Epiphany: Massive Task‐Parallelism

10

Coprocessor to ARM/Intel CPU 25mW per core C/C++ programmable

Page 11: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Programming Models

11

MODEL#1TASK QUEUE MODEL

• Up to 2 GFLOPS/core• Supports standard C/C++• “Cloud on a chip”

MODEL #2DATA PARALLEL MODEL

• openCL programmable• Easy integration of C/C++• openMP/MPI roadmap

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

X86/ARM/FPGA Host

Task1

Task3Task4

Task2

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

MINICPU

X86/ARM/FPGA HostTask1

Page 12: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Epiphany Silicon Devices

12

Features:• 16 RISC CPU cores• 512KB distributed memory• IEEE Floating Point• 32 distributed DMA engines• 4 off-chip serial links• 65nmSpecifications:• 1 GHz• 32 GFLOPS• 2 Watt Max Chip Power• 512 GB/sec memory bandwidth• 8GB/sec off chip BW

eLinkI/O

eLinkI/O

eLink I/O

eLink I/O

Features:• 64 RISC CPU cores• 2MB distributed memory• IEEE Floating Point• 128 distributed DMA engines• 4 off-chip serial links• 28nmSpecifications:• 800 MHz• 100 GFLOPS• 2 Watt Max Chip Power• 1.6 TB/sec memory bandwidth• 8GB/sec off chip BW

Page 13: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Parallella

13

Page 14: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Parallella Open Computing

14

Rj45

USB

GPIO

GPIO

ZYNQ(ARM)CPU

E64

1GB SDRAM

uSD

HDMI

USB

• Open (and ”free”):• Documentation• Board design files• Drivers• Software Tools

• Accessible (NO NDAs!)• $100 entry point• ~4000 devs signed up in 4 weeks

IO IO

Page 15: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

How cool is this?

15

100 GFLOPS100 KW$10M

(1992)Connection Machine 5

100 GFLOPS5 W (20k X)$200 (50k X)

(2012/2013)Parallella Board

Rj45

USB

GPIO

GPIO

ZYNQ(ARM)CPU

E64

1GB SDRAM

uSD

HDMI

USB

Page 16: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

eLink

Parallella Architecture

16

Dual CoreARM A9

AXI BUS

MIO

SHARED DRAM

“O/S” DRAM

USB OTG USB 2.0

UART Ethernet

SD‐CARD I2C

DAC/ADC IFHDMI

Controller

AXI‐MASTER AXI‐SLAVE

“Glue‐Logic”

DaughterCard

AXI‐MASTER

ZynqFPGA

Zynq“Hard”

Off‐Chip

EpiphanyEpiphany

MEM‐CTRL

“Sandbox”

Page 17: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

Parallella Coprocessor Approach

17

ARM runs Linux

Epiphany accelerates key

tasks

Programmable logic “makes

anything possible”

Program Flow

1. ARM boots Linux. First stage boot loader from Flash, everything else from SD card.2. “Main” application executes on ARM3. Application sends critical tasks send to Epiphany using OpenCL or simple threads4. ARM/Epiphany communication through shared DRAM buffer outside virtual memory of O/S.

Page 18: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

18

Zedboard Introduction

18

Page 19: Parallella: A Love Story - Adapteva A Love Story Heterogeneous.. Parallel.. Efficient.. Open.. Andreas Olofsson MIT, Jan 7,2013 1Published in: National Geographic · 2002Authors: Angus

19

The Future is… Open

Heterogeneous Massively Task-Parallel

Efficient

Grande Challenges Ahead…• Rebuild the computer ecosystem• Rewrite billions of lines of code• Retrain millions of programmers• Rewrite the education curriculum