why simplicity matters: a hardware perspective

19
Why Simplicity Matters: A Hardware Perspective by Andreas Olofsson March 27, 2015 Erlang Factory (San Fransisco)

Upload: andreas-olofsson

Post on 18-Aug-2015

52 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Why Simplicity Matters: A Hardware Perspective

Why Simplicity Matters: A Hardware Perspective

by Andreas OlofssonMarch 27, 2015Erlang Factory (San Fransisco)

Page 2: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

The Free Lunch is Over!

ILP

PWR

FREQ

XTORS

Page 3: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Communication

Robotics

IoT

Datacenters/HPC

Life without Mooooooore will be boring!

Page 4: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Chip Hardware Design 101CMOS NAND GATE

SRAM STORAGE

Power ~= VDD^2 * F * CAPPower ~= VDD^2 * F * CAP

The cost of HW may approach zero, but will never be zero....

Page 5: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

A 5 minute introduction to Modern HW design

SYNTAX EXAMPLE MEANING

module ...endmodule module flipflop (d,clk,q);input d, clk;output q;...endmodule

Basic unit of hiearchy.Can be instantiated many times.

wire wire a; Declares a physical wire

reg reg [7:0]; Declares a state variable

assign assign a = b & c; Continuous assignmentalways @ always @ (clk)

mystate <=1;end

Act on an event

if...else(and case() )

if(a) out2=celse out2=d

Control flow

&,|,~,*,+,-, etc assign a[31:0] = b[31:0] + c[31:0] Boolean operations

Page 6: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

The Future of ComputingConstraint --> Result

1. Performance limits Massive parallelism

2. Amdahl's law New algorithms + languages

3. Thermal density Slow clocks (1MHz-1GHz)

4. Failure rate Distributed systems

5. IO Bandwidth Limited

6. Density + cost 3D chip stacking

7. Energy Efficiency Heterogeneity

8. Productivity Multiple Languages

9. Development cost Collaboration/complexity

10. Latency Locality

Page 7: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

The Architecture of the Future?

Sequencer FPU

?SMP (coherent, shared)

DISTRIBUTED

SIMD (lockstep)

Page 8: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Epiphany Manycore Processor

Page 9: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Epiphany 64-core 2011 Processor

ELINK (LVDS)

ELINK (LVDS)

EL

INK

(L V

DS

)

● 64 RISC cores

● 800MHz

● 100GFLOPS

● 2MB SRAM

● 1.6TB/s local memory BW

● 102GB/s bisection BW

● 7.2 GB/s IO

●15x15mm BGA

● < 2 Watts (ie 50 GFLOPS/W)

Page 10: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

It's all about silicon efficiency...

Intel Haswell14.5mm2

Intel Atom5.6mm2

AMD Jaguar3.1mm2

ARM A151.62mm2

ARM A70.45mm2

Epiphany0.13mm2

100 Epiphany CPU cores fit in the space of one Intel

Haswell CPU core!

100 Epiphany CPU cores fit in the space of one Intel

Haswell CPU core!

Page 11: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

There is Plenty of Room at the Bottom

“There is STILL plenty of room at the bottom”

Tianhe233 PFLOPS$390M24MWInsanity!

33 PFLOPS=~16, 28nm Epiphany Wafers

● Moving one electron at VDDMIN: ● Emin = QVDD/2 = q 2(ln2)kT/2q = kTln(2)● At 300K, Emin = 0.29e-20 J

● Minimum sized CMOS inverter at 28nm● E = CVDD^2 =~ 0.2e-15 J,

5 orders of magnitude larger!

Page 12: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Yes...but does it work?

●25X over GPU and CPU on 'bcrypt' (OpenWall, Russia)

●25x over Intel Xeon on FFTs/DSP (Ericsson)

●25x over Intel Xeon in HPC application by UK customer

●85% of peak performance by students at ANU

Page 13: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

The Parallella Project

● An open parallel computing platform

● Launched in 2012 at $99

● Open source SW/HW!

● Dual-core ARM A9 processor

● FPGA logic

● 1GB RAM, USB, HDMI, GigE

● 16/64 Epiphany coprocessors

● 50 Gbit/sec IO, 25/100 GFLOPS

● 10,000 shipped (20,000 built)

● 200 Universities

Page 14: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Some perspective...

● 1993 CM-5● 1024 processors● 136 GFLOPS/100KW● #1 in 1993 Top500 List● Price: ~$30M

● 2014 Parallella-64● 66 processors● 100 GFLOPS/5W● #1 in energy efficiency● Price: $199*

Page 15: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

16K-64K CPUs1MB/core (3D)~20 TFLOPS

0.2W-20W

16K-64K CPUs1MB/core (3D)~20 TFLOPS

0.2W-20W

64 CPUs32KB/core

100 GFLOPS0.1W-2W

64 CPUs32KB/core

100 GFLOPS0.1W-2W

1024 CPUs64KB/core2 TFLOPS 0.2W-10W

1024 CPUs64KB/core2 TFLOPS 0.2W-10W

1K CPUs64KB/core2 TFLOPS1W-40W

1K CPUs64KB/core2 TFLOPS1W-40W

2013 2015 2016 2018

The Epiphany Roadmap

Road map anchored by our 28nm 64-core chip data

Page 16: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

A 1024 core strawman processor ● 1024 CPU cores● 1GHz operation ● 64 MB local memory● 2 TFLOPS performance● 32 TB/s local memory BW● 1 Tera-messages/sec● 2.5 Tbps IO

Page 17: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Programming a 1024 core processor

Faulty cores

Cooperating Program(messages)

NODE:64KB1GHz RISC Core2 GFLOPS/core

Physical Constraints:1.5ns/hop latency10pJ / FLOP30pJ / off chip read/write10pJ / on chip end2end(give or take....)

Page 18: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

The future of programming is HARD

● Minimize code size● Minimize data movement and communication● Minimize energy● Minimize heat density● Minimize failures● Minimize congestion

....AND MINIMIZE EXECUTION TIME

Page 19: Why Simplicity Matters: A Hardware Perspective

Adapteva, Erlang Factory 2015

Parallel Computing Needs Your Help!● Parallel Standard Libraries:

– It's about time we have open libraries with parallelism built in– “PAL” (github.com/parallella/pal)

● VM's:

– When will we have Erlang running on Epiphany?● Spread the word...

– Parallel programming can be easy– Show, don't tell.

● A Standard Language?

– We need a “C/JAVA/BASIC/PYTHON” of parallel computing?