a perspective on the limits of computation oskar mencer may 2012

27
A Perspective on the Limits of Computation Oskar Mencer May 2012

Upload: ashley-phelps

Post on 28-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Perspective on the Limits of Computation Oskar Mencer May 2012

A Perspective on the Limits of Computation

Oskar Mencer

May 2012

Page 2: A Perspective on the Limits of Computation Oskar Mencer May 2012

Limits of Computation

Objective: Maximum Performance Computing (MPC)

What is the fastest we can compute desired results?

Conjecture:

Data movement is the real limit on computation.

Page 3: A Perspective on the Limits of Computation Oskar Mencer May 2012

Maximum Performance Computing (MPC)

The journey will take us through:

1. Information Theory: Kolmogorov Complexity

2. Optimised Arithmetic: Winograd Bounds

3. Optimisation via Kahneman and Von Neumann

4. Real World Dataflow Implications and Results

Less Data Movement = Less Data + Less Movement

Page 4: A Perspective on the Limits of Computation Oskar Mencer May 2012

Kolmogorov Complexity (K)

Definition (Kolmogorov): “If a description of string s, d(s), is of minimal length, […] it is called a minimal description of s. Then the length of d(s), […] is the Kolmogorov complexity of s, written K(s), where K(s) = |d(s)|”

Of course K(s) depends heavily on the Language L used to describe actions in K. (e.g. Java, Esperanto, an Executable file, etc)

Kolmogorov, A.N. (1965). "Three Approaches to the Quantitative Definition of Information". Problems Inform. Transmission 1 (1): 1–7.

Page 5: A Perspective on the Limits of Computation Oskar Mencer May 2012

A Maximum Performance Computing Theorem

For a computational task f, computing the result r, given inputs i, i.e. task f: r = f( i ), or

Assuming infinite capacity to compute and remember inside box f, the time T to compute task f depends on moving the data in and out of the box.

Thus, for a machine f with infinite memory and infinitely fast arithmetic, Kolmogorov complexity K(i+r) defines the fastest way to compute task f.

fi r

Page 6: A Perspective on the Limits of Computation Oskar Mencer May 2012

The representation K(σ,F) of the state σ,F is critical!

dtdZdW

dZd

dWFdF

ttt

tttt

,

SABR model:

We integrate in time (Euler in log-forward, Milstein in volatility)

dtZZ

WFdtFFF

tttttt

ttttttt

221

1

221

1

))((

)ln)1exp((.))ln)1exp(((lnln

logic

state

σ, F

Page 7: A Perspective on the Limits of Computation Oskar Mencer May 2012

MPC– Bad News

1. Real computers do not have either infinite memory or infinitely fast arithmetic units.

2. Kolmogorov Theorem. K is not a computable function.

MPC – Good News

Today’s arithmetic units are fast enough.

So in practice...

Kolmogorov Complexity => Discretisation & Compression

=> MPC depends on the Representation of the Problem.

Page 8: A Perspective on the Limits of Computation Oskar Mencer May 2012

Euclids Elements, Representing a²+b²=c²

Page 9: A Perspective on the Limits of Computation Oskar Mencer May 2012

17 × 24 = ?

Page 10: A Perspective on the Limits of Computation Oskar Mencer May 2012

Thinking Fast and Slow

Daniel Kahneman Nobel Prize in Economics, 2002

back to 17 × 24

Kahneman splits thinking into:

System 1: fast, hard to control ... 400

System 2: slow, easier to control ... 408

Page 11: A Perspective on the Limits of Computation Oskar Mencer May 2012

Remembering Fast and Slow

John von Neumann, 1946:

“We are forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding, but which is less quickly accessible.”

Page 12: A Perspective on the Limits of Computation Oskar Mencer May 2012

Consider Computation and Memory Together

Computing f(x) in the range [a,b] with |E| ≤ 2 ⁿ⁻

Table Table+Arithmetic Arithmetic

and +,-,×,÷ +,-,×,÷

uniform vs non-uniform number of table entries how many coefficients

polynomial or rational approx continued fractions multi-partite tables

Underlying hardware/technology changes the optimum

Page 13: A Perspective on the Limits of Computation Oskar Mencer May 2012

MPC in PracticeTradeoff Representation, Memory and Arithmetic

Page 14: A Perspective on the Limits of Computation Oskar Mencer May 2012

Limits on Computing + and ×Shmuel Winograd, 1965

Bounds on Addition- Binary: O(log n)- Residue Number System: O(log 2log α(N))- Redundant Number System: O(1)

Bounds on Multiplication- Binary: O(log n)- Residue Number System: O(log 2log β(N))- Using Tables: O(2[log n/2]+2+[log 2n/2])- Logarithmic Number System: O(Addition)

However, Binary and Log numbers are easy to compare, others are not!

Lesson: If you optimize only a little piece of the computation, the result is useless in practice => Need to optimize ENTIRE programs.

Or in other words: abstraction kills performance.

Page 15: A Perspective on the Limits of Computation Oskar Mencer May 2012

Addition in O(1)

Redundant: 2 bits represent 1 binary digit=> use counters to reduce the input

(3,2) counters reduce threenumbers (a,b,c) to two numbers (out1, out2)so that a + b + c = out1 + out2

abcout1out2

Page 16: A Perspective on the Limits of Computation Oskar Mencer May 2012

From Theory to PracticeOptimise Whole Programs

Bit Level Representation

Storage

Processor

Discretisation

Iteration

Method

CustomiseNumerics

CustomiseArchitecture

Page 17: A Perspective on the Limits of Computation Oskar Mencer May 2012

Mission Impossible?

Page 18: A Perspective on the Limits of Computation Oskar Mencer May 2012

Maximum Performance Computing (MPC)

The journey will take us through:

1. Information Theory: Kolmogorov Complexity

2. Optimised Arithmetic: Winograd Bounds

3. Optimisation via Kahneman and Von Neumann

4. Real World Dataflow Implications and Results

Less Data Movement = Less Data + Less Movement

Page 19: A Perspective on the Limits of Computation Oskar Mencer May 2012

Optimise Whole Programs with Finite Resources

SYSTEM 1x86 cores

SYSTEM 2flexible

memory+logic

Low LatencyMemory

High ThroughputMemory

Balance Computation and Memory

Page 20: A Perspective on the Limits of Computation Oskar Mencer May 2012

The Ideal System 2 is a Production Line

SYSTEM 1x86 cores

SYSTEM 2flexible

memory+logic

Low LatencyMemory

High ThroughputMemory

Balance Computation and Memory

Page 21: A Perspective on the Limits of Computation Oskar Mencer May 2012

8 Maxeler DFEs replacing 1,900 Intel CPU cores

0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

2,000

1 4 8

Equi

vale

nt C

PU c

ores

Number of MAX2 cards

15Hz peak frequency

30Hz peak frequency

45Hz peak frequency

70Hz peak frequency

presented by ENI at the Annual SEG Conference, 2010

Compared to 32 3GHz x86 cores parallelized using MPI

100kWatts of Intel cores => 1kWatt of Maxeler Dataflow Engines

Page 22: A Perspective on the Limits of Computation Oskar Mencer May 2012

Given matrix A, vector b, find vector x in Ax = b.

Example: Sparse Matrix ComputationsO. Lindtjorn et al, HotChips 2010

0

10

20

30

40

50

60

0 1 2 3 4 5 6 7 8 9 10

Compression Ratio

Sp

ee

du

p p

er

1U

No

de

GREE0A1new01

Domain Specific Address and Data Encoding (*Patent Pending)

MAXELER SOLUTION: 20-40x in 1UDOES NOT SCALE BEYOND

SIX x86 CPU CORES

Page 23: A Perspective on the Limits of Computation Oskar Mencer May 2012

• Compute value and risk of complex credit derivatives.

• Moving overnight run to realtime intra day

• Reported Speedup: 220-270x 8 hours => 2 minutes

• Power consumption per node drops from 250W to 235W per node

Example: JP Morgan Derivatives PricingO Mencer, S Weston, Journal on Concurrency and Computation, July 2011.

See JP Morgan talk at Stanford on Youtube, search “weston maxeler”

Page 24: A Perspective on the Limits of Computation Oskar Mencer May 2012

Maxeler Loop Flow Graphs for JP Morgan Credit Derivatives

Whole Program Transformation Options

Page 25: A Perspective on the Limits of Computation Oskar Mencer May 2012

Maxeler Data Flow Graph for JP Morgan Interest Rates Monte Carlo Acceleration

Page 26: A Perspective on the Limits of Computation Oskar Mencer May 2012

Example:data flow graph generated by MaxCompiler

4866 static dataflow cores

in 1 chip

Page 27: A Perspective on the Limits of Computation Oskar Mencer May 2012

Maxeler Dataflow Engines (DFEs)

High Density DFEsIntel Xeon CPU cores and up to 6

DFEs with 288GB of RAM

The Dataflow ApplianceDense compute with 8 DFEs, 384GB of RAM and dynamic

allocation of DFEs to CPU servers with zero-copy RDMA access

The Low Latency ApplianceIntel Xeon CPUs and 1-2 DFEs with

direct links to up to six 10Gbit Ethernet connections

MaxWorkstationDesktop dataflowdevelopment system

Dataflow Engines48GB DDR3, high-speed connectivity and dense configurable logic

MaxRack10, 20 or 40 node rack systems integratingcompute, networking & storage

MaxCloudHosted, on-demand, scalable accelerated compute