braxton mckee, founder & ceo, ufora at mlconf sf - 11/13/15

Ufora @ MLConfBraxton McKee, CEO & Founder

Why should I have to write a different program for 1000 rows or 1

billion?

Our Vision: Simplified Distributed Computing• Using lots of machines should be as easy as using one.• Enable scalable, fast machine learning and data

processing• Parallelism should be natural, come from the language

itself

I want to treat the cloud like it’s one big, fast, desktop.

What is Ufora?Auto-parallel, compiled, multi-host python

Key Components• JIT Compiled• Implicit Parallelism at the language level• Fault tolerant• Automatic co-location of data and compute

We are now open source!•5 years of work by ~ 5 engineers•~350k lines of code•Apache 2.0 License•Hosted on GitHub

Sound Familiar?• Similar approach to JIT Compilation• Scalable but without frameworks like MapReduce• Package that works easily with existing python workflow

How do I use it?Install the client pip install pyfora

pyfora_aws start … --num-instances 4or

docker run … ufora/service

import pyforaufora = pyfora.connect('http://<ip_address>:30000’)with ufora.remotely:

#your code here

Get some workers

In your python program

How do I use it?def isPrime(p):

if p < 2: return 0

x = 2

while x*x <= p:

if p%x == 0: return 0

x = x + 1

return 1

result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))

~1 hour

How do I use it?def isPrime(p):

if p < 2: return 0

x = 2

while x*x <= p:

if p%x == 0: return 0

x = x + 1

return 1

with ufora.remote:

result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))

~10 secs

What do you give up?•No mutability of data-structures•No side-effects•No nondeterminism• Emphasize “functional” programming style

ArchitectureWorker Nodes S3/HDFS

Gateway Node

PyFora Client

def filter(v,f):if len(v) == 0:

return []

if len(v) == 1:return v if f(v[0]) else []

mid = len(v)/2

return filter(v[:mid],f) + filter(v[mid:],f)

primes = filter(range(100000000),isPrime)

Naturally parallel(divide and conquer)

Implicit Parallelism

CORE #1

CORE #2 CORE #3 CORE #4

0 – 25M 25M – 50M 50M – 75M 75M – 100M

100M Integers

0 – 50M 50M – 100M

filter(v, isPrime)

Splitting

Adaptive Parallelism

How do we know where to put the data?

Answer: React dynamically as the program runs

Watch running threads to see what blocks of data they’re accessing.

Move threads to data, or data to threads, depending on what’s cheaper.

Detect when two blocks of data absolutely have to be on the same machine.

Build a statistical model of correlations between block accesses.

Place data to minimize expected future number of machine boundary crossings.

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

A simple example

v = range(0, 2*10**9)

Red boxes are blocks of data

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

Computation starts on Machine 1

When the computation exhausts the data on one machine, the runtime moves it to the next

for x in v:state = f(state,x)

But real access patterns are more complex!

User writes

Now the computation is looking at all pairs v[i] and v[i+10]

res = 0def f(x,y):

# some functionfor i in xrange(0, len(v)-10):

res = res + f(v[i], v[i+10])

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

But when the computation reaches the end of block 4, v[i] and v[i+10] aren’t on the same machine!

At first, everything is OK, since v[ix] and v[ix+10] are close to each other in the data

Every time we have to move the computation, we’re hitting the network.

Block 4 on Machine 1

Block 5 on Machine 2v[ix]

v[ix+10]

This is really slow!

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

Solution: Replicate blocks so that they overlap

5

9

13

Data can live on two different machines at the same time because its immutable!

Project Roadmap: Current Version (0.1)• Coverage of core python2.7 language.• Run locally (using docker) or in AWS• Import pyfora and go!

Project Roadmap: Upcoming Release (0.2)• Core numpy and dataframe implementations (in python)• Coverage for some core scikit data science algorithms

(gbm, regressions, etc.)• Better error handling, lots of bugfixes

Project Roadmap: the future• Python 3 support• Execution of arbitrary python code out-of-process (for

non-pure code we don't want to port)• More generic model for import/export of data from the

cluster.• Enabling better feedback in the pyfora api for tracking

progress of computations.• Support for running calculations on GPU

Ufora is Auto-Parallel, Multi-Host Python• Star/fork the repo: github.com/ufora/ufora

• Contribute to the codebase

• Find me after this presentation

• Tell us what we should build next. This affects our priorities!!!

• Email me: [email protected]

braxton mckee, founder & ceo, ufora at mlconf sf - 11/13/15

Technology