braxton mckee, founder & ceo, ufora at mlconf sf - 11/13/15

25
Ufora @ MLConf Braxton McKee, CEO & Founder

Upload: mlconf

Post on 26-Jan-2017

908 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Ufora @ MLConfBraxton McKee, CEO & Founder

Page 2: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Why should I have to write a different program for 1000 rows or 1

billion?

Page 3: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Our Vision: Simplified Distributed Computing• Using lots of machines should be as easy as using one.• Enable scalable, fast machine learning and data

processing• Parallelism should be natural, come from the language

itself

I want to treat the cloud like it’s one big, fast, desktop.

Page 4: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

What is Ufora?Auto-parallel, compiled, multi-host python

Key Components• JIT Compiled• Implicit Parallelism at the language level• Fault tolerant• Automatic co-location of data and compute

Page 5: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

We are now open source!•5 years of work by ~ 5 engineers•~350k lines of code•Apache 2.0 License•Hosted on GitHub

Page 6: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Sound Familiar?• Similar approach to JIT Compilation• Scalable but without frameworks like MapReduce• Package that works easily with existing python workflow

Page 7: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

How do I use it?Install the client pip install pyfora

pyfora_aws start … --num-instances 4or

docker run … ufora/service

import pyforaufora = pyfora.connect('http://<ip_address>:30000’)with ufora.remotely:

#your code here

Get some workers

In your python program

Page 8: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

How do I use it?def isPrime(p):

if p < 2: return 0

x = 2

while x*x <= p:

if p%x == 0: return 0

x = x + 1

return 1

result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))

~1 hour

Page 9: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

How do I use it?def isPrime(p):

if p < 2: return 0

x = 2

while x*x <= p:

if p%x == 0: return 0

x = x + 1

return 1

with ufora.remote:

result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))

~10 secs

Page 10: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

What do you give up?•No mutability of data-structures•No side-effects•No nondeterminism• Emphasize “functional” programming style

Page 11: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

ArchitectureWorker Nodes S3/HDFS

Gateway Node

PyFora Client

Page 12: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

def filter(v,f):if len(v) == 0:

return []

if len(v) == 1:return v if f(v[0]) else []

mid = len(v)/2

return filter(v[:mid],f) + filter(v[mid:],f)

primes = filter(range(100000000),isPrime)

Naturally parallel(divide and conquer)

Implicit Parallelism

Page 13: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

CORE #1

CORE #2 CORE #3 CORE #4

0 – 25M 25M – 50M 50M – 75M 75M – 100M

100M Integers

0 – 50M 50M – 100M

filter(v, isPrime)

Splitting

Adaptive Parallelism

Page 14: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

How do we know where to put the data?

Page 15: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Answer: React dynamically as the program runs

Watch running threads to see what blocks of data they’re accessing.

Move threads to data, or data to threads, depending on what’s cheaper.

Detect when two blocks of data absolutely have to be on the same machine.

Build a statistical model of correlations between block accesses.

Place data to minimize expected future number of machine boundary crossings.

Page 16: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

A simple example

v = range(0, 2*10**9)

Red boxes are blocks of data

Page 17: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

Computation starts on Machine 1

When the computation exhausts the data on one machine, the runtime moves it to the next

for x in v:state = f(state,x)

Page 18: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

But real access patterns are more complex!

User writes

Now the computation is looking at all pairs v[i] and v[i+10]

res = 0def f(x,y):

# some functionfor i in xrange(0, len(v)-10):

res = res + f(v[i], v[i+10])

Page 19: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

But when the computation reaches the end of block 4, v[i] and v[i+10] aren’t on the same machine!

At first, everything is OK, since v[ix] and v[ix+10] are close to each other in the data

Page 20: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Every time we have to move the computation, we’re hitting the network.

Block 4 on Machine 1

Block 5 on Machine 2v[ix]

v[ix+10]

This is really slow!

Page 21: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Machine 1

Machine 2

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Machine 3

Machine 4

Solution: Replicate blocks so that they overlap

5

9

13

Data can live on two different machines at the same time because its immutable!

Page 22: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Project Roadmap: Current Version (0.1)• Coverage of core python2.7 language.• Run locally (using docker) or in AWS• Import pyfora and go!

Page 23: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Project Roadmap: Upcoming Release (0.2)• Core numpy and dataframe implementations (in python)• Coverage for some core scikit data science algorithms

(gbm, regressions, etc.)• Better error handling, lots of bugfixes

Page 24: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Project Roadmap: the future• Python 3 support• Execution of arbitrary python code out-of-process (for

non-pure code we don't want to port)• More generic model for import/export of data from the

cluster.• Enabling better feedback in the pyfora api for tracking

progress of computations.• Support for running calculations on GPU

Page 25: Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15

Ufora is Auto-Parallel, Multi-Host Python• Star/fork the repo: github.com/ufora/ufora

• Contribute to the codebase

• Find me after this presentation

• Tell us what we should build next. This affects our priorities!!!

• Email me: [email protected]