braxton mckee, founder & ceo, ufora at mlconf sf - 11/13/15
TRANSCRIPT
Ufora @ MLConfBraxton McKee, CEO & Founder
Why should I have to write a different program for 1000 rows or 1
billion?
Our Vision: Simplified Distributed Computing• Using lots of machines should be as easy as using one.• Enable scalable, fast machine learning and data
processing• Parallelism should be natural, come from the language
itself
I want to treat the cloud like it’s one big, fast, desktop.
What is Ufora?Auto-parallel, compiled, multi-host python
Key Components• JIT Compiled• Implicit Parallelism at the language level• Fault tolerant• Automatic co-location of data and compute
We are now open source!•5 years of work by ~ 5 engineers•~350k lines of code•Apache 2.0 License•Hosted on GitHub
Sound Familiar?• Similar approach to JIT Compilation• Scalable but without frameworks like MapReduce• Package that works easily with existing python workflow
How do I use it?Install the client pip install pyfora
pyfora_aws start … --num-instances 4or
docker run … ufora/service
import pyforaufora = pyfora.connect('http://<ip_address>:30000’)with ufora.remotely:
#your code here
Get some workers
In your python program
How do I use it?def isPrime(p):
if p < 2: return 0
x = 2
while x*x <= p:
if p%x == 0: return 0
x = x + 1
return 1
result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))
~1 hour
How do I use it?def isPrime(p):
if p < 2: return 0
x = 2
while x*x <= p:
if p%x == 0: return 0
x = x + 1
return 1
with ufora.remote:
result = sum(isPrime(x) for x in xrange(100 * 1000 * 1000))
~10 secs
What do you give up?•No mutability of data-structures•No side-effects•No nondeterminism• Emphasize “functional” programming style
ArchitectureWorker Nodes S3/HDFS
Gateway Node
PyFora Client
def filter(v,f):if len(v) == 0:
return []
if len(v) == 1:return v if f(v[0]) else []
mid = len(v)/2
return filter(v[:mid],f) + filter(v[mid:],f)
primes = filter(range(100000000),isPrime)
Naturally parallel(divide and conquer)
Implicit Parallelism
CORE #1
CORE #2 CORE #3 CORE #4
0 – 25M 25M – 50M 50M – 75M 75M – 100M
100M Integers
0 – 50M 50M – 100M
filter(v, isPrime)
Splitting
Adaptive Parallelism
How do we know where to put the data?
Answer: React dynamically as the program runs
Watch running threads to see what blocks of data they’re accessing.
Move threads to data, or data to threads, depending on what’s cheaper.
Detect when two blocks of data absolutely have to be on the same machine.
Build a statistical model of correlations between block accesses.
Place data to minimize expected future number of machine boundary crossings.
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
A simple example
v = range(0, 2*10**9)
Red boxes are blocks of data
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Computation starts on Machine 1
When the computation exhausts the data on one machine, the runtime moves it to the next
for x in v:state = f(state,x)
But real access patterns are more complex!
User writes
Now the computation is looking at all pairs v[i] and v[i+10]
res = 0def f(x,y):
# some functionfor i in xrange(0, len(v)-10):
res = res + f(v[i], v[i+10])
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
But when the computation reaches the end of block 4, v[i] and v[i+10] aren’t on the same machine!
At first, everything is OK, since v[ix] and v[ix+10] are close to each other in the data
Every time we have to move the computation, we’re hitting the network.
Block 4 on Machine 1
Block 5 on Machine 2v[ix]
v[ix+10]
This is really slow!
Machine 1
Machine 2
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Machine 3
Machine 4
Solution: Replicate blocks so that they overlap
5
9
13
Data can live on two different machines at the same time because its immutable!
Project Roadmap: Current Version (0.1)• Coverage of core python2.7 language.• Run locally (using docker) or in AWS• Import pyfora and go!
Project Roadmap: Upcoming Release (0.2)• Core numpy and dataframe implementations (in python)• Coverage for some core scikit data science algorithms
(gbm, regressions, etc.)• Better error handling, lots of bugfixes
Project Roadmap: the future• Python 3 support• Execution of arbitrary python code out-of-process (for
non-pure code we don't want to port)• More generic model for import/export of data from the
cluster.• Enabling better feedback in the pyfora api for tracking
progress of computations.• Support for running calculations on GPU
Ufora is Auto-Parallel, Multi-Host Python• Star/fork the repo: github.com/ufora/ufora
• Contribute to the codebase
• Find me after this presentation
• Tell us what we should build next. This affects our priorities!!!
• Email me: [email protected]