gpu computing for data science

34
GPU Computing for Data Science John Joo [email protected] Data Science Evangelist @ Domino Data Lab

Upload: domino-data-lab

Post on 20-Mar-2017

14.312 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: GPU Computing for Data Science

GPU Computing for Data Science

John Joo

[email protected]

Data Science Evangelist @ Domino Data Lab

Page 2: GPU Computing for Data Science

Outline• Why use GPUs?

• Example applications in data science

• Programming your GPU

Page 3: GPU Computing for Data Science

Case Study:Monte Carlo Simulations• Simulate behavior when randomness

is a key component

• Average the results of many simulations

• Make predictions

Page 4: GPU Computing for Data Science

Little Information in One “Noisy Simulation”Price(t+1) = Price(t) e InterestRate•dt + noise

Page 5: GPU Computing for Data Science

Many “Noisy Simulations” ➡ Actionable InformationPrice(t+1) = Price(t) e InterestRate•dt + noise

Page 6: GPU Computing for Data Science

Monte Carlo Simulations Are Often Slow

• Lots of simulation data is required to create valid models

• Generating lots of data takes time

• CPU works sequentially

Page 7: GPU Computing for Data Science

CPUs designed for sequential, complex tasks

Source: Mythbusters https://youtu.be/-P28LKWTzrI

Page 8: GPU Computing for Data Science

GPUs designed for parallel, low level tasks

Source: Mythbusters https://youtu.be/-P28LKWTzrI

Page 9: GPU Computing for Data Science

GPUs designed for parallel, low level tasks

Source: Mythbusters https://youtu.be/-P28LKWTzrI

Page 10: GPU Computing for Data Science

Applications of GPU Computing in Data Science• Matrix Manipulation

• Numerical Analysis

• Sorting

• FFT

• String matching

• Monte Carlo simulations

• Machine learning

• Search

Algorithms for GPU Acceleration

• Inherently parallel

• Matrix operations

• High FLoat-point Operations Per Sec (FLOPS)

Page 11: GPU Computing for Data Science

GPUs Make Deep Learning Accessible

Google Datacenter Stanford AI Lab

# of machines 1,000 3

# of CPUs or GPUs 2,000 CPUs 12 GPUs

Cores 16,000 18,432

Power used 600 kW 4 kW

Cost $5,000,000 $33,000

Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, Ng Andrew ; JMLR W&CP 28 (3) : 1337–1345, 2013

Page 12: GPU Computing for Data Science

CPU vs GPU Architecture: Structured for Different Purposes

CPU4-8 High Performance Cores

GPU100s-1000s of bare bones cores

Page 13: GPU Computing for Data Science

Both CPU and GPU are required

CPU GPU

Compute intensive functions

Everything else

General Purpose GPU Computing (GPGPU)Heterogeneous Computing

Page 14: GPU Computing for Data Science

Getting Started: Hardware

• Need a computer with GPU

• GPU should not be operating your display

Spin up a GPU/CPU computer with 1 click.

8 CPU cores, 15 GB RAM1,536 GPU cores, 4GB RAM

Page 15: GPU Computing for Data Science

Getting Started: Hardware

Page 16: GPU Computing for Data Science

Programming CPU

• Sequential

• Write code top to bottom

• Can do complex tasks

• Independent

Programming GPU

• Parallel

• Multi-threaded - race conditions

• Low level tasks

• Dependent on CPU

Getting Started: Software

Page 17: GPU Computing for Data Science

Talking to your GPU

CUDA and OpenCL are GPU computing frameworks

Page 18: GPU Computing for Data Science

Choosing How to Interface with GPU:Simplicity vs FlexibilityApplication

specific libraries

General purpose GPU

libraries

Custom CUDA/

OpenCL code

Flexibility

Simplicity

Low

Low

High

High

Page 19: GPU Computing for Data Science

Application Specific LibrariesPython

• Theano - Symbolic math

• TensorFlow - ML

• Lasagne - NN

• Pylearn2 - ML

• mxnet - NN

• ABSsysbio - Systems Bio

R

• cudaBayesreg - fMRI

• mxnet - NN

• rpud -SVM

• rgpu - bioinformatics

Tutorial on using Theano, Lasagne, and no-learn:http://blog.dominodatalab.com/gpu-computing-and-deep-learning/

Page 20: GPU Computing for Data Science

General Purpose GPU Libraries

• Python and R wrappers for basic matrix and linear algebra operations

• scikit-cuda

• cudamat

• gputools

• HiPLARM

• Drop-in library

Page 21: GPU Computing for Data Science

Drop-in Library

Credit: NVIDIA

Also works for Python!http://scelementary.com/2015/04/09/nvidia-nvblas-in-numpy.html

Page 22: GPU Computing for Data Science

Custom CUDA/OpenCL Code1. Allocate memory on the GPU

2. Transfer data from CPU to GPU

3. Launch the kernel to operate on the CPU cores

4. Transfer results back to CPU

Page 23: GPU Computing for Data Science

Example of using Python and CUDA:Monte Carlo Simulations

• Using PyCuda to interface Python and CUDA

• Simulating 3 million paths, 100 time steps each

Page 24: GPU Computing for Data Science

Python Code for CPU

Python/PyCUDA Code for GPU

8 more lines of code

Page 25: GPU Computing for Data Science

Python Code for CPU

Python/PyCUDA Code for CPU

1. Allocate memory on the GPU

Page 26: GPU Computing for Data Science

Python Code for CPU

Python/PyCUDA Code for CPU

2. Transfer data from CPU to GPU

Page 27: GPU Computing for Data Science

Python Code for CPU

Python/PyCUDA Code for CPU

3. Launch the kernel to operate on the CPU cores

Page 28: GPU Computing for Data Science

Python Code for CPU

Python/PyCUDA Code for CPU

4. Transfer results back to CPU

Page 29: GPU Computing for Data Science

Python Code for CPU

26 sec

Python/PyCUDA Code for CPU

8 more lines of code1.5 sec

17x speed up

Page 30: GPU Computing for Data Science

Some sample Jupyter notebooks• https://app.dominodatalab.com/johnjoo/gpu_examples

• Monte Carlo example using PyCUDA

• PyCUDA example compiling CUDA C for kernel instructions

• Scikit-cuda example of matrix multiplication

• Calculating a distance matrix using rpud

Page 31: GPU Computing for Data Science

More resources• NVIDIA

• https://developer.nvidia.com/how-to-cuda-python• Berkeley GPU workshop

• http://www.stat.berkeley.edu/scf/paciorek-gpuWorkshop.html

• Duke Statistics on GPU (Python)• http://people.duke.edu/~ccc14/sta-663/

CUDAPython.html• Andreas Klockner’s webpage (Python)

• http://mathema.tician.de/• Summary of GPU libraries

• http://fastml.com/running-things-on-a-gpu/

Page 32: GPU Computing for Data Science

More resources• Walk through of CUDA programming in R

• http://blog.revolutionanalytics.com/2015/01/parallel-programming-with-gpus-and-r.html

• List of libraries for GPU computing in R• https://cran.r-project.org/web/views/

HighPerformanceComputing.html• Matrix computations in Machine Learning

• http://numml.kyb.tuebingen.mpg.de/numl09/talk_dhillon.pdf