ディープラーニングフレームワーク とchainerの実装

50
ディープラーニングフレームワーク とChainerの実装 2016/03/07 PPL2016@岡山 招待講演 (株)Preferred Networks 奥田 遼介

Upload: ryosuke-okuta

Post on 23-Jan-2017

5.252 views

Category:

Engineering


2 download

TRANSCRIPT

  • Chainer

    2016/03/07 PPL2016@

    Preferred Networks

  • -2014

    2014

    2014-

    ChainerCuPy

    v1.7(2016/3/1)

    2/50

  • Define and RunDefine by Run

    CuPyChainer

    Chainer

    3/50

  • x1

    xN

    h1

    hH

    kM

    k1

    yM

    y1

    Forward

    Backward

    50%

    5/50

  • DLDeepLearning

    2012

    201420151500*

    DeepMind AlphaGo 40Lee Sedol

    201422NN [Google]

    *http://memkite.com/deep-learning-bibliography/

    2015152NN [MSRA]

    6/50

  • ILSVRC

    28.2

    25.8

    16.4

    11.7

    6.7 5.985.1 4.94 4.82

    3.56

    0

    5

    10

    15

    20

    25

    30

    Deep Learning

    7/50

  • chainer-DCGAN 300NNhttps://github.com/mattya/chainer-DCGAN

    8/50

  • 2

    9/50

  • 1

    10/50

  • 11/50

  • 12/50

  • DAG =

    13/50

  • z = x ** 2 + 2 * x * y + y

    x

    y

    _ ** 2

    2 * _ _ * _ _ + _ z

    _ + _

    14/50

  • 15/50

  • 1.

    2.

    3.

    16/50

  • Linear

    L2

    Linear

    L1

    MNIST & Chainer

    3 L1 = L.Linear(784, n_units)L2 = L.Linear(n_units, 10))

    def forward(self, x):

    h1 = F.relu(L1(x))

    return L2(h1)

    h1

    W bias

    W bias

    ReLU

    17/50

  • Recurrent Net

    t=T t=T-1 t=T

    T

    T-1

    T

    18/50

  • Recurrent Net

    DAG

    DAG Backprop

    Through Time

    t=1

    t=2

    t=3

    t=4

    19/50

  • SGDAdam

    21/50

  • 2015

    C/C++, Python, R, Matlab, Julia

    GPU CUDAOpenCL

    GPU/

    22/50

  • DL

    Python C++ Lua Python C++/Python

    Preferred Networks/Infrastructure

    BVLC Idiap Research

    Institute,

    DeepMind

    Univ. of

    Montreal

    Google

    ()

    RNN/LSTM

    DSL (prototxt)

    DSL (YAML)

    DSLPython

    LuaJIT

    GPUgRPC

    23/50

  • SymbolicImperative

    Theano, TensorFlow

    Caffe, cxxnet

    Chainer

    Torch, Minerva

    #A = Variable('A')

    B = Variable('B')

    C = B * A

    D = C + Constant(1)

    # compiles the function

    f = compile(D)

    d = f(A=numpy.ones(10),

    B=numpy.ones(10) * 2)

    #a = numpy.ones(10)

    b = numpy.ones(10) * 2

    c = b * a

    d = c + 1

    24/50

  • 1

    GPU

    TensorFlow

    #A = Variable('A')

    B = Variable('B')

    C = B * A

    D = C + Constant(1)A

    DB

    B * A + 1

    A

    B

    DC

    B * A C + 1

    25/50

  • a = numpy.ones(10)

    b = numpy.ones(10) * 2

    c = b * a

    d = 0

    for i in range(c):

    d += c[i] + i

    26/50

  • Q. A. Recursive Neural Network

    Recurrent NetRecursive Net

    ChainerExample

    x1 x2

    p1

    x3

    p2

    p1 = f(x1, x2)

    p2 = f(p1, x3)

    27/50

  • DSL

    or

    MXNet

    28/50

  • Chainer

    Define by Run

    backward

    Define and Run

    TensorFlow, Theano, Torch nn

    a = numpy.ones(10)

    b = numpy.ones(10) * 2

    c = b * a

    d = c + 1

    a

    b

    dc

    b * a c + 1

    Define by Run

    29/50

  • Define by Run

    NN

    NN

    OK

    NN

    30/50

  • NN

    GPUTesla K80@24GB

    150

    GPU

    31/50

  • NN

    1001 Nvidia

    t=1

    t=2

    t=3

    t=4

    t=1

    t=2

    A4 B 2

    32/50

  • Define and RunDefine by Run

    GPU

    33/50

  • CuPyChainer

  • CuPy

    CUDANumPy

    Chainer v1.5.0 174

    cuBLAS

    reshape

    elementwise, reduction

    PythonGPU

    PythonNumPy

    PCGPUCUDA

    CUDANumPy CuPy35/50

  • CuPy

    Torch(), Eigen::Tensor

    BLAS(OpenBLAS, MKL)

    CUDA(cuBLAS, cuDNN)

    DL

    36/50

  • CuPy

    numpy cupy

    CPU/GPU NumPy CuPy logsumexp

    def logsumexp(x, axis=None):

    xp = cuda.get_array_module(x) #x_max = x.max(axis)

    exp_sum = xp.exp(x - x_max).sum(axis)

    return x_max + xp.log(exp_sum)

    37/50

  • CuPy

    def test(xp):

    a = xp.arange(1000000).reshape(1000, -1)

    return a.T * 2

    test(numpy)

    t1 = datetime.datetime.now()

    for i in range(1000):

    test(numpy)

    t2 = datetime.datetime.now()

    print(t2 -t1)

    test(cupy)

    t1 = datetime.datetime.now()

    for i in range(1000):

    test(cupy)

    t2 = datetime.datetime.now()

    print(t2 -t1)

    [ms]

    NumPy 2929 1.0

    CuPy 585 5.0

    CuPy +Memory Pool

    123 23.8

    Intel Core i7-4790 @3.60GHz,

    32GB, GeForce GTX 970

    38/50

  • CuPy1/2

    Cython

    PythonC200

    NumPy

    NumPy

    NumPy

    83

    NumPy

    39/50

  • CuPy2/2

    CUDA

    nvcc

    A+B=C

    0~25 27

    8311^3=1331

    Chainer11595

    40/50

  • CuPy

    3

    cupy.core cupy.cuda Cython

    CUDA(cuBLAS, cuRNAD, cuDNN)

    ndarray

    ufunc, elementwise, reduction

    CUDA Python wrapper cupy.cuda

    cupy.core

    cupy

    41/50

  • Pythonndarray C++CArray

    template //

    class CArray {

    private:

    T* data_; //GPUint size_; //CArrayint shape_[ndim]; //int strides_[ndim]; //

    } //

    42/50

  • GPU

    PythonC++

    ElementwiseKernel

    ReductionKernel

    Reduce

    ufunc

    NumPyElementwise

    43/50

  • ElementwiseKernel

    2

    squared_diff = cupy.ElementwiseKernel(

    float32 x, float32 y, #float32 z, #z = (x - y) * (x - y), #squared_diff) #

    squared_diff(cupy.arange(10), 10)

    44/50

  • 1

    squared_diff = cupy.ElementwiseKernel(

    T x, T y, //T z, //z = (x - y) * (x - y), //squared_diff) //

    45/50

  • Elementwise

    Python

    ${preamble}

    extern "C" __global__ void ${name}(${params})

    {

    ${loop_prep};

    CUPY_FOR(i, _ind.size()){ //_ind.set(i); //${operation}; //

    }

    ${after_loop};

    }

    46/50

  • cupy.add(x, y)

    NumPy

    args, kwdargs

    device

    add

    CUDA

    CUDA

    CUDA

    47/50

  • CuPy

    GPU

    Malloc, Free

    NumPy

    Chainer

    48/50

  • CuPy

    Chainer

    NumPy

    49/50

  • MXNet

    Programming Models for Deep Learning

    http://mxnet.readthedocs.org/en/latest/program_model.html

    GPU

    http://www.slideshare.net/NVIDIAJapan/gpu-51812528

    CuPy

    http://www.slideshare.net/ryokuta/cupy

    NumPy

    http://www.slideshare.net/ryokuta/numpy-57587130

    We are Hiring!

    https://www.preferred-networks.jp/job_ja50/50

    http://mxnet.readthedocs.org/en/latest/program_model.htmlhttp://www.slideshare.net/NVIDIAJapan/gpu-51812528http://www.slideshare.net/ryokuta/cupyhttp://www.slideshare.net/ryokuta/numpy-57587130