tensorflow: what and why? - meetupfiles.meetup.com/18200471/meetup3_4.pdf · python api, c++ core...

TensorFlow: what and why?

Konstantin Shmelkov

Grenoble Data Science Meetup

14 Sep 2016

Konstantin Shmelkov (Grenoble Data Science Meetup)TensorFlow: what and why? 14 Sep 2016 1 / 20

What is TensorFlow?

From the whitepaper: “TensorFlow is an interface for expressing machinelearning algorithms, and an implementation for executing such algorithms”.

Data flow graph

Pictures are from colah.github.io

Data flow graph

Back in the past: Theano (2009)

Framework mostly developed in LISA group at the University of Montreal.

Features:

symbolic differentiation,

transparent use of GPU,

dynamic code generation (C and CUDA),

everything in Python!

Main challenges of deep learning framework: flexible, distributed,easy-to-use.

Back in the past: Theano (2009)

Framework mostly developed in LISA group at the University of Montreal.

Features:

transparent use of GPU,

dynamic code generation (C and CUDA),

everything in Python!

Main challenges of deep learning framework: flexible, distributed,easy-to-use.

Theano: flexible, distributed, easy-to-use.

TensorFlow features

distributed on all levels: real multithreading, multi-GPU, multiplecluster nodes,

freshly designed API,

Python API, C++ core (Eigen),

powerful visualization with TensorBoard,

model deployment with TensorFlow Serving,

integration with Google Cloud Platform.

TensorFlow features

Core abstractions

This is a data flow graph.

x is a Placeholder.W and b are Variables.Everything else are intermediate Tensors.

import tensorflow as tf

x = tf.placeholder(tf.float32 ,

shape=[None , 784])

W = tf.Variable(tf.random_normal ([784 , 10]))

b = tf.Variable(tf.zeros ([10]))

C = tf.nn.relu(tf.matmul(x, W) + b)

Core abstractions

This is a data flow graph.x is a Placeholder.

W and b are Variables.Everything else are intermediate Tensors.

shape=[None , 784])

Core abstractions

This is a data flow graph.x is a Placeholder.W and b are Variables.

Everything else are intermediate Tensors.

shape=[None , 784])

Core abstractions

This is a data flow graph.x is a Placeholder.W and b are Variables.Everything else are intermediate Tensors.

shape=[None , 784])

Core abstractions

shape=[None , 784])

Core abstractions

shape=[None , 784])

Let’s compute that!

import numpy as np

batch = np.random.randn (128, 784)

with tf.Session () as sess:

sess.run(tf.initialize_all_variables ())

val = sess.run(C, feed_dict ={x: batch})

Let’s compute that!

import numpy as np

val = sess.run(C, feed_dict ={x: batch})

Simple network example

shape=[None , 784])

C = tf.matmul(x, W) + b

# C = tf.nn.relu(tf.matmul(x, W) + b)

y = tf.placeholder(tf.int64 , shape=[None])

xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(C, y)

loss = tf.reduce_mean(xentropy)

optimizer = tf.train.GradientDescentOptimizer (1e-3)

train_op = optimizer.minimize(loss)

Simple network example

shape=[None , 784])

C = tf.matmul(x, W) + b

# C = tf.nn.relu(tf.matmul(x, W) + b)

y = tf.placeholder(tf.int64 , shape=[None])

xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(C, y)

loss = tf.reduce_mean(xentropy)

optimizer = tf.train.GradientDescentOptimizer (1e-3)

train_op = optimizer.minimize(loss)

Let’s optimize that!

import numpy as np

labels = np.random.randint (10, size =128)

train_loss , _ = sess.run([loss , train_op],

feed_dict ={x: batch , y=labels })

ConvNet example in TF Slim

import tensorflow.contrib.slim as slim

with slim.arg_scope ([slim.conv2d , slim.fully_connected],

activation_fn=tf.nn.relu ,

weights_initializer=tf.truncated_normal_initializer (0.0, 0.01),

weights_regularizer=slim.l2_regularizer (0.0005)):

net = slim.conv2d(net , 64, [3, 3], scope=’conv1’)

net = slim.max_pool2d(net , [2, 2], scope=’pool1’)

net = slim.fully_connected(net , 1024, scope=’fc1’)

net = slim.dropout(net , 0.5, scope=’dropout1 ’)

net = slim.fully_connected(net , 10, activation_fn=None ,

scope=’fc2’)

Too many GPUs?

shape=[None , 784])

How to stop worrying and start using multi-GPU?

Model parallelism

with tf.device(’/gpu:1’):

shape=[None , 784])

Data parallelism scheme

Data parallelism code example

Data parallelism

with tf.device(’/cpu:0’):

x1 = tf.placeholder(tf.float32 ,

shape =[None , 784])

x2 = tf.placeholder(tf.float32 ,

shape =[None , 784])

C1 = tf.nn.relu(tf.matmul(x1, W) + b)

C2 = tf.nn.relu(tf.matmul(x2, W) + b)

Scopes in TF

Flexible system of hierarchical structures

arg scope — redefine default arguments for enclosed functions.

name scope — group intermediate Tensors together.

variable scope — facilitate variable reuse to build complicated graphswith tied weights (implies a name scope).

Examples of variable scope:

/myNetwork/convLayer2/weights

/myNetwork/convLayer2/BatchNorm/gamma

/vgg16/fc7/biases

Scopes in TF

/vgg16/fc7/biases

Scopes in TF

/vgg16/fc7/biases

Scopes in TF

/vgg16/fc7/biases

Tensorboard demo

LIVEFasten your seat belts and such

Other goodies

Queues and preprocessing: data loading, data augmentation, batchshuffling can be easily offloaded to TF threads.

Other goodies

Checkpoints: computational graph and variables can be easily saved ortransferred across the network.

TensorFlow Serving

Comparison

Caffe Theano Torch DL4J TensorFlow

RNN kind of Yes Yes Yes Yes

multi-GPU C++ only Yes Yes Yes Yes

multi-node No No kind of Yes Yes

API C++, Python Python Lua Java C++,Matlab Python

autodiff No Yes Recently No Yes

extensibility No Yes Yes ? Yes

Thank you!

Thank you for your time!Any questions?

Contact info: [email protected]

tensorflow: what and why? - meetupfiles.meetup.com/18200471/meetup3_4.pdf · python api, c++ core...

Documents