deep learning image classification aplicado al mundo de la moda

Robert Figiel Co-Founder & CTO

Deep LearningImage Classification

Deep LearningImage

Classification

Javier AbadíaLead Developer

WHAT DO WE DO AT STYLESAGE?

Web-Crawling of 100M+ e-commerce products daily.

Analysis of text, machine learning, image-recognition

Visualize insights for fashion brands & retailers

Collect Data Analyze Products Visualize Insights

CHALLENGE: CLASSIFY PRODUCTS FROM IMAGES

• Category: Dress

SOLUTION: CONVOLUTIONAL NEURAL NETWORKS (CNN)

Input(Image Data)

BLACK BOX(for now)

ConvolutionalNeural Network

Output(Probability Vector)

• Dress : 94.8%• Skirt: 4.1%• Jacket: 1.2%• Pant: 0.1%• Socks: 0.01%• ...

TRADITIONAL COMPUTING

algorithminput output

MACHINE LEARNING

model training

input

output

algorithm

new input

new output

MACHINE LEARNING - CLASSIFICATION

Features ClassesSupervised Learning

MACHINE LEARNING - CLASSIFICATION

• Supervised Learning– Decision Trees– Bayesian Algorithms– Regression– Clustering– Neural Networks– …

http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/

LETTER RECOGNITION

28x28 – gray levels

784

LOGISTIC CLASSIFIERWX+b=Y

784 x 35 + =784 35 35

weights inputfeatures

bias scores

35

probabilities

P = softmax(Y)

GRADIENT DESCENT

CODE USING python/scikit-learn""" based on http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html """

import numpy as npfrom sklearn import linear_model, metrics

N = 50000X = np.array([x.flatten() for x in data['train_dataset'][:N]])Y = data['train_labels'][:N]

solver = 'sag'C = 0.001

# trainlogreg = linear_model.LogisticRegression(C=C, solver=solver)logreg.fit(X, Y)

# testVX = np.array([x.flatten() for x in data['test_dataset']])predicted_labels = logreg.predict(VX)print "%.3f" % (metrics.accuracy_score(predicted_labels, data['test_labels']),)

CODE WITH tensorflowimport tensorflow as tf

graph = tf.Graph()with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

# Variables. weights = tf.Variable( tf.truncated_normal([image_size * image_size, num_labels])) biases = tf.Variable(tf.zeros([num_labels]))

# Training computation. logits = tf.matmul(tf_train_dataset, weights) + biases loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

# Optimizer. optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

LINEAR METHODS ARE LIMITED TO LINEAR RELATIONSHIPS

PX ✕W1 +b1 Y s(Y)

PX ✕W1 +b1 Y s(Y)✕W2 +b2

activationfunction(RELU)

CODE WITH tensorflowimport tensorflow as tf

graph = tf.Graph()with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

# Variables weights1 = tf.Variable(tf.truncated_normal([image_size * image_size, n_hidden_moves])) biases1 = tf.Variable(tf.zeros([n_hidden_moves]))

weights2 = tf.Variable(tf.truncated_normal([n_hidden_moves, num_labels])) biases2 = tf.Variable(tf.zeros([num_labels]))

# Training model logits1 = tf.matmul(tf_train_dataset, weights1) + biases1 relu_output = tf.nn.relu(logits1) logits2 = tf.matmul(relu_output, weights2) + biases2 loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits2, tf_train_labels))

# Optimizer optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

NN VS CNNNeural Network

(ANY numeric input)Convolutional Neural

Network(IMAGE input)

DEEPLEARNINGMANY LAYERS FOR HIGHER ACCURACY

Example: GoogLeNet architecture (2014), 22 layers

Example: AlexNet (2012)8 layers

ME ABURRO

CHOOSING A MODEL – OPEN SOURCE OPTIONS

• AlexNet (2012)• 8 layers, 16.4% error rate on ImageNet

• GoogLeNet (2014)• 22 layers, 6.66% error rate on ImageNet

• Google Inception v3 (2015)• 48 layers, 3.46% error rate

• Microsoft ResNet (2015)• 152 layers, 3.57% error rate

Yearly competition on ImageNet dataset with 1M images across 1000 object classes – models available open source

Many models open source.No need to re-invent the wheel.

FRAMEWORK – OPEN SOURCE OPTIONS

• Caffe• Developed by UC Berkley, Very efficient algorithms• Implemented GoogLeNet, ResNet• Large community

• Tensorflow• Released 2015 by Google• Ready-to-use Implementions of GoogLeNet, Inception v3• Tensorboard for visualizing training progress

• Torch, Theano, Keras, ...

Many Python frameworks available, all with many examples, good documentation and pre-implemented models

Chose a Python Frame-work that fits your needs

IMPLEMENTING A CNNMODEL – TRAIN - PREDICT

Select / DevelopMODEL

TRAIN/TEST modelwith known images PREDICT

on new Images

Feedback loop Additional Training Data

INFRASTRUCTURE – GPUSUnderlying CNN computations are mainly matrix multiplications GPUs (Graphical Processing Unit) 30-50X faster than CPUs

1 CPU: 2 sec1 GPU: 50ms30-50X faster

vs.

Use GPU based servers for faster training and predictions

https://aiexperiments.withgoogle.com/what-neural-nets-see

THANK YOU – WE ARE RECRUITING!Team Slide - recruiting

www.stylesage.co/careers [email protected]

GRACIAS!

deep learning image classification aplicado al mundo de la moda

Software