deep learning image classification aplicado al mundo de la moda
TRANSCRIPT
Robert Figiel Co-Founder & CTO
Deep LearningImage Classification
Deep LearningImage
Classification
Javier AbadíaLead Developer
WHAT DO WE DO AT STYLESAGE?
Web-Crawling of 100M+ e-commerce products daily.
Analysis of text, machine learning, image-recognition
Visualize insights for fashion brands & retailers
Collect Data Analyze Products Visualize Insights
CHALLENGE: CLASSIFY PRODUCTS FROM IMAGES
• Category: Dress
SOLUTION: CONVOLUTIONAL NEURAL NETWORKS (CNN)
Input(Image Data)
BLACK BOX(for now)
ConvolutionalNeural Network
Output(Probability Vector)
• Dress : 94.8%• Skirt: 4.1%• Jacket: 1.2%• Pant: 0.1%• Socks: 0.01%• ...
TRADITIONAL COMPUTING
algorithminput output
MACHINE LEARNING
model training
input
output
algorithm
new input
new output
MACHINE LEARNING - CLASSIFICATION
Features ClassesSupervised Learning
MACHINE LEARNING - CLASSIFICATION
• Supervised Learning– Decision Trees– Bayesian Algorithms– Regression– Clustering– Neural Networks– …
http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
LETTER RECOGNITION
28x28 – gray levels
784
LOGISTIC CLASSIFIERWX+b=Y
784 x 35 + =784 35 35
weights inputfeatures
bias scores
35
probabilities
P = softmax(Y)
GRADIENT DESCENT
CODE USING python/scikit-learn""" based on http://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html """
import numpy as npfrom sklearn import linear_model, metrics
N = 50000X = np.array([x.flatten() for x in data['train_dataset'][:N]])Y = data['train_labels'][:N]
solver = 'sag'C = 0.001
# trainlogreg = linear_model.LogisticRegression(C=C, solver=solver)logreg.fit(X, Y)
# testVX = np.array([x.flatten() for x in data['test_dataset']])predicted_labels = logreg.predict(VX)print "%.3f" % (metrics.accuracy_score(predicted_labels, data['test_labels']),)
CODE WITH tensorflowimport tensorflow as tf
graph = tf.Graph()with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
# Variables. weights = tf.Variable( tf.truncated_normal([image_size * image_size, num_labels])) biases = tf.Variable(tf.zeros([num_labels]))
# Training computation. logits = tf.matmul(tf_train_dataset, weights) + biases loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
# Optimizer. optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
LINEAR METHODS ARE LIMITED TO LINEAR RELATIONSHIPS
PX ✕W1 +b1 Y s(Y)
PX ✕W1 +b1 Y s(Y)✕W2 +b2
activationfunction(RELU)
CODE WITH tensorflowimport tensorflow as tf
graph = tf.Graph()with graph.as_default(): # Input data placeholder tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size * image_size)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
# Variables weights1 = tf.Variable(tf.truncated_normal([image_size * image_size, n_hidden_moves])) biases1 = tf.Variable(tf.zeros([n_hidden_moves]))
weights2 = tf.Variable(tf.truncated_normal([n_hidden_moves, num_labels])) biases2 = tf.Variable(tf.zeros([num_labels]))
# Training model logits1 = tf.matmul(tf_train_dataset, weights1) + biases1 relu_output = tf.nn.relu(logits1) logits2 = tf.matmul(relu_output, weights2) + biases2 loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits2, tf_train_labels))
# Optimizer optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
NN VS CNNNeural Network
(ANY numeric input)Convolutional Neural
Network(IMAGE input)
DEEPLEARNINGMANY LAYERS FOR HIGHER ACCURACY
Example: GoogLeNet architecture (2014), 22 layers
Example: AlexNet (2012)8 layers
ME ABURRO
CHOOSING A MODEL – OPEN SOURCE OPTIONS
• AlexNet (2012)• 8 layers, 16.4% error rate on ImageNet
• GoogLeNet (2014)• 22 layers, 6.66% error rate on ImageNet
• Google Inception v3 (2015)• 48 layers, 3.46% error rate
• Microsoft ResNet (2015)• 152 layers, 3.57% error rate
Yearly competition on ImageNet dataset with 1M images across 1000 object classes – models available open source
Many models open source.No need to re-invent the wheel.
FRAMEWORK – OPEN SOURCE OPTIONS
• Caffe• Developed by UC Berkley, Very efficient algorithms• Implemented GoogLeNet, ResNet• Large community
• Tensorflow• Released 2015 by Google• Ready-to-use Implementions of GoogLeNet, Inception v3• Tensorboard for visualizing training progress
• Torch, Theano, Keras, ...
Many Python frameworks available, all with many examples, good documentation and pre-implemented models
Chose a Python Frame-work that fits your needs
IMPLEMENTING A CNNMODEL – TRAIN - PREDICT
Select / DevelopMODEL
TRAIN/TEST modelwith known images PREDICT
on new Images
Feedback loop Additional Training Data
INFRASTRUCTURE – GPUSUnderlying CNN computations are mainly matrix multiplications GPUs (Graphical Processing Unit) 30-50X faster than CPUs
1 CPU: 2 sec1 GPU: 50ms30-50X faster
vs.
Use GPU based servers for faster training and predictions
THANK YOU – WE ARE RECRUITING!Team Slide - recruiting
www.stylesage.co/careers [email protected]
GRACIAS!