deep learning for robotics
TRANSCRIPT
A fast, scalable deep learning platform
Deep Learning for RoboticsYinyin Liu, PhD
MAKING MACHINES SMARTER.
now part of
Proprietary and confidential. Do not distribute.
Nervana Systems Proprietary A little background of who we are, what we do.
Nervanas deep learning stack2
Nervana Systems Proprietary 3Back-propagationEnd-to-endResnetImageNetNLPRegularizationConvolutionUnrollingRNNGeneralizationhyperparametersVideo recognitiondropoutPoolingLSTMAlexNetAuto-encoderneonhttps://github.com/NervanaSystems/neon
Nervanas deep learning tutorials:https://www.nervanasys.com/deep-learning-tutorials/
We are hiring!https://www.nervanasys.com/careers/
Nervana Systems Proprietary
Like to see a show of hands:who has heard of some or most of these terminologieswho uses DL for their day-to-day work?
Outline4What is Deep Learning and What Can It Do Today?How DL helps Robotics?Deep Reinforcement LearningFinding the Right Frameworks For You
Nervana Systems Proprietary
In this talk, we will cover several topics as follows. Start with .
After each of the topics, I will be happy to pause the take any questions you have.
5What can deep learning do today
Nervana Systems Proprietary
In the last four years, deep learning has made its way into the heart of almost every AI application. And across all these domains, deep learning architectures have been very successful, blowing away the competition. On the surface, it seems almost unreasonably effective across these domains.
First image: Tumor positive, negative
6
Object localization and segmentationhttps://www.nervanasys.com/industry-focus-serving-the-automotive-industry-with-the-nervana-platform/
Nervana Systems Proprietary
End-to-end object localization system.Model on the left is able to learn to map each of the pixel into a category. Model on the right is able to generate bounding boxes for the precise location of the detected objects.
Video recognition with 3D convolution
7
Nervana Systems Proprietary Video recognition using 3D convolution.
Deep speech 28
Nervana Systems Proprietary
End to end Speech Recognition using Deep Learning
Why is this cool? - ASR systems around for a long time - End to end means starting with raw spectrogram data, a deep neural net can produce text- without any hand-engineering of features- network learns features from data
Deep Reinforcement Learning9http://www.nervanasys.com/deep-reinforcement-learning-with-neon/
https://youtu.be/KkIf0Ok5GCE
Nervana Systems Proprietary
Combine Deep network with RL, a network is able to learn to play games from scratch. Similar technology being developed further by DeepMind became the algorithm behind AlphaGo.
What is deep learning?10Historical perspective:Input designed features outputInput designed features SVM outputInput learned features SVM outputInput levels of learned features output
Nervana Systems Proprietary Things like HOG and SIFT features. Nearest neighbor clustering, L2 distance.
Able to replace the nonlinear classifier by a linear classifier, because the multi layer model is nonlinear.
11What is deep learning?
~60 million parameters
Positive/negativeEnd-to-end learning
Raw image inputOutput
Nervana Systems Proprietary
Historical perspectiveInput designed features outputInput designed features SVM outputInput learned features SVM outputInput levels of learned features output
One key features of DL is that the model tends to be a end-to-end system, in the sense that, for example, if we want to classify an image, we pass the NxN raw pixels directly to a deep neural network as input, and we provide the desired output labels, and if all goes well, the model will automatically find the right features from the data.
* conceptual shift in thinking, from: "how do you engineer the best features?" to "how do you guide the model towards finding the best features?
but many old practices from machine learning still apply!
The end-to-end nature of deep learning allows it a wide range of applicability. One could substitue videos instead of images, for example or speech, or text, .. The output could be object identity, hair style, or a set of optimal moves in a video games.
What is deep learning?12A method for extracting features at multiple levels of abstractionFeatures are discovered from dataPerformance improves with more dataNetwork can express complex transformationsHigh degree of representational power
Nervana Systems Proprietary
What is deep learning?13
(Zeiler and Fergus, 2013)
Nervana Systems Proprietary
With DL: Features are learned from data rather than hand-engineered- at multiple levels of representation- L1 lines/bands- L3 intricate patterns (honeycomb)- L5 faces
A method for extracting features at multiple levels of abstractionFeatures are discovered from dataPerformance improves with more dataNetwork can express complex transformationsHigh degree of representational power
What is deep learning?
Source: ImageNetImageNet top 5 error rate
Deep learning techniques
human performanceNo free lunchlots of dataflexible and fast frameworkspowerful computing resources14
Nervana Systems Proprietary
15Healthcare: Tumor detection
Automotive: Speech interfaces
Finance: Time-series search engine
Positive:Negative:Agricultural Robotics
Oil & Gas
Positive:Negative:
Proteomics: Sequence analysis
Query:
Results:
Nervana in action
Nervana Systems Proprietary
Here are just some of the examples where the Nervana platform has been applied to real-world problems such as Detecting tumors in healthcare, counting plants in agricultural robotics, finding oil rich regions in seismic data, building better speech interfaces in cars, building a timeseries search engine for finance, and engineering better organisms through amino acid sequence analysis
Outline16What is Deep Learning and What Can It Do Today?How DL helps Robotics?Deep Reinforcement LearningFinding the Right Frameworks For You
Nervana Systems Proprietary
In this talk, we will cover several topics as follows. Start with .
After each of the topics, I will be happy to pause the take any questions you have.
Robotic vision17
Image classificationObject localization
Image segmentation
Nervana Systems Proprietary The DL progress on computer vision obviously help robotic vision as well. For models that can do image segmentation and object localization, they definitely help a robot or an agent with scene understand and navigate around an environment.
Robot vision18
pepper
jibo
Robot base
FURo-i
Cubic
Budgee
BrantoechoroombaConsumer robots for companionship and home service
Nervana Systems Proprietary Robotic vision19
https://www.autonomous.ai/personal-robotDL-based computer vision solutions help robot to navigate around a home and understand the scene and localize everyday objects.
Nervana Systems Proprietary Robotic NLU20
DL-based NLP/NLU solutions help robot to understand verbal commands and interact with users
Nervana Systems Proprietary Inevitably, the natural language processing is an instrumental piece as well. A module as part of the robot should able to understand human command and execute on the command
Robot movement21But most of the consumer robots either do not move or move around on a base
Home robots are still far from providing home service, e.g. cooking, cleaning, taking care of people.
Robot movement is a difficult
It is challenging for robot to know how to interact with objects, not to mention having the level of dexterity of human
Nervana Systems Proprietary Deep learning for grasping22Trying to tackle the problem of robotic grasping
14 Separate robots to collect data in parallel, 800k grasp attempts collected, over 7 months
Each grasp consists of T time steps. At the end of the T, grasp success is evaluated. Then T samples of (image, current pose, success label) data are collected
No human labelling needed!
Levine et.al (2016)
https://research.googleblog.com/2016/03/deep-learning-for-robots-learning-from.html
Nervana Systems Proprietary
Deep learning for grasping23Prediction network: CNN learn to predict the outcome of a grasp, given An image before grasp beginsAn image at current timeA motor command - 3D translation vector
https://arxiv.org/pdf/1603.02199v4.pdf
Nervana Systems Proprietary
Deep learning for grasping24Servoing mechanism:User the predictor networkChoose the motor commands from a pool of samples with the best scorePrediction network
scorescorescorescore
Nervana Systems Proprietary Deep learning for grasping25
End-to-end learningwhat are objects vs. gripperwhat is the right orientation to graspwhat is the right motor command
Learn from repetitively trials
A useful training paradigm is RL
Nervana Systems Proprietary
Outline26What is Deep Learning and What Can It Do Today?How DL helps Robotics?Deep Reinforcement LearningFinding the Right Frameworks For You
Nervana Systems Proprietary
In this talk, we will cover several topics as follows. Start with .
After each of the topics, I will be happy to pause the take any questions you have.
Deep reinforcement learning27RL defines the goal, reward, training paradigmDL gives the mechanics RL + DL = AI*
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
* By David Silver
Nervana Systems Proprietary Deep RL28End-to-end learningRaw perceptionOutput
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf
Nervana Systems Proprietary As one network approximating the Q value, and output layer represents values for each action, the algorithm deals with discrete and finite-set of actions only.
Deep RL in neon29
https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
https://www.nervanasys.com/deep-reinforcement-learning-with-neon/
Nervana Systems Proprietary Deep RL in neon30
https://github.com/tambetm/simple_dqn
Nervana Systems Proprietary Deep RL in neon31
https://github.com/tambetm/simple_dqn
Nervana Systems Proprietary Code makes it look real
Deep RL32As one network approximating the Q value, and output layer represents values for each action, the algorithm deals with discrete and small finite-set of actions only. Apply actor-critic architecture to continuous action spaceAdd BatchNorm help to generalize to different problemsHigh-dimensional tasks simulated in MuJoCo.Race game simulated using Torcs.Lillicrap et. al. (Deepmind, ICLR 2016)https://arxiv.org/pdf/1509.02971v5.pdf
Nervana Systems Proprietary Outline33What is Deep Learning and What Can It Do Today?How DL helps Robotics?Deep Reinforcement LearningFinding the Right Frameworks For You
Nervana Systems Proprietary
In this talk, we will cover several topics as follows. Start with .
After each of the topics, I will be happy to pause the take any questions you have.
Challenges34To make progress on robotics:Need a lot of data to improve on executing tasksNeed interaction with the environment- costly for real world experiments- need simulator for a variety of tasksNeed benchmarks- ImageNet drove a lot of progress for the vision problems in supervised learning- lack of standardized environment, tasks, or metrics for RL publications and comparison
Nervana Systems Proprietary RL benchmarks35
https://arxiv.org/pdf/1604.06778v3.pdf
Nervana Systems Proprietary RL benchmarks36
https://www.nervanasys.com/openai/
Nervana Systems Proprietary neonTM overviewLayersLinear,Convolution,Pooling,Deconvolution,Dropout,Recurrent, LongShort-TermMemory,GatedRecurrentUnit,BatchNorm,LookupTable, LocalResponseNormalization,Bidirectional-RNN,Bidirectional-LSTMBackendNervanaGPU, NervanaCPU, NervanaMGPUDatasetsMNIST, CIFAR-10, Imagenet 1K, PASCAL VOC, Mini-Places2, IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize, UCF101, flickr8k, flickr30k, COCOInitializersConstant,Uniform,Gaussian,GlorotUniform,Xavier,Kaiming,IdentityInit, OrthonormalOptimizersGradientDescentwithMomentum,RMSProp,AdaDelta,Adam,Adagrad,MultiOptimizerActivationsRectifiedLinear,Softmax,Tanh,Logistic,Identity,ExpLinCostsBinaryCrossEntropy,MulticlassCrossEntropy,SumofSquaresErrorMetricsMisclassification (Top1,TopK),LogLoss,Accuracy,PrecisionRecall,ObjectDetection
37
Nervana Systems Proprietary
Framework attributes38neonTheanoCaffeTorchTensorFlowAcademic ResearchBleeding-edgeCurated modelsIteration TimeInference speedPackage ecosystemSupport
Nervana Systems Proprietary
So where do the various frameworks match on to these attributes?
39
BenchmarksThird-party (Facebook) benchmarking
Nervana Systems Proprietary
Training is the pain point, and we are 2-3x faster than any other framework out there. That is today; when are new chip comes out we are going to be 10-20x faster than the nearest competitor. Models that originally would take you 1 month to train will take