grokking techtalk #21: deep learning in computer vision
TRANSCRIPT
[email protected],2017
DangHuynhEducation
• Ph.D.inComputerScience(France)
Work• Jan2017– now:AxonEnterprise• 2015– 2016:Misfit• 2011– 2015:NokiaBellLabs
Researchdomains• Machinevision.• Datascience.• Telecommunicationsystems.
Axon Enterprise
Misfit
Nokia Bell Labs2/43
!=
WeareAXON!
3/43
Outline
•Refresh•Computervision•DeeplearninginComputervision•Theoryvs.Reality•Demo
4/43
RefreshMachinelearningandDeeplearning
5/43
MachinelearningInputdataà predictionmodelà outputlabel
y
x
y=F(x)x0
y0?
6/43
MachineLearningy=4x13 - 2x22 +8
x2
f(x)=x3x1
f(x)=x2
+1
y
weight=1
0
0
1
4
-2
8
7/43
MachineLearning
Challenges• Relevantdataacquisition• Datapreprocessing• Featureselection• Modelselection:simplicityversuscomplexity• Resultinterpretation.
8/43
DeepLearning• MachineLearningwithmany(deep)hiddenlayers
x2
x1
+1
+1
+1
y1
y2
HiddenlayersInput Output9/43
Whydeeplearning?
Amountofdata
Perfo
rmance
Deeplearning
Machinelearning
10/43
ComputerVisionintro
11/43
Makecomputersunderstandimagesandvideo:- Detection- Recognition- Tracking- Extraction
ComputerVision
Object detection 12/43
Stilltherearechallenges:objectcanbe…
ComputerVision
… partlyoccluded
… orevenfullyoccluded.
13/43
ChallengeWe were building a human detector, and we accidentally got future human detector!
14/43
15/43
TraditionalapproachDeeplearningapproach
has two eyes?
has a nose below eyes?
Ok, it’s a face!
…..
Feature engineering NO feature engineering
Traditionalapproachvs.Deeplearning
16/43
ImageNet: 1.2 million images with 1000 object categories
Source:http://pattern-recognition.weebly.com/
Deep learningTradition
DeepLearning in ComputerVision
17/43
ComputerVisionWhatcomputersees
Red43 45 2113 34 12
23 88 55
Green19 89 2717 57 29
75 56 94
Blue19 89 2717 57 29
75 56 94
y=F(Red,Green,Blue)
3-Dinputarray
Facialdetection
18/43
Intuition
x2
x1
+1
+1
+1
y1
y2
HiddenlayersInput Output
Facialdetection
Green
Red
Blue
19/43
ConvolutionalNeuralNetwork(CNN)Idea:havingafilterscanningoverimage.
Outputmatrix
Inputmatrix(e.g.,image)Filter(grey)
Source:https://github.com/vdumoulin/conv_arithmetic
Convolutionalprocess
20/43
CNN – StridingandPaddingControlhowthefilterconvolvesaroundtheinputmatrix.
Outputmatrix
Inputmatrix(e.g.,image)
Filter(grey)
Source:https://github.com/vdumoulin/conv_arithmetic
Stride=2,Zero-padding=121/43
Convolutionaloperation
0 1 1 1 0 0 00 0 1 1 1 0 00 0 0 1 1 1 00 0 0 1 1 0 00 0 1 1 0 0 00 1 1 0 0 0 01 1 0 0 0 0 0
1 0 10 1 01 0 1
1 4 3 4 11 2 4 3 31 2 3 4 11 3 3 1 13 3 1 1 0
5x5Output
3 x3Filter
7x7Input
* =
Input [height1,width1,#ofchannels]Filter [height2,width2,#ofchannels]Output [height3,width3,#offilters] 22/43
RectifiedLinearUnit(ReLU)
ReLU:F(y)=max(0,y)
-3 2 01 -1 0
-5 2 4
0 2 01 0 0
0 2 4
ReLU
Non-linearactivationfunction.
23/43
MaxPooling
1 0 2 3
4 6 6 8
3 1 1 0
1 2 2 4
6 8
3 4
Reducedimensionandavoidoverfitting.
Maxpoolwith2x2filterandstride2
24/43
Example
Input24x24x3
11x11x28 4x4 x48 3x3x64
face/non-face
boundingboxregression
2
4
Conv:3x3MP:2x2
Conv:3x3MP:3x3
Conv:2x2 Fullyconnected
128
SupposethatallMaxPooling(MP)layerhasstride2.
Input:24 x24 x3Conv:3 x3 x3MP:2x2(stride2)à Outputdimension(24 – 3 +1)/2=11
25/43
Objectscales• Detectobjectofvarioussizes.
Source:https://www.pyimagesearch.com
Input
Tradeoffs?
scansover
26/43
Dataaugmentation• Generatemoreartificialdatapointsfrombasedata.
•Applywithcare tootherdatatypes!
Original Little noise Moderate Heavy noise
27/43
Complexdataaugmentation
Face rotation28/43
Whydataaugmentation?
WITHOUT augmentation
AXON detection
WITH augmentation
29/43
Howtobenchmark?
Facebook detection 30/43
Theoryvs.Reality
31/43
DeeplearninginComputerVisionPros:• DLreducestheneedforfeatureengineering.• DLoutperformsclassicalComputerVisionapproaches.
Cons:• DLrequiresahugeamountofdata(>100Ksamples).• DLisextremelycomputationallyexpensivetotrain(weeksonGPUs).• DLmodelstructureisablackbox.
32/43
Performancevs.Portability
Theory Reality
33/43
Performancevs.Powerconsumption
Theory Reality
Portable battery34/43
SpecialhardwareforDeepLearning
Jetson TX2 (NVDIA) Google TPU Movidius Myriad
• Optimizedforspecificusecase.• Notplug-and-play,needgoodengineerstomakeitwork.
Stillfarfromconsumer…35/43
Privacy
• Thepoliceareourcustomers,sodataprivacyisimportant.• Canwe“extractfeatures”fromtheprivatedata?
36/43
Demo
37/43
Workflowandtoolset
38/43
Skinblurring
39/43
Facialdetectionwithtracking
40/43
Licenseplatedetection
41/43
TakeHomemessage
42/43
Industryperspective
Alwaysconsiderthefollowing4Ps:• Performance• Powerconsumption• Portability• Price
Deeplearningisnotamagic:tradeoffalwaysexists!
43/43
Thankyou
44/43
WeareHiring
FullStack,ResearchEngineers,Security.
https://jobs.lever.co/axon
45/43