computer vision...computer vision from traditional approaches to deep neural networks stanislav...
TRANSCRIPT
Computer Vision
From traditional approaches to deep neural networks
Stanislav Frolov München, 27.02.2018
● Computer vision● Human vision● Traditional approaches and methods● Artificial neural networks● Summary
2
Outline of this talkWhat we are going to talk about
● trained deep neural networks for object detection during master thesis
● still fascinated and interested
3
Stanislav Frolov
Big Data Engineer @inovex
● Teach computers how to see● Automatic extraction, analysis and understanding of
images● Infer useful information, interpret and make decisions● Automate tasks that human visual system can do● One of the most exciting fields in AI and ML
4
What is computer visionGeneral
5
What is computer visionMotivation
● Era of pixels● Internet consists
mostly of images● Explosion of visual
data● Cannot be labeled
by humans
6
What is computer visionDrivers
● Two drivers for computer vision explosion○ Compute (faster and cheaper)○ Data (more data > algorithms)
7
What is computer visionInterdisciplinary field
Computer Science
Mathematics
Engineering
Physics
BiologyPsychology
Information Retrieval
Machine LearningGraphs,
Algorithms
Systems Architecture
Robotics
Speech, NLP
Image Processing
OpticsSolid-State Physics
Neuroscience
Cognitive SciencesBiological vision
Synonyms?
8
● Imaging for statistical pattern recognition● Image transformations such as pixel-by-pixel operations
○ Contrast enhancement○ Edge extraction○ Noise reduction○ Geometrical and spatial operations (i.e rotations)
9
What is computer visionRelated fields - image processing
● Creates new images from scene descriptions● Produces image data from 3D models● “Inverse” of computer vision● AR as a combination of both
10
What is computer visionRelated fields - computer graphics
● Mainly manufacturing applications● Image-based automatic inspection, process control,
robot guidance● Usually employs strong assumptions (colour, shape,
light, structure, orientation, ...) -> works very well● Output often pass/fail or good/bad● Additionally numerical/measurement data, counts
11
What is computer visionRelated fields - machine vision
● Create “intelligent” systems● Studying computational aspects of intelligence● Make computers do things at which, at the moment,
people are better● Many techniques play an important role (ML, ANNs)● Currently does a few things better/faster at scale than
humans can● Ability to do anything “human” is not answered
12
What is computer visionRelated fields - AI
● Related fields have a large intersection● Basic techniques used, developed and studied are very
similar
13
What is computer visionRelated fields- summary
Short trip to human vision
14
● Two stage process○ Eyes take in light reflected off the objects and retina
converts 3D objects into 2D images○ Brain’s visual system interprets 2D images and “rebuilds”
a 3D model
15
What is human visionGeneral
● Pair of 2D images with slightly different view allows to infer depth
● Position of nearby objects will vary more across the two images than the position of more distant objects
16
What is human visionStereoscopic vision
● Prior knowledge of relative sizes and depths is often key for understanding and interpretation
17
What is human visionPrior knowledge
● Texture and texture change helps solving depth perception
18
What is human visionTexture pattern
19
What is human visionBiases and illusions in human perception
● Shadows make all the difference in interpretation● Gradual changes in light ignored to not be misled by
shadow
20
What is human visionA few more illusions
● Two arrows with different orientations have the same length
● Assumptions and familiarity (distorted room)● Face recognition bias● Up-down orientation bias
21
What is human visionBiases and illusions in human perception
22
What is human visionSummary
● Illusions are fun, but the complete puzzle to understand human vision is far from being complete
Back to computer vision
23
● Recognition● Localization● Detection● Segmentation
24
What is computer visionTypical tasks
● Part-based detection○ Deformable parts model○ Pose estimation and poselets
25
What is computer visionTypical tasks
● Image captioning (actions, attributes)
26
What is computer visionTypical tasks
● Motion analysis○ Egomotion (camera)○ Optical flow (pixels)
27
What is computer visionTypical tasks
● Scene understanding and reconstruction
28
What is computer visionTypical tasks
● Image restoration● Colouring black & white photos
29
What is computer visionTypical tasks
Solving this is useful for many applications
30
31
What is computer visionTypical applications
● Assistance systems for cars and people● Surveillance● Navigation (obstacle avoidance, road following, path
planning)● Photo interpretation● Military (“smart” weapons)● Manufacturing (inspection, identification)● Robotics● Autonomous vehicles (dangerous zones)
32
What is computer visionTypical applications
● Recognition and tracking● Event detection● Interaction (man-machine interfaces)● Modeling (medical, manufacturing, training, education)● Organizing (database index, sorting/clustering)● Fingerprint and biometrics● …
Why so difficult?
33
34
What is computer visionWhy it is difficult
● Occlusion● Deformation● Scale● Clutter● Illumination● Viewpoint● Object pose
● Tons of classes and variants
● Often n:1 mapping● Computationally
expensive● Full understanding of
biological vision is missing
System overview
35
● Input: image(s) + labels● Output: Semantic data, labels
● Digital image pixels usually have three channels [R,G,B] each [0...255] + Location[x,y]
● Digital images are just vectors
36
What is computer visionSystem overview
1. Image acquisition (camera, sensors)2. Pre-processing (sampling, noise reduction,
augmentation)3. Feature extraction (lines, edges, regions, points)4. Detection and segmentation5. Post-processing (verification, estimation, recognition)6. Decision making● -> Ability of a machine to step back and interpret the big
picture of those pixels37
What is computer visionSystem overview
Some history
38
1950s
● 2D imaging for statistical pattern recognition● Theory of optical flow based on a fixed point
towards which one moves
39
What is computer visionHistory
Image processing
● Histograms● Filtering● Stitching● Thresholding● ...
40
What is computer visionTraditional approaches
1960s
● Desire to extract 3D structure from 2D images for scene understanding
● Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots
● Summer vision project at MIT: attach camera to computer and having it “describe what it saw”
41
What is computer visionHistory
● Given to 10 undergraduate students● … an attempt to use our summer workers effectively … ● … construction of a significant part of a visual system … ● … task can be segmented into sub-problems … ● … participate in the construction of a system complex
enough to be a real landmark in the development of “pattern recognition” …
42
What is computer visionHistory: summer vision project @MIT 1966
● Goal: analyse scenes and identify objects● Structure of system:
○ Region proposal○ Property lists for regions○ Boundary construction○ Match with properties○ Segment
● Basic foreground/background segmentation with simple objects (cubes, cylinders, ….)
43
What is computer visionHistory: summer vision project @MIT 1966
● Unlike general intelligence, computer vision seemed tractable
● Amusing anecdote, but it did never aimed to “solve” computer vision
● Computer vision today differs from what it was thought to be in 1966
44
What is computer visionHistory: summer vision project @MIT 1966
1970s
● Formed many algorithms that exist today● Edges, lines and objects as interconnected
structures
45
What is computer visionHistory
46
What is computer visionTraditional approaches
Edge detection based on
● Brightness● Gradients● Geometry● Illumination
47
What is computer visionTraditional approaches - part based detector
● Objects composed of features of parts and their spatial relationship
● Challenge: how to define and combine
1980s
● More rigorous mathematical analysis and quantitative aspects
● Optical character recognition● Sliding window approaches● Usage of artificial neural networks
48
What is computer visionHistory
49
What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)
● Concept in 80s but used only in 2005● Create HOG descriptors (object generalizations)● One feature vector per object● Train with SVM● Sliding window @multiple scales
50
What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)
● Computation of HOG descriptors:
1. Compute gradients2. Compute histograms on cells3. Normalize histograms4. Concatenate histograms
● Requires a lot of engineering● Must build ensembles of feature descriptors
1990s
● Significant interaction with computer graphics (rendering, morphing, stitching)
● Approaches using statistical learning● Eigenface (Ghostfaces) through principal component
analysis (PCA)
51
What is computer visionHistory
52
What is computer visionTraditional approaches - deformable parts model (DPM)
● Objects constructed by its parts● First match whole object, then refine on the parts● HOG + part-based + modern features ● Slow but good at difficult objects● Involves many heuristics
53
What is computer visionFeatures
● Feature points○ Small area of pixels with certain properties
● Feature detection○ Use features for identification○ Activate if “object” present
● Examples:○ Lines, edges, colours, blobs, …○ Animals, faces, cars, ...
54
What is computer visionTraditional approaches - classical recognition
● Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels
● Inference: extract features from query image and find closest match in database or train a classifier
● Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches
55
What is computer visionHistory
Before the new era
● Bags of features● Handcrafted ensembles
Input Feat. 2
Feat. 1
Feat. n
FinalDecision
Feature Extraction
The new era of computer vision
56
● Elementary building block
● Inspired by biological neurons
● Mathematical function y=f(wx+b)
● Learnable weights
57
Artificial neural networksFundamentals - artificial neuron
● Collection of neurons organized in layers
● Universal approximators
● Fully-connected network here
58
Artificial neural networksFundamentals - artificial neural networks
59
Artificial neural networksFundamentals - training
● Basically an optimization problem
● Find minimum of a loss function by an iterative process (training)
● Designing the loss function is sometimes tricky
60
Artificial neural networksFundamentals - training
Simple optimizer algorithm:
1. Forward pass with a batch of data2. Calculate error between actual and wanted output3. Nudge weights in proportion to error into the right
direction (same data would result in smaller error)4. Repeat until convergence
61
Artificial neural networksFundamentals - CNN
● Local neighborhood contributes to activation
● Exploit spatial information
● Hierarchical feature extractors
● Less parameters input
activation
filters
receptive field
62
Artificial neural networksFundamentals - CNN
● Filter of size 3x3 applied to an input of 7x7
63
Artificial neural networksFundamentals - pooling
● Max-pooling● Dimension reduction/adaption● Existence is more important than location
64
Artificial neural networksFundamentals - pooling
● Zero-padding● Controlling dimensions
65
Artificial neural networksFundamentals - general network architecture
Input image
convolutional layers
... Final decision
66
Artificial neural networksFundamentals - hierarchical feature extractors
Lines, edges, blobs, colours, ...
Abstract objectsParts of abstract objects
First layers Deeper layers
Activations for:
Modern history of object recognition
67
● Classification and detection○ 27k images○ 20 classes
■ person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor
68
BenchmarkDatasets - PASCAL VOC
● Challenges on a subset of ImageNet○ 14kk labeled images○ 20k object categories
● ILSVRC* usually on 10k categories including 90 out of 120 dog breeds
69
BenchmarkDatasets - ImageNet
*ImageNet Large Scale Visual Recognition Challenge
● ILSVRC 2012 winner by a large margin from 25% to 16%● Proved effectiveness of CNNs and kicked of a new era● 8 layers, 650k neurons, 60kk parameters
70
Artificial neural networksRoadmap - AlexNet
● ILSVRC 2013 winner with a best top-5 error of 11.6%● AlexNet but using smaller 7x7 kernels to keep more
information in deeper layers
71
Artificial neural networksRoadmap - ZFNet
● ILSVRC 2013 localization winner● Uses AlexNet on multi-scale input images with sliding
window approach● Accumulates bounding boxes for final detection (instead
of non-max suppression)
72
Artificial neural networksRoadmap - OverFeat
● 2k proposals generated by selective search● SVM trained for classification● Multi-stage pipeline
73
Artificial neural networksRoadmap - RCNN (region based CNN)
● Not a winner but famous due to simplicity and effectiveness
● Replace large-kernel convolutions by stacking several small-kernel convolutions
74
Artificial neural networksRoadmap - VGGNet
● ILSVRC 2014 winner● Stacks up “inception” modules● 22 layers, 5kk parameters
75
Artificial neural networksRoadmap - InceptionNet (GoogleNet)
● Jointly learns region proposal and detection● Employs a region of interest (RoI) that allows to reuse
the computations
76
Artificial neural networksRoadmap - Fast RCNN
● Directly predicts all objects and classes in one shot● Very fast● Processes images at ~40 FPS on a Titan X GPU● First real-time state-of-the-art detector● Divides input images into multiple grid cells which are
then classified
77
Artificial neural networksRoadmap - YOLO (you only look once)
● ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%)
● Employs residual blocks which allows to build deep networks (hundreds of layers)
● Additional identity mapping
78
Artificial neural networksRoadmap - ResNet (Microsoft)
● Not a recognition network● A region proposal network● Popularized prior/anchor boxes (found through
clustering) to predict offsets● Much better strategy than starting the predictions with
random coordinates● Since then heuristic approaches have been gradually
fading out and replaced
79
Artificial neural networksRoadmap - MultiBox
● Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox
● RPN shares full-image convolutional features with the detection network (cost-free region proposal)
● RPN uses “attention” mechanism to tell where to look● ~5 FPS on a Titan K40 GPU● End-to-end training
80
Artificial neural networksRoadmap - Faster RCNN
● SSD leverages the Faster RCNN’s RPN to directly classify objects inside each prior box (similar to YOLO)
● Predicts category scores and box offsets for a fixed set of default bounding boxes
● Fixes the predefined grid cells used in YOLO by using multiple aspect ratios
● Produces predictions of different scales● ~59 FPS
81
Artificial neural networksRoadmap - SSD (single shot multibox detector)
● Open-source software library for machine learning applications
● Tensorflow Object Detection API○ A collection of pretrained models○ construct, train and deploy object detection models
82
Artificial neural networksTensorFlow object detection API
Summary
83
● Humans are good at understanding the big picture● Neural networks are good at details● But they can be fooled...
84
SummaryHuman vs machine
● Need a large amount data● Lots of engineering● Trial and error● Long training time● Still lots of hyperparameter parameter tuning● No general network (generalization not answered)● Little mathematical foundation
85
SummaryComputer vision is still difficult
● Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized
86
SummaryComputer vision is hard
Thank You
Stanislav Frolov
Big Data Engineer
0173 318 11 35
inovex GmbH
Lindberghstraße 3
80939 München