computer vision...computer vision from traditional approaches to deep neural networks stanislav...

Computer Vision

From traditional approaches to deep neural networks

Stanislav Frolov München, 27.02.2018

● Computer vision● Human vision● Traditional approaches and methods● Artificial neural networks● Summary

2

Outline of this talkWhat we are going to talk about

● trained deep neural networks for object detection during master thesis

● still fascinated and interested

3

Stanislav Frolov

Big Data Engineer @inovex

● Teach computers how to see● Automatic extraction, analysis and understanding of

images● Infer useful information, interpret and make decisions● Automate tasks that human visual system can do● One of the most exciting fields in AI and ML

4

What is computer visionGeneral

5

What is computer visionMotivation

● Era of pixels● Internet consists

mostly of images● Explosion of visual

data● Cannot be labeled

by humans

6

What is computer visionDrivers

● Two drivers for computer vision explosion○ Compute (faster and cheaper)○ Data (more data > algorithms)

7

What is computer visionInterdisciplinary field

Computer Science

Mathematics

Engineering

Physics

BiologyPsychology

Information Retrieval

Machine LearningGraphs,

Algorithms

Systems Architecture

Robotics

Speech, NLP

Image Processing

OpticsSolid-State Physics

Neuroscience

Cognitive SciencesBiological vision

Synonyms?

8

● Imaging for statistical pattern recognition● Image transformations such as pixel-by-pixel operations

○ Contrast enhancement○ Edge extraction○ Noise reduction○ Geometrical and spatial operations (i.e rotations)

9

What is computer visionRelated fields - image processing

● Creates new images from scene descriptions● Produces image data from 3D models● “Inverse” of computer vision● AR as a combination of both

10

What is computer visionRelated fields - computer graphics

● Mainly manufacturing applications● Image-based automatic inspection, process control,

robot guidance● Usually employs strong assumptions (colour, shape,

light, structure, orientation, ...) -> works very well● Output often pass/fail or good/bad● Additionally numerical/measurement data, counts

11

What is computer visionRelated fields - machine vision

● Create “intelligent” systems● Studying computational aspects of intelligence● Make computers do things at which, at the moment,

people are better● Many techniques play an important role (ML, ANNs)● Currently does a few things better/faster at scale than

humans can● Ability to do anything “human” is not answered

12

What is computer visionRelated fields - AI

● Related fields have a large intersection● Basic techniques used, developed and studied are very

similar

13

What is computer visionRelated fields- summary

Short trip to human vision

14

● Two stage process○ Eyes take in light reflected off the objects and retina

converts 3D objects into 2D images○ Brain’s visual system interprets 2D images and “rebuilds”

a 3D model

15

What is human visionGeneral

● Pair of 2D images with slightly different view allows to infer depth

● Position of nearby objects will vary more across the two images than the position of more distant objects

16

What is human visionStereoscopic vision

● Prior knowledge of relative sizes and depths is often key for understanding and interpretation

17

What is human visionPrior knowledge

● Texture and texture change helps solving depth perception

18

What is human visionTexture pattern

19

What is human visionBiases and illusions in human perception

● Shadows make all the difference in interpretation● Gradual changes in light ignored to not be misled by

shadow

20

What is human visionA few more illusions

● Two arrows with different orientations have the same length

● Assumptions and familiarity (distorted room)● Face recognition bias● Up-down orientation bias

21

What is human visionBiases and illusions in human perception

22

What is human visionSummary

● Illusions are fun, but the complete puzzle to understand human vision is far from being complete

Back to computer vision

23

● Recognition● Localization● Detection● Segmentation

24

What is computer visionTypical tasks

● Part-based detection○ Deformable parts model○ Pose estimation and poselets

25


● Image captioning (actions, attributes)

26


● Motion analysis○ Egomotion (camera)○ Optical flow (pixels)

27


● Scene understanding and reconstruction

28


● Image restoration● Colouring black & white photos

29


Solving this is useful for many applications

30

31

What is computer visionTypical applications

● Assistance systems for cars and people● Surveillance● Navigation (obstacle avoidance, road following, path

planning)● Photo interpretation● Military (“smart” weapons)● Manufacturing (inspection, identification)● Robotics● Autonomous vehicles (dangerous zones)

32

What is computer visionTypical applications

● Recognition and tracking● Event detection● Interaction (man-machine interfaces)● Modeling (medical, manufacturing, training, education)● Organizing (database index, sorting/clustering)● Fingerprint and biometrics● …

Why so difficult?

33

34

What is computer visionWhy it is difficult

● Occlusion● Deformation● Scale● Clutter● Illumination● Viewpoint● Object pose

● Tons of classes and variants

● Often n:1 mapping● Computationally

expensive● Full understanding of

biological vision is missing

System overview

35

● Input: image(s) + labels● Output: Semantic data, labels

● Digital image pixels usually have three channels [R,G,B] each [0...255] + Location[x,y]

● Digital images are just vectors

36

What is computer visionSystem overview

1. Image acquisition (camera, sensors)2. Pre-processing (sampling, noise reduction,

augmentation)3. Feature extraction (lines, edges, regions, points)4. Detection and segmentation5. Post-processing (verification, estimation, recognition)6. Decision making● -> Ability of a machine to step back and interpret the big

picture of those pixels37

What is computer visionSystem overview

Some history

38

1950s

● 2D imaging for statistical pattern recognition● Theory of optical flow based on a fixed point

towards which one moves

39

What is computer visionHistory

Image processing

● Histograms● Filtering● Stitching● Thresholding● ...

40

What is computer visionTraditional approaches

1960s

● Desire to extract 3D structure from 2D images for scene understanding

● Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots

● Summer vision project at MIT: attach camera to computer and having it “describe what it saw”

41


● Given to 10 undergraduate students● … an attempt to use our summer workers effectively … ● … construction of a significant part of a visual system … ● … task can be segmented into sub-problems … ● … participate in the construction of a system complex

enough to be a real landmark in the development of “pattern recognition” …

42

What is computer visionHistory: summer vision project @MIT 1966

● Goal: analyse scenes and identify objects● Structure of system:

○ Region proposal○ Property lists for regions○ Boundary construction○ Match with properties○ Segment

● Basic foreground/background segmentation with simple objects (cubes, cylinders, ….)

43


● Unlike general intelligence, computer vision seemed tractable

● Amusing anecdote, but it did never aimed to “solve” computer vision

● Computer vision today differs from what it was thought to be in 1966

44


1970s

● Formed many algorithms that exist today● Edges, lines and objects as interconnected

structures

45


46

What is computer visionTraditional approaches

Edge detection based on

● Brightness● Gradients● Geometry● Illumination

47

What is computer visionTraditional approaches - part based detector

● Objects composed of features of parts and their spatial relationship

● Challenge: how to define and combine

1980s

● More rigorous mathematical analysis and quantitative aspects

● Optical character recognition● Sliding window approaches● Usage of artificial neural networks

48


49

What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)

● Concept in 80s but used only in 2005● Create HOG descriptors (object generalizations)● One feature vector per object● Train with SVM● Sliding window @multiple scales

50

What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)

● Computation of HOG descriptors:

1. Compute gradients2. Compute histograms on cells3. Normalize histograms4. Concatenate histograms

● Requires a lot of engineering● Must build ensembles of feature descriptors

1990s

● Significant interaction with computer graphics (rendering, morphing, stitching)

● Approaches using statistical learning● Eigenface (Ghostfaces) through principal component

analysis (PCA)

51


52

What is computer visionTraditional approaches - deformable parts model (DPM)

● Objects constructed by its parts● First match whole object, then refine on the parts● HOG + part-based + modern features ● Slow but good at difficult objects● Involves many heuristics

53

What is computer visionFeatures

● Feature points○ Small area of pixels with certain properties

● Feature detection○ Use features for identification○ Activate if “object” present

● Examples:○ Lines, edges, colours, blobs, …○ Animals, faces, cars, ...

54

What is computer visionTraditional approaches - classical recognition

● Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels

● Inference: extract features from query image and find closest match in database or train a classifier

● Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches

55


Before the new era

● Bags of features● Handcrafted ensembles

Input Feat. 2

Feat. 1

Feat. n

FinalDecision

Feature Extraction

The new era of computer vision

56

● Elementary building block

● Inspired by biological neurons

● Mathematical function y=f(wx+b)

● Learnable weights

57

Artificial neural networksFundamentals - artificial neuron

● Collection of neurons organized in layers

● Universal approximators

● Fully-connected network here

58

Artificial neural networksFundamentals - artificial neural networks

59

Artificial neural networksFundamentals - training

● Basically an optimization problem

● Find minimum of a loss function by an iterative process (training)

● Designing the loss function is sometimes tricky

60

Artificial neural networksFundamentals - training

Simple optimizer algorithm:

1. Forward pass with a batch of data2. Calculate error between actual and wanted output3. Nudge weights in proportion to error into the right

direction (same data would result in smaller error)4. Repeat until convergence

61

Artificial neural networksFundamentals - CNN

● Local neighborhood contributes to activation

● Exploit spatial information

● Hierarchical feature extractors

● Less parameters input

activation

filters

receptive field

62

Artificial neural networksFundamentals - CNN

● Filter of size 3x3 applied to an input of 7x7

63

Artificial neural networksFundamentals - pooling

● Max-pooling● Dimension reduction/adaption● Existence is more important than location

64

Artificial neural networksFundamentals - pooling

● Zero-padding● Controlling dimensions

65

Artificial neural networksFundamentals - general network architecture

Input image

convolutional layers

... Final decision

66

Artificial neural networksFundamentals - hierarchical feature extractors

Lines, edges, blobs, colours, ...

Abstract objectsParts of abstract objects

First layers Deeper layers

Activations for:

Modern history of object recognition

67

● Classification and detection○ 27k images○ 20 classes

■ person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor

68

BenchmarkDatasets - PASCAL VOC

● Challenges on a subset of ImageNet○ 14kk labeled images○ 20k object categories

● ILSVRC* usually on 10k categories including 90 out of 120 dog breeds

69

BenchmarkDatasets - ImageNet

*ImageNet Large Scale Visual Recognition Challenge

● ILSVRC 2012 winner by a large margin from 25% to 16%● Proved effectiveness of CNNs and kicked of a new era● 8 layers, 650k neurons, 60kk parameters

70

Artificial neural networksRoadmap - AlexNet

● ILSVRC 2013 winner with a best top-5 error of 11.6%● AlexNet but using smaller 7x7 kernels to keep more

information in deeper layers

71

Artificial neural networksRoadmap - ZFNet

● ILSVRC 2013 localization winner● Uses AlexNet on multi-scale input images with sliding

window approach● Accumulates bounding boxes for final detection (instead

of non-max suppression)

72

Artificial neural networksRoadmap - OverFeat

● 2k proposals generated by selective search● SVM trained for classification● Multi-stage pipeline

73

Artificial neural networksRoadmap - RCNN (region based CNN)

● Not a winner but famous due to simplicity and effectiveness

● Replace large-kernel convolutions by stacking several small-kernel convolutions

74

Artificial neural networksRoadmap - VGGNet

● ILSVRC 2014 winner● Stacks up “inception” modules● 22 layers, 5kk parameters

75

Artificial neural networksRoadmap - InceptionNet (GoogleNet)

● Jointly learns region proposal and detection● Employs a region of interest (RoI) that allows to reuse

the computations

76

Artificial neural networksRoadmap - Fast RCNN

● Directly predicts all objects and classes in one shot● Very fast● Processes images at ~40 FPS on a Titan X GPU● First real-time state-of-the-art detector● Divides input images into multiple grid cells which are

then classified

77

Artificial neural networksRoadmap - YOLO (you only look once)

● ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%)

● Employs residual blocks which allows to build deep networks (hundreds of layers)

● Additional identity mapping

78

Artificial neural networksRoadmap - ResNet (Microsoft)

● Not a recognition network● A region proposal network● Popularized prior/anchor boxes (found through

clustering) to predict offsets● Much better strategy than starting the predictions with

random coordinates● Since then heuristic approaches have been gradually

fading out and replaced

79

Artificial neural networksRoadmap - MultiBox

● Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox

● RPN shares full-image convolutional features with the detection network (cost-free region proposal)

● RPN uses “attention” mechanism to tell where to look● ~5 FPS on a Titan K40 GPU● End-to-end training

80

Artificial neural networksRoadmap - Faster RCNN

● SSD leverages the Faster RCNN’s RPN to directly classify objects inside each prior box (similar to YOLO)

● Predicts category scores and box offsets for a fixed set of default bounding boxes

● Fixes the predefined grid cells used in YOLO by using multiple aspect ratios

● Produces predictions of different scales● ~59 FPS

81

Artificial neural networksRoadmap - SSD (single shot multibox detector)

● Open-source software library for machine learning applications

● Tensorflow Object Detection API○ A collection of pretrained models○ construct, train and deploy object detection models

82

Artificial neural networksTensorFlow object detection API

https://github.com/tensorflow/models/tree/master/research/object_detection

Summary

83

● Humans are good at understanding the big picture● Neural networks are good at details● But they can be fooled...

84

SummaryHuman vs machine

● Need a large amount data● Lots of engineering● Trial and error● Long training time● Still lots of hyperparameter parameter tuning● No general network (generalization not answered)● Little mathematical foundation

85

SummaryComputer vision is still difficult

● Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized

86

SummaryComputer vision is hard

Thank You

Stanislav Frolov

Big Data Engineer

[email protected]

0173 318 11 35

inovex GmbH

Lindberghstraße 3

80939 München

computer vision...computer vision from traditional approaches to deep neural networks stanislav...

Documents