computer vision an overview - public.ostfalia.de

14.5.2019Computer Vision – An OverviewJonny Karlsson B. Eng. (IT), PhD

Degree Programme Director / Senior Lecturer in Information Technologhy

arcada.fi #arcadauas

ArcadaUniversity ofAppliedSciences

Arcada University of Applied SciencesCampus in Arabianranta, Helsinki

• A modern university of appliedsciences in Helsinki, Finland

• 2700 students

• Modern campus withinternational atmosphere;

• more than 40 nationalitiesare represented

• more than 10% of all students come from abroad

• Business Administration

• Cultural Management

• Media

• Environmental and

Energy Engineering

• Materials Processing Technology

• Information Technology• Sports and Health Promotion

• Occupational Therapy

• Physiotherapy

• Social Services

• Emergency Care

• Public Health

• Midwifery

• Nursing

BACHELOR’S DEGREE PROGRAMMES

IN SWEDISH

• International Business

• Materials Processing

Technology

• Nursing

BACHELOR’S DEGREE PROGRAMMES

IN ENGLISH

MASTER’S DEGREE PROGRAMMES

• Big Data Analytics

• International Business

Management

• Media Management

• Global Health Care

• Rehabilitation (in Swedish)

• Advanced Clinical Care

(in Swedish)

• Health promotion (in Swedish)

• Social Services (in Swedish)


Whatwillwecover today?

1. What is Computer Vision?

2. Low-level vision

– Smoothing (removing noise)

– Sobel edge detection

3. Middle-level vision

– Separating foreground objects from the background

4. High-level vision

– Face/object recognition with the Viola-Jones algorithm

• Input: Image or video sequence

• Output: Some description/interpretation of the image or video– Many different possible levels of description/interpretation.

Whatis Computer Vision?


Computer Vision Applications

• Security– Face recognition

• Medical imaging– Reconstruction/visualization of inner body

– Diagnostisation

• Autonomous cars

• E-Rehabilitation


Computer Vision Activitiesat Arcada

• Teaching

– A course in computer vision for 3rd year IT students (bachelor)

– Exchange students from Ostfalia have participated in planning the course content!

Nils-Peter Töpfer (7.2.2017). Development of a modern course for the information

analytics curriculum: Programming exercises for image processing and computer vision

algorithms

Karola Tabea Isensee (7.2.2017). Development of a modern course for the information

analytics curriculum: Theory and state-of-the-art analysis of computer vision



• A student project from the computer vision course



• Research

– Computer vision based real-time motion analysis in health and well-being.

Elements ofComputer Vision


Three Stagesin Computer Vision

1. Low-level

– Input: Image Output: Image

2. Middle-level

– Input: Image Output: Features

3. High-level

– Input: Image Output: Recognition/Interpret.


Looks like an edge!

1. Low-levelVision

Considers local image properties


1. Low-levelVision: Example

Blur (smoothing) filter

Sharpening filter


1. Low-levelVision: Example

Edge detection filter


2. Middle-levelVision

Segmentation and the grouping of pixels/features

Foreground object separated from background


Low-to-MiddleExample

Edge Detection

Object Recognition

Middle-level

original edge image

Circular arcs, line

segments, corners

Structures in the data


3. High-levelVision

Interpretation/Recognition

It’s a chair!


High-to-Low-levelVision: Example

edge image

consistentlines and corners

Low-level

Middle-level

High-level

Building Recognition


Low-levelVision: Whatis Image Filtering?

• Some common image filtering techniques:

– Low-pass filters (smoothing)

– Edge detection


Low-levelVision: Howdoesfilteringwork?

• Let’s first make sure we all know what a digital image is


Whatis a digital image??

• Can be seen as a matrix of pixels

• Each pixel represents a color value

• In a grayscale image the color value– 0 means black

– 255 means white


Digital Images: Color Channels

• In a color image each pixel represents a color value for threedifferent color channels:

– (R)ed: 0255

– (G)reen: 0255

– (B)blue: 0 255


Digital Images: Color Channels

Computer Vision algorithms typically operate on grayscaleimages

Easier!!


Low-levelVision: Image Filtering

• Modifies pixels by applying an operator to a localised

neighbourhood of pixels

• So the output pixel value (8) depends upon the corresponding input

pixel (7) and its neighbours

7 04

1 31

5 36

Local image

8

Modified image

Some operator


Image Filtering–Pixel Mask / Kernel

• Splits an image into smaller sub-images for processing

• Operations only apply to masked pixels

• Operations are bit-wise

• A pixelmask/kernel is typically a square (3x3, 5x5..) with a center

pixel


Image Filtering–ExamplePixel Mask

Mask #1

00p 01p

10p

20p

11p

21p

02p

12p

22p

Mask #N

y

x

Mask moves one pixel right


Image Filtering–Bit-wiseOperations

37 33

61 65

66

62

6862

120

35

28

123

54

23

23

23

84 107

77

107

74

02

03

88

94

77

161 0 1

1

1

0

0 1

1

Mask Weights

9908

114

17

56

83

07 136

15 76

44 7308

3240

Input

i j

jyixhjifyxhyxfyxg ,,,,,

86

Output

Dot Product

X+


Image Filtering

• As a result of filtering, new pixels are obtained by a dot product

(sum of bit-wise multiplies) over the mask

• Changing the mask coefficients (the numbers in the mask) gives

different filter functions


Simple Filter Operations

000

010

000

Original

?

3x3 mask



000

010

000

Original 3x3 mask Filtered (no change)



000

100

000

Original

?

3x3 mask



000

100

000

Original 3x3 mask Shifted left 1 pixel


Low-Pass Filtering(Smoothing)

Averaging Filter

Original

111

111

111

Blurred (effect of

averaging)3x3 mask


Low-Pass Filtering: AveragingFilter

37 33

61 65

66

62

6862

120

35

28

123

54

23

23

23

84 107

77

107

74

02

03

88

94

77

161 1 1

1

1

1

1 1

1

Mask

9908

114

17

56

83

07 136

15 76

44 7308

3240

Input

63

OutputSum of Bitwise X / 9


Averagingfilter –Simple Example

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

• Lets apply an averaging filter on the following

Image!

111

111

111



1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

111

111

111

1 1 100 1 1

1 34 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1



1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

111

111

111

1 1 100 1 1

1 34 34 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1



1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

111

111

111

1 1 100 1 1

1 34 34 34 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1



1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

1 1 100 1 1

111

111

111

1 34 34 34 1

1 34 34 34 1

1 34 34 34 1

1 34 34 34 1

1 34 34 34 1

Finally, after applying the mask on

the whole image!


Image Filtering: EdgeDetection

1 0 -1

2 0 -2

1 0 -1

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

*

Input image

Vertical Sobel mask/filter

• The Sobel filter is well known in computer vision for recreating images

emphasising the edges



1 0 -1

2 0 -2

1 0 -1

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

*

Input imageVertical Sobel mask

Output image (edge map)

70 70 70 70 70 10 10 10

70 0 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10



1 0 -1

2 0 -2

1 0 -1

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

*



70 70 70 70 70 10 10 10

70 0 0 0 240 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10



1 0 -1

2 0 -2

1 0 -1

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

*



70 70 70 70 70 10 10 10

70 0 0 0 240 240 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10



1 0 -1

2 0 -2

1 0 -1

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

*



70 70 70 70 70 10 10 10

70 0 0 0 240 240 0 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10



1 0 -1

2 0 -2

1 0 -1

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

70 70 70 70 70 10 10 10

*



0 0 0 0 240 240 0 0

0 0 0 0 240 240 0 0

0 0 0 0 240 240 0 0

0 0 0 0 240 240 0 0

0 0 0 0 240 240 0 0



1 2 1

0 0 -0

-1 -2 -1

Horizontal Sobel mask/filter

• Note that to be able to find horizontal masks we need to apply a

horizontal sobel mask/filter



• So, the result of the edge detection process is a so called “binary

image” where the edges have been highlighted

Edge Detection

original edge image / edge map


So far wehavecovered

1. Low-level


2. Middle-level


3. High-level



Middle-levelVision: Separation offoregroundobjectsfrom the background

• We will not go in to this in detail, but in short:

Object Recognition Circular

arcs, line segments,

cornersStructures in the data



1. Low-level


2. Middle-level


3. High-level



High-levelVision example: Face detection

• The Viola Jones algorithm is a classic and widely used algorithm for

face detection but also for detecting other different types of objects

• Viola-Jones consists of 4 different ”elements”:

Haar features

Integral image

Adaboost

Cascading


The Viola-Jones AlgorithmHaar features | Integral image | Adaboost | Cascading



• The problem with Haar featrues is that we need to

calculate the average of a given region multiple times

=> High complexity! ( O(N2) )

• Why?

We have to use Haar features with all possible

sizes and locations

Same mask, but different sizes!!



• In an integral image the value at pixel (x, y) is the sum

of pixels above and to the left of (x, y)

1 1 1

1 1 1

1 1 1

1 2 3

2 4 6

3 6 9

Input image Integral image



• Much lower complexity!! ( O(1) )



• With a suitable scale of the Haar line feature placed as

in the image below we would get a quite high match!

• The problem is that with the same Haar feature we

would also get matches in other places of the image!



• One feature alone, or in other words, a single classifier

is not accurate enought!

Called a weak classifier

The idea of Haar-cascading is to combine a series

of weak classifiers (those who are barely better

than ”random guessing”) to achieve a strong

classifier!



• Each of the figures below represent a general feature

of a face (Believe it or not!)



• If we combine all these features toghether… see the

point?



• Ok, this is a simplified example, but this combination of Haar features is

unlikely to be found elsewhere than in a face!



• We want to find a series of weak classifiers that all matches in the reagion

of a face if you pass them in a certain order.

• If any classifier fails, then everyting fails! => not a face!

Haar Feature 1 Haar Feature 2 Haar Feature 3Pass Pass

Fail Fail Fail



• How do we combine a series of weak classifiers into a strong

classifier?

Adaboost tries out multiple weak classifiers over several rounds

The best weak classifier in each round is selected

Finally the best weak classifiers are combined



• In other words the Viola-Jones face detection algorithm is based on

machine learning!

Before we can detect a face we need to train the algorithm to find the

”strong classifier”

The training phase is done by testing all the weak classifiers on a set of

both positive (images with containg a face) and negative (images not

containing a face) training data.


The Viola-Jones Algorithm: Summary



1. Low-level


2. Middle-level


3. High-level



Nowyouknowall aboutComputer Vision!


No youdon’t!!!!• BUT hopefully you have got an insight into the different levels of computer

vision and you have seen some examples of operations at each level:

Low-level vision

o Blurring, sharpening, edge detection

Middle-level vision

o Extracting features / separating foreground objects from the background

High-level vision

o Recognition and interpretation, e.g. face detection


Summary

• Typically computer vision software involve all 3 levels, for example:

1. Apply sharpening filter for strengthening the edges and Sobel edge

detection filter for emphasising the edges

2. Extract the foreground objects from the background, e.g. find contours,

lines, corners etc. Output typically a binary image (such as an edge

map)

3. On the binary image apply some algorithms for performing

interpretation of the image or recognizing a specific object in the image


Thank you for listening!

[email protected]

computer vision an overview - public.ostfalia.de

Documents