maxim kamensky - applying image matching algorithms to video recognition and autonomous robot...

Applying image matching algorithms to video recognition and

autonomous robot navigation

Maxim Kamensky, CEO, Invarivision Dmitriy Yeremeyev, CTO, Invarivision

EECVC presentation, July 9 2016

Image matching algorithmsFeatures selection

● well explored technology● able to find partially closed images● find rotated images● works slowly● recognize a little number of objects

Template-based algorithms● work fast● able to store many images● do not cope well with overlapped images● do not recognize rotated images

Featureextraction

Input image

Classification

Objecttype

Featurevector

Template

Input image

Search in template database

Objecttype

Keywords: SURF, SIFT, ConvNet, etc Keywords: BiGG

http://www.ros.org/presentations/2010-08-Marius-Object-Recognition.pdf

AVM - Associative Video Memory

Templates - recognition matrices3x3

7x7

15x15

31x31

Associative treeRoot base

Base 1L1 Base 2L1 Base nL1

Base 1Lm Base 1Lm Base 1Lm

Level 1

Level m

Associative base

Recognition matrix Associated data

ImageRead / Write operation

Associated data

Template-based image matching algorithm

Technique of AVM testingAlso we have tested algorithm AVM on images of "Amsterdam Library of Object Images" (ALOI).

ALOI database have several expressions of the same object. So, it allows to compare how well the algorithm recognizes different expressions of the same object against how well the algorithm discriminates different objects. To calculate this we perform step listed below:

Separate database to training and test parts:● Each object is off-plane rotated every 5 degrees;● Training part: rotations 0, 10, 20, 30, ... 350; 36 expressions at all;● Testing part: rotations 5, 15, 25, ... 355; 36 expressions at all;● Do this for N objects from the dataset.

So for every object we have separate AVM with 36 learned object expressions:● Match each model against each image of the test part of the database;● Take the model maximal similarity response for a test image;● If a model and a test image are of the same object this is genuine (same) matching pair;● If a model and a test image are of different objects this is impostor (different) matching pair;● Now we have N * 36 genuine matching pairs and N * ((N – 1) * 36) impostor matching pairs.

We draw special kind of ROC graph called Decision Error Tradeoff (DET) graph:● On X axis we have False Acceptance Rate (FAR) or False Positive Rate (FPR);● On Y axis we have False Recjection Rate (FRR) or False Negative Rate (FNR);● Both axis are logarithmic;

Create models from the training part:● Each model is in the separate AVM;● Add 36 training expressions to the AVM with 80x80 key image size for instance;

http://aloi.science.uva.nl/

Christmas bear, © Amsterdam Library of Object Images

AVM performanceTime performance - average time of processing each image (in ms)Tree capacity - total number of images in the tree (Intel® Xeon® CPU L5630 @ 2.13GHz)

Object search in image

Object training (write)

Sliding window (read).Scan step is ⅛ of window size.Window size is scaled up by 25% on each step Window position is adjusted by AVM

Result : object id, x, y, scale

Autonomous navigation of robots in indoor spacesNavigation module based on AVM technology allows the robot to orientate in a space and navigate precisely to a defined point on the map.

Images

AVM search treeWebcam

Recognized?

Actual position

* X, Y coordinates* azimuth

*Pairs: image -> X,Y and azimuth

Yes

In our case the visual navigation for robot is just sequence of images with associated coordinates that was memorized inside AVM tree.

Using of AVM in robotics

Object trackingFollow me

http://www.youtube.com/watch?v=HTxNlOpm11U

http://www.youtube.com/watch?v=ueqDhuHiR-E

Augmented reality by AVM

3D marker of target position

http://www.youtube.com/watch?v=CjRe7Kd7ZSw

Implementation for Roborealm - AVM NavigatorAVM Navigator is an module of the RoboRealm system that provides object recognition and autonomous robot navigation using a single video camera on the robot as the main sensor for navigation.

Localization errorThe localization errors is about 0.1 meter (10 centimeters).

http://www.roborealm.com/

http://www.youtube.com/watch?v=F3u0rTNBCuA

Quake3 robot simulator mod

Navigating outdoors

Route recognitionRoute training

http://www.youtube.com/watch?v=Yi7jgELFr-A

http://www.youtube.com/watch?v=HuIXI3VmYr4

Image matching in video processing

Automatic searching of video fragmentsFilm Frame

Image

s-core #1AVM search tree

S-core cluster

Database

Film ID Position

MultiTrack - assembling module

s-core #2AVM search tree

s-core #NAVM search tree

Video fragment #1

Film name, position, length

Video fragment #2


Video fragment #M


MultiTrack - assembling of duplicates

Fragment #1 Unknown video Fragment #2 Fragment #3 Fragment #4

Scanned video

Source fragment #1

Duplicate video #1.1



Source fragment #2



Source fragment #3 Source fragment #4


Search results

Distributed system

Customer system

REST API

Invarivision - ISS Base server

* Task management* Database* s-core* s-coordinator

Node server #1

* s-core* s-coordinator

Node server #2


Node server #N


All these servers can contain applications for video processing and image recognition.

s-coordinator - application for coordination of video processing.

s-core - application for reading/writing of the separate images in the search tree.

Software structure

Video

Database

s-coordinator s-core #1

s-core #2

s-core #N

EthernetUDP Multicast

Image*Film ID*Position

Frame change detector

Write operation

Read/search operation

12%

Scaling of the search system

s-core #1,1 s-core #2,1 s-core #N,1

s-core #1,2 s-core #2,2 s-core #N,2

s-core #1,M s-core #2,M s-core #N,M

Write speed and capacity

Rea

d sp

eed

Network of the search cores

Computer cluster

Scheme of the video write

1 2 3 4 5 6 7 8 9 ...

Video frames

3x3

s-core #1,1 s-core #2,1 s-core #3,1



Scheme of the video read

12

34

56

78

9...

Vid

eo fr

ames

3x3




Tree splitting

Scaling of capacity

Tree #1

Tree #1.1 Tree #1.2

Capacity alignment Adding video for searching

Alignment system Images

Tree #1

Tree #2

Tree #N

Next stage

Tree #1

Tree #2

Tree #N

Storage capacity

RAM 6.2Mb → 1 hour of video(with FCD 12%)

Server with 256Gb RAM → 41290 hours of source video for searching

Using SSD disk as a swap space1.4TB → 225806 hours

Speed with FCD set to 12% of frames on 1 base server Dual Xeon 2xE5690 (3.47 GHz) is about 50 video hours per hour.

Interference resistance

Worst quality - CRF 51

Padded corner 15%

Padded center 10%

White noise 100% 10 degrees rotated

Padded center 5% Grayscale

Padded corner 5% Padded corner 10%

White noise 50%

Cropped from center 15%

5 degrees rotated

Test resultsData set Average precision

%Average recall

%Average F-measure

%

5 degrees rotated* 100 93.82 % 96.81 %

10 degrees rotated* 100 18.54 % 31.28 %

White noise 50%* 100 98.09 % 99.03 %

White noise 100%* 100 93.9 % 96.85 %

Padded center 5%* 100 97.04 % 98.5 %

Padded center 10%* 100 48.8 % 65.59 %

Padded corner 5%* 100 97.84 % 98.91 %

Padded corner 10%* 100 89.89 % 94.68 %

Padded corner 15%* 100 41.12 % 58.28 %

Cropped from center 10%* 100 96.54 % 98.24 %

Cropped from center 15%* 100 67.44 % 80.55 %

Constant Rate Factor 51* 100 96.53 % 98.23 %

Grayscale* 100 97.5 % 98.73 %

For each scanned interval in video we can define one of the following situations:

● True Positive (TP) — system found correct matching original interval

● False Positive (FP) — system found incorrect matching original interval

● False Negative (FN) — system didn’t find matching original interval, but it does exist

Thank you for your attention!

Questions?Site: Invarivision.comEmail: [email protected]: maxim.kamenskyPhone: +380662346738

EECVC presentation, July 9 2016

mailto:[email protected]

maxim kamensky - applying image matching algorithms to video recognition and autonomous robot...

Technology