shape recognition and pose estimation for mobile augmented reality author ： n. hagbi, j. el-sana,...

Shape Recognition and Shape Recognition and Pose Estimation for Pose Estimation for Mobile Augmented Mobile Augmented RealityRealityAuthorAuthor：： N. Hagbi, J. El-Sana, O. Bergig, and M. BillinghurstN. Hagbi, J. El-Sana, O. Bergig, and M. Billinghurst

DateDate：： 2012-04-172012-04-17

SpeakerSpeaker：： Sian-Lin HongSian-Lin Hong

IEEE Transactions On Visualization And Computer Graphics, IEEE Transactions On Visualization And Computer Graphics, Vol. 17, No. 10, pp. 1369 - 1379, October 2011.Vol. 17, No. 10, pp. 1369 - 1379, October 2011.

OutlineOutline

1.1.IntroductionIntroduction

2.2.Related WorkRelated Work

3.3.NestorNestor

4.4.Contextual Shape LearningContextual Shape Learning

5.5.Experimental ResultsExperimental Results

2

1. Introduction (1/5)1. Introduction (1/5)1.1.Model-Based visual tracking has become Model-Based visual tracking has become

increasingly attractive in recent years in many increasingly attractive in recent years in many domainsdomains

2.2.Visual tracking is often combined with object Visual tracking is often combined with object recognition tasksrecognition tasks

3.3.In AR applications, model-based recognition In AR applications, model-based recognition and 3D pose estimation are often used for and 3D pose estimation are often used for superposing computer-generated images over superposing computer-generated images over views of the real world in real timeviews of the real world in real time

3

1. Introduction (2/5)1. Introduction (2/5)

1.1.Fiducial-based computer vision registration is Fiducial-based computer vision registration is popular in AR applications due to the simplicity popular in AR applications due to the simplicity and robustness it offersand robustness it offers

2.2.Fiducials are of predefined shape, and Fiducials are of predefined shape, and commonly include a unique pattern for commonly include a unique pattern for identificationidentification

4

1. Introduction (3/5)1. Introduction (3/5)1.1.Natural Feature Tracking (NFT) methods are Natural Feature Tracking (NFT) methods are

becoming more common, as they are less becoming more common, as they are less obtrusive and do not require to modify the sceneobtrusive and do not require to modify the scene

2.2.This paper describe a recognition and pose This paper describe a recognition and pose estimation approach that is unobtrusive for estimation approach that is unobtrusive for various applications, and still maintains the high various applications, and still maintains the high levels of accuracy and robustness offered by levels of accuracy and robustness offered by fiducial markersfiducial markers

3.3.We recognize and track shape contours by We recognize and track shape contours by analyzing their structureanalyzing their structure

5

1. Introduction (4/5)1. Introduction (4/5)1.1.When a learned shape is recognized at runtime, When a learned shape is recognized at runtime,

its pose is estimated in each frame and its pose is estimated in each frame and augmentation can takeaugmentation can take

6

1. Introduction (5/5)1. Introduction (5/5)1.1.Virtual content can be automatically assigned Virtual content can be automatically assigned

to new shapes according to a shape class to new shapes according to a shape class librarylibrary

2.2.When learning a new shape, the system can When learning a new shape, the system can classify it to one of the predefined shape classify it to one of the predefined shape classesclasses

3.3.Which define the default virtual content that Which define the default virtual content that should be automatically assigned to itshould be automatically assigned to it

7

2. Related work (1/2)2. Related work (1/2)1.1.Object recognition and pose estimation are two Object recognition and pose estimation are two

central tasks in computer vision and Augmented central tasks in computer vision and Augmented RealityReality

2.2.Object recognition methods aim to identify Object recognition methods aim to identify objects in images according to their known objects in images according to their known descriptiondescription

3.3.The cores of AR applications are based on The cores of AR applications are based on recognition and pose estimation to allow the recognition and pose estimation to allow the appropriate virtual content to be registered and appropriate virtual content to be registered and augmented onto the real worldaugmented onto the real world

8

1.1.Fiducial-based registration methods have been Fiducial-based registration methods have been used from the early days of ARused from the early days of AR

2.2.The frame is first used for rectification of the The frame is first used for rectification of the pattern inside of itpattern inside of it

3.3.ARToolKit locates a square frame in the image ARToolKit locates a square frame in the image and calculates its poseand calculates its pose

2. Related work (2/2)2. Related work (2/2)

9

3. Nestor (1/9)3. Nestor (1/9)1.1.Nestor is a recognition and 3D pose tracking Nestor is a recognition and 3D pose tracking

system for planar shapessystem for planar shapes

2.2.The main goal of Nestor is The main goal of Nestor is

• To serve as a registration solution for AR To serve as a registration solution for AR applications, which allows augmenting shapes applications, which allows augmenting shapes with 3D virtual contentwith 3D virtual content

3.3.Nestor can be used to augment shapes that have Nestor can be used to augment shapes that have visual meanings to humans with 3D models visual meanings to humans with 3D models having contextual correspondence to themhaving contextual correspondence to them

10

3. Nestor (2/9)3. Nestor (2/9)

1.1.Features extracted from each concavity are Features extracted from each concavity are then used to generate a first estimate for the then used to generate a first estimate for the homography between each hypothesized homography between each hypothesized library shape and the image shapelibrary shape and the image shape

2.2.We calculate an estimate of the homography We calculate an estimate of the homography between the image and library shapes using between the image and library shapes using features from all concavitiesfeatures from all concavities

11

3. Nestor (3/9)3. Nestor (3/9)1.1.Begin the processing of each frame by extracting Begin the processing of each frame by extracting

the contours of visible shapesthe contours of visible shapes

2.2.We generally assume the shapes are highly We generally assume the shapes are highly contrasted from their background and take a contrasted from their background and take a thresholding-based approachthresholding-based approach

3.3.Apply adaptive thresholding to the image using Apply adaptive thresholding to the image using integral imagesintegral images

• That a window of size 8*8 usually gives the That a window of size 8*8 usually gives the most pleasing resultsmost pleasing results

12

3. Nestor (4/9)3. Nestor (4/9)1.1.The contour of each image shape is then The contour of each image shape is then

extracted by straightforward sequential edge extracted by straightforward sequential edge linking as an ordered list of points linking as an ordered list of points

2.2.Check for the convexity of contours and drop Check for the convexity of contours and drop ones that are convex or close to convex ones that are convex or close to convex

3.3.Finally apply median filtering to each contour Finally apply median filtering to each contour and get smooth contoursand get smooth contours

13

C I =(p1, p2....pn)

3. Nestor (5/9)3. Nestor (5/9)1.1. We use a construction which is based on the bitangent lines to We use a construction which is based on the bitangent lines to

the contour, illustrated in Fig. 2athe contour, illustrated in Fig. 2a

2.2. Each bitangent line l gives two tangency points, Pa and Pb, which Each bitangent line l gives two tangency points, Pa and Pb, which segment a concavity from the rest of the curve, known as the M-segment a concavity from the rest of the curve, known as the M-curvecurve

14

3. Nestor (6/9)3. Nestor (6/9)1.1. The occluded shape may thus contain concavities that point The occluded shape may thus contain concavities that point

to different library shapesto different library shapes

2.2. Since we are tracking recursively on a frame-to-frame basis, Since we are tracking recursively on a frame-to-frame basis, a shape can be tracked from previous framesa shape can be tracked from previous frames

15

3. Nestor (7/9)3. Nestor (7/9)1.1.The system maintains a shape library that The system maintains a shape library that

contains the shapes learned so farcontains the shapes learned so far

2.2.The system can load a directory of shape files The system can load a directory of shape files and learn themand learn them

3.3.User can also teach the system new shapes at User can also teach the system new shapes at runtimeruntime

16

3. Nestor (8/9)3. Nestor (8/9)1.1.When teaching the system a new shape, the When teaching the system a new shape, the

image goes through the same recognition step image goes through the same recognition step described in the Shape Recognition Section, described in the Shape Recognition Section, and its signatures are hashedand its signatures are hashed

2.2.The curve, its signatures, and additional The curve, its signatures, and additional required information are stored in the shape required information are stored in the shape librarylibrary

3.3.Once the shape is found, it is moved into the Once the shape is found, it is moved into the visible shape listvisible shape list

17

3. Nestor (9/9)3. Nestor (9/9)1.1.The shape list is searched for each shape once The shape list is searched for each shape once

per execution, when the shape first appearsper execution, when the shape first appears

2.2.This strategy can be useful when This strategy can be useful when

• Only a few shapes are visible in a single Only a few shapes are visible in a single frameframe

• Only a small number of shapes are used Only a small number of shapes are used through a single executionthrough a single execution

18

4. Contextual shape 4. Contextual shape learning (1/4)learning (1/4)1.1.Previously, to teach the system a new shape, the user Previously, to teach the system a new shape, the user

had tohad to

• Show it frontally to the camera Show it frontally to the camera

• explicitly assign a model to itexplicitly assign a model to it

2.2.To learn an unknown shape appearing in the image, To learn an unknown shape appearing in the image, upon user request, we automatically perform upon user request, we automatically perform rectification according to the rectifying transformation rectification according to the rectifying transformation recovered from a tracked shape that lies in the same recovered from a tracked shape that lies in the same planeplane

19

4. Contextual shape 4. Contextual shape learning (2/4)learning (2/4)1.1.The nearest tracked shape NC to the new The nearest tracked shape NC to the new

shape C is found according to the shapes’ shape C is found according to the shapes’ centroidscentroids

2.2.This projects C to the image plane outside of This projects C to the image plane outside of the image bounds and to a scale that depends the image bounds and to a scale that depends on its location relative to C in the real worldon its location relative to C in the real world

3.3.We finally centralize the rectified contour of CWe finally centralize the rectified contour of C

20

4. Contextual shape 4. Contextual shape learning (3/4)learning (3/4)

21

4. Contextual shape 4. Contextual shape learning (4/4)learning (4/4)

22

ª›®œ•Œ QuickTime˛ ©M°ßH.264°®∏—¿£¡Yæπ

®”¿Àµ¯¶ππœµe°C

5. Experimental results 5. Experimental results (1/6)(1/6)1.1.We benchmarked and tested Nestor on a Nokia N95 We benchmarked and tested Nestor on a Nokia N95

mobile phone and a Dell Latitude D630 notebook mobile phone and a Dell Latitude D630 notebook computer computer

2.2.The Nokia N95 The Nokia N95

• 330 MHz processor330 MHz processor

• camera that captures 320camera that captures 320＊＊ 240 pixel images 240 pixel images

3.3.The Dell notebook The Dell notebook

• 2.19 GHz processor2.19 GHz processor

• webcam that provides 640webcam that provides 640＊＊ 480 pixel images480 pixel images

23

5. Experimental results 5. Experimental results (2/6)(2/6)1.1.We measured the relation between the number We measured the relation between the number

of tracked shapes in each frame and per-frame of tracked shapes in each frame and per-frame tracking timetracking time

24

5. Experimental results 5. Experimental results (3/6)(3/6)1.1.To assess this relation, we measured the To assess this relation, we measured the

recognition rate of the system with different recognition rate of the system with different shape library sizes and slantsshape library sizes and slants

25

5. Experimental results 5. Experimental results (4/6)(4/6)1.1.The experiment was performed using the The experiment was performed using the

notebook configurationnotebook configuration

2.2.The camera was fixed approximately 40 cm The camera was fixed approximately 40 cm from the shapesfrom the shapes

3.3.For each library size, the recognition rate was For each library size, the recognition rate was tested on all of the shapes in the librarytested on all of the shapes in the library

26

5. Experimental results 5. Experimental results (5/6)(5/6)1.1. We also measured the reprojection error for different distances of the camera We also measured the reprojection error for different distances of the camera

from imaged shapesfrom imaged shapes

2.2. For each library shape and ARToolkit fiducial, 50 randomly sampled points in For each library shape and ARToolkit fiducial, 50 randomly sampled points in the area of the shape/fiducial were checked using a random transformation the area of the shape/fiducial were checked using a random transformation synthesizersynthesizer

27

5. Experimental results 5. Experimental results (6/6)(6/6)

1.1.

28

shape recognition and pose estimation for mobile augmented reality author ： n. hagbi, j. el-sana,...

Documents

shape recognition

d pose estimation

modelbased recognition

learned shape

d pose tracking system

hypothesized library

track shape contours

predefined shape classeswhich