shape recognition and pose estimation for mobile augmented reality author : n. hagbi, j. el-sana,...
TRANSCRIPT
Shape Recognition and Shape Recognition and Pose Estimation for Pose Estimation for Mobile Augmented Mobile Augmented RealityRealityAuthorAuthor:: N. Hagbi, J. El-Sana, O. Bergig, and M. BillinghurstN. Hagbi, J. El-Sana, O. Bergig, and M. Billinghurst
DateDate:: 2012-04-172012-04-17
SpeakerSpeaker:: Sian-Lin HongSian-Lin Hong
IEEE Transactions On Visualization And Computer Graphics, IEEE Transactions On Visualization And Computer Graphics, Vol. 17, No. 10, pp. 1369 - 1379, October 2011.Vol. 17, No. 10, pp. 1369 - 1379, October 2011.
OutlineOutline
1.1.IntroductionIntroduction
2.2.Related WorkRelated Work
3.3.NestorNestor
4.4.Contextual Shape LearningContextual Shape Learning
5.5.Experimental ResultsExperimental Results
2
1. Introduction (1/5)1. Introduction (1/5)1.1.Model-Based visual tracking has become Model-Based visual tracking has become
increasingly attractive in recent years in many increasingly attractive in recent years in many domainsdomains
2.2.Visual tracking is often combined with object Visual tracking is often combined with object recognition tasksrecognition tasks
3.3.In AR applications, model-based recognition In AR applications, model-based recognition and 3D pose estimation are often used for and 3D pose estimation are often used for superposing computer-generated images over superposing computer-generated images over views of the real world in real timeviews of the real world in real time
3
1. Introduction (2/5)1. Introduction (2/5)
1.1.Fiducial-based computer vision registration is Fiducial-based computer vision registration is popular in AR applications due to the simplicity popular in AR applications due to the simplicity and robustness it offersand robustness it offers
2.2.Fiducials are of predefined shape, and Fiducials are of predefined shape, and commonly include a unique pattern for commonly include a unique pattern for identificationidentification
4
1. Introduction (3/5)1. Introduction (3/5)1.1.Natural Feature Tracking (NFT) methods are Natural Feature Tracking (NFT) methods are
becoming more common, as they are less becoming more common, as they are less obtrusive and do not require to modify the sceneobtrusive and do not require to modify the scene
2.2.This paper describe a recognition and pose This paper describe a recognition and pose estimation approach that is unobtrusive for estimation approach that is unobtrusive for various applications, and still maintains the high various applications, and still maintains the high levels of accuracy and robustness offered by levels of accuracy and robustness offered by fiducial markersfiducial markers
3.3.We recognize and track shape contours by We recognize and track shape contours by analyzing their structureanalyzing their structure
5
1. Introduction (4/5)1. Introduction (4/5)1.1.When a learned shape is recognized at runtime, When a learned shape is recognized at runtime,
its pose is estimated in each frame and its pose is estimated in each frame and augmentation can takeaugmentation can take
6
1. Introduction (5/5)1. Introduction (5/5)1.1.Virtual content can be automatically assigned Virtual content can be automatically assigned
to new shapes according to a shape class to new shapes according to a shape class librarylibrary
2.2.When learning a new shape, the system can When learning a new shape, the system can classify it to one of the predefined shape classify it to one of the predefined shape classesclasses
3.3.Which define the default virtual content that Which define the default virtual content that should be automatically assigned to itshould be automatically assigned to it
7
2. Related work (1/2)2. Related work (1/2)1.1.Object recognition and pose estimation are two Object recognition and pose estimation are two
central tasks in computer vision and Augmented central tasks in computer vision and Augmented RealityReality
2.2.Object recognition methods aim to identify Object recognition methods aim to identify objects in images according to their known objects in images according to their known descriptiondescription
3.3.The cores of AR applications are based on The cores of AR applications are based on recognition and pose estimation to allow the recognition and pose estimation to allow the appropriate virtual content to be registered and appropriate virtual content to be registered and augmented onto the real worldaugmented onto the real world
8
1.1.Fiducial-based registration methods have been Fiducial-based registration methods have been used from the early days of ARused from the early days of AR
2.2.The frame is first used for rectification of the The frame is first used for rectification of the pattern inside of itpattern inside of it
3.3.ARToolKit locates a square frame in the image ARToolKit locates a square frame in the image and calculates its poseand calculates its pose
2. Related work (2/2)2. Related work (2/2)
9
3. Nestor (1/9)3. Nestor (1/9)1.1.Nestor is a recognition and 3D pose tracking Nestor is a recognition and 3D pose tracking
system for planar shapessystem for planar shapes
2.2.The main goal of Nestor is The main goal of Nestor is
• To serve as a registration solution for AR To serve as a registration solution for AR applications, which allows augmenting shapes applications, which allows augmenting shapes with 3D virtual contentwith 3D virtual content
3.3.Nestor can be used to augment shapes that have Nestor can be used to augment shapes that have visual meanings to humans with 3D models visual meanings to humans with 3D models having contextual correspondence to themhaving contextual correspondence to them
10
3. Nestor (2/9)3. Nestor (2/9)
1.1.Features extracted from each concavity are Features extracted from each concavity are then used to generate a first estimate for the then used to generate a first estimate for the homography between each hypothesized homography between each hypothesized library shape and the image shapelibrary shape and the image shape
2.2.We calculate an estimate of the homography We calculate an estimate of the homography between the image and library shapes using between the image and library shapes using features from all concavitiesfeatures from all concavities
11
3. Nestor (3/9)3. Nestor (3/9)1.1.Begin the processing of each frame by extracting Begin the processing of each frame by extracting
the contours of visible shapesthe contours of visible shapes
2.2.We generally assume the shapes are highly We generally assume the shapes are highly contrasted from their background and take a contrasted from their background and take a thresholding-based approachthresholding-based approach
3.3.Apply adaptive thresholding to the image using Apply adaptive thresholding to the image using integral imagesintegral images
• That a window of size 8*8 usually gives the That a window of size 8*8 usually gives the most pleasing resultsmost pleasing results
12
3. Nestor (4/9)3. Nestor (4/9)1.1.The contour of each image shape is then The contour of each image shape is then
extracted by straightforward sequential edge extracted by straightforward sequential edge linking as an ordered list of points linking as an ordered list of points
2.2.Check for the convexity of contours and drop Check for the convexity of contours and drop ones that are convex or close to convex ones that are convex or close to convex
3.3.Finally apply median filtering to each contour Finally apply median filtering to each contour and get smooth contoursand get smooth contours
13
C I =(p1, p2....pn)
3. Nestor (5/9)3. Nestor (5/9)1.1. We use a construction which is based on the bitangent lines to We use a construction which is based on the bitangent lines to
the contour, illustrated in Fig. 2athe contour, illustrated in Fig. 2a
2.2. Each bitangent line l gives two tangency points, Pa and Pb, which Each bitangent line l gives two tangency points, Pa and Pb, which segment a concavity from the rest of the curve, known as the M-segment a concavity from the rest of the curve, known as the M-curvecurve
14
3. Nestor (6/9)3. Nestor (6/9)1.1. The occluded shape may thus contain concavities that point The occluded shape may thus contain concavities that point
to different library shapesto different library shapes
2.2. Since we are tracking recursively on a frame-to-frame basis, Since we are tracking recursively on a frame-to-frame basis, a shape can be tracked from previous framesa shape can be tracked from previous frames
15
3. Nestor (7/9)3. Nestor (7/9)1.1.The system maintains a shape library that The system maintains a shape library that
contains the shapes learned so farcontains the shapes learned so far
2.2.The system can load a directory of shape files The system can load a directory of shape files and learn themand learn them
3.3.User can also teach the system new shapes at User can also teach the system new shapes at runtimeruntime
16
3. Nestor (8/9)3. Nestor (8/9)1.1.When teaching the system a new shape, the When teaching the system a new shape, the
image goes through the same recognition step image goes through the same recognition step described in the Shape Recognition Section, described in the Shape Recognition Section, and its signatures are hashedand its signatures are hashed
2.2.The curve, its signatures, and additional The curve, its signatures, and additional required information are stored in the shape required information are stored in the shape librarylibrary
3.3.Once the shape is found, it is moved into the Once the shape is found, it is moved into the visible shape listvisible shape list
17
3. Nestor (9/9)3. Nestor (9/9)1.1.The shape list is searched for each shape once The shape list is searched for each shape once
per execution, when the shape first appearsper execution, when the shape first appears
2.2.This strategy can be useful when This strategy can be useful when
• Only a few shapes are visible in a single Only a few shapes are visible in a single frameframe
• Only a small number of shapes are used Only a small number of shapes are used through a single executionthrough a single execution
18
4. Contextual shape 4. Contextual shape learning (1/4)learning (1/4)1.1.Previously, to teach the system a new shape, the user Previously, to teach the system a new shape, the user
had tohad to
• Show it frontally to the camera Show it frontally to the camera
• explicitly assign a model to itexplicitly assign a model to it
2.2.To learn an unknown shape appearing in the image, To learn an unknown shape appearing in the image, upon user request, we automatically perform upon user request, we automatically perform rectification according to the rectifying transformation rectification according to the rectifying transformation recovered from a tracked shape that lies in the same recovered from a tracked shape that lies in the same planeplane
19
4. Contextual shape 4. Contextual shape learning (2/4)learning (2/4)1.1.The nearest tracked shape NC to the new The nearest tracked shape NC to the new
shape C is found according to the shapes’ shape C is found according to the shapes’ centroidscentroids
2.2.This projects C to the image plane outside of This projects C to the image plane outside of the image bounds and to a scale that depends the image bounds and to a scale that depends on its location relative to C in the real worldon its location relative to C in the real world
3.3.We finally centralize the rectified contour of CWe finally centralize the rectified contour of C
20
4. Contextual shape 4. Contextual shape learning (3/4)learning (3/4)
21
4. Contextual shape 4. Contextual shape learning (4/4)learning (4/4)
22
ª›®œ•Œ QuickTime˛ ©M°ßH.264°®∏—¿£¡Yæπ
®”¿Àµ¯¶ππœµe°C
5. Experimental results 5. Experimental results (1/6)(1/6)1.1.We benchmarked and tested Nestor on a Nokia N95 We benchmarked and tested Nestor on a Nokia N95
mobile phone and a Dell Latitude D630 notebook mobile phone and a Dell Latitude D630 notebook computer computer
2.2.The Nokia N95 The Nokia N95
• 330 MHz processor330 MHz processor
• camera that captures 320camera that captures 320** 240 pixel images 240 pixel images
3.3.The Dell notebook The Dell notebook
• 2.19 GHz processor2.19 GHz processor
• webcam that provides 640webcam that provides 640** 480 pixel images480 pixel images
23
5. Experimental results 5. Experimental results (2/6)(2/6)1.1.We measured the relation between the number We measured the relation between the number
of tracked shapes in each frame and per-frame of tracked shapes in each frame and per-frame tracking timetracking time
24
5. Experimental results 5. Experimental results (3/6)(3/6)1.1.To assess this relation, we measured the To assess this relation, we measured the
recognition rate of the system with different recognition rate of the system with different shape library sizes and slantsshape library sizes and slants
25
5. Experimental results 5. Experimental results (4/6)(4/6)1.1.The experiment was performed using the The experiment was performed using the
notebook configurationnotebook configuration
2.2.The camera was fixed approximately 40 cm The camera was fixed approximately 40 cm from the shapesfrom the shapes
3.3.For each library size, the recognition rate was For each library size, the recognition rate was tested on all of the shapes in the librarytested on all of the shapes in the library
26
5. Experimental results 5. Experimental results (5/6)(5/6)1.1. We also measured the reprojection error for different distances of the camera We also measured the reprojection error for different distances of the camera
from imaged shapesfrom imaged shapes
2.2. For each library shape and ARToolkit fiducial, 50 randomly sampled points in For each library shape and ARToolkit fiducial, 50 randomly sampled points in the area of the shape/fiducial were checked using a random transformation the area of the shape/fiducial were checked using a random transformation synthesizersynthesizer
27
5. Experimental results 5. Experimental results (6/6)(6/6)
1.1.
28