walk&sketch: create floor plans with an rgb-d camera · walk&sketch: create floor plans ......

WalkampSketch Create Floor Plans with an RGB-D Camera

Ying ZhangPalo Alto Research Center

3333 Coyote Hill RdPalo Alto CA

yzhangparccom

Chuanjiang LuoOhio State University

2015 Neil AvenueColumbus OHluo75osuedu

Juan LiuPalo Alto Research Center

3333 Coyote Hill RdPalo Alto CA

jjliuparccom

ABSTRACTCreating floor plans for large areas via manual surveying islabor-intensive and error-prone In this paper we present asystem WalkampSketch that creates floor plans of an indoorenvironment by a person walking through the environment ata normal strolling pace and taking videos using a consumerRGB-D camera The method computes floor maps repre-sented by polylines from a 3D point cloud based on preciseframe-to-frame alignment It aligns a reference frame withthe floor and computes the frame-to-frame offsets from thecontinuous RGB-D input Line segments at a certain heightare extracted from the 3D point cloud and are merged toform a polyline map which can be further modified and an-notated by users The explored area is visualized as a se-quence of polygons providing users with the information oncoverage Experiments have done in various areas of an of-fice building and have shown encouraging results

Author Keywords2D3D mapping RGB-D cameras floor plans

ACM Classification KeywordsH52 Information interfaces and presentation Miscellaneous

General TermsAlgorithms Design Experimentation Measurement

MOTIVATION AND INTRODUCTIONRecent years have observed a dramatic cost reduction ofRGB-D cameras (eg Microsoftrsquos Kinect [17]) in the con-sumer market The cost of a 3D camera is only a fraction ofa 2D laser scanner commonly used in robotic mapping andlocalization Although RGB-D cameras are often limited inrange and viewing angles the low cost has encouraged re-search on using these devices for 3D modeling and map-ping [3 11 13 19] In addition images associated with thedepth information provide extra visual features for frame-to-frame matching which can be used to improve the robust-ness of mapping Image understanding can also be used for

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page To copy otherwise orrepublish to post on servers or to redistribute to lists requires prior specificpermission andor a feeUbiComp rsquo12 Sep 5-Sep 8 2012 Pittsburgh USACopyright 2012 ACM 978-1-4503-1224-01209$1000

Figure 1 Platform for WalkampSketch

automatic labeling of objects in the environment Comparedto 2D laser range-finders mapping using 3D cameras doesrequire much higher cost in computation However with theincreasing availability and decreasing cost of GPU-equippedlaptops 3D modeling and mapping can achieve real-timeperformance on mobile platforms [3] Comparing to densetracking and mapping with 2D cameras [20 14 2] RGB-Dcameras can be more robust to lighting and image featuresdue to the additional depth images

Prior research has achieved the capability of the 3D model-ing of an indoor environment using an RGB-D camera (eg[3]) In this paper we present a relevant and yet differentmapping application that creates a 2D floor plan by a personwalking through the environment at a normal leisurely walk-ing pace with an RGB-D camera This application is moti-vated by the fact that floor plans are important in space plan-ning navigation and other location-based services yet notall buildings have floor plans Furthermore remodeling orrepartition is common in residential and commercial build-ings which may render an existing floor plan obsolete Cur-rently the common practice to obtain a floor plan is by man-ual surveying This is labor-intensive and error-prone Thereis also a new app called MagicPlan [27] which createsfloor plans using smart phones or tablets however such amethod would not work for long hallways or multiple roomsOur application is designed to facilitate the building map-ping using technology advances in RGB-D cameras The re-sulted floor plan can be modified and annotated by humansjust as those in [27] or [5]

Figure 2 From RGB-D images to 2D floor plan and explored region

Our system WalkampSketch consists of a backpack with anRGB-D camera mounted on the top (Figure 1) The user car-ries the backpack and walks around the building at a normalstrolling pace As the user walks the floor plan is incremen-tally constructed hence the name WalkampSketch The cam-era is mounted slightly tilted (sim 5 - 10 degrees) toward thefloor Such a setting results in stable heights frame-by-framewith the floor plane in view from time to time as a horizon-tal reference Our method first aligns the reference framewith the floor using the first few frames ie we chooseto initially stand in an area so that the camera can see thefloor and walls then computes the frame-to-frame offsetsfrom the continuous RGB-D inputs while walking along thewalls in the environment Instead of creating 3D models asmany other research has done [3 11 13 19] using RGB-D cameras we select a horizontal slice at a certain heightand construct a 2D floor plan using connected line segmentsie polylines Along with the polyline representation forfloor plans we represent the explored area by a sequenceof polygons which is computed from the resulting polylinemap and the trajectory of the camera obtained by frame-to-frame matching The explored area polygons provide visualfeedback and guidance to the user (the person carrying thebackpack-mounted camera) The user may decide to nav-igate an already-explored area to further improve mappingaccuracy or navigate to new areas to further grow the floorplan Figure 2 shows the overall design of our system andFigure 3 shows an example of a multi-room floor map andexplored areas

Most existing work in mapping of indoor environments usedoccupancy grids [4] point clouds [11] and scattered line seg-ments [21] we advocate polyline maps [15] ie represent-ing a map using a set of connected linear segments as usedin floor plans Polyline maps provide a compact and ab-stract representation and can be easily manipulated by hu-mans using an interactive interface Polyline maps can alsobe used for indoor localization in ubiquitous applications[26] There has been research on getting 2D line sketchesfrom 3D laser scans in cluttered indoor environments [30] or3D line sketches from vision-based SLAM [6] where lineswere considered as features in SLAM computation and theresulting map would be a set of line features In our caselines are post-extracted from matched 3D point clouds ofthe current frame using both RGB and depth images andgrouped as polylines ie a connected set of directional seg-ments Polylines of the current frame are then merged withthe previous polyline map to form a new partial map

Figure 3 An example of polyline maps explored area is representedby overlapping polygons and problematic corners circled

To reduce the effect of noise and increase the computationefficiency we have used the assumption of orthogonality [21]in indoor environments Such an assumption has been usedin previous work such as [10 15] with promising resultsThe rectification greatly improves the quality of the result-ing map with simple computation for map merging

A prototype has been implemented in Matlab using pre-recordedRGB-D images saved at 10Hz from a Microsoft Kinect withwalking speed ( 05 meters per second) We have tested thesystem in various areas of our office building a conferenceroom hallways and a set of connected offices (Figure 3)The results are encouraging

The major contributions of this paper are as follows 1) De-sign and development of a system that efficiently creates a2D floor plan by walking and taking videos from a consumerRGB-D camera 2) design and implementation of a polylineextraction algorithm from 3D point clouds and 3) designand implementation of explored area representation by a se-quence of polygons

In the subsequent sections we first describe the method of3D image registration followed by the algorithms of poly-line map generation and explored area computation We thenshow some results from experiments in an office environ-ment Finally we conclude the paper with some future di-rections

Figure 4 Preprocessing (yellow blocks) and local matching (blue blocks) from RGB-D images to camera pose and error estimates

OBTAIN CAMERA TRAJECTORY FROM RGB-D IMAGESThere has been active research on 3D modeling and map-ping using RGB-D cameras The approach we developedis similar to that in [11] except our uncertainty modelingfor local matching fits better for ambiguity situations Ourmethod consists of three steps preprocessing local match-ing and global matching Preprocessing handles RGB-D im-age un-distortion and alignment and 3D point cloud down-sampling Local matching applies to two consecutive framesto find a consistent offset between them Global matchingseeks to improve alignment accuracy by registering two keyframes far apart in time yet sharing similar image contentThe reoccurence of image content is due to revisits for in-stance the camera returns to a position that it has viewedbefore In this case a matching pair of key frames are de-tected and we apply an error distribution method to improvethe 3D image registration results Loop closure is performedto generate a globally consistent map

PreprocessingKinect consists of an RGB camera an infrared camera andan infrared projector [17] The distortion from the Kinectsensors is non-negligible We have used the method imple-mented in [1] to calibrate the cameras and use the calibrationresult to undistort both RGB images and depth images Thenwe apply the following method to align the RGB image withthe depth image The stereo configuration gives the relativepose between the RGB camera and the infrared camera sothat mapping a point in the depth image to the RGB image isstraightforward Nevertheless mapping a point in the RGBimage back to the depth image needs costly search opera-tions To avoid this complication for every point in the depthimage we find its corresponding location in the RGB imageand calculate its RGB value by interpolation ie for eachpoint in the depth image we get its RGB values resulting apair of RGB-D image

Kinectrsquos depth image is obtained by correlating the imagefrom the infrared camera with the pattern projected from theinfrared projector The correlation procedure is far from per-fect especially in regions with depth discontinuity In par-ticular suppose two adjacent points a Kinect camera sees arefrom two different surface instead of an abrupt depth jumpit will look like there is another plane connecting the twopoints In other words the 3D space looks smoother thanit actually is This artifacts is detrimental to the subsequentmatching phase As such we have to identify and remove

those fake depths whose points lie collinear with the opticalcenter of the infrared camera

The original RGB-D images from the Kinect are in a reso-lution of 640x480 or more than 300000 points Computa-tionally it is challenging to perform frame-to-frame match-ing such as the iterative closest point (ICP) algorithm on thismany point samples in a real-time or close-to real-time sys-tem down-sampling is beneficial In order to have betteraccuracy for the matching instead of down-sampling pointsuniformly in the image coordinate we down-sample pointsaccording to the depth of the points ie points within thevalid range (08m- 5m) are down-sampled less and hencedenser samples surviving than points out of range becausepoints within the valid range are more accurate for matchingIn particular the size of the down-sample grid varies with thedepth of the points Each down-sampled point takes the aver-age of its depth values within the down-sample grid Down-sampling rates vary from frame-to-frame After down-samplingwe normally get from 1000 to 20000 points for ICP and 2Dpolyline extraction

Local matchingLocal matching aligns adjacent frames We take a similarapproach as in [11 3] with some modification Figure 4shows an overall flowchart of the preprocessing and localmatching procedures The output of this step is a 6D cam-era pose X = (x y z α β γ) or its 3D transformation Tfor the current frame and a 3D covariance error matrix Cfor the translation part of the pose estimation reflecting theuncertainty

We first extract features from the two RGB images usingscale invariant feature transform (SIFT) [16] descriptors im-plemented in [28] SIFT identifies a set of key-points basedon local image features that are consistent under illumina-tion conditions viewing position and scale SIFT featuresare distinctive and are commonly used for image matchingSIFT features between frames are matched using randomsample consensus (RANSAC) [23] an algorithm that iter-atively estimates the model parameters from a data set con-taining inliers and outliers A model is fitted using inliersand then other data are tested against the model fitted andclassified again into inliers and outliers The model with themost inliers will be chosen However SIFT and RANSACare error-prone in indoor environments For instance repet-itive patterns such as patterns on carpet and wallpaper can

result in ambiguities in matching To account for repetitivepatterns in the indoor environment we modify RANSAC tobe a maximum likelihood algorithm Specifically we as-sume that the speed of the movement of the camera is con-stant with some Gaussian noise ie if Xk is the pose offrame k

Xk+1 minusXk = Xk minusXkminus1 + ∆X (1)

where ∆X are random noise with Gaussian distribution Letthe matching score for pose X be m(X) Instead of choos-ing argX maxm(X) we use argX max[m(X)p(X = X prime)]where X prime is the predicted pose for this frame according toEq 1 In other words we are using prior knowledge regard-ing the continuity of motion to eliminate ambiguities in localframe matching

Using RANSAC we find the best camera rotation and trans-lation between the two feature sets While not accurate enoughthis gives a good initial estimate We will then use ICP [12]to refine the estimation The idea of ICP is that points in asource cloud are matched iteratively with their nearest neigh-bors in the target cloud and a rigid transformation is es-timated by minimizing error between matched pairs Theprocess iterates until convergence Because of the partialoverlap we only focus on point pairs within the intersectionThere are several variants of ICP Theoretically generalizedICP proposed in [25] is the best but it is also pretty costlyIn our experiments we observed that the point to plane ver-sion [24] works well enough The down-sampled 3D pointclouds of the two neighboring frames are used for the march-ing starting from the initial estimate from RANSAC Theoutput of ICP is the final pose estimate and a set of matchedpoints

While the 3D transformation between frames is our target forlocal matching computation it is almost equally importantto keep track of the uncertainty in the estimation process Inthe global matching stage (the next subsection) we will bedetecting loops between far apart frames When a loop isdetected the cumulative error between two ends of the loopcould be significant To make the two ends meet the cumu-lative error need to be distributed to individual frames Thisimproves the overall mapping accuracy The uncertainty inthe local matching stage gives us an intuitive guideline tohow error should be distributed

The uncertainty of the pose estimation is represented by itscovariance matrix Given the matched point pairs pj andpprimej suppose pj is in the first frame we approximate its localtangent plane with the k nearest neighbors by principle com-ponent analysis (PCA) It gives the normal direction n1 andtwo other basis vector n2 n3 in the tangent plane Becausein the ICP algorithm we adopt the point to plane strategypoints can slide arbitrarily in the tangent plane As a re-sult we do not have any confidence how well pj matchespprimej in the tangent plane The distance between the two localtangent planes provides a quality measure along the normaldirection As such we calculate the covariance matrix of thematch between pj and pprimej as follows Let the distances ofthe k-nearest neighbors of pprimej to the tangent plane of pj be

Figure 5 Frame-to-frame constraints along a sequence of frames

di i = 1 k the average of d2i is taken as the varianceσ2j in the direction of n1 To bring the covariance matrix to

the coordinate system of this frame the rotation matrix Rj

from PCA needs to be applied

Cj = Rj

σ2j 0 0

0 infin 00 0 infin

RTj (2)

where Rj = [n1 n2 n3] is the rotational matrix for theplane To obtain the covariance matrix of the pose estima-tion between the two adjacent frame the average of Cj of allmatched pairs should be taken We use the harmonic meanof Cj as the standard arithmetic average is not feasible dueto the infin entry in Cj We have the covariance matrix be-tween the matched frames as

C =nsum

j Cminus1j

(3)

where n is the total number of matching points between the

two frames and Cminus1j = Rj

σminus2j 0 00 0 00 0 0

RTj

Global matchingDue to imprecise pose estimation over time error can accu-mulate which causes the estimation of the camera positionto drift over time leading to inaccuracies in the map This ismost noticeable when camera moves over a long path even-tually return to the location previously visited To addressthis issue global optimization is needed We can representconstraints between frames with a graph structure If a loopis detected it is represented as constraints between framesthat are not adjacent (from frame k to frame i in Figure 5)

We design a two-step global matching process

bull Key frame identification this step involves the identifi-cation of key frames Let the first frame be a key frameF0 At each time after local matching with the previousframe we also check if this frame can match the previouskey frame Fiminus1 If it matches continue to process the nextframe if not set the current frame be the next key frameFi We compute the covariance matrix between Fiminus1 andFi using Cj between the two frames and throw away allthe frames between the two to save storage

bull Loop identification and closure after a new key frameFk is identified we check if this key frame matches anyprevious key frames Fkminus1 F0 If we can identify amatch from Fk to Fi loop closure will be performed onthe identified loop (Figure 5)

There are several methods in the literature to do loop clo-sure [18 7 8 9] However the full six degree constraint isnonlinear optimization is expensive Instead of doing a gen-eral optimization as in [9] we have used a simple strategyto do the optimization for the rotational part and the trans-lation part separately as described in [18] We argue thatfor our applications in which most images are walls or rec-tilinear objects the main error source comes from the am-biguity in the uncertainty in translations due to the lack offeatures either visually or geometrically The rotational partis much less noisy than the translation part We simply dis-tribute the rotational error among the frames evenly to closethe loop of rotational offset using the method described inSection 513 in [18] In particular assume the rotation ma-trix from key frame i to key frame k be R which can betransformed into a quaternion q = (q0 qx qy qz) with therotation axis a = (qx qy qz)

radic1minus q20 and the rotation an-

gle θ = 2 arccos q0 Rj i lt j lt k is obtained by settingthe angle offset θ(kminus i) along the axis a between any twoconsecutive key frames in the loop

Instead of evenly distribute the translation error along theframes in the loop we spread the error according to their co-variance matrix computed in the local matching stage (de-scribed in the last subsection) Specifically if the transla-tional offset from frame j to j + 1 is Dj with covariancematrix Cj and from frame k to frame i is Dk with the co-variance matrix Ck (Figure 5) we simply minimize the fol-lowing energy function

W =

kminus1sumj=i

(Xj+1 minusXj minusDj)TCminus1j (Xj+1 minusXj minusDj)

+(Xi minusXk minusDk)TCminus1k (Xi minusXk minusDk)

where Xj is the translation of the frame j It turns out thatthis is a linear optimization and Xj i lt j le k can be ob-tained by a simple matrix inversion [18]

The 3D transformation T of the current frame is obtainedby combining the rotation matrix R and the translation X The sequence of transformations T0 T1 T2 of the keyframes represents the trajectory of the camera poses Weuse the 3D trajectory to compute 2D polyline maps for floorplans

GENERATE POLYLINES FROM CAMERA TRAJECTORYGiven the transformation of the current key frame T and theset of 3D points Pc in the current camera frame we obtaina subset of points corresponding to the plane at height hie Ph = p isin Pc |y(T middot p) minus h| lt ε where y(p) is yvalue of a 3D point p We then extract and rectify line seg-ments in Ph and merge the new set of line segments with thepartial map obtained from the previous key frames to forma new partial map The choice of line segment representa-tion is motivated by convenience of users mdash human is usedto line drawing rather than occupancy grid and point cloudand such a representation is already used in floor plan draw-ings The line segment is also concise and hence efficient interms of memory and computation

Figure 6 Procedures for segmentation linearization and clustering

SegmentationGiven a 2D point cloud there exist many methods for lineextractions [22] We present here a simple method that ap-plies to an ordered set of points Segmentation on an orderedset of points is much more efficient than that on an order-lessset of points Given the set of points in a horizontal plane Ph

from the current camera frame we order the points by theviewing angle in the x-z plane ie p1 lt p2 if and only ifx(p1)z(p1) lt x(p2)z(p2) Let Po be the ordered set ofpoints Ph in the reference frame ie Po = T middotp p isin Phwe get an ordered set of 2D points P in the reference frameby projecting Po to the x-z plane in the reference frame

Given the set of ordered points P in the 2D plane we obtaina set of polyline segments Two consecutive points belongto the same polyline segment if and only if the distance be-tween them is smaller than a given threshold eg 03 me-ters The goal of linearization is to generate line segmentswhich are primitives of map construction There is a balancebetween two considerations (1) a line segment is preferredto be long to give a concise representation and (2) a linesegment need to have a small linearization error to maintainhigh accuracy Our algorithm is designed to strike a suitablebalance described as follows

bull linearize Start from the first point in the polyline seg-ment For each point in the segment in the sequence weextend the line segment as follows Let p0 = 〈x0 y0〉 bethe first point in the line segment and pl = 〈xl yl〉 be theprevious point and pc = 〈xc yc〉 be the current point Letthe distance between p0 and pl be l the distance betweenpl and pc be d and the distance from pc to the line p0-pl bedprime (Figure 6 left) The line p0-pl will be extended to p0-pcif and only if l lt lmin or ddprime lt K where lmin is theminimum length of a line and 0 lt K lt 1 is a constant Inour setting lmin = 03 meters and K = 01 If pc extendsthe line segment we have pl larr pc and continue If pcdoes not extend the current line we start a new line withp0 larr pl and pl larr pc and continue

bull cluster Given a sequence of line segments after lineariza-tion we compute the triangular areas formed by two con-secutive lines LetAi be the area formed by (piminus1 pi) and(pi pi+1) (Figure 6 right) The two lines can be mergedinto a new line (piminus1 pi+1) if and only if (1)Ai = minj Aj ieAi is the smallest among all other areasAj formed bytwo lines at j and (2) Ai lt Amin ie it is smaller thanthe minimum size in our caseAmin = 01 square metersThe algorithm iterates until all areas are larger than Amin

or there is only one line left in the polyline segment

After segmentation we obtain a set of polyline segmentsfrom the current 2D point cloud representing mostly wallsegments

RectificationWe make an assumption that most wall segments in indoorenvironment are rectilinear [15 21] In fact we position thecamera such that it will target at walls most of the times Therectilinear assumption greatly reduces computation for mapmerging We will point out an extension to this assumptionat the end of this section

Unlike previous work on rectification [15 21] we first usethe statistical distribution of the directions of line segmentsto compute the orientation of the walls using the first fewframes which we are sure are shooting at walls Given thedirection of walls all line segments of the future frames arethen rectified The details follow

bull Orientation Let L = (li αi) i isin 1N be the setof N line segments where li is the length and minusπ ltαi le π is the angle of the irsquoth segment respectively Wecompute the wall direction α = arctan(s c)4 wheres =

sumi wi sin(4αi) c =

sumi wi cos(4αi) and wi =

lisumj lj

In other words the wall direction α isin (minusπ4 π4]

is the weighted average of the normalized directions in(minusπ4 π4] of all line segments in this frame To ver-ify if this frame consists of mostly wall segments wecompute variance σ2

s =sumwi(sin(4αi) minus sin(4α))2 and

σ2c =

sumwi(cos(4αi) minus cos(4α))2 If we have large σs

or σc either we have a noisy frame or wall segments arenot rectilinear We will use first few frames to computethe wall orientation and use value α that has minimumuncertainty ie smallest variances as the result

bull Rectification To make sure that the frame consists ofmostly wall segments we compute the wall orientationfor each frame If the orientation differs a large amountfrom the wall orientation α the frame will not be used formap merging otherwise we rotate the current frame byminusα to align with the walls The set of line segments in therotated frame will then be rectified to one of the rectilinearangles 0 π2 πminusπ2 that is closest to its direction

The result of rectification is a set of rectified polyline seg-ments We will then merge the set of polyline segments withthe partial sketch map generated by the previous frames toform a new partial sketch map In cases where many diago-nal or circular walls are present we can extend the rectifica-tion to eight or more directions instead of four For the restof this paper however we focus only on rectilinear walls

Map mergingA polyline map is represented by a set of polylines Insteadof merging two polyline maps we first decompose the poly-line maps to four groups of line segments each in one of thedirections 0 π2 πminusπ2 We merge lines that are over-lapping in the same direction Finally we clean up the sketchmap by removing small segments and recreate the polylinemap from the merged sketch map The details follow

Figure 7 Merging two lines close-by in the same direction

Figure 8 Sketch map red and green lines are of opposite directions

bull Sketch merging For each of the four directions we repeatmerging two overlapping lines until no lines are overlap-ping Two lines are overlapping if and only if (1) thevertical gap between the two lines are less than G and(2) either the line are overlapping or the horizontal gapbetween the lines is less than G In our case G is 02 me-ters Given l1 and l2 the the lengths of two overlappinglines in the same direction the merged line l is parallelto the two lines with the two ends at the boundary of therectangular box formed by these two lines (Figure 7) Letd1 and d2 be the distances from the merged line to l1 andl2 respectively we have d1l1 = d2l2 In other words themerged line is closer to the longer line Figure 8 shows anexample of a sketch map out of the 2D point cloud (bluedots) where red and green show lines of opposite direc-tions

bull Intersection identification Due to noise in images and ap-proximation in processing we will get sketches with linescrossing each other in different directions (Figure 9 left)We identify three types of intersect X-Y Y -X and badintersect (Figure 10) where X-Y intersections start fromX-line and end at Y -line Y -X intersections start at Y -

Figure 9 Sketch and correspondent polylines

Figure 10 Intersection types for sketches

line and end at X-line and bad intersections are invalidresulted from noisy images and misalignments in trajec-tory matching In particular an intersection type is X-Yif and only if x2 lt ε and y1 lt ε it is Y -X if and only ifx1 lt ε and y2 lt ε One important feature of this workis to identify those bad intersects so that human users canadjust the map afterwords

bull Polyline extraction Start from any valid intersect P oflines lx and ly ndash without loss of generality we assume itis of type X-Y ndash we do forward search as follows Anintersect P prime of lprimex and lprimey follows P if and only if P prime isof opposite type ie Y -X ly = lprimey and P prime is the firstintersect along the direction of Y Similarly we searchbackward to find P primeprime of lprimeprimex and lprimeprimey which precedes P ifand only if P primeprime is of type Y -X lprimeprimex = lx and P primeprime is thefirst intersect along the opposite direction of X We markall the intersects used in this search and create a polylineof all the intersects in the chain The process continuesuntil no more valid intersects exist We then add back allunused lines which do not have any intersects with otherlines The result of this process produces a set of polylinesegments which will be viewed as a partial map at the time(Figure 9 right) The corresponding polyline map of thesketch map in Figure 8 is shown in Figure 3

In implementation we keep both the partial sketch map andthe partial polyline map so that the sketch map will be mergedwith the new sketch produced from frame to frame and poly-line map can be viewed in real time

OBTAIN EXPLORED AREA FROM MAP AND TRAJECTORYExplored area is defined to be the area swiped by a sensoralong its motion trajectory In exploration or search and res-cue situations it is important to obtain the explored area so

Figure 11 Viewing polygon vs non-viewing polygon

Figure 12 Viewing polygons of sensors

that it is more efficient to discover the space unexplored Inprevious work the explored area is mostly represented bya grid map with 0 as empty 1 as occupied and 05 as un-known Although we can convert a polyline map to a gridmap and compute the explored area by the line of sight ofsensor coverage it will not be efficient In this work wecompute a viewing polygon directly given a polyline mapand the camerarsquos position and orientation We then obtainthe explored area by a sequence of viewing polygons alongthe trajectory

A viewing polygon is a polygon such that (1) there is an ori-gin pointO and (2) there is only one intersection for any linefrom the origin towards any edge of the polygon (Figure 11)We call any line from the origin a viewing line A viewingpolygon will be represented by a sequence of points startingfrom the origin Note that viewing polygons may not be con-vex A view of any directional sensor such as a camera ora laser scanner can be approximated by a viewing polygon(Figure 12) Given an initial viewing polygon of a sensorwe clip the viewing polygon by the lines in the polyline mapthat intersect with it and obtain a new viewing polygon thatis not blocked by any line Given any line that intersectswith the viewing polygon there are three cases (Figure 13)(1) both ends are in the polygon (2) one end in and one endout and (3) both ends are out For each in-end we com-pute the intersection of the viewing line extended from thein-end to one of the edges of the polygon For the out-endwe compute the intersection of the line with one of the edgesof the polygon We order all the boundary intersections andin-ends in an increasing order of the viewing angles Letthe initial viewing polygon P = (P1 P2 PN ) be the or-

dered set of N points starting from the origin The index ofa boundary intersection point is k if the point is between Pk

and Pk+1 A new viewing polygon cut by a line is obtainedas follows

bull Both ends in there are zero or even number of inter-sections of this line with the polygon Let I1 and I2 bethe two in-ends of the line and B1 and B2 are the cor-respondent boundary points Let the index of B1 be kand B2 be j k le j If there are zero number of inter-sections (top of Figure 13) the new polygon would beP (1 k) B1 I1 I2 B2 P (j + 1 N) If there areeven number of intersections (Figure 13 bottom) we haveto insert additional boundary points and remove the origi-nal points that are outside of the line Let i1 i2 i2n bethe indexes of the intersection points in order The pointsbetween i2kminus1 and i2k remain in the new polygon but thepoints between i2k and i2k+1 are outside of the new poly-gon

bull One end in and one end out there is one or odd num-ber of intersections of this line with the polygon Withoutloss of generality assume the in-end I is before the out-end ie B1 has index k and B2 has index j k le jIf there is only one intersection B2 the new polygon isP (1 k) B1 I B2 P (j + 1 N) If there are morethan one intersections ie i1 i2 i2n+1 similar to theprevious case the points between i2kminus1 and i2k remain inthe new polygon but the points between i2k and i2k+1 areoutside of the new polygon

bull Both ends out there are two or even number of intersec-tions of this line with the polygon Let B1 and B2 be thetwo intersection points of the line and the polygon Letthe index of B1 be k and B2 be j k le j If these are theonly two intersections (top of Figure 13) the new polygonwould be P (1 k) B1 B2 P (j + 1 N) If there aremore than two intersections (Figure 13 bottom) we haveto insert additional boundary points and remove the origi-nal points that are outside of the line Let i1 i2 i2n bethe indexes of the intersection points in order The pointsbetween i2kminus1 and i2k are outside of the new polygonbut the points between i2k and i2k+1 remain in the newpolygon

To compute a viewing polygon for a camera pose the orig-inal viewing polygon (Figure 12) will be cut by each ofthe lines that intersect with it in the current polyline mapViewing polygons are computed only when the camera movedsignificantly since the last time a polygon was computed Tocalculate the difference between two frames we use

d = λ||xminus xprime||2D2 + (1minus λ) sin2(αminus αprime) (4)

where x and α (xprime and αprime) are the 2D location and headingof the current (previous) frame respectively In our casewe have D set to 5 meters λ is 05 and d gt 01 triggers anew polygon computation A sequence of viewing polygonsforms the explored region by the camera For the example inFigure 3 we get 1000+ key frames but only 70+ polygonsfor representing the explored space

Figure 13 Clipping viewing polygons top - simple bottom - complex

EXPERIMENTS IN OFFICE ENVIRONMENTSExperiments were performed on various areas of our officebuilding including a conference room corridors and a set ofconnected offices Our data were taken at 10Hz in walkingspeed ( 05m per second) and processed off-line using Mat-lab The processing time for generating camera trajectory isabout 2-3 seconds per frame on ThinkPad T420 (3M Cache23 GHz) The time from camera trajectory to 2D polylinemap is only 0007 seconds per key frame almost negligiblecomparing to the computation time for camera poses

Figure 14 shows four additional examples with polyline map-ping and explored areas from top a conference room alooped corridor an open space and an open space with twoconnected rooms The sizes of areas of our experiments sofar are under 20 meters by 20 meters

Kinect cameras after calibration (with undistorted images)offer good accuracy (error within 1 depth values) Im-age registration or camera pose computation introduces er-ror when the matching frames are lack of visual or geo-metric features Line segmentation rectification and mapmerging introduce additional error when the image has non-rectilinear lines Overall we found that the 2D floor plansgenerated so far have very good accuracy in measurementscales less than 03 meters in most cases In addition oursystem offers a unique feature that identifies problematic linecrossings which provides areas for human correction afterthe map generation

The explored area provides information on holes that needfurther exploration In two of the examples in Figure 14 thecamera did not cover the space in the middle so it would begood to take additional images in the middle in order to coverthe whole space It also shows if an area has been exploredbefore so that one can avoid wasting time taking videos ofprevious explored areas The efficient representation of theexplored areas can be implemented on small handheld de-vices and used for search and rescue applications

minus4 minus2 0 2 4 6 8

minus3

minus2

minus1

0

1

2

3

4

5

6

7

minus20 minus15 minus10 minus5 0 5 10

minus5

0

5

10

15

minus10 minus8 minus6 minus4 minus2 0 2 4 6 8

minus8

minus6

minus4

minus2

0

2

4

6

minus6 minus4 minus2 0 2 4 6 8 10 12 14

minus6

minus4

minus2

0

2

4

6

8

Figure 14 Four mapping examples from top a conference room alooped corridor an open space and an open space with two connectedrooms

CONCLUSIONS AND FUTURE WORKWe have presented a system WalkampSketch to create indoorpolyline maps for floor plans by a consumer RGB-D cam-era via walking through the environment The prototype hasa proof of concept implementation in Matlab using savedvideo images with processing time about 2-3 seconds perframe Videos have been taken in various areas of our officebuilding The resulting 2D floor plans have good accuracywith error less than 03 meters

One challenge to create maps in real-time is to extract re-liable camera trajectory in real-time while 2D floor planscan be already generated in real-time given the camera tra-jectory According to the work presented in [3] one canprocess 3-4 frames in a second if a gaming laptop is usedwith GPU SIFT implementation [29] We have used similarand simplified computations as [3] and our implementationis in Matlab which is known to be several magnitude orderslower than a C implementation for iterative operations Weare confident that real-time processing is possible and thiswill be one direction for our future work Another directionis to speedup the image registration algorithm by optimizingor simplifying some of the computations in frame-to-framematching

After we have a real-time mapping system we will work onlarger areas to demonstrate the reliability and robustness ofsuch a system With real-time mapping we can have inter-active procedures similar to [3] to hint the quality of frame-to-frame matching during video taking For a large buildingone may stop and restart from the same spot using imagematching to the last frame as described in [3] We will alsoextend our algorithms to cases where diagonal or circularwalls exist for more general floor plans

We will develop a user interface so that one can view mod-ify and annotate maps being generated similar to what canbe done in [5] We would also like to add features in theenvironment such as windows and doors and objects suchas tables and chairs automatically to the floor plan usingsome image classification techniques In addition to vis-ible features one can also embed radio fingerprint to themap for localization purpose Such extensions would makeWalkampSketch more effective and efficient

ACKNOWLEDGEMENTSWe thank Dr Craig Eldershaw for putting together a proto-type system (Figure 1) for our experimentation We are alsograteful for valuable comments from anonymous reviewerswhich helped us improving the quality of this paper Thisproject was sponsored by Palo Alto Research Center via sup-plementary research funding

REFERENCES1 J-Y Bouguet Camera calibration toolbox for Matlabhttpwwwvisioncaltechedubouguetjcalib_doc 2012

2 A J Davison I D Reid N D Molton and O StasseMonoSLAM Real-time single camera SLAM IEEE

Transactions on Pattern Analysis and MachineIntelligence 29 2007

3 H Du P Henry X Ren M Cheng D B GoldmanS Seitz and D Fox Interactive 3D modeling of indoorenvironments with a consumer depth camera In ACMConference on Ubiquitous Computing 2011

4 A Elfes Using occupancy grids for mobile robotperception and navigation Computer 2246 ndash 57 June1989

5 Floor Planner The easiest way to create floor planshttpfloorplannercom 2012

6 A P Gee and W Mayol-Cuevas Real-timemodel-based SLAM using line segments In Advancesin Visual Computing volume 4292 Springer 2006

7 G Grisetti R Kummerle C Stachniss U Frese andC Hertzberg Hierarchical optimization on manifoldsfor online 2D and 3D mapping In IEEE InternationalConference on Robotics and Automation pages273ndash278 2010

8 G Grisetti C Stachniss and W Burgard Non-linearconstraint network optimization for efficient maplearning IEEE Transactions on IntelligentTransportation Systems 10428ndash439 2009 ISSN1524-9050

9 G Grisetti C Stachniss S Grzonka and W BurgardA tree parameterization for efficiently computingmaximum likelihood maps using gradient descent InRobotics Science and Systems (RSS) 2007

10 A Harati and R Siegwart Orthogonal 3D-SLAM forindoor environments using right angle corners In ThirdEuropean Conference on Mobile robots 2007

11 P Henry M Krainin E Herbst X Ren and D FoxRGB-D mapping Using depth cameras for dense 3Dmodeling of indoor environments In AdvancedReasoning with Depth Cameras Workshop inconjunction with RSS 2010

12 ICP httpwwwmathworkscommatlabcentralfileexchange27804-iterative-closest-point 2012

13 S Izadi D Kim O Hilliges D MolyneauxR Newcombe P Kohli J Shotton S HodgesD Freeman A Davison and A FitzgibbonKinectFusion Real-time 3D reconstruction andinteraction using a moving depth camera In ACMSymposium on User Interface Software andTechnology 2011

14 G Klein and D Murray Parallel tracking and mappingfor small AR workspaces In IEEE Intrsquol Symp Mixedand Augmented Reality (ISMAR rsquo07) 2007

15 J Liu and Y Zhang Real-time outline mapping formobile blind robots In IEEE International Conferenceon Robotics and Automation 2011

16 D G Lowe Distinctive image features fromscale-invariant keypoints Int J Comput Vision60(2)91ndash110 Nov 2004

17 Microsoft Kinecthttpwwwxboxcomen-USkinect 2012

18 A Nchter 3D Robotic Mapping The SimultaneousLocalization and Mapping Problem with Six Degrees ofFreedom Springer Publishing Company Incorporated1st edition 2009

19 R A Newcombe S Izadi O Hilliges D MolyneauxD Kim A J Davison P Kohli J Shotton S Hodgesand A Fitzgibbon KinectFusion Real-time densesurface mapping and tracking In IEEE Intrsquol Symp inMixed and Augmented Reality (ISMAR) 2011

20 R A Newcombe S J Lovegrove and A J DavisonDTAM Dense tracking and mapping in real-time In13th International Conference on Computer Vision2011

21 V Nguyen A Harati Agostino and R SiegwartOrthogonal SLAM a step toward lightweight indoorautonomous navigation In IEEERSJ InternationalConference on Intelligent Robots and Systems 2006

22 V Nguyen A Martinelli N Tomatis and R SiegwartA comparison of line extraction algorithms using 2Dlaser rangefinder for indoor mobile robotics InIEEERSJ International Conference on IntelligentRobots and Systems 2005

23 RANSAChttpwwwcsseuwaeduau˜pkresearchmatlabfnsRobustransacm2012

24 S Rusinkiewicz and M Levoy Efficient variants of theICP algorithm In Third International Conference on3D Digital Imaging and Modeling (3DIM) June 2001

25 A Segal D Haehnel and S Thrun Generalized-ICPIn Proceedings of Robotics Science and SystemsSeattle USA June 2009

26 SenionLab httpwwwsenionlabcom 2012

27 Sensopia MagicPlan As easy as taking a picturehttpwwwsensopiacomenglishindexhtml 2012

28 SIFT httpsourceforgenetprojectslibsift(contains a DLL library) 2012

29 C Wu SiftGPU A GPU implementation of scaleinvariant feature transform (SIFT)httpcsuncedu˜ccwusiftgpu 2007

30 O Wulf K O Arras H I Christensen and B Wagner2D mapping of cluttered indoor environments by meansof 3D perception In IEEE International Conference onRobotics and Automation (ICRA04) 2004

Motivation and Introduction

Obtain Camera Trajectory from RGB-D Images

Preprocessing

Local matching

Global matching

Generate Polylines From Camera Trajectory

Segmentation

Rectification

Map merging

Obtain Explored Area From Map and Trajectory

Experiments in Office Environments

Conclusions and Future Work

Acknowledgements

REFERENCES

Figure 2 From RGB-D images to 2D floor plan and explored region

Our system WalkampSketch consists of a backpack with anRGB-D camera mounted on the top (Figure 1) The user car-ries the backpack and walks around the building at a normalstrolling pace As the user walks the floor plan is incremen-tally constructed hence the name WalkampSketch The cam-era is mounted slightly tilted (sim 5 - 10 degrees) toward thefloor Such a setting results in stable heights frame-by-framewith the floor plane in view from time to time as a horizon-tal reference Our method first aligns the reference framewith the floor using the first few frames ie we chooseto initially stand in an area so that the camera can see thefloor and walls then computes the frame-to-frame offsetsfrom the continuous RGB-D inputs while walking along thewalls in the environment Instead of creating 3D models asmany other research has done [3 11 13 19] using RGB-D cameras we select a horizontal slice at a certain heightand construct a 2D floor plan using connected line segmentsie polylines Along with the polyline representation forfloor plans we represent the explored area by a sequenceof polygons which is computed from the resulting polylinemap and the trajectory of the camera obtained by frame-to-frame matching The explored area polygons provide visualfeedback and guidance to the user (the person carrying thebackpack-mounted camera) The user may decide to nav-igate an already-explored area to further improve mappingaccuracy or navigate to new areas to further grow the floorplan Figure 2 shows the overall design of our system andFigure 3 shows an example of a multi-room floor map andexplored areas

Most existing work in mapping of indoor environments usedoccupancy grids [4] point clouds [11] and scattered line seg-ments [21] we advocate polyline maps [15] ie represent-ing a map using a set of connected linear segments as usedin floor plans Polyline maps provide a compact and ab-stract representation and can be easily manipulated by hu-mans using an interactive interface Polyline maps can alsobe used for indoor localization in ubiquitous applications[26] There has been research on getting 2D line sketchesfrom 3D laser scans in cluttered indoor environments [30] or3D line sketches from vision-based SLAM [6] where lineswere considered as features in SLAM computation and theresulting map would be a set of line features In our caselines are post-extracted from matched 3D point clouds ofthe current frame using both RGB and depth images andgrouped as polylines ie a connected set of directional seg-ments Polylines of the current frame are then merged withthe previous polyline map to form a new partial map

Figure 3 An example of polyline maps explored area is representedby overlapping polygons and problematic corners circled

To reduce the effect of noise and increase the computationefficiency we have used the assumption of orthogonality [21]in indoor environments Such an assumption has been usedin previous work such as [10 15] with promising resultsThe rectification greatly improves the quality of the result-ing map with simple computation for map merging

A prototype has been implemented in Matlab using pre-recordedRGB-D images saved at 10Hz from a Microsoft Kinect withwalking speed ( 05 meters per second) We have tested thesystem in various areas of our office building a conferenceroom hallways and a set of connected offices (Figure 3)The results are encouraging

The major contributions of this paper are as follows 1) De-sign and development of a system that efficiently creates a2D floor plan by walking and taking videos from a consumerRGB-D camera 2) design and implementation of a polylineextraction algorithm from 3D point clouds and 3) designand implementation of explored area representation by a se-quence of polygons

In the subsequent sections we first describe the method of3D image registration followed by the algorithms of poly-line map generation and explored area computation We thenshow some results from experiments in an office environ-ment Finally we conclude the paper with some future di-rections



















Cj = Rj

σ2j 0 0

0 infin 00 0 infin

RTj (2)


C =nsum

j Cminus1j

(3)



σminus2j 0 00 0 00 0 0

RTj









W =

kminus1sumj=i



















lisumj lj




σ2c =


































minus3

minus2

minus1

0

1

2

3

4

5

6

7


minus5

0

5

10

15


minus8

minus6

minus4

minus2

0

2

4

6


minus6

minus4

minus2

0

2

4

6

8








































Preprocessing

Local matching

Global matching


Segmentation

Rectification

Map merging




Acknowledgements

REFERENCES



















Cj = Rj

σ2j 0 0

0 infin 00 0 infin

RTj (2)


C =nsum

j Cminus1j

(3)



σminus2j 0 00 0 00 0 0

RTj









W =

kminus1sumj=i



















lisumj lj




σ2c =


































minus3

minus2

minus1

0

1

2

3

4

5

6

7


minus5

0

5

10

15


minus8

minus6

minus4

minus2

0

2

4

6


minus6

minus4

minus2

0

2

4

6

8








































Preprocessing

Local matching

Global matching


Segmentation

Rectification

Map merging




Acknowledgements

REFERENCES





W =

kminus1sumj=i



















lisumj lj




σ2c =


































minus3

minus2

minus1

0

1

2

3

4

5

6

7


minus5

0

5

10

15


minus8

minus6

minus4

minus2

0

2

4

6


minus6

minus4

minus2

0

2

4

6

8








































Preprocessing

Local matching

Global matching


Segmentation

Rectification

Map merging




Acknowledgements

REFERENCES







lisumj lj




σ2c =


































minus3

minus2

minus1

0

1

2

3

4

5

6

7


minus5

0

5

10

15


minus8

minus6

minus4

minus2

0

2

4

6


minus6

minus4

minus2

0

2

4

6

8








































Preprocessing

Local matching

Global matching


Segmentation

Rectification

Map merging




Acknowledgements

REFERENCES

























minus3

minus2

minus1

0

1

2

3

4

5

6

7


minus5

0

5

10

15


minus8

minus6

minus4

minus2

0

2

4

6


minus6

minus4

minus2

0

2

4

6

8








































Preprocessing

Local matching

Global matching


Segmentation

Rectification

Map merging




Acknowledgements

REFERENCES
































Preprocessing

Local matching

Global matching


Segmentation

Rectification

Map merging




Acknowledgements

REFERENCES

walk&sketch: create floor plans with an rgb-d camera · walk&sketch: create floor plans ......

Documents