background replacement - computer sciencesashi/projects/bg-replacement/prp-report.pdf · background...

10
Background Replacement Sashi Kumar Penta * Department of Computer Science University of North Carolina at Chapel Hill Figure 1: (a)Input image whose background to be replaced (b) matting results: the image on the top is matte, the image in the middle is background and the image in the bottom is foreground with green background (c) criteria for search: the image on the top is geometric context represents three different regions (sky as blue, vertical regions as red and ground regions as green) and in total represents the geometric properties of the image and the image in the bottom is just the background image, its colors are use to represent photometric properties (d) image search results for each of geometric and photometric crieterias separately for the input background image (e) image search results using both geometric and photometric simultaneously (f) Some final compositions of with the original foreground whose background is replaced with these new ones Abstract In this paper, we present a technique for automatically replacing the background of an image with a better background image. We ob- tain the foreground and background parts of the input image by us- ing matting techniques. The user specifies the type of background he/she wants in the new composite. Our technique finds the best background from thousands of images available on the web, which matches both illumination and geometric properties of the original background image. Once we have the new background, we com- posite it with the foreground part of the input image. 1 Introduction A tremendous number of images are currently available on the web [Flickr ; Google ]. It is quite common to find a digital camera in the hands of many people around the world; uploading photos onto the Internet is one of the easiest ways of sharing images with loved ones and friends. This wealth of images gives a wonderful opportunity for artists to create composite images. Composite images are created by combining different parts from two or more images. Creating composites also appeals to amateurs for entertainment purposes. Researchers have proposed various techniques for automatically * e-mail: [email protected] creating composite images [Lalonde et al. 2007; Hays and Efros 2007; Kwatra et al. 2008; Reinhard et al. 2004; Reinhard et al. 2001; Rother et al. 2006; Snavely et al. 2006]. Many composite techniques combine parts from two or more images to create a single image. Most of the images available on the web are either taken using digital cameras or scanned from original printed photographs, which we call real images. Images can also be gen- erated using 3D graphics programs, which we call synthetic images. In this paper, we present a technique for automatically replacing the background of an image with a better background image. We obtain the foreground and background parts of the input image, by using matting techniques. User specifies the type of background he/she wants in the new composite. Type of the background can be ”beach”, ”landscape”, ”lake” or ”mountains”, etc. Our technique then, finds the best background that matches both illumination and geometric properties of the original background image. Once we have the new background, we composite it with the foreground part of the input image. Our technique is mainly inspired from Photo Clip Art [Lalonde et al. 2007] and Scene Completion [Hays and Efros 2007], which integrates and extends ideas from these two techniques, to solve the background replacement problem. 2 Background Depending on the source of the images, techniques for creating composite images can be mainly divided into two classes: pure and

Upload: others

Post on 03-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

Background Replacement

Sashi Kumar Penta∗

Department of Computer ScienceUniversity of North Carolina at Chapel Hill

Figure 1: (a)Input image whose background to be replaced (b) matting results: the image on the top is matte, the image in the middleis background and the image in the bottom is foreground with green background (c) criteria for search: the image on the top is geometriccontext represents three different regions (sky as blue, vertical regions as red and ground regions as green) and in total represents the geometricproperties of the image and the image in the bottom is just the background image, its colors are use to represent photometric properties (d)image search results for each of geometric and photometric crieterias separately for the input background image (e) image search results usingboth geometric and photometric simultaneously (f) Some final compositions of with the original foreground whose background is replacedwith these new ones

Abstract

In this paper, we present a technique for automatically replacing thebackground of an image with a better background image. We ob-tain the foreground and background parts of the input image by us-ing matting techniques. The user specifies the type of backgroundhe/she wants in the new composite. Our technique finds the bestbackground from thousands of images available on the web, whichmatches both illumination and geometric properties of the originalbackground image. Once we have the new background, we com-posite it with the foreground part of the input image.

1 Introduction

A tremendous number of images are currently available on theweb [Flickr ; Google ]. It is quite common to find a digital camerain the hands of many people around the world; uploading photosonto the Internet is one of the easiest ways of sharing images withloved ones and friends. This wealth of images gives a wonderfulopportunity for artists to create composite images. Compositeimages are created by combining different parts from two ormore images. Creating composites also appeals to amateurs forentertainment purposes.

Researchers have proposed various techniques for automatically

∗e-mail: [email protected]

creating composite images [Lalonde et al. 2007; Hays and Efros2007; Kwatra et al. 2008; Reinhard et al. 2004; Reinhard et al.2001; Rother et al. 2006; Snavely et al. 2006]. Many compositetechniques combine parts from two or more images to create asingle image. Most of the images available on the web are eithertaken using digital cameras or scanned from original printedphotographs, which we call real images. Images can also be gen-erated using 3D graphics programs, which we call synthetic images.

In this paper, we present a technique for automatically replacingthe background of an image with a better background image. Weobtain the foreground and background parts of the input image, byusing matting techniques. User specifies the type of backgroundhe/she wants in the new composite. Type of the background can be”beach”, ”landscape”, ”lake” or ”mountains”, etc. Our techniquethen, finds the best background that matches both illumination andgeometric properties of the original background image. Once wehave the new background, we composite it with the foreground partof the input image. Our technique is mainly inspired from PhotoClip Art [Lalonde et al. 2007] and Scene Completion [Hays andEfros 2007], which integrates and extends ideas from these twotechniques, to solve the background replacement problem.

2 Background

Depending on the source of the images, techniques for creatingcomposite images can be mainly divided into two classes: pure and

Page 2: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

mixed compositing techniques. In pure compositing techniques,composite images are created only from real images, whereasin mixed compositing techniques, composite images are createdfrom both real and synthetic images. Fig.2 shows an exampleimage generated using a pure composite technique, where inputsto the technique are real images. In this figure, the city skylinefrom the top-left image is combined with the lower part from thebottom-left image and their composite image is shown on the right.Fig. 3 shows an example image generated using a mixed compositetechnique, where the output from a graphics fluid simulation iscombined with real images to generate composite images. In thisfigure, a baby is drenched with simulated honey and water. Theframe on the left is the input to the system. The middle frameshows the result when the baby is drenched with simulated honey,and in the last frame baby is drenched with simulated water.

Figure 2: An example of pure composite technique: The city sky-line (upper) part from top-left image is combined with the lowerpart from the bottom-left image and their composite image is shownon the right. [Content based image synthesis [Diakopoulos et al.2004]]

Figure 3: An example for mixed composite technique: The frameon the left is the input to the system. The middle frame shows theresult when simulated with honey (output obtained from graphicssimulation of honey), and the last frame with water (output obtainedfrom graphics simulation of water). [Fluid in Video [Kwatra et al.2008]]

Images can be combined in various other fashions. Some com-positing techniques do not take explicit parts from input images.Instead, they can take colors from one image and content fromanother. Fig.4 shows an example composite using the color transfertechnique, where colors from one image are transferred to theother image. Using texture synthesis methods, multiple parts of thesame image are used to create bigger images [Kwatra et al. 2003;Avidan and Shamir 2007] . Seam carving is an image resizingtechnique which alters the dimensions of an image not by scalingor cropping but rather by intelligently removing pixels from (oradding pixels to) the image that carry little importance. Fig.5shows the result of the seam carving technique, where the dolphin

size does not increase with the size of the image. Image basedrendering techniques generate new images from novel view points,given the images from several different view points [Penta 2005;Buehler et al. 2001; Snavely et al. 2006] . Fig.6 illustrates onesuch image based rendering technique. It shows a scene of a tablefrom four view points of a table and the resulting novel view in themiddle, which composites information from the other view points.

Figure 4: Colors from the middle image are used in the first imageand the color transferred image is shown on the right. [Reinhardet al. 2001]

Figure 5: Image on the left is used as input for seam carving tech-nique. Image on the top-right is the output from normal resiz-ing. Image on bottom-right is the result of Seam Carving technique[Avidan and Shamir 2007].

Figure 6: Image based rendering technique [Penta 2005]

Yet another way of classifying image compositing techniques isbased on the amount of 3D geometry used. On one extreme, thereare techniques that do not use any 3D geometry. On the otherextreme, there are techniques that uses accurate 3D geometry. Infact, there is a continuous spectrum of these compositing tech-niques based on the amount of geometry used for combining input

Page 3: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

images . Fig. 7 shows the spectrum of some image compositingtechniques. The following sections explain each technique in thisspectrum.

2.1 Compositing techniques without geometry

In this section, we describe two image compositing techniques,Auto collage [Rother et al. 2006] and Scene Completion [Haysand Efros 2007] where no explicit geometry of the input isconsidered.

2.1.1 Auto Collage

Auto Collage is a technique for constructing a visually appealingcollage from a set of input images [Rother et al. 2006]. Auto Col-lage falls on the left end of the spectrum as shown in fig. 7, usesno geometry of the input. Geometry is not needed for making acollage, because in a collage it is acceptable to have parts of im-ages anywhere in the final composite. In this technique, the goal isto construct a collage that represents the original collection of in-put images. Regions of interest such as central parts of the images,faces, and highly textured information are used to make a good col-lage. Some example regions of interest are shown in fig.8.

Figure 8: Region of Interest: Faces, center parts of the images andhighly textured parts of the images are important parts for the col-lage.

Once the regions of interest are obtained, they need to be packedinto a collage. The goal is to incorporate as many regions ofinterest as possible into the collage, while making sure that everypixel of the composite image is covered. This packing problem isa combinatorial problem, which is a generalization of well-knownNP-hard packing problems. Therefore, heuristic approaches areused for this packing problem. Regions of interest are obtained,which are then shifted and enlarged in such a way that all pixels arecovered, as shown in fig.9. Finally, the arranged regions of interestare smoothly blended to get the final composite as shown in fig.10.

Figure 9: Results from constraint propagation

Figure 10: Final result using Auto Collage algorithm [Rother et al.2006]

2.1.2 Scene Completion

Consider a variation of Auto Collage, in which the user marksa region in the input image as missing and wants to create ageometrically consistent composite by filling this missing regionwith an image that looks like the input image. The image usedto fill the missing region can be any image that is available onthe web. One could imagine using Auto Collage except that thereis only one missing region to be filled, probably with one regionof interest. As mentioned before, Auto Collage does not use anygeometry and lets any part of the image fill the missing region.This may result in a geometrically inconsistent composite.

Scene Completion attempts to create geometrically consistentcomposites [Hays and Efros 2007] . Although Scene Completiondoes not use geometry explicitly, it tries to find a matching sceneon the web (about two million images which are downloadedfrom the internet) and implicitly finds geometrically consistentcompositions. Given that there are large number of images onthe web, it is generally possible to find a geometrically consistentimage to fill the missing region in the input image by just matchingthe color distributions. Since it tries to create geometricallyconsistent composites without explicitly computing 3D geometryof the input images, we say that it uses geometry implicitly. Thistechnique is explained using a flow chart shown in the fig.11 andthe details are given below.

Semantic Scene Matching: The input image with the regionmarked to be completed is shown on top-left of fig. 11. Searchingfor an image to fill a missing region in the input image from 2million images is time consuming. The search space is reducedto hundreds by finding the images which look similar to the inputimage. Similarity is computed using a scene descriptor. Scenedescriptor used in this technique consists of two parts: the gistdescriptor [Torralba et al. 2003] and color information. Thegist descriptor is oriented edge responses at different frequencyscales. It is found that a gist descriptor built from 6 oriented edgeresponses at 5 frequency scales aggregated to 4x4 spatial resolutionto be most effective. Color information of the incomplete inputimage is also used as the part of scene descriptor at the sameresolution (4 × 4) as the gist. Fig. 12 shows the scene descriptorat a 4 × 4 spatial resolution, where on the left it shows color

Page 4: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

Figure 7: Continuous spectrum of image compositing techniques. Some of these techniques are discussed in this paper.

Figure 11: Scene Completion algorithm. Input image with the re-gion marked to be completed shown on top-left. A scene descriptoris computed for this incomplete input image. Semantically validimages are obtained by matching the scene descriptor of the inputimage, and the results are shown on the bottom right. Top 20 imagecompletions are obtained by matching the local context and blend-ing, which are shown in the bottom-left image. [Hays and Efros2007]

information and on the right it shows the gist descriptor. Herecolor information is a 4 × 4 rescaled image of the input image andthe gist descriptor is a matrix of frequency responses for 6 edgeorientations and 5 frequency scales at a 4 × 4 resolution. Lowedge strength (or low edge energy) is shown in blue and high edgestrength (or high edge energy) is shown in red.

A Sum Squared Difference (SSD) is computed between the gist ofinput (query image) and each image in the database of two millionimages. Color difference is also obtained in an L*a*b color space.L*a*b color space is another standard color space like RGB andCMYK. Unlike the RGB and CMYK color spaces, L*a*b coloris designed to approximate human vision. It aspires to perceptualuniformity, and its L component closely matches human perceptionof lightness. The gist difference is weighted twice as heavily ascolor difference. The top 200 matched images obtained using thismetric are shown in the bottom-right Image in the fig. 11.

Local Context Matching: A local illumination context of about80 pixels around missing region’s boundary is compared againstthis reduced set of images. The local illumination context is anapproximation to the lighting conditions of the input image around

Figure 12: Scene descriptor: Color information at 4x4 resolutionand the gist descriptor at 4x4 spatial resolution for 5 frequencyscales and 6 different orientations. [Hays and Efros 2007]

the missing region. The top 20 images are obtained based on theSSD error in L*a*b color space for all possible translations andscales of the images. For each of these 20 images final compositeimages are obtained by smoothly blending around the missingregion’s boundary of the input image at the correct translation andscale of the input image. Final composite images are shown in thebottom-left of the fig. 11.

Results: This technique works well for most cases, as long as thereare enough images in the input dataset that have similar semanticsto the input image. Fig. 13 shows the results obtained using thisalgorithm. In this figure, a road in front of the building is marked asthe missing region. Three possible completions are shown on theright. Fig. 14 shows a failure case, where scene matching is doneincorrectly. Not even one in 2 million images has the same scenedescription as the input image, and this case shows the possibilityof a failure using this technique.

Figure 13: Scene Completion result: [From left to right] showsoriginal input image, input image marked with the missing region,and the rest are possible completions with Scene Completion tech-nique. [Hays and Efros 2007]

Page 5: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

Figure 14: Scene Completion failure is due to matching failure,which shows that even 2 million images does not have a goodmatching image for this particular incomlete region. [Hays andEfros 2007]

2.2 Compositing techniques with geometry

2.2.1 Photo clip art

Consider a variation of Scene Completion, where a user likes toinsert objects from one image to another image. For example,in fig.17 a car is being inserted into another image. One couldimagine using Scene Completion, by marking a region as missingto fill the gap using database of car object images. Since SceneCompletion does not use explicit geometry, it can produce implau-sible compositions such as the one shown in the fig.17, in which acar does not look as if it rests on the road.

Figure 15: 3D view of the background image [Lalonde et al. 2007]

To address this issue, Photo clip art computes the approximate 3Dgeometry of the input image, and uses this geometry to select theright car to produce a geometrically consistent composite. Theapproximate 3D structure of the input image is shown in fig.15. Itassumes the input image to have a ground, vertical and sky regions.Camera parameters such as the height and orientation are computedboth for the input image and photographic clip arts. Approximate3D geometry of the input is obtained by employing popular singleview reconstruction methods [Hoiem et al. 2005a] . Photo clip artsare then inserted in to the scene geometrically. Photo consistencyis satisfied with the use of illumination context. Color distributionsfor three regions, ground, vertical and sky regions are matched forthe input image and the backgrounds of photo clip arts. The localcolor distribution of the photo clip art is also computed for theregion around the clip art in it’s background and matched to theinput image.

Fig.16 shows the Graphical User Interface (GUI) of this technique.On the right hand side of the figure shows a list of photo cliparts which are sorted based on both geometric and photographicconsistency, to be inserted into the image. Clip arts are resizedbased on the position selected in the input scene, in order tomaintain perspective. This technique is described in more detail inthe following sections.

Figure 16: Photo clip art Graphical User Interface [Lalonde et al.2007]

Challenges: Inserting images into another image with photorealism is challenging. This is illustrated in fig. 17, in whicha car has to be inserted into a street scene. As shown in thisfigure, straight forward insertion may not result in photo realisticcomposites. Here the challenges are to satisfy both geometricconsistency and photo consistency. One solution is to get accurate3D geometric and photometric descriptions of this object (car). If3D geometric information of the car is available, we can changethe orientation and make it rest on the ground, and since we havephotometric information, we can illuminate in such a way thatlighting conditions of the car match the input image. However,obtaining the accurate geometric and photometric descriptionsof objects from a single image is a hard problem in computer vision.

Figure 17: Photo clip art Challenges [Lalonde et al. 2007]

Instead of trying to obtain the geometric and photometric informa-tion of the object, an easy alternative is to find an object (car) thatfits well geometrically and photometrically into the scene from anannotated database as shown in fig. 18. A large data set of objectswith annotations such as camera parameters and illuminationcontexts are obtained. These annotations are used to produce aranked list of objects that can be inserted into the input image.

Annotating objects: Annotations of the photo clip arts includetheir object size, orientation, camera parameters, lighting condi-tions, etc. Camera parameters that need to be estimated are heightand orientation, as shown in fig. 19. Camera parameters and objectsizes are obtained iteratively as shown in fig. 20. Initially, cameraparameters are estimated based on the average height of humans,which is found to be 1.7 meters according to the National centerfor health statistics. Using these camera parameters, object sizes of

Page 6: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

Figure 18: Photo clip art Solution [Lalonde et al. 2007]

unknown objects are obtained. Now using the object sizes of theseunknown objects other camera parameters are obtained.

Figure 19: Camera Params, height and orientation of a camera areused in this technique. [Lalonde et al. 2007]

Figure 20: Camera Params are estimated iteratively. First they ob-tain the camera parameters of the scene where the humans heightdistrbution is known as shown on the left. Then the camera param-eters are used to obtain the height distributions of the other objectssuch as cars as shown in the middle. Using the height distributionsof these objects, camera paprameters of other images are obtainedas shown in the right. [Lalonde et al. 2007]

Lighting conditions of the objects are obtained from their originalbackground images. Lighting conditions include total illumina-tion context and local illumination context. Total illuminationcontext of the background image is represented by a set of colordistributions for three regions of the image. Fig. 21 shows threeregions ground, vertical regions and sky which are obtained usinggeometric context techniques [Hoiem et al. 2005b] . For eachof these regions 3D joint histograms in L*a*b color space areobtained and stored for each object or photo clip art are storedseparately. Local illumination context is obtained for a regionaround the clip art as shown in the left image of fig. 22. Local

illumination context is a histogram in L*a*b color space. Thiscompletes the collection of annotations of photo clip arts in thedatabase.

Figure 21: Illumination context, color histograms for the sky, verti-cal and ground regions are obtained. [Lalonde et al. 2007]

Figure 22: Local context, color distributions of region around theobject in the original image are obtained and matched. [Lalondeet al. 2007]

Object Insertion: For a given input image, the graphics userinterface provides a ranked list of possible clip arts, which can beinserted into the scene. Matching and sorting of the photo cliparts are done by matching the camera parameters and lightingparameters. Fig.23 illustrates a result of this technique, wheremore objects such as humans, cars, parking meters etc. are addedinto the input image.

Figure 23: Left most image is the input background image. Moreand more photo clip arts are added into the background [from 2ndto 3rd image]. Results of photo clip art [Lalonde et al. 2007]

2.2.2 Fluid in Video

As mentioned before, photo clip art computes the approximate3D structure of the input image and is suitable for obtaininggeometrically consistent composite images, but not videos. Forexample if one would like to create a video composite where theobjects in the input image interact with the new objects that are

Page 7: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

added, this approximate geometry would not be sufficient. Fluidin Video uses full 3D geometry of the input image to obtain avideo composite where the objects in the input image interact withvirtual fluids [Kwatra et al. 2008] . It is a technique for couplingsimulated fluid that interacts with real dynamic scenes. Fluidsinteract with foreground objects (such as the baby in this case) inthe video. The foreground is obtained by computing the differencebetween the input frame and static background frame.

The input video scene is captured from multiple views, whichallows construction of accurate 3D structure of the foreground ob-jects as shown in fig.24. Once we have 3D geometry, a simulationof fluids interacting with the foreground objects is computed, andthen finally rendered to create a composite. Fig.3 shows one frameof the video where the baby is drenched with artificial water andhoney.

Most of the image compositing techniques rely on computer visionalgorithms such as scene understanding, obtaining scene geometry,etc. Some of these techniques also rely on computer graphicsconcepts, especially mixed compositing techniques. Similar toimage-based rendering techniques [Shum and Kang 2000] , wehave seen that there exists a spectrum of compositing techniquesbased on the amount of geometry used.

Figure 24: Figure shows input frame on the left and 3D geometryof the baby (foreground object) on the right.

3 Background Replacement

3.1 Image Matting

First step in our technique is to first decompose the composite im-age into background and foreground parts. Image matting can beused to determine whether a pixel in an image belongs to fore-ground, background or mixed region. In case of mixed regions,a parameter, generally known as ’alpha’, is used to determine howmuch of that pixel is part of the foreground. Input image can bedivided into three images, α-image, foreground image F and back-ground image B according to the following formula.

I(x, y) = α(x, y) ∗ F (x, y) + (1 − α(x, y)) ∗ B(x, y)

where I is the input image, (x,y) is the pixel location, α(x, y) isthe alpha value at pixel (x, y), F is foreground image and B isbackground image. Image matting is to obtain this α-image.

Robust Matting’s technique[Wang and Cohen 2007] takes fewbrush strokes for background and few for foreground, which isshown in the fig.25. Fig.26 shows the result obtained using thistechnique for those strokes shown in the fig.25. Top row of this fig-ure shows input image and its foreground matte and the bottomow

Figure 25: Robust matting [Wang and Cohen 2007] GUI: Bluestrokes are given for background part and red strokes are given forforeground part.

Figure 26: Top row of this figure shows input image and its fore-ground matte and the bottom row shows the just the backgroundimage and foreground image composited on a green background.

row shows the just the background image and foreground imagecomposited on a green background. Pixel values in the foregroundmatte are α values, 1 being the highest and 0 being the lowest. 1represents foreground, 0 represents background and α (0 < α < 1)represents mixed regions. Background image is obtained by retain-ing the pixels in which the pixel values of matte are zeros. We usethis background image to find the new background image from thedataset which matches both geometrically and photometrically.

3.2 Data set collection

Large data set of background images are collected based on the tagsfrom the web. We consider, ”beaches”, ”lakes”, ”urban”, ”land-scapes”, ”mountains”, ”Trees”, ”waterfalls” background classes,and it can be extended to any number of classes. Each class canuse many tags to download from the web. E.g ”beach” class canhave tags such as ’beach’, ’sea’, ’ocean’ etc. Fig.27 shows someimages downloaded for the class ’beach’ based on the tags search.On the right of this figure, it shows some images at a larger view of

Page 8: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

Figure 27: A sample of beach image collection downloaded from Flickr using the tags for beach. On the right it shows some images of thisdata set of a larger size. Images in the bottom row on the right side do not have beach in it, but they have beach tags associated with them.

this data set. Images in the bottom row on the right side do not havebeach in it, but they have beach tags associated with them.

3.3 Data set filtering

The noise in the data set which could be because of improper tag-ging is filtered using the candidate backgrounds for each class. Wemanually choose some background images for each class as candi-dates of the class, and four candidates for the beach class is shownin the fig.28. We compute the GIST [Torralba et al. 2003] fea-tures for the background images and retain the images which areclose to one of these candidate background images with respect tothis GIST feature. We eliminate around half of the dataset whichare not close to candidate background images. This step helps ineliminating the images which are incorrectly tagged. Fig. 29 showssome of those retained images, and on the right side of this figurewe show enlarged images of few. As you can see from the figurethat this filtering works pretty well.

4 Background selection

Once we have the filtered data set, we select images to replace theoriginal background, by matching both illumination and geometryof the original background image. This selection is done in twosteps. In the first step illuminations are matched and in the secondstep geometry is matched.

4.1 Illumination matching

In this step, we find images which have similar illuminations withthe original background image. We use color histograms to findwhether two images have similar illuminations or not. Histogramsof two images are compared using χ2 distance metric, and it for-mula is given below.

d = Σi

(x(i) − y(i))2

(2 ∗ (x(i) + y(i)))

where i represents each bin in the histograms, and x(i), y(i) repre-sents the values in the two histograms x and y. Fig.30 shows thematches using χ2 disance metric for the query image shown in thetop. For query image and each matching image, this fig also showshow the colors in it are distributed in 3D RGB space.

Figure 30: Query matches using illumination matching

4.2 Geometry matching

Normally images have ground, vertical and sky regions. Here wetry to match these regions in terms of area, shape and positionof these regions with in the image. These three regions can beobtained using geometric context technique [Hoiem et al. 2005c].Fig.31 shows geometric context obtained for the background imageshown in the fig.26. It shows three regions, ground regions in greencolor, vertical regions in red color and sky regions in blue colorrespectively. Fig.32 shows the matches for a query image shown inthe top. For query image and each matching image, this fig alsoshows the geometric contexts next to it.

Instead of matching geometry and illumination as two seperateterms, we do it together which is an approximation. We matchethe color histograms of ground, vertical and sky regions instead ofthe whole image. So implicitly we are matching the geometry ofthe images since we are computing histograms for three regions.This approximation only takes care of area but not position of thethree regions. χ2 distance is computed for these regions separately

Page 9: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

Figure 28: Candidate images for the beach class in our experiment

Figure 29: A sample of retained images which are close to the gist features of the candidate images shown in fig.28. On the right side of thisfigure shows some enlarged images in the filtered dataset.

Figure 31: It shows the geometric context obtained for the back-ground image. Green color represents ground region, Red colorrepresents vertical regions and blue color represents sky region inthe image.

and weighted equally. We take top 10 images using this metric,as candidate background images to replace the original backgroundimages. User can manually select some of these to create new com-posite images.

4.3 Background composition

Once the background selection is done, it is easy to replace the orig-inal background. We use foreground matte, original input imageand the new background to do this composition. Final composite is

Figure 32: Query matches using geometry matching

created using the following formula.

C(x, y) = α(x, y) ∗ I(x, y) + (1 − α(x, y)) ∗ B(x, y)

where (x,y) is the pixel location, α(x, y) is value of pixel (x,y) inthe foreground matte, I(x,y) is color value of pixel (x, y) in theinput image, B(x, y) is color value of the new background imageand C(x,y) is the color value of the final composite.

Page 10: Background Replacement - Computer Sciencesashi/projects/bg-replacement/PRP-report.pdf · Background Replacement Sashi Kumar Penta Department of Computer Science University of North

5 Results

Final compositions are created using the formula given above. Weused Jue Wang and Michael Cohen [Wang and Cohen 2007]’s toolfor creating final composites. This tool allows us positioning andscaling the foreground image in the new background image. Fig.33shows some results using our technique for the input image on theleft. Final compositions are shown to the right, where the positionand scale of foreground image is changed to match it’s original ge-ometric context with in the image.

Figure 33: (a) input image (b) final compositions where backgroundis replaced

6 Conclusions and Future work

We used ideas from Scene completion[Hays and Efros 2007] tocompute GIST for the images and filter the images which are closeto the candidate of each class. We used ideas from Photo Clipart[Lalonde et al. 2007] to compute illumination contexts for thethree regions separately for the filtered dataset. We select top 10images by matching these illumination contexts (color histograms)of the original background image and the filtered dataset. Thisselection process can be improved by matching the geometrywhich is obtained using geometric context. We can also use cues toeliminate the backgrounds further, where backgrounds have lot ofvertical regions of persons and other things.

Color histograms comparison can be further localized by comput-ing the histograms for the small segments around the foregroundof the input image. We used χ2 distance for comparing two colorhistograms. Earth movers distance [Rubner et al. 1998], is a bettermetric, robust, can improve the results. We used Robust Matting[Wang and Cohen 2007]’s tool for position and scaling the fore-ground image. We can use ideas from Scene Completion[Hays andEfros 2007] to automatically find the position and scale of the fore-ground image in the new background image.

References

AVIDAN, S., AND SHAMIR, A. 2007. Seam carving for content-aware image resizing. ACM Trans. Graph. 26, 3, 10.

BUEHLER, C., BOSSE, M., MCMILLAN, L., GORTLER, S. J.,AND COHEN, M. F. 2001. Unstructured lumigraph rendering. InSIGGRAPH 2001, Computer Graphics Proceedings, ACM Press/ ACM SIGGRAPH, E. Fiume, Ed., 425–432.

DIAKOPOULOS, N., ESSA, I. A., AND JAIN, R. 2004. Contentbased image synthesis. In CIVR, 299–307.

FLICKR. Flickr. http://flickr.com/ .

GOOGLE. Google images. http://images.google.com.

HAYS, J. H., AND EFROS, A. A. 2007. Scene completion usingmillions of photographs. ACM Transactions on Graphics (SIG-GRAPH 2007) 26, 3 (August).

HOIEM, D., EFROS, A. A., AND HEBERT, M. 2005. Automaticphoto pop-up. In ACM SIGGRAPH.

HOIEM, D., EFROS, A. A., AND HEBERT, M. 2005. Geomet-ric context from a single image. In International Conference ofComputer Vision (ICCV), IEEE, vol. 1, 654 – 661.

HOIEM, D., EFROS, A. A., AND HEBERT, M. 2005. Geomet-ric context from a single image. In International Conference ofComputer Vision (ICCV), IEEE, vol. 1, 654 – 661.

KWATRA, V., SCHDL, A., ESSA, I., TURK, G., AND BOBICK, A.2003. Graphcut textures: Image and video synthesis using graphcuts. ACM Transactions on Graphics, SIGGRAPH 2003 22, 3(July), 277–286.

KWATRA, V., MORDOHAI, P., PENTA, S. K., NARAIN, R.,CARLSON, M., POLLEFEYS, M., AND LIN, M. 2008. Fluidin video: Augmenting real video with simulated fluids. In EU-ROGRAPHICS 2008.

LALONDE, J.-F., HOIEM, D., EFROS, A. A., ROTHER, C.,WINN, J., AND CRIMINISI, A. 2007. Photo clip art. ACMTransactions on Graphics (SIGGRAPH 2007) 26, 3 (August).

PENTA, S. K. 2005. Depth Image Representation for Image BasedRendering. Master’s thesis, IIIT, Hyderabad.

REINHARD, E., ASHIKHMIN, M., GOOCH, B., AND SHIRLEY, P.2001. Color transfer between images. IEEE Computer Graphicsand Applications 21, 5 (September), 34–41.

REINHARD, E., AHMET OGUZ AKUYZ, M. C., HUGHES, C. E.,REINHARD, M. O., AKUYZ, A. O., COLBERT, M., HUGHES,C. E., AND O’CONNOR, M. 2004. Real-time color blending ofrendered and captured video. In Interservice/Industry Training,Simulation and Education Conference.

ROTHER, C., BORDEAUX, L., HAMADI, Y., AND BLAKE, A.2006. Autocollage. In SIGGRAPH ’06: ACM SIGGRAPH 2006Papers, ACM Press, New York, NY, USA, 847–852.

RUBNER, Y., TOMASI, C., AND GUIBAS, L. J. 1998. A metric fordistributions with applications to image databases. In ICCV ’98:Proceedings of the Sixth International Conference on ComputerVision, IEEE Computer Society, Washington, DC, USA, 59.

SHUM, H.-Y., AND KANG, S. B. 2000. A review of image-basedrendering techniques. IEEE/SPIE Visual Communications andImage Processing (VCIP) 2000 (June), 2–13.

SNAVELY, N., SEITZ, S. M., AND SZELISKI, R. 2006. Phototourism: Exploring photo collections in 3d. 835–846.

TORRALBA, A., MURPHY, K., FREEMAN, W., AND RUBIN, M.2003. Context-based vision system for place and object recogni-tion.

WANG, J., AND COHEN, M. 2007. Optimized color samplingfor robust matting. In IEEE Conference on Computer Vision &Pattern Recognition.