image segmentation

6
Automatic object extraction in images using embedded labels Chee Sun Won Dept of Electronic Eng, Dongguk University Seoul, 100-715, South Korea [email protected] Abstract To automatically generate images with the same fore- ground but different backgrounds, a watermark bit (e.g., binary 1 for foreground and 0 for background) can be in- serted for each pixel location. Then, the embedded water- mark bit can be automatically extracted and the background can be separated from the object. Note that the object ex- traction can be done successfully only if the watermarked image is intact. However, if the watermarked image goes through some post-processing including JPEG compression and cropping, then the pixel-wise watermark decoding may fail. To overcome this problem, inthis paper, a block-wise watermark insertion and a block-wise MAP (maximum a posteriori) watermark decoding are proposed. Experimen- tal results show that the proposed method is more robust that the pixel-wise decoding for various post-processing at- tacks. 1. Introduction This paper deals with the problem of automatically sep- arating object/foreground from background in an image. Specifically, we focus on the problem of automatic back- ground replacement in an image. Note that the demands for various digital photo editing functionalities will be higher as the performance of digital multimedia devices such as digital camera and cell phone increases. Also, it is often re- quired to generate training images with the same object but different backgrounds for automatic machine learning sys- tem. For example, suppose we need some training images for the face recognition problem (e.g., see Fig 1). Then, to increase the recognition accuracy, it is important to provide as many training images as possible with various variations in the appearance of the object in the image [1]. Previous background extraction methods are based on three approaches. First approach is to exploit a known or controlled environment. For example, blue-screen matting (also called chroma-key technique) [3] is to take a picture Figure 1. Example of the same object (face) with different background (Images from Cal- tech Image Archive [2].) with a uniform background so that the background can be easily segmented. However, this technique requires a pro- fessional studio with a physical blue screen. To avoid the cumbersome physical setting, in [4], they propose to pre- shoot the background image. Then, the background im- age is subtracted from another shoot with the same back- ground including an object. Also, in [5], self-identifying patterns are used to recognize the background. The sec- ond approach basically relies on estimating the probability of each pixel to decide whether it belongs to the object or the background. For example, a Bayesian framework has been adopted for separating the background from the object [6][7]. In particular, in [7], special images called low-depth- of-field (LDOF) images are used for the automatic object segmentation. For this particular LDOF image, object in the image is focused and the rest of the background is blurred. Then, the high frequency components residing inside the object region are modeled with a probabilistic distribution for Bayesian image segmentation framework. Finally, the third approach for the background extraction involves the Canadian Conference on Computer and Robot Vision 978-0-7695-3153-3/08 $25.00 © 2008 IEEE DOI 10.1109/CRV.2008.10 231

Upload: arpanmankar

Post on 31-May-2015

462 views

Category:

Education


5 download

TRANSCRIPT

Page 1: image segmentation

Automatic object extraction in images using embedded labels

Chee Sun WonDept of Electronic Eng, Dongguk University

Seoul, 100-715, South [email protected]

Abstract

To automatically generate images with the same fore-ground but different backgrounds, a watermark bit (e.g.,binary 1 for foreground and 0 for background) can be in-serted for each pixel location. Then, the embedded water-mark bit can be automatically extracted and the backgroundcan be separated from the object. Note that the object ex-traction can be done successfully only if the watermarkedimage is intact. However, if the watermarked image goesthrough some post-processing including JPEG compressionand cropping, then the pixel-wise watermark decoding mayfail. To overcome this problem, in this paper, a block-wisewatermark insertion and a block-wise MAP (maximum aposteriori) watermark decoding are proposed. Experimen-tal results show that the proposed method is more robustthat the pixel-wise decoding for various post-processing at-tacks.

1. Introduction

This paper deals with the problem of automatically sep-arating object/foreground from background in an image.Specifically, we focus on the problem of automatic back-ground replacement in an image. Note that the demands forvarious digital photo editing functionalities will be higheras the performance of digital multimedia devices such asdigital camera and cell phone increases. Also, it is often re-quired to generate training images with the same object butdifferent backgrounds for automatic machine learning sys-tem. For example, suppose we need some training imagesfor the face recognition problem (e.g., see Fig 1). Then, toincrease the recognition accuracy, it is important to provideas many training images as possible with various variationsin the appearance of the object in the image [1].

Previous background extraction methods are based onthree approaches. First approach is to exploit a known orcontrolled environment. For example, blue-screen matting(also called chroma-key technique) [3] is to take a picture

Figure 1. Example of the same object (face)with different background (Images from Cal-tech Image Archive [2].)

with a uniform background so that the background can beeasily segmented. However, this technique requires a pro-fessional studio with a physical blue screen. To avoid thecumbersome physical setting, in [4], they propose to pre-shoot the background image. Then, the background im-age is subtracted from another shoot with the same back-ground including an object. Also, in [5], self-identifyingpatterns are used to recognize the background. The sec-ond approach basically relies on estimating the probabilityof each pixel to decide whether it belongs to the object orthe background. For example, a Bayesian framework hasbeen adopted for separating the background from the object[6][7]. In particular, in [7], special images called low-depth-of-field (LDOF) images are used for the automatic objectsegmentation. For this particular LDOF image, object in theimage is focused and the rest of the background is blurred.Then, the high frequency components residing inside theobject region are modeled with a probabilistic distributionfor Bayesian image segmentation framework. Finally, thethird approach for the background extraction involves the

Canadian Conference on Computer and Robot Vision

978-0-7695-3153-3/08 $25.00 © 2008 IEEEDOI 10.1109/CRV.2008.10

231

Page 2: image segmentation

direct human intervention during the segmentation. This,so called semi-automatic image segmentation, requires atime-consuming and inconvenient user intervention to get arough outline of the object boundary. In this semi-automaticimage segmentation problem, the issue is how to minimizeand simplify the human’s intervention [8][9].

Note that the upper-mentioned background extractionmethods belong to one-time solution. That is, wheneverwe need to extract or to replace the background from thesame image, we have to repeatedly apply one of the abovethree approaches, which is either semi-automatic or appli-cable only to a limited case such as LDOF images. To solvethis problem, once the object and the background are sep-arated by one of the above three approaches, the extractedobject and the background are slightly modified to embeda watermark. Then, it can be automatically identified forthe later demands for background extractions or replace-ments. This paper deals with this problem. That is, we areinterested in the repetitive requests for the background re-placement instead of non-repetitive one-time execution. Ofcourse, the prerequisite for our automatic background re-placement is that the very first object extraction should bedone by one of the above three approaches. Once we obtainseparated object and background in image, we can embeddifferent binary bit for object and background to be usedfor automatic background replacement. Then, after the bi-nary bit embedding, we can automatically separate the ob-ject from the background by simply extracting and identify-ing the embedded bits. Thus, the embedded bits for the ob-ject and the background are inherited to subsequent imagecomposition and are used for later request for the separationof the background from the object.

In [10], the quantization index modulation (QIM) basedwatermarking scheme [11] was used for each pixel to em-bed watermark bit 1 and 0 for the object and the back-ground, respectively (see Fig. 2 for the overall struc-ture). Once the watermark embedded image, which is calledautomatic-object-extractible image, is generated, the auto-matic background replacement problem is equivalent to theextraction of the embedded watermark. As demonstratedin [10], the separation of the object from the backgroundcan be done automatically. However, if the watermark em-bedded (i.e., automatic-object-extractible) image has under-gone post-processing such as JPEG compression or corrup-tion by additive noise etc., the watermarked pixel valueswill be changed and there will be no guarantee to success-fully separate the object from the background. In this paper,we propose a watermark embedding and extraction method,which is robust to some unintentional image modificationssuch as JPEG compressions, additive noises, and croppingetc. The basic idea of the proposed method is to embedthe watermark bit into the average value of a small imageblock instead of the modification of the least significant bits

for each pixel gray-level. Also, the embedded watermarkscan be viewed as two dimensional Markov random fields(MRF), then the watermark extraction can be reformulatedas an MAP (maximum a posteriori) based image segmenta-tion problem.

2. Pixel-wise QIM watermarking: Previousmethod

Quantization Index Modulation (QIM) watermarkingscheme [11] was used to embed watermark bit for indicatingbackground and object for the purpose of repetitive back-ground extraction and replacement [10]. The QIM methodis a blind watermarking scheme and is known to be morerobust than the spread spectrum watermarking. Also, theQIM method is a spatial domain watermarking. Note thatsince the watermark embedding should be done for irregu-lar shaped object and background, it is not tractable to applytransform domain watermarking scheme on a rectangularregion.

In the previous work [10], one-bit watermark is embed-ded to each pixel, i.e., watermark bit 1 is embedded to apixel belonging to the object and watermark bit 0 to a pixelbelonging to the background. Specifically, for a pixel i, awatermark bit bi is generated as follows

bi =

1, if pixel i belongs to the object/foreground0, if pixel i belongs to the background.

(1)According to the watermark bit bi, the gray-level yi at pixeli is modified to yi as follows

yi =

Q1(yi), if bi = 1Q0(yi), if bi = 0,

(2)

where Q1(yi) and Q0(yi) are defined as

Q1(yi) = yi − ∆/4∆

× ∆ +∆4

, (3)

and

Q0(yi) = yi + ∆/4∆

× ∆ − ∆4

. (4)

In (3) and (4), ζ is to convert ζ to an integer (i.e., a ceilfunction) and ∆ is a quantization step for the watermarkstrength.

For a given watermarked image data yi, the embeddedwatermark bit at pixel i can be extracted as follows

bi =

1, if ‖yi − Q1(yi)‖ < ‖yi − Q0(yi)‖0, otherwise.

, (5)

Note that if there is no alteration for the watermarked gray-level yi, then the extracted watermark bit by (5), bi, shouldbe identical to the original watermark bit bi. However, if the

232

Page 3: image segmentation

Original Image

Face

Background

Semi-automaticSegmentationHuman Intervention

composition

Face extractibleImage (automatic -object-extractible image)

ContrastEnhancement

+

ContrastSuppression

Figure 2. The overall structure of embedding watermark bit for the object and the background.

watermarked gray-level is changed due to some post pro-cessing such as JPEG compression and noise addition, thenthe extracted watermark bit bi does not necessarily matchthe original watermark bit bi and the object extraction fromthe background yields some segmentation errors. For exam-ple, the watermarked image Fig. 3-(a) has undergone JPEGcompression as shown in Fig. 3-(b). As a result the water-mark extraction by (5) yields a lot of segmentation errors(see Fig. 3-(c)).

3. Block-wise MAP QIM watermarking

As demonstrated in Fig 3, the pixel-wise watermark de-coding used in [10] is vulnerable to post-processing appliedto watermarked images. To alleviate this problem, in thispaper, we propose to adopt a block-wise MAP (maximum aposteriori) decoding scheme. Note that, in [12], a pixel-wiseMAP decoding for QIM watermarking was also used. Thisscheme is based on a sliding window to embed the water-mark to the local average value. It is certainly expected thatthe robustness (especially against the JPEG compressions)increases by embedding the watermark to the average val-ues rather than pixel gray-levels. However, overlapping thesliding window, the current watermarked pixel value mayaffect the watermarking to the next sliding window. Exper-imental results reveal that the visual degradations occurredby modifying the current gray-level of a pixel to change thelocal average brightness with the watermarked neighboringpixel values. To overcome this problem, in this paper, the

(a) (b)

(c) (d)

Figure 3. Example of segmentation errors: (a)watermarked image, (b) JPEG compressedand decompressed image, (c) object extrac-tion by (5), (d) original object.

233

Page 4: image segmentation

watermark embedding is embedded to the average bright-ness for each non-overlapping block, spreading the modi-fications to all pixels in the block rather than to a pixel ina sliding window method. Also, by simultaneously mod-ifying gray-level in a block, the watermark decoding be-comes block-wise. Here, the block-wise contextual infor-mation (i.e., the block-wise watermark smoothness condi-tion) can be modeled by a Markov random fields (MRF).Thus, the watermark decoding turns out to be a block-wiseMAP (2-class) image segmentation problem, which is sim-ilar to [13][7].

Let us denote the set of 2-D pixel indices in N1×N2 im-age space as Ω = (i, j) : 0 ≤ i ≤ N1−1, 0 ≤ j ≤ N2−1and the non-overlapping block indices with B × B size asΩB = (i, j) : 0 ≤ i ≤ N1

B − 1, 0 ≤ N2B − 1. The

watermark embedding is executed for each block s ∈ ΩB .That is, given the original image composed with an objectand a background, we divide the image space into B × Bnon-overlapping blocks. Then, for each block s ∈ ΩB , wecalculate the average brightness ys. If the majority of thepixels in the block s belongs to an object, then the repre-sentative watermark bit to be embedded is 1 (i.e., xs = 1).Otherwise, we embed watermark bit 0 (i.e., xs = 0). Then,the gray-level at (k, l), 0 ≤ k, l ≤ B − 1, in a block s,ys(k, l), is modified as follows due to the watermark em-bedding

ys(k, l) =

ys(k, j) + Q1(ys) − ys, if xs = 1ys(k, l) + Q0(ys) − ys, if xs = 0,

(6)

where ys(k, l) is the watermarked gray-level at (k, l) in theblock s ∈ ΩB .

On each block s, the watermark bit xs ∈ 0, 1 is as-sumed to be a realization of a random variable Xs, wherexs = 1 for the object and xs = 0 for the background.Denoting the set of random variable (i.e., a random field)X = Xs : s ∈ ΩB, we assume that X is a Gibbs randomfield with the conditional probability as

P (Xs = xs|xηs) =

1Z

expVc(xs), (7)

where Vc(xs) is the clique potential of a clique c on theneighborhood system ηs. On top of the random field X , wehave the random field Y = Ys : s ∈ ΩB, where Ys =ys ∈ 0, 1, · · · , 255 is a random variable in a block s. Therealization of the random variable Ys, i.e., ys, representsthe average brightness of the pixels in the block s and isassumed to be Gaussian distributed with mean Q0(ys) forwatermark 0 and Q1(ys) for watermark 1 and variance σ2.Thus, we have

P (ys|xs) =1√

2πσ2exp−(ys − Qxs

(ys))2/2σ2. (8)

Defining all necessary stochastic models, our block-based MAP watermark decoding is to extract x∗ given the

watermarked and possibly modified image data y as follows

x∗ = argmaxx

P (X = x|Y = y)

= argmaxx

P (Y = y|X = x)P (X = x). (9)

In (9), P (Y |X) and P (X) can be assumed to be block-wiseindependent as follows

P (Y |X) =∏

s∈ΩB

P (Ys|Xs), (10)

andP (X) ≈

∏s∈ΩB

P (Xs|Xηs). (11)

Note that (11) is an approximation of the global Gibbs dis-tribution by a product of local characteristics. Since the aposteriori probability in (9) can be separable for each blockas in (10) and (11), we can independently extract the water-mark bit x∗

s for each block s by maximizing the followinglocal probability

x∗s = argmax

xs

[P (Ys|Xs)P (Xs|Xηs)]

= argmaxxs

[lnP (Ys|Xs) + lnP (Xs|Xηs)]. (12)

Then, by plugging (7) and (8) into (12) and taking a loga-rithm ln, we can determine the block-wise MAP watermarkdecoding.

Once the watermark bit for each block is obtained, theblocks in the boundary between the object and the back-ground are subdivided into four smaller blocks. Then, thewatermark bit of each subdivided image block should bedetermined. For example, the watermark bit of subdividedboundary block comes from one of its non-boundary neigh-boring blocks with the smallest average gray-level differ-ence. This subdivision and watermark determination con-tinues until we have the pixel-level watermark extraction.Note that this process is similar to the watershed methodused in [13][7].

4. Experiments

The quantization step ∆ in (3) and (4) causes a trade-offbetween the strength of the watermarking and the degrada-tion of the image quality. (∆ = 10 is used in this paper).For the clique potential Vc(xs) in (7), a pair clique for thesecond-order neighborhood system is used with the cliquepotential as

Vc(xs) =

β, if all watermarks in pair pixelsof s are same

−β, otherwise(13)

234

Page 5: image segmentation

where we set β = 1. Also, we set σ = 1 in (8). x∗s in

(9) is updated iteratively with less than 5 iterations. At thefirst iteration, only P (Ys|Xs) in (12) without P (Xs|Xηs

) isconsidered.

The proposed watermark encoding and decoding methodresists against some post image processing attacks. Forexample, in Fig 4, the original image Fig 4-(a) is water-marked and JPEG compressed as in Fig 4-(b). Then, it isdecoded by the previous method [10] as in Fig 4-(c) andis decoded by the proposed block-wise MAP decoding (seeFig 4-(d)), demonstrating the robust of the proposed methodto the JPEG compression. Also, as shown in Fig 5, the pro-posed method is also separates the background from the ob-ject even if the watermarked image is cropped.

The proposed method needs to determine parameter val-ues (i.e., ∆, B, β, and σ) a priori. The parameter valueschosen in this paper are not exhaustively searched. Thus,the chosen parameter values may not be optimal. For ex-ample, the block size B is set to be 4 for Fig 4 and Fig 5.However, as shown in Fig 6-(c) and (d), the segmentationresult yields more errors with B = 4 than B = 8 for theadded noise. Analyzing the sensitivity of those parametervalues to the segmentation result and determining the opti-mal parameter values remain as a future work.

5. Conclusion

To embed robust watermark against image processingtechniques such as JPEG compressions, we embed the wa-termark to the average brightness value in an image block.Then, the embedded watermark can be efficiently extractedby adopting a block-wise MAP segmentation framework.We proposed the block-wise MAP QIM decoding schemein this paper and demonstrated its successful backgroundseparation from object even the watermarked image goesthrough post-processing attacks including JPEG compres-sions.

6. Acknowledgement

This work was supported by Seoul R&BD Program(SFCC).

References

[1] D. Roobaert, M. Zillich, and J.-O. Eklundh, Apure learning approach to background-invariant ob-ject recognition using pedagogical support vector ma-chine, Proc of CVPR, II, 351-357, 2001.

[2] http://www.vision.caltech.edu/html-files/archive.html

[3] D.J. Chaplin, Chroma key method and apparatus, USPatent 5,249,039 1995.

[4] R.J. Qian and M.I. Sezan, Video background replace-ment without a bliue screen, IEEE ICIP, 1999.

[5] M. Fiala and C. Shu, Background subtraction usingself-identifying patterns, IEEE CRV, 2005.

[6] Y-Y Chuang, B. Curless, D.H. Salesin, and R.Szeliski, A bayesian approach to digital matting, IEEECVPR, 2001.

[7] C.S. Won, K. Pyun, and R. M. Gray, Automatic objectsegmentation in images with low depth of field, IEEEProc. Of Image Processing (ICIP), III, 805-808, 2002.

[8] C. Gu and M.-C. Lee, Semiautomatic segmentationand tracking of semantic video objects, IEEE Tr. OnCirc. and Sys. for video Tech., 8(5), 572-584, 1998.

[9] Y. Gaobo and Y. Shengfa, Modified intelligent scissorsand adaptive frame skipping for video object segmen-tation, Real-time Imaging, 11, 310-322, 2005.

[10] C.S. Won, On generating automatic-object-extractibleimages, Proc of SPIE, vol 6764, 2007.

[11] B. Chen and G.W. Wornell, Quantization index mod-ulation: A class of probably good methods for digitalwatermarking and information embedding, IEEE Tr.On Information Theory, 47, 1423-1443, 2001.

[12] W. Lu, W. Li, R. Safavi-Naimi, and P. Ogubona, Apixel-based robust image watermarking system, IEEEICME, 2006.

[13] C.S.Won, A block-wise MAP segmentation for im-age compression, IEEE Tr. on Circuits and Systemsfor Video Technology, vol. 8, no. 5, pp.592-601, 1998.

235

Page 6: image segmentation

(a) (b) (c) (d)

Figure 4. Results with JPEG compressions: (a) Original image, (b) Watermark encoded and JPEGcompression applied (compression scale 10 in Photoshop), (c) Object extraction by the previousmethod [10], (d) Object extraction by the proposed method.

(a) (b) (c) (d)

Figure 5. Results with image cropping: (a) Original image, (b) Watermark encoded and croppedimage, (c) Object extraction by the previous method [10], (d) Object extraction by the proposedmethod.

(a) (b) (c) (d)

Figure 6. Comparison with different block sizes B: (a) Original image, (b) Watermark encoded anduniform noise added by 2% from the Photoshop, (c) Object extraction by B = 4, (d) Object extractionby B = 8.

236