animated image cloth segmentation - iitkhome.iitk.ac.in/~mandeeps/animated image cloth...

November 2016

Animated Image Cloth Segmentation

by

MANDEEP SINGH

Third Year Undergraduate

Mechanical Engineering Department

(IITK)

&

BIBEK KUMAR

Third Year Undergraduate

Electrical Engineering Department

(IITK)

Supervised by

Dr. Tanaya Guha, Assistant Professor

Electrical Engineering Department (IITK)

DEPARTMENT OF ELECTRICAL ENGINEERING

INDIAN INSTITUTE OF TECHNOLOGY KANPUR

ANIMATED IMAGE CLOTH SEGMENTATION 2

ABSTRACT

This project aims to improvise the process to parse clothes from animated images which can be

deployed in the search engines of online shopping related to Disney character clothing. A lot of

time & energy consumed in data retrieval process for searching can be saved using some elegant

algorithms like k-means clustering and using Gabor filters for texture based feature extraction.

Further a novel probabilistic modelling is proposed for image parsing using Data Driven Markov

Chain Monte Carlo (DDMCMC) for improvised results as the future work.

Keywords: k-means clustering, Gabor filters, Data Driven Markov Chain Monte Carlo

(DDMCMC), split and merge method


INTRODUCTION

Clothing is one of the major part of our lifestyle which is the key parameter in one’s

social status. Nowadays, due to the growing online shopping facilities, there is an increase in the

demand to retrieve individual consumer’s choice based search results so as to attract the

customers towards their respective online shopping sites and make their shopping experience

more easy and delightful.

Vision algorithm to recognize the clothing have a wide variety of potential impacts, ranging from

better social understanding, to improved person identification, surveillance or content-based

image retrieval. The e-commerce opportunities alone are huge with millions of dollars clothing

markets! Despite the potential research and commerce gains of clothing recognition, relatively

few researchers have explored the problem. Although clothing styles like that of a Disney prince

and/or princess which are quite popular among the kids and youngsters (especially the bride

gowns), little research has been made to parse the animated images, being different from the real

world images, and to develop a sophisticated and intelligent algorithm to recognize and

distinguish various clothing items from an animated image of any favorite Disney character.


OVERVIEW OF THE APPROACH

This project utilizes one of the famous algorithm, called k-means clustering, to segment

constituent parts of clothes from the background and other body parts like face, hairs, hands and

feet of the animated image of the famous Disney characters as the sampled dataset used in it as

the first stage of the extracting the clothing. Further for future improvisation, a data driven

Markov chain is established for parsing the image to the constituent clothing items by unifying

various segmentation algorithms like edge k-means, split and merge etc. The probabilistic model

can be established by categorically dividing the image into different regions based on its type i.e.

uniform, texture and shading.

1. k-means clustering

It is a least square method to partition the given n dataset into k-regions, each observation

belongs to a particular cluster based on the nearest distance from the mean of that cluster. Here

the dataset is the set of pixels of the animated RGB image and each pixels are the collection of

the intensity values of red, blue, and green light ranging from 0-255. So we can consider each

pixel with its intensity (r, g, b) as a point in the 3-D RGB coordinate system.

1.1 Algorithm-

Randomly select k points as the means of k-clusters and

calculate the Euclidean distance of the ith pixel (ri, gi, bi)

from the initial means (rm, gm, bm) and the minimum

distance to that pixel gives the cluster in which it belong.

Now after forming the clusters, the updated means of the

clusters is defined by their respective centroid. And the algorithm is

run till the point of convergence with k distinct clusters as shown in the Fig 1 with 3 clusters.

Figure 1


Following are the sampled animated images of the Disney princesses used to segment the clothes

in this project along with its results on the application of k-means clustering after separating the

background.

Figure 2

Figure 3

Figure 4


Figure 5

Figure 6

1.2 Limitation of the initial approach and its remedy-

After performing the k-means clustering, we can infer that it can segment the clothings from the

face, skin and hair almost with 90-95% accuracy as in the case of Fig 5, however it is expected to

fail in the cases (with 60-75% accuracy) where there is cluttering of the pixels of the 2 different

cluster and also the unneccesary splitting of single cluster as in the other cases.

These glitches can be removed by segmenting the images using Gabor filters for the textured

regions and deploying a more rigorous algorithm that can prevent the same textured regions or

the region with shade to split further by merging those regions into the single regions. It is finally

proposed to use the Markov property (i.e. conditional probabilities of the future states don’t

depend on the sequence of events in the past and only depend on the present state) which come

handy to get the optimal solution (regions) of the dynamic system.


2. Texture based segmentation

As discussed in the earlier limitations section, we use the multi-channel Gabor filters for

extracting the regions of same texture.

2.1 Gabor filters

These are the band-pass filters with tunable center frequency, orientation and bandwidth.

Frequency and orientation representations of Gabor filters are similar to those of the human

visual system, and they have been found to be particularly appropriate for texture representation

and discrimination. The Fourier transform of the Gabor filter is a Gaussian shifted in frequency.

2-D Gabor Filter-

𝑓(𝑥, 𝑦, 𝜔, 𝜃, 𝜎x, 𝜎y) = {exp [−((𝑥/𝜎x)2+(y/𝜎y)

2)/2+𝑗𝜔(𝑥𝑐𝑜𝑠𝜃 + 𝑦𝑠𝑖𝑛𝜃)]}

where 𝜎 is the spatial spread, 𝜔 is the frequency, 𝜃 is the orientation

2.2 Gaussian Smoothing

Spatial smoothing can be applied to the feature extraction methods through Gabor filter, and is

known to enhance the process of segmentation process because it suppresses large variations in

the feature map in the regions which belong to the same texture.

Each filter output is smoothed using Gaussian smoothing function that matches the

corresponding filter spatial Gaussian curve.

𝑔(𝑥, 𝑦) = exp {−(𝑥2+ 𝑦2)/2𝜎2}

2.3 Steps for texture based segmentation

Input Image

Gabor Filtering

Gaussian smoothing

ClusteringSegmented

Image


3. Data Driven Markov Chain Monte Carlo (DDMCMC)

Markov Chain Monte Carlo is clever way to iteratively search through higher dimensional space

by constructing a Markov Chain which converges to the invariant (stationary) solution from the

posterior probability p(W|I) which is proportional to prior p(W) and likelihood p(I|W).

3.1 Image Models

As there are mainly two types of regions in the sampled image, namely uniform and texture, so

we can define 2 image models: independently and identically distributed Gaussian model for

uniform regions and a mixture of 2 Gaussian model for textured color regions along with their

likelihood functions to switch between different regions according to its maximum likelihood.

The likelihood of image is the product of the regions’ likelihood which is given by

p(I|W)=∏ 𝑝(𝑰𝐾𝑖=1 Ri; i;𝑙i) where I is the image; Ri is the ith region; i is the model parameter

vectors and 𝑙i is the model label index.

3.2 Dynamics- Split and Merge method

Let W and W’ be the states of Markov chain with K and K+1 disjoint regions.

Then if the kth region is split into ith and jth , its proposed splitting probability is defined by

Where Conditional Probability of how likely chain proposes to move to W’ from W is given by

& Probability of Proposed Merge is


Figure 7: The anatomy of the solution space. The arrows represent Markov chain jumps and the

reversible jumps between the two subspace 8 and 9 realise the split and merge of a region.

3.3 Further refinement through calibration

The project has proposed the DDMCMC algorithm to get the desired results, however there is a

need to calibrate the system according to the sampled animated images. A parametric factor is

multiplied with the likelihood function to get the required output with the correct posterior

probability of both types of image models as that of human visual recognition.

REFERENCES

[1] R. C. Gonzalez, R. E. Woods, “Digital Image Processing”, 3rd Edition, Ch-10

[2] Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, Tamara L. Berg, “Parsing Clothing in

Fashion Photographs”, IEEE, 2012

[3] Zhuowen Tu, Song-Chun Zhu, “Image Segmentation by Data-Driven Markov Chain Monte

Carlo”, IEEE Transactions on Pattern Analysis and Machine Intelligence, May 2002

[4] Mathwork Documentation, “Image Processing”, The Mathwork Inc., 2016

animated image cloth segmentation - iitkhome.iitk.ac.in/~mandeeps/animated image cloth...

Documents