[ieee 2013 ieee international conference on multimedia and expo (icme) - san jose, ca, usa...

6
OBJECTIVE ASSESSMENT OF VIDEO SEGMENTATION QUALITY FOR AUGMENTED RE ALITY Silvio R. R. Sanches*, Valdinei F Silva t , Ricardo Nakamura, Rome Tori Universidade de Sao Paulo, Sao Paulo, Brasil { silviorrs, valdinei.freire } @usp.br, [email protected], [email protected] ABSTRACT Assessment of video segmentation quality is a problem sel- dom investigated by the scientific community. Nevertheless, recent studies presented some objective metrics to evaluate algorithms. Such metrics consider different ways in which segmentation errors occur (perceptual factors) and its param- eters are adjusted according to the application for which the segmented ames are intended. We demonstrate empirically that the performance of existing metrics changes according to the segmentation algorithm; we applied such metrics to eval- uate bilayer segmentation algorithms for compose scenes in Augmented Reality environments. We also contribute with a new objective metric to adjust the parameters of two bilayer segmentation algorithms found in the literature. Index Terms- Binary Segmentation Algorithm, Objec- tive Assessment, Objective Evaluation, Augmented Reality. 1. INTRODUCTION Image segmentation for the extraction of a person in the fore- ground from their original context has become a common task in Augmented Reality (AR) systems. This operation be- comes more difficult when, due to application requirements, it must be performed in natural environments - with an arbi- trary background and without controlled lighting - and using uncalibrated monocular video capture [1, 2]. Recent research has produced algorithms that work in real-time and perform segmentation based in monocular im- ages [3,4, 5, 6]. Applications such as videoconferencing sys- tems (or videochats) [3, 4] and immersive games [1] have adopted these algorithms; both applications may implement AR systems [1, 2]. Because of the difficulty of segmenting a natural image - that is, an image obtained from a natural environment - the output image, which should contain only the element of interest, may present pixel classification er- rors. The usage of images with those errors may influence the *Thanks to CAPES (Coordena9ao de Aperfei90amento de Pessoal de Nivel Superior) for financial support and Instituto Nacional de Ciencia e Tecnologia - Medicina Assistida por Computa9ao Cientffica (INCT-MACC) Proc. 573710/2008-2 for device used in the subjective experiments. t Thanks to FAPESP (Funda9aO de Amparo Pesquisa do Estado de Sao Paulo) Procs. 11119280-8 and 12/19627-0 for financial support. visual quality of the scene displayed to the user considerably, and may prevent some applications to use those algorithms. According to Gelasca and Ebrahimi [7], it is possible to obtain an application-dependent metric to evaluate segmenta- tion algorithms. Those authors presented an objective metric containing parameters that were adjusted according to the tar- get application of the segmented video ames; the study in- cluded AR systems. To obtain that metric, the authors defined and created a set of error types that were artificially inserted in video sequences that simulated an AR application. Those videos were submitted to a subjective quality assessment pro- cess. We have evaluated the metric proposed in [7] using ac- tual segmentation errors, i.e., errors produced aſter applying a real segmentation algorithm. The metric was applied to video sequences obtained by the execution of two segmenta- tion algorithms described in the literature, and exhibiting the corresponding errors. Videos simulating an AR environment were created with the imperfectly segmented video frames. A subjective quality assessment process was applied to those videos. Based on the results obtained from this subjective evaluation, we proposed a new objective metric which con- sider perceptual factors for the assessment of segmentation quality for AR environments. Our metric can be used to se- lect optimal parameters for two bilayer segmentation algo- rithm following subjective evaluation. 2. SEGMENTATION QUALITY ASSESSMENT Assessing the quality of segmentation algorithms is a prob- lem that has been investigated in different contexts in the lit- erature. One case is the evaluation of images composed by objects - in particular, people - extracted from video content [7]. Although sophisticated segmentation methods exist, none are precise and general enough yet to be a definitive solution to the problem. Therefore, applications must consider that the composite image presented to the user, in a given moment, may contain segmentation errors. Images produced by the process of segmentation and composition with a new background have been evaluated subjectively and objectively [7]. Subjective assessment has

Upload: romero

Post on 02-Mar-2017

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: [IEEE 2013 IEEE International Conference on Multimedia and Expo (ICME) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and Expo (ICME)

OBJECTIVE ASSESSMENT OF VIDEO SEGMENTATION QUALITY FOR AUGMENTED REALITY

Silvio R. R. Sanches*, Valdinei F. Silvat, Ricardo Nakamura, Romero Tori

Universidade de Sao Paulo, Sao Paulo, Brasil

{silviorrs, valdinei.freire}@usp.br, [email protected], [email protected]

ABSTRACT

Assessment of video segmentation quality is a problem sel­

dom investigated by the scientific community. Nevertheless,

recent studies presented some objective metrics to evaluate

algorithms. Such metrics consider different ways in which

segmentation errors occur (perceptual factors) and its param­

eters are adjusted according to the application for which the

segmented frames are intended. We demonstrate empirically

that the performance of existing metrics changes according to

the segmentation algorithm; we applied such metrics to eval­

uate bilayer segmentation algorithms for compose scenes in

Augmented Reality environments. We also contribute with a

new objective metric to adjust the parameters of two bilayer

segmentation algorithms found in the literature.

Index Terms- Binary Segmentation Algorithm, Objec­

tive Assessment, Objective Evaluation, Augmented Reality.

1. INTRODUCTION

Image segmentation for the extraction of a person in the fore­

ground from their original context has become a common

task in Augmented Reality (AR) systems. This operation be­

comes more difficult when, due to application requirements,

it must be performed in natural environments - with an arbi­

trary background and without controlled lighting - and using

uncalibrated monocular video capture [1, 2].

Recent research has produced algorithms that work in

real-time and perform segmentation based in monocular im­

ages [3,4, 5, 6]. Applications such as videoconferencing sys­

tems (or videochats) [3, 4] and immersive games [1] have

adopted these algorithms; both applications may implement

AR systems [1, 2]. Because of the difficulty of segmenting

a natural image - that is, an image obtained from a natural

environment - the output image, which should contain only

the element of interest, may present pixel classification er­

rors. The usage of images with those errors may influence the

*Thanks to CAPES (Coordena9ao de Aperfei90amento de Pessoal de Nivel Superior) for financial support and Instituto Nacional de Ciencia e Tecnologia - Medicina Assistida por Computa9ao Cientffica (INCT-MACC) Proc. 573710/2008-2 for device used in the subjective experiments.

tThanks to FAPESP (Funda9aO de Amparo it Pesquisa do Estado de Sao Paulo) Procs. 11119280-8 and 12/19627-0 for financial support.

visual quality of the scene displayed to the user considerably,

and may prevent some applications to use those algorithms.

According to Gelasca and Ebrahimi [7], it is possible to

obtain an application-dependent metric to evaluate segmenta­

tion algorithms. Those authors presented an objective metric

containing parameters that were adjusted according to the tar­

get application of the segmented video frames; the study in­

cluded AR systems. To obtain that metric, the authors defined

and created a set of error types that were artificially inserted

in video sequences that simulated an AR application. Those

videos were submitted to a subjective quality assessment pro­

cess.

We have evaluated the metric proposed in [7] using ac­

tual segmentation errors, i.e. , errors produced after applying

a real segmentation algorithm. The metric was applied to

video sequences obtained by the execution of two segmenta­

tion algorithms described in the literature, and exhibiting the

corresponding errors. Videos simulating an AR environment

were created with the imperfectly segmented video frames.

A subjective quality assessment process was applied to those

videos. Based on the results obtained from this subjective

evaluation, we proposed a new objective metric which con­

sider perceptual factors for the assessment of segmentation

quality for AR environments. Our metric can be used to se­

lect optimal parameters for two bilayer segmentation algo­

rithm following subjective evaluation.

2. SEGMENTATION QUALITY ASSESSMENT

Assessing the quality of segmentation algorithms is a prob­

lem that has been investigated in different contexts in the lit­

erature. One case is the evaluation of images composed by

objects - in particular, people - extracted from video content

[7]. Although sophisticated segmentation methods exist, none

are precise and general enough yet to be a definitive solution

to the problem. Therefore, applications must consider that the

composite image presented to the user, in a given moment,

may contain segmentation errors.

Images produced by the process of segmentation and

composition with a new background have been evaluated

subjectively and objectively [7]. Subjective assessment has

Page 2: [IEEE 2013 IEEE International Conference on Multimedia and Expo (ICME) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and Expo (ICME)

been shown to be the most efficient means to obtain reli­

able measurements [8] both in the industry and in the sci­

entific community. Some methods, which are traditionally

used in quality assessment of video codecs for TV broadcasts,

were adapted over time so that they could be used for images

presented in multimedia applications, including evaluation of

segmentation [9]. Some of these methods are also directly

applied in processes of quality assessment of video objects

[9].

The main problem of subjective assessment methods is

that, in general, they require a large number of observers and

significant infrastructure. This makes the process lengthy and

possibly expensive. Then, if an algorithm should be tuned to

best performance, an objective metric can avoid the applica­

tion of subjective assessment directly [7].

Although there exist previous researches in measuring

quality of image segmentation, the main motivation for re­

search in quality of image segmentation is directly related to

the efforts for the standardization ofISO/MPEG-4. Due to the

standard-required feature of independent video-object encod­

ing, research aimed at the quality assessment of those seg­

mented images became necessary. This research developed

a metric based in spatial accuracy and in spatial coherency

[10]. Other methods, consisting in improvements on the base

model, were also proposed by the same research group [11].

Correia and Pereira [12] evaluated individual object seg­

mentation based in spatial and temporal criteria - it also in­

cludes the evaluation of segmentation with multiple objects

presented in the video content. The temporal criteria are:

shape fidelity, geometric fidelity, similarity of edge contents,

and statistical data similarity. The adopted temporal criteria

represents temporal perceptual information and a measure­

ment of criticality, using spatial and temporal information si­

multaneously.

Gelasca and Ebrahimi [7] proposed subjective tests to

identify the most noticeable error types, but the error types

were artificially created to simulate spatial and temporal er­

rors. The developed method takes into consideration the er­

rors (or artifacts) identified through the assessment process to

cause more inconvenience to the users. Four artifacts were de­

fined: added regions, added background, internal holes, and

edge holes [7]; we detail this artifacts in section 4. Then, all

of these artifacts are linear combined to produce an overall

measurement of discomfort. Finally, authors defines specific

weights for some applications, including the assessment of

segmentation in AR environments.

Although some methods for the objective assessment

of segmentation quality have been proposed, according to

Gelasca and Ebrahimi [7], few of them focus in studying and

defining the errors typically found in the segmentation pro­

cess in order to obtain a perceptive measurement.

3. SUBJECTIVE EXPERIMENT

Our first task was to define a formal subjective experiment

to gather the opinion of users regarding videos presenting an

AR environment with segmentation errors in the avatar image

that was inserted in it. These errors were obtained by exe­

cuting two different segmentation algorithms that are feasible

in this context [4, l3]. Therefore, user observes actual seg­

mentation errors, instead of simulated ones as in [7]. Next

sections describe the steps of the subjective method.

3.1. Preparation of the Video Database

Five different video sequences were used as sources, named

SEQ1, SEQ2, SEQ3, SEQ4, and SEQ5. In these sequences

the element of interest in the scene - the person in the fore­

ground - was placed so that the upper body or full body was

visible. Figure 1 shows video frames from sequences SEQ1

and SEQ2.

Fig. 1. Frames from original video sequences SEQ1 e SEQ2.

Each source video sequence used in the experiment has a

ground truth. That is, for each video frame there is a corre­

sponding precisely segmented image. This allowed the cal­

culation of classification errors for each sequence. The pixels

in the ground truth video frames were labeled as foreground,

background, and unknown region. The pixels that are part of

the unknown region (edge of the element of interest) were not

included in the error count. In the generated videos, pixels

in this region are composed of the element of interest and the

new background as matting technique, each with 50% trans­

parency, this technique soften the edges of the element of in­

terest.

Sequences SEQ2 and SEQ4 and their respective ground

truth sequences were obtained from a database available for

research', whereas sequences SEQ1, SEQ3, and SEQ5 were

captured and labeled manually.

By executing two segmentation algorithms with differ­

ent parameters, we produced short-duration (lOs) video se­

quences from the five source video sequences, with a reso­

lution of 640 x 480 pixels and different percentages of pixel

segmentation errors2. These videos present an AR application

scenario based on a 3D virtual office environment from a fixed

1 http://research.microsoft.com!en-us/projects/i2i/data.aspx 2We define formally pixel segmentation errors as the artifact Ey in sec­

tion 4.2.

Page 3: [IEEE 2013 IEEE International Conference on Multimedia and Expo (ICME) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and Expo (ICME)

point of view. The segmented element of interest is inserted

in the scene as a "billboard " - that is, applied as a texture

on a rectangular 3D mesh present in the virtual environment

that remains perpendicular to the viewing direction. For the

tests, error-segmented sequences were produced for each of

the source videos for a total of forty-eight video sequences.

Twenty-four were segmented using the background subtrac­

tion method (Qian) proposed in [13] and the other twenty­

four segmented using the method based in the energy mini­

mization framework, proposed by Criminisi et al (Crim) [4].

The error amount and error type depend on the segmentation

algorithm and on the parameters applied within the algorithm.

By varying setting parameters in both algorithm, pixel er­

rors range from 0% (ground-truth reference video) to 31.85%

(worst case) in video sequences. The different error percent­

ages were obtained, in the case of the Qian method, by vary­

ing the value of the threshold that controls the tolerance in

comparing colors between the background model and the an­

alyzed frame [13]. In the method by Criminisi et aI. , the error

percentages were obtained by varying the normalization pa­

rameters of the Conditional Random Field (CRF) [4] that is

used in the model. Since the error percentages were obtained

by the execution of the cited segmentation algorithms, some

videos presented higher error percentages due to the limita­

tions of the methods to handle some scenarios presented in the

source video content, for example, similar colors in the back­

ground and in the element of interest and lighting changes.

Two frames of the new videos - containing segmentation

errors - can be seen in figure 2.

Fig. 2. Videos sequence with segmentation errors created for

subjective assessment tests.

3.2. Performing the Tests

Some video quality subjective assessment methods, with ac­

knowledged efficiency [8], are popular both in the industry

and in the scientific community. Among them, SAMVIQ

(Subjective Assessment Methodology for Video Quality) [14]

has been shown to be very precise according to certain studies

[8]. We used the SAMVIQ method to determine which seg­

mentation errors were the most noticeable to the users. Eval­

uation of the generated videos was performed by applying the

SAMVIQ method [14], implemented in the MSV3 tool.

3 http://compress ion . ru/ vi deo/q uali ty Jl1eas ure/perceptual _video_quality _tooLen .html

A total of 26 volunteers took part in the experiment. The

only restriction regarding the participants' profile, as required

by the SAMVIQ method, is that they must not work with im­

age quality assessment as their primary occupation. The same

environmental conditions were maintained for all tests. De­

tails of the physical setup may be found in the lTV recom­

mendations BT-5004. The interface of the assessment envi­

ronment is presented in figure 3.

T .. ,_

�" ..... _JiS.»oIJ_CMO -'

Sequence C

--,-' -

Fig. 3. Subjective Experiments Interface.

4. ERROR TYPE ANALYSIS

A segmentation error may affect video quality in spatial and

temporal terms [11]. The work of Gelasca and Ebrahirni

presents four spatial artifacts that may be combined to rep­

resent a general discomfort in relation to segmentation errors

in a video sequence. They are: Added Regions A,.., Added

Background Ab, Internal Holes Hi, and Edge Holes Hb [7].

Since Gelasca and Ebrahirni assessed subjective evaluation

by showing to users video sequences with synthetic segmen­

tation errors; we intended to verify if the same results can be

obtained under errors produced by real algorithms. We do so

in the context of AR applications.

Artifacts of the same type are grouped on scalar. The rel­

ative spatial error S Ar (k) groups all added regions in frame k by:

"NAr IAj(k)1 S (k) = L..J=l ,.. Ar In(k)1

(1)

where At (k) is the added region j in the frame k, I . I is the set

cardinality operator, n( k) is the sum of reference pixels and

the results of the segmentation and NAT is the total number of

added regions. Likewise, for the j internal holes Hi (k), the

relative spatial error SHi (k) may be obtained. For the other

types of error, which are connected to the edge of the element

of interest, a weight Dj that takes into account the distance of

each pixel in region to the edge of the element of interest is

4 http://www. itu. intireclR -REC-BT. SOO-12-200909-S/en

Page 4: [IEEE 2013 IEEE International Conference on Multimedia and Expo (ICME) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and Expo (ICME)

also added. The relative spatial error for edge errors S Ab (k) for j added regions is:

(2)

In a similar manner, for j edge holes 1-l1 (k), the relative spa­

tial error S1iu (k) may be obtained.

To take into account the temporal aspect, effects such as

the sudden disappearance of artifacts, suprise effect and ex­

pectation effect are considered [7], resulting in four objective

perceptual metrics PSTAr, PSTAb, PST1ii and PST1ib, with weights (a, b, C and d) adjusted by means of subjective

evaluation [7]. Lastly, the proposed objective perceptual met­

ric is a linear combination of those metrics:

PST = axPSTAr+bxPSTAb+CXPST1ii+dxPST1ib. (3)

4.1. PST Weights Analysis

The initial tests were performed to verify if the weights re­

lated to each artifact, obtained from PST metric for AR appli­

cations are valid when evaluating segmentation based in ac­

tual errors, obtained from the execution of two different seg­

mentation algorithms, here referred to as Qian [13] and Crim

[4].

Initially, the artifacts defined in PST were identified in the

data obtained from the segmentation performed by each al­

gorithm. Afterwards, the optimal values for W = [a, b, c, dJ (equation 3) and their respective confidence intervals were ob­

tained, for the Crim algorithm, by means of linear regression

WCrim = regress(SubjCrim, n) (4)

where SubjCrim are the data from the subjective evalua­

tion (SAMVIQ) of the Crim algorithm and n = [PSTAr, PSTAO' PST1ii e PST1iuJ are the artifact-based metrics

found in the data resulting from segmentation. The values

of W Qian were obtained in a similar manner.

Table 1 presents the confidence intervals (with 95% level)

obtained in this experiment for each algorithm.

Table 1 Confidence Interval WCrim WQian

left right left right

a 0.93 1l.72 3.75 44.15

b 6.48 28.79 -22.68 4.78

C -6.10 0.71 -20.87 -3.63

d 6.00 12.98 3.54 8.70

Since the suggested weights W Gel in the PST metric are

a = 6.71, b = 8.39, C = 12.57 e d = 8.74 for AR applica­

tions, it is possible to observe that, for the Crim algorithm, C

is outside the confidence interval whereas in the Qian algo­

rithm, b, C and d are outside the interval. We note that be­

cause the Crim algorithm produces cluster-like errors, since

one of its criterion for segmentation is based on contrast, on

the other side, the Qian algorithm produces scattered-like er­

rors. Gelasca and Ebrahimi experiment only within cluster­

like errors, but not at all with scattered-like errors [7].

In the previous experiment we compute a weight vector

for each algorithm: W Qian and W Crim. Our statistical test

says that W Qian and W Crim are different from W Gel; the

next step is to measure how each weight affects the evalua­

tion prediction. We compute errors EQian,Qian, EQian,Gel, EQian,Crim, ECrim,Crim, ECrim,Gel, and ECrim,Qian, where error Ei,j stand for the square error obtained when pre­

dicting evaluation within weights Wj against subjective eval­

uation subj;. By applying the Student T-test between EQian,Qian and

EQian,Gel, it is possible to affirm with 95% confidence that

there are significant differences between the errors for AR

applications, i.e. , evaluating Qian algorithm within a metric

obtained under methodology of Gelasca and Ebrahimi is sta­

tistically worse than evaluating Qian algorithm within a met­

ric obtained directly from the errors produced by the algo­

rithm. We also confirmed that EQian,Crim and ECrim,Qian are statistically different, meaning that it is not only impor­

tant to make use of real errors, but make tuning metric to spe­

cific algorithm. On the other side, as it has already happen in

the parameters interval, errors ECrim,Crim e ECrim,Gel did

not prove to be statistically different, meaning that results ob­

tained by Gelasca and Ebrahimi can be useful, but it must be

used with parsimony.

4.2. Definition and Analysis of Artifacts

The second experiment was intended to verify if the artifacts

n, suggested in the work of [7], are the best set to be ap­

plied for the evaluation of segmentation in the context of AR

applications considered in this research. For the comparison

with the previously described artifacts, another set of eighteen

artifacts were defined, some of them with variations in their

parameters.

False negatives EN are foreground pixels classification

errors given by

1 K P EN = K � � pix(p, k) E N(k) (5)

k=lp=l

where pix(p, k) is the pixel at p position from frame k. False

positives Ep which are background errors obtained in a sim­

ilar way and total error mean Er is the sum of these artifacts

Er = EN + Ep. Some artifacts were defined to measure the degree of an­

noyance related to distance from false positives to foreground.

DPin(dt) are false positives at maximum dt pixels from the

Page 5: [IEEE 2013 IEEE International Conference on Multimedia and Expo (ICME) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and Expo (ICME)

foreground which are defined by equation

1 K P

DPin(dt) =

K l: l: pix(p, k) E P(k), Vpix(p, k) < dt k=l p=l

(6)

where dt E {80, 90, 100, 110, 120} is the distance in pixels.

Dpout(dt) is the false positives at minimum dt pixels distant

from foreground and it was obtained in a similar way. An­

other type of artifact defined in our experiment is the con­

nected pixels (or blob). Blob errors are given by

K BPlarger(size) = l: P(k) > size (7)

k=l

where size = {5, 10, 15, 20} is the amount of pixels con­

nected. BPsmaller(size) are the false positives with up to size pixels connected, B Nlarger(size) are the false negatives with

at least size pixels connected and BNsmaller(size) are the

false negatives with up to size pixels connected. These ar­

tifacts were calculated in equation 7.

To take into account the temporal aspect we defined the

artifact TN(pc) which are the errors of the pixel p from the

frame k that occur at least in pc percent of the frames from a

video sequence K. This error type is given by

1 K TN(pc) =

K l: pix(p, k) E N(k) and TN(pc) < pc (8)

k=l

where pc = {40, 50, 60, 70, 80, 90}. Tp(pc) which are false

positives spatio-temporal errors are obtained in a similar way.

Another type of spatial and temporal errors considered in

this research are the "false blobs ". A false negative spatial

false blob F S N s is calculate by convolving a binary image (l

= false negatives pixels, 0 = no false negative pixels) with a

simple kernel Ms according to equation

K FSNS = l: MN(k) * Ms

k=l (9)

where * is the convolution operator. MN(k) is a binary image

created from the false negative pixels and Ms is a 3 x 3 kernel.

False positive false blobs F Sps are calculate in a similar

manner. The false blob error F SN9 is obtained by convolving

MN(k) according to equation

K FSN9 = l: MN(k) * Mg(J

k=l (10)

where Mg is a centralized gaussian kernel with standard de­

viation (J = 0.8. FSpg is obtained in a similar way.

In order to calculate temporal false blobs errors we de­

fined a video sequence as a three-dimensional matrix H ei x

K x Wid where H ei is the height of the original frames and

Wid is its width. Therefore, a " temporal frame " Qt can be

represented by an image H ei x K and a temporal sequence

contain Wid temporal frames. False negatives temporal blobs

are given by

Wid FTNs = l: QtN(w) * Ms(J (11)

w=l

where QtN is a binary temporal image created from false neg­

atives pixels. FTps are false positives temporal blobs, FTN 9 and FTpg are gaussian false negatives and gaussian false pos­

itives temporal blobs respectively.

Once the new artifacts were defined, the next step con­

sisted in obtaining the most annoyance artifacts and its re­

spective weights in order to define the objective metric. For

this, we used a multi-step greedy selection after testing a set

of artifacts through regression.

This analysis showed that artifacts EN, Dpout(llO) , TN(60) and PSTAb are the most annoyance in the analysis

performed with the Crim algorithm, while the artifacts EN, TN(70)' Tp(50) and BNsmaller(5) are the most annoyance in the analysis of the Qian algorithm.

S. OBJECTIVE METRIC DEFINITION

According to results shown in section 4.2 an objective metric

must be specific to a segmentation algorithm. Therefore, the

metric M can be defined by equation

I,J M(Alg,Ap) = l: (pesi x artj)Alg (12)

i=l,j=l

where Alg is the algorithm-dependent and Ap is the ap­

plication in which the foreground layer will be used.

The weights pes are denoted by a vector pes (pesl' peS2, ... ,pesi, ... ,pes I ) (these weights were ob­

tained in linear regression step) as well as the artifacts

(art1, art2, ... , artj, ... , artJ). We used I, J = 4 as in PST

in order to better comparison with that metric.

The metric M must be used in AR applications to evaluate

the Crim algorithm according to equation

M(Crim,RA) = a x EN + b x Dpout(llO)+ c x TN(60) + d x PSTAb

(l3)

where a = 0.051, b = 0, c = -4.692 and d = 13.731. In a

similar manner, the metric M must be used as in equation

M(Qian,RA) = a x EN + b x TN(70)+ c x Tp(50) + d x BNsmaller(5)

(14)

where a = -0.087, b = 1.781, c = 2.978 and d = -0.001. Although the metrics obtained improve over PST

(Gelasca and Ebrahimi), mainly when the Qian algorithm is

considered, we do not state strongly to be the best metric for

both algorithms here considered. First, we considered some

Page 6: [IEEE 2013 IEEE International Conference on Multimedia and Expo (ICME) - San Jose, CA, USA (2013.07.15-2013.07.19)] 2013 IEEE International Conference on Multimedia and Expo (ICME)

hand-coded artifacts plus artifacts defined on PST; although

the set of artifacts is large, they are far from extinguishing ar­

tifact possibilities. Second, we select four artifacts greedily to

compare with PST metric; four artifacts may not be enough to

evaluate properly and multi-step greed selection is suboptimal

in general (for example, Dpout(llO) proves to be irrelevant af­

ter adding more two artifacts). Third, the proximity between

W Gel and W Grim may indicate a direction to setting a metric

for types of segmentation algorithms (for example, PST Au was elected as a good artifact for Crim algorithm). Fourth,

although different algorithms may require different different

metrics, both algorithms elected EN and TN(pc) indicating a

direction to setting a common subset for evaluate algorithms.

Finally, the small number of video sequence evaluated within

volunteers does not allow a generalization on our metric.

6. CONCL USION

This paper has addressed the problem of bilayer video seg­

mentation quality assessment when the segmented video

frames are used to compose scenes in Augmented Reality en­

vironments. Here we show that a state-of-art metric is not

directly usable to evaluate segmentation in this context. De­

spite the artifacts proposed by the authors do not prove to­

tally irrelevant when the segmentation quality of one of the

two algorithms under investigation was evaluated, the use of

the suggested artifacts weights - which are used to combine

these artifacts - produced results misaligned with the subjec­

tive evaluation. Finally, we show that for each segmentation

method used in this experiment new adjusted artifacts can bet­

ter represent the overall annoyance produced by segmentation

errors which are noted by the users. These artifacts were used

to compose the new objective metric presented in this paper.

7. REFERENCES

[1] R. Nakamura, L. L. M. Lago, A. B. Carneiro, A. J. C.

Cunha, F. J. M. Ortega, J. L. Bernardes-Jr, and R. Tori,

"3PI experiment: immersion in third-person view, " in

Proceedings of the SIGGRAPH Symposium on Video

Games, New York, NY, USA, 2010, pp. 43-48, ACM.

[2] S. R. R. Sanches, D. M. Tokunaga, V. F. Silva, A. C.

Sementille, and R. Tori, "Mutual occlusion between

real and virtual elements in augmented reality based on

fiducial markers, " in IEEE Workshop on Applications

of Computer Vision (WACV). 2012, pp. 49 -54, IEEE

Computer Society.

[3] P. Yin, A. Criminisi, J. Winn, and I. Essa, "Bilayer

segmentation of webcam videos using tree-based classi­

fiers, " Pattern Analysis and Machine Intelligence, IEEE

Transactions on, vol. 33, no. 1, pp. 30-42, 2011.

[4] A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov,

"Bilayer segmentation of live video, " in Proceedings of

IEEE Computer Society Conference on Computer Vision

and Pattern Recognition, Washington, DC, USA, 2006,

vol. 1, pp. 53-60, IEEE Computer Society.

[5] A. Parolin, G. P. Fickel, C. R. Jung, T. Malzbender, and

R. Samadani, "Bilayer video segmentation for video­

conferencing applications, " in IEEE International Con­

ference on Multimedia and Expo (ICME), 2011, pp. 1-6.

[6] S. R. R. Sanches, V. da Silva, and R. Tori, "Bilayer seg­

mentation augmented with future evidence, " in Compu­

tational Science and Its Applications - ICCSA, B. Mur­

gante, O. Gervasi, S. Misra, N. Nedjah, A. Rocha,

D. Taniar, and B. Apduhan, Eds., vol. 7334 of Lec­

ture Notes in Computer Science, pp. 699-711. Springer

Berlin / Heidelberg, 2012.

[7] E. Gelasca and T. Ebrahimi, "On evaluating video ob­

ject segmentation quality: A perceptually driven objec­

tive metric, " IEEE Journal of Selected Topics in Signal

Processing, vol. 3, no. 2, pp. 319 -335, 2009.

[8] S. Pechard, R. Pepion, and P. L. Callet, "Suitable

methodology in subjective video quality assessment:

a resolution dependent paradigm, " in International

Workshop on Image Media Quality and its Applications

(IMQA), 2008.

[9] S. R. R. Sanches, D. M. Tokunaga, V. F. Silva, and

R. Tori, "Subjective video quality assessment in seg­

mentation for augmented reality applications, " in XIII

Symposium on Virtual Reality (SVR), 2012, pp. 46 -55.

[10] P. Villegas, X. Marichal, and A. Salcedo, "Objective

evaluation of segmentation masks in video sequences, "

in Workshop on Image Analysis for Multimedia Interac­

tive Services (WIAMIS), 1999, pp. 85 - 88.

[11] X. Marichal and P. Villegas, "Objective evaluation of

segmentation masks in video sequences, " in European

Conference on Signal Processing (EUSIPCG), 2000,

vol. 4, pp. 2193-2196.

[12] P. Correia and F. Pereira, "Objective evaluation of video

segmentation quality, " IEEE Transactions on Image

Processing, vol. 12, no. 2, pp. 186 - 200, 2003.

[13] R. Qian and M. Sezan, "Video background replace­

ment without a blue screen, " Proceedings of Interna­

tional Conference on Image Processing (ICIP), vol. 4,

pp. 143-146 vol.4, 1999.

[14] F. Kozamernik, V. Steinmann, P. Sunna, and E. Wyck­

ens, "Samviq - a new ebu methodology for video qual­

ity evaluations in multimedia, " SMPT E Motion Imaging

Journal, vol. 114, no. 4, pp. 152 - 160, april 2005.