[ieee 2012 1st international conference on emerging technology trends in electronics, communication...

PERFORMANCE EVALUATION OF FADE AND DISSOLVE TRANSITION SHOT BOUNDARY DETECTION IN PRESENCE OF MOTION IN

VIDEO 1Pradip Panchal, 2Shabbir Merchant

1Research Scholar, 2Professor, Department of Electrical Engineering Indian Institute of Technology, Bombay, Powai, Mumbai, India

Abstract-Detection of interesting part from video on web or in large database with less time consumption is a big issue. Decomposition of video into shots and scenes with key frame extractions are useful to some extent with the compromise in spatio-temporal characteristics. We have presented techniques for frame base shot boundary detection, scene change detection and key frame extraction in videos, which are useful to differentiate in presence of any type of motions. It is useful to define particular frame ranges of gradual transitions as a fade and dissolve as sequence of frames in presence of motion, which are difficult to detect compare to the sharp transitions. Approach is based on the discrete cosine transform based DC Coefficients and DC image, where mean and variance of DC image values are sufficient to choose threshold to detect these transitions effectively. This concept was applied and verified on large database; and here we presented results applied on different movies. Performance is verified with existing techniques in terms of recall, precision measures and results reflect more about this concept at the end. Keywords- shot boundary detection, discrete Cosine Transform, DC co-efficient, DC image

I. INTRODUCTION AND RELATED WORK

To enable a large database of video to be available online, effective access and search of interesting part of video or movie in less time is necessity of the system with the advances in video compression standards, high speed and broadcast networks. Availability of high resolution capturing, recording with the cheapest cost and high performance digital devices the production of digital video is huge and its trend. The focus on object based or content based retrieval makes the video indexing, retrieval and management automatic. In video processing with the prior process of video segmentation means video into shots, people generally concerned about semantic relation between the frames of any shot or scene. Basic operations are applied first; the content of frames and types of changes from one frame to another frame or in the group of frames and that is all about shot boundary detection and key frame extraction. Currently most of the videos are in compressed domain so it is desirable to detect the sharp and gradual transitions directly in the compressed domain. Initially, the video is divided in small sections called shots, where shot is defined as number of frames which are being recorded by camera in particular time. The process of segmentation is

called shot boundary detection. These shots are assembly means sequence of frames and from these frames key frame extractions are applied for scene retrieval and browsing of scenes of viewer interest can be separated.

Production of mass digital videos, movies and TV broadcast is moving into the digital era. The area of content-based video retrieval aims to automate the indexing, retrieval, and management of this video data. For efficient video storage and management, video segmentation must be performed prior to all other processes. Video segmentation is a technique that divides physical units, generally called shots consists of one continuous action. These shot boundaries can be categorized into two types, abrupt transition (cut) and gradual transition (GT). The GT can be further classified into dissolve, wipe, fade in and fade out [13, 14, 15].

Abrupt transition occurs between two consecutive frames, gradual transition occurs over multiple frames. For finding gradual transitions care of data of multiple frames have to be taken. This gradual transition includes dissolve, fade in/out, wipe etc. In fade transitions, there are two types, first which includes change from dark frame to picture information of scene is called fade in and the other in which includes transition from picture information of scene to dark screen. In dissolve there is transition from one scene to another scene, where in between two scenes one scene is disappear and other scene appear, simultaneously.

A literature presents comparison of shot boundary detection methods. In which, Pair-wise comparison, Color histogram method, and histogram comparison have been used as a different methods for shot boundary detection by Zhang et al. [1], and noticed that maximum possibility of false positives are during the object and camera motion. The classification techniques and their variations and comparison of several shot boundary detection with histogram, edge tracking, discrete cosine transform, motion vector, and block matching methods presented by Boreczky and Rowe [2]. Lienhart [3] has also used color histogram differences method, standard deviation of pixel intensities, and edge based contrast method to find shot boundaries and tested on a diverse set of video sequences. The major issues which are related to the shot boundary detection analyse in detail and it is identified by Hanjalic [4]. Evaluation, characterization and performance based number of shot detection techniques using color histograms based, Moving

2012 1st International Conference on Emerging Technology Trends in Electronics, Communication and Networking

978-1-4673-1627-9/12/$31.00 ©2012 IEEE

Picture Expert Group (MPEG) compression parameter information based, and image block motion matching presented by Gargi et al. [5]. Results and analysis on various histogram test statistics, statistic-based metrics, pixel differences, MPEG metrics and an edge-based metrics reported by Ford et al. [6]. Yuan et al. [7] have reported a comprehensive review of the existing methods and identified the major challenges to the shot boundary detection, and found that the elimination of disturbances caused by large object and camera movement is the major challenge to the current shot boundary detection techniques. Sethi and Patel [8] have also tested statistical test on scene change detection.

Though it has been reported that gradual transition detection is difficult in presence of object motion and camera motion, it has been the major source of false positives and missed. Comparisons which have been used on test video data, the different methods where not able to identify number of frames with fast camera and object motion. Evaluation and the performance analysis have been performed with major techniques for shot boundary detection; it has not only hit the gradual transitions but specifically indicates exact frame numbers in the presence of camera and object motion. To compare various methods, we have the video clips as the test videos, and detected shot transitions (boundaries) in videos where motion observed on both the sides.

The structure of this paper is as follow: In Section 2, for comparison of experimental results is used as major methods have been discussed. Evaluation criteria and test video sequences described in the Section 3. Simulation results of the traditional shot boundary detection methods in Red Green Blue (RGB) color space have been presented in Section 4. The proposed algorithm and its performance comparison with the traditional methods are discussed in the Section 5. Finally, paper concluded with the remarks and future work discussed in the Section 6.

II. EXISTING MAJOR METHODS USED FOR SHOT BOUNDARY DETECTION

Employed mathematical notations to describe these methods are summarized as follows: Let fk and fk+1 are the consecutive frames, μk and μk+1 are the mean intensity value of these frames, σk and σk+1 are the standard deviations of intensity value of frames fk and fk+1, respectively. The total number of frames in a one video denoted as F with 1 ≤ k ≤ F-1, M x N is the size of the image or frame, where 1 ≤ x ≤ M, and 1 ≤ y ≤ N.

A. Pixel Difference Method The easiest method to shot detection is the two frames are

significantly different, is to count the number of pixels that have changed. If more than a given percentage of the total numbers of pixels have been changed, a shot boundary is declared. The pixel differences method (denoted as PDM) is defined as

( )∑∑= =

+−=M

x

N

ykyxfkyxf

MNkPDM

1 1]1,,[],,[1)( (1)

This method use to fails to detect shot in presence of camera and object motion, this effect of motion can be reduced by using 3х3 averaging filter before pixel based comparison suggested by Zhang et al. [1].

B. Histogram Difference Mostly applied common method used to detect shot

boundaries. Histogram difference is defined by

∑=

+−=G

jkk jHjHkHD

11 ])[][()( (2)

Where, Hk[j] and Hk+1[j] denotes the histogram of the kth frame and (k+1)th frame, respectively, and j is one of the G possible gray level. The histogram comparison algorithm (HD) is also less sensitive to object motion than pixel differences.

C. χ2-square test Nagasaka and Tanka [10] worked on histogram and pixel

difference techniques, and concluded that histogram methods are most effective; they proposed that to obtain the best results by dividing the frames into 16 regions, and applying a χ2-square test (denoted as CHT) on color histogram of those regions and it is defined as

∑= +

+−=

G

j k

kk

jHjHjH

kCST1 1

21

][])[][(

)( (3)

Where, Hk[j] denotes the histogram value for the kth frame and j is one of the G possible gray levels.

D. Color Histogram Color histogram (denoted as CH) comparison is calculated

by histogram comparison of each color space of adjacent two frames and is defined as

( ) ( ) ( ){ }∑=

+++ −+−+−=G

jk

bk

bk

gk

gk

rk

rbgr jHjHjHjHjHjHkHD

1111,, ][][][][][][)( (4)

Where, Hr

k[j], Hgk[j] and Hb

k[j] denote the histogram value for the kth frame in R, G, and B color space, respectively.

E. Block based χ2-square histogram Block based χ2-square histogram [9, 11] (BCH) is computed

by, Let F(k) be the kth frame in video sequence, k = 1, 2,…,Fv (Fv denotes the total number of frames in video).

[ ]∑

−

=

+−=+1

0

2

),,()1,,(),,(),,1,(

L

lB kjiH

kjiHkjiHjikkD (5)

Where, H (i, j, k) and H (i, j, k+1) are the histogram of blocks at (i, j) in the kth and (k+1)th frame respectively and L is the number of gray in an image.

∑∑= =

+=+m

i

n

jBij jikkDwkkD

1 1

),,1,()1,( (6)


978-1-4673-1627-9/12/$31.00 ©2012 IEEE

t ,t +1

Where m = n = 3, w11= 2, w12 =1, w13 = 2, w21 = 1, w22 = 1, w23 = 1, w31 = 2,w32 =1, w33 = 2.

F. Joint Entropy and Mutual Information based approach The average information per pixel of an image is called its

entropy [12]. The entropy is measure of the distance between two probability distributions. If a image t have N messages, including n different messages, the kth message (k=1,2,…..,n) repeats hk times, then hk/N, thus the entropy H(t) of a source A is defined by

∑=

⎟⎟⎠

⎞⎜⎜⎝

⎛=

n

k kk P

PtH1

1log)( (7)

The gray level histogram of image may count up the number

of pixel of each gray level. It’s useful in probability of image gray level. The histogram of two consecutive images A and B is H(i) and H(j) respectively, the probability is PA(i) and PB(j), thus

∑=

A

A iHiHiP

)()()( and

∑=

B

B jHjHjP

)()()( (8)

The joint histogram of two consecutive images A and B may

count the number of pixel-pair of two images in corresponding position. The gray level of image A and image B is i and j respectively so probability is,

∑=

ji

AB jiHjiHjiP

,),(

),(),( (9)

The marginal probability distribution is represented by joint

probability.

∑=j

ABA jiPiP ),()( , ∑=i

ABB jiPiP ),()( (10)

If PA(i) and PB(j) is probability of image A and B

respectively. PAB(i,j) is joint probability of image A and B, then the joint entropy (denoted as JE)is

∑ ⎟⎟⎠

⎞⎜⎜⎝

⎛=

ji ABAB jiP

jiPBAH, ),(

1log),(),( (11)

Thus, the mutual information I(A,B) between frame A and B base on entropy is,

),()()(),( BAHBHAHBAI −+= (12)

Mutual information (denoted as MI) is similarity metric

between frames. The major difference in content it represents

the smaller mutual information. Existence of shot transition detection measure frames relevance with mutual information. Adjacent frames mutual information reflects the presences of cut transition, but gradual transitions detection performance is not in effect. In contrast to cut, a gradual transition spreads across a number of frames, therefore in order to capture the duration of the transition, taking into account all the frames within a certain temporal window W.

In this case, the mutual information and joint entropy between two frames are calculated separately, for each of the RGB color components. In the case of R component,

),(1, jiC Rtt + , 0 ≤ i, j ≤ N − 1 (N being the number of gray

levels in the frame), corresponds to the probability that a pixel with gray level i in frame ft has gray level j in frame ft +1. In other words, Ct,t+1(i, j) equals to the number of pixels which change from gray level i in frame ft to gray level j in frame ft +1, divided by the total number of pixels in the video frame. The mutual information Ik,l of frames fk, fl of the R component is expressed as

∑∑−

=

−

=⎟⎟⎠

⎞⎜⎜⎝

⎛=

1

0

1

0

,,, )()(

),(log),(

N

i

N

jRl

Rk

RlkR

lkR

lk iCiCjiC

jiCI (13)

The total mutual information (MI) calculated between frames fk, fl is defined as

( ) B

lkG

lkR

lklk IIIffI ,,,, ++= (14)

The joint entropy Ik,l of frames fk, fl for the R component is defined as

( ) ( )∑∑−

=

−

=

−=1

0

1

0,,, ),(log,

N

i

N

jlk

Rlk

Rlk

R jiCjiCH (15)

The total joint entropy calculated between frames fk, fl is defined as

lk

Blk

Glk

Rlk HHHH ,,,, ++= (16)

Figure 1: Diagram showing pairs of frames, which contribute to Icumm(i) for

a window size of Nw = 6.

In order to detect gradual transitions boundary, taking a temporal window W of size Nw, which is centered around


978-1-4673-1627-9/12/$31.00 ©2012 IEEE

frames fi,fi+1, A small value of the MI indicates the existence of a cut between frames ft and ft+1. The mutual information between pairs of frames shown is in Figure 1. Since MI decreases when the transmitted information from one frame to another is small (in case of cuts and fades) the Joint Entropy is employed, to efficiently distinguish fades from cuts.

The JE measures the amount of information carried by the union of these frames. Therefore, its value decreases only during fades, where a weak amount of inter-frame information is presented.

Then a cumulative measure which combines information from all these frame pairs is calculated as follows:

∑ ∑−=

+

+=

=i

ik

i

illkcumm ffIiI

σ

σ

1),()( (17)

Where σ = NW/2 is half the size of the temporal window.

The procedure is repeated for the whole video sequence by sliding the window over all the frames, which provides information on shot gradual transition detection.

F. Proposed Approach In shot boundary detection of video, we focused specifically

on gradual transition like fade and dissolve. First, Discrete Cosine Transform (DCT) of an image or video frame (k is number of frames in video) is obtained. In which, the image M × N (or frame) is first divided in the block of size 8 × 8; there are p × q numbers of blocks of an image. This division of DCT of every block is obtained, specifically. Each block has one DC coefficient and one or more AC coefficients. These coefficients of each block used to detect abrupt and gradual transitions. In this approach DC image of current frame is calculated and compared with DC image of next frame or frames. DC image of frame can be calculated by [3].

DC (i, j) = ∑∑= =

7

0

7

0

),(641

x y

yxDCT (18)

In above equation the DCT(x, y) shows the 64 DCT values

of one 8 × 8 block. DC (i, j) shows the DC coefficient of each 8 × 8 block of an image, where i = 1, 2 … p and j = 1,2 ...q. First or DC coefficient of all block DCT of an image jointly represents DC image. Take successive DC images of frames and perform the following operation for fade effect detection: compute mean of DC image of all frames.

m = 1

)(m

kDC∑ (19)

Where, m1 represents total number of values in DC image.

Mean of DC image tapered on either side of minimum value completely hits the fade in/out transitions and detected in presence of motion. On the other side for the detection of dissolve effect following operation is applied. Compute the variance of DC image, and variation in variance of frames is

tracked, abrupt change from minimum value from each frame based results dissolve transition detected efficiently in presence of motion.

Variance (DC image) = 22mm − (20)

In the above equation 2

m represents square of mean of DC

image and 2m represents the mean of squared DC image. It means DC image is multiplied with it self and then mean of that squared DC image is taken as:

2m = mm × (20)

2m =1

2

mm

(21)

It is useful to detect dissolve transitions. III. TEST VIDEO SEQUENCE AND EVALUATION

CRITERION USED FOR EVALUATION

The proposed algorithm has been tested on movies The Last Airbender (TLA), Die Hard 1(DH), Shrek 1(SRK), Toy story (TS), Rab ne bana di jodi (RNB), Vivah (VVH), Toy story – 3 (TS3), Charusat (CST), and General (any video from internet) (OHR). These movies are manually observed frame by frame to find actual shot boundaries. These movies are considered video clips where camera and object motion is observed in addition to flash, fire effects and the gradual shot boundaries. Number of frames considered for test video sequence in each movie is shown in Table 1.

A. Evaluation Criterion Traditionally, Recall and Precision are the two metrics used

for evaluation of shot detection algorithms. Recall is defined as

DC

MCCR =+

= (22)

Where as, Precision is defined as

FPCCR+

= (23)

Where, D is the total number of shot boundaries in the test video sequence, C is the number of shot boundaries correctly detected by the algorithm, M is the number of shot boundaries missed by the algorithm, and FP is the false positive detected by the algorithm. Also, to rank the performance of different algorithms, F1 measure have been used, i.e., harmonic average of Recall and Precision defined as

PRPRPRF

+××= 2),(1 (24)

IV. EVALUATION RESULTS OF THE TRADITIONAL SHOT BOUNDARY DETECTION METRICS

The performance of methods such as pixel difference, histogram difference, color histogram, Chi-square test, color histogram and block based χ2 histogram in RGB color space has been compared on the same data sequence. The


978-1-4673-1627-9/12/$31.00 ©2012 IEEE

performance comparison between χ2 histogram and color histogram in RGB color space are shown in Table 2. It has been observed that χ2 histogram provided better result than color histogram as per F1 measure of respective videos. Significant presence of false positives and missed frames are in both the algorithm was due to the camera and object motion. [18]

The performance comparison between Mutual Information with Joint Entropy based method and block χ2 histogram method in RGB color space are shown in Table 3. The performance of block χ2 histogram method is slightly better in RGB color space, where as the performance of color histogram in RGB color space is better than gray scale histogram. The performance of these methods was found poor in RGB color space for video DH1 due to large number of frames with motion.

We also tested the performance of proposed method, where mean of DC image and variance of DC image are able to detect frame base fade and dissolve transition respectively. Results verified over various videos as shown in Table 4 and 5. Overall, it has been observed that all these methods did not perform well due to the disturbances caused by camera and object motion. The maximum false positives and missed detections were due to frame difference between consecutive frames caused by motion.

A. Computation of Threshold For detection of fade transition, threshold must be nearer to

zero or towards minimum value of mean. To decide presence of fade effect or not, several continuous frames observed. Similarly, detection of dissolve effect concentration is kept on variation in variance of the frames calculated from the DC image. Depending on the threshold value the desired results obtained. Mostly, there are fade or dissolve type of gradual transitions are significantly present which are also plays role to present theme of story.

In the result of mean when there is transition from higher to lower value of mean then it is treated as fade out effect towards minimum value of mean or zero, and when there is transition from low value to high value then it is treated as fade in effect. In the case of variance the transition from low to high and high to low both are treated as dissolve effect.

Figure 2 Fade detection based on mean of DC image The figure-2 shows specific 1000 frames for the fade in and fade out effects are detected effectively. Where, the fade or dissolve transition effects are shown by small circles. Similarly, figure 3 shows dissolve effect by variance of DC image.

Figure 3 Dissolve detection based on variance of DC image TABLE I

NUMBER OF FRAMES CONSIDERED FOR ANALYSIS FROM EACH TEST VIDEO [16, 17]

Movie Number of frames

TLA 86500

DH 90000

SRK 90000

RNB 40000

VVH 20000

TS3 20000

CST 1650

OHR 1175

TABLE II PERFORMANCE COMPARISON BETWEEN χ 2-HISTOGRAM AND COLOR

HISTOGRAM, FOR FADE TRANSITION DETECTION

Methods Measure TLA DH1 SRK

CHT R 0.91 0.66 0.75

P 0.97 0.50 0.78

F1 0.93 0.57 0.76

CH R 0.81 0.61 0.67

P 0.76 0.49 0.62

F1 0.78 0.54 0.65

TABLE III PERFORMANCE COMPARISON BETWEEN MUTUAL INFORMATION WITH JOINT ENTROPY AND BLOCK Χ2-HISTOGRAM, FOR FADE TRANSITION DETECTION

Methods Measure TLA DH1 SRK

MI & JE R 0.90 0.70 0.75

P 0.99 0.54 0.97

F1 0.95 0.61 0.84

BCH R 0.84 0.66 0.82

P 0.75 0.98 0.96

F1 0.79 0.79 0.89

TABLE IV FRAME BASED DISSOLVE EFFECT DETECTION USING VARIANCE OF DC IMAGE

Video D C M FP R P F1

TS3 14 11 3 1 0.79 0.92 0.84

CST 4 4 0 1 1 0.80 0.88

OHR 6 6 0 0 1 1 1


978-1-4673-1627-9/12/$31.00 ©2012 IEEE

TABLE V FRAME BASED FADE EFFECT DETECTION USING MEAN OF DC IMAGE

Video D C M FP R P F1

TLA 11 10 1 0 0.9 1 0.95

DH 5 4 1 2 0.8 0.66 0.72

SRK 4 3 1 1 0.75 0.75 0.75

RNB 8 8 0 1 1 0.88 0.94

VVH 5 5 0 1 1 0.84 0.91

TS3 5 5 0 0 1 1 1

V. CONCLUSION AND FUTURE WORK

Disturbance caused by object and camera motion are often mistaken as shot boundaries and its elimination is the major challenge to the shot boundary detection algorithms. We evaluated the performance of major traditional shot boundary algorithms in presence of motion for various color space. From the experimental results, it has been found that the χ2 histogram method performed better than the color histogram in RGB color space. However, Block χ2 histogram and Mutual information with joint entropy methods performed better on different video in RGB color space. The performance of all the method is poor due to the disturbances caused by motion.

Hence, proposed method of shot boundary detection in the presence of motion performs significantly well using mean and variance of DC image and thresholds. We used video clips where large number of frames with motion is observed in addition to shot boundaries to test the robustness of the proposed algorithm. The performance of the proposed algorithm has been tested on large database and observed that F1 in terms of recall and precision is nearer to 85 (%) to detect presence of fade in/out and in other case around 90 (%) F1 value which detects presence of dissolves transitions and performed better than these conventional methods in terms of improved recall, precision and F1 measure.

REFERENCES [1] H J Zhang, A Kankanhalli, and S Smoliar, “Automatic partitioning of

full motion video”, Multimedia systems, Vol. 1No.1, pp. 10-28, Jan 1993.

[2] J S Boreezky, and L A Rowe, “Comparison of video shot boundary detection techniques”, Proc. SPIE Stoarage Retrieval Image Video Databases, Vol. 2664, No. 4, pp. 170-9, Jan 1996.

[3] R. Lienhart, “Comparison of automatic shot boundary detection algorithms”, Proc. SPIE Image and Video Process., Vol. 3656, No. 7, pp. 25-30, Jan, 1999.

[4] A Hanjalic, “Shot boundary detection: Unraveled and resolved”, IEEE Transaction on Circuits and Systems for Video Technology, Vol. 12, No. 2, pp. 90-105, Feb. 2002.

[5] U. Gargi, R Kasturi, and S Strayer, “Performance characterization of video-shot-change detection methods”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, No. 1, pp. 1-13, Feb 2000.

[6] R. Ford, C. Roboson, D. Temple, and M Geriach, “Metrics for shot boundary detection in digital video sequences”, Multimedia System, Vol. 8, pp. 37-46, 2000.

[7] J Yuan, H Wang, L Xiao, W Zheng, J Li, F Lin, et al., “A Formal Study of Shot Boundary Detection”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 17, No. 2, pp. 168-86, Feb. 2007.

[8] I K Sethi, and N Patel, “A statistical approach to scene change detection”, SPIE Proc. On Storage and Retrieval for Image and Video Databases III, Vol. 2420, pp. 329-38, Feb. 1995.

[9] Robert A Joyce, Bede Liu, ”Temporal segmentation of video using frame and histogram space”, IEEE Transaction on multimedia, vol.8 Issue1, February 2006.

[10] A Nagasaka, and Y Tanka, “Automatic video indexing and full video search for object appearance”, Visual Database Systems II, E Knuth, and L Wegner Editors., Elsevier Science Publishers, pp. 113-27, 1992.

[11] Irena Koprinskal, Sergio Carrato,” Temporal video segmentation-a survey” Signal processing: Image communication 16 Elsevier, 2001 pp 477-500.

[12] Zuzana Cˇ erneková, Ioannis Pitas,and Christophoros Nikou, “Information Theory-Based Shot Cut/Fade Detection and Video Summarization”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, No. 1, Jan 2006.

[13] Boon Lock Yeo Bede Liu,” Rapid scene analysis on compressed video” IEEE Transaction on circuits and system for video technology vol.5, no.6, December 1995.

[14] Gentao Liu, Xiangming Wen, Wei Zheng, Peizhou He, “Shot Boundary Detection and Key frame Extraction based on Scale Invariant Feature Transform,” IEEE/ACIS International Conference on Computer and Information Science, IEEE-2009, pp 1126-1130.

[15] Yeung, M.M, and Yeo Book-Lock Yeo; “Video visualization for compact presentation and fast browsing of pictorial content”, IEEE Transaction on Circuits System for video technology Vol.7, Issue 5 (Oct. 1997), pp 771–785.

[16] www.moserbaerhomevideo.com [17] www.nada.kth.se/index.asp [18] www.mathworks.com


978-1-4673-1627-9/12/$31.00 ©2012 IEEE

[ieee 2012 1st international conference on emerging technology trends in electronics, communication...

Documents