678 ieee transactions on image processing, vol. 15, no. …dcl/video halftoning.pdf · as channel...

9
678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCH 2006 Video Halftoning Zhaohui Sun, Member, IEEE Abstract—This paper studies video halftoning that renders a dig- ital video sequence onto display devices, which have limited in- tensity resolutions and color palettes, by trading the spatiotem- poral resolution for enhanced intensity/color resolution. This trade is needed when a continuous tone video is not necessary or not practical for video display, transmission, and storage. In partic- ular, the quantization error of a pixel is diffused to its spatiotem- poral neighbors by separable one-dimensional temporal and two- dimensional spatial error diffusions. Motion-adaptive gain control is employed to enhance the temporal consistency of the visual pat- terns by minimizing the flickering artifacts. Experimental results of halftone and colortone videos are demonstrated and evaluated with various halftoning techniques. Index Terms—Digital halftoning, human visual system, motion estimation, spatiotemporal error diffusion, temporal flicker, video rendering and display. I. INTRODUCTION W ITH THE ADVANCE of digital technologies, digital video is getting easier and more efficient to use in a wide variety of applications, such as entertainment, education, medicine, security, and the military. Accordingly, there is an increasing demand for video processing techniques [1]. Video halftoning is a task that renders video sequences onto display devices that have limited intensity resolutions and color palettes. It provides an alternative for video representation, ren- dering, storage, and transmission when continuous tone video is not necessary or not practical. And it can be used in various applications, including the following. Display. It can be used to render continuous tone video onto display devices, when there is a mismatch between the image/video representation and the display capability because of the constraints of cost and system complexity, such as small electronic gadgets (e.g., cellular phone, personal digital assistant (PDA), game console, and vehicle dashboard), large screen display (e.g., cinema poster, commercial billboard, and stadium screen), and flexible display (e.g., packaging label). Data reduction. With a shorter bit depth, the size of a halftone or colortone video is much smaller than its coun- terpart with continuous tone and can be further reduced after exploring the temporal consistency of the static and slow-moving visual patterns. Manuscript received January 22, 2004; revised April 6, 2005. This work was done when the author was with the Research and Development Laboratories, Eastman Kodak Company, Rochester, NY. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Zhigang (Zeke) Fan. The author is with the Visualization and Computer Vision Lab, GE Global Research, Niskayuna, NY 12309 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TIP.2005.863023 Error-resilient communication. As stochastic noise patterns are used to conceal the quantization errors in the spatiotemporal domain less visible to the human eye, any random perturbation on the halftone video, such as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is particularly suitable for wireless communication. Related prior art includes digital image halftoning and the extension to handling image sequences. Image halftoning re- duces the image intensity/color resolution for the best possible reproduction and has wide applications in the printing industry. A number of techniques have been proposed [2]–[7], such as error diffusion, ordered dither, dot diffusion, and stochastic screening. A comprehensive review can be found in [8]–[10]. Studies have also been carried out applying the halftoning technology to image sequences [11]–[16]. A three-dimensional (3-D) error diffusion algorithm is proposed in [11], including a scheme to minimize the flickering artifacts. In [12], an iterative image halftoning algorithm is applied to the image sequence where the halftone map on the previous frame is used as the starting point for iterative refinement on the current image frame, thus minimizing the temporal flicker. In [14], spatiotem- poral error diffusion filters are designed for the luminance and chrominance channels at different frame rates. And the direct-binary-search algorithm is applied to 3-D error diffusion in [15]. Motivated by the widespread use of image halftoning in the printing industry, we extended the concept to video halftoning for display applications, by reducing the video tone scale with minimum visual degradation. The major contributions of this paper are as follows: scheme of video tone-scale reduction by separable tem- poral and spatial error diffusion; method of temporal flicker reduction by the use of motion information. After the problem formulation in Section II, separable temporal and spatial error diffusions are employed in Section III to diffuse the quantization error of a pixel to its causal temporal neighbor along the motion trajectory and its causal spatial neighbors. The diffused interframe error is large for fast-moving patterns at high frame rates and small for static patterns at low frame rates. To alleviate the temporal flickering artifacts, the quantization threshold of a pixel is adaptively adjusted based on the motion information in Section IV, which increases the inertia of the in- terframe error diffusion in static regions to enhance temporal consistency and encourage free error diffusion in fast-moving regions for the best image quality. A video halftoning algo- rithm is presented in Section V. It is applied to the generation 1057-7149/$20.00 © 2006 IEEE Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Upload: others

Post on 19-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCH 2006

Video HalftoningZhaohui Sun, Member, IEEE

Abstract—This paper studies video halftoning that renders a dig-ital video sequence onto display devices, which have limited in-tensity resolutions and color palettes, by trading the spatiotem-poral resolution for enhanced intensity/color resolution. This tradeis needed when a continuous tone video is not necessary or notpractical for video display, transmission, and storage. In partic-ular, the quantization error of a pixel is diffused to its spatiotem-poral neighbors by separable one-dimensional temporal and two-dimensional spatial error diffusions. Motion-adaptive gain controlis employed to enhance the temporal consistency of the visual pat-terns by minimizing the flickering artifacts. Experimental resultsof halftone and colortone videos are demonstrated and evaluatedwith various halftoning techniques.

Index Terms—Digital halftoning, human visual system, motionestimation, spatiotemporal error diffusion, temporal flicker, videorendering and display.

I. INTRODUCTION

WITH THE ADVANCE of digital technologies, digitalvideo is getting easier and more efficient to use in a

wide variety of applications, such as entertainment, education,medicine, security, and the military. Accordingly, there is anincreasing demand for video processing techniques [1].

Video halftoning is a task that renders video sequences ontodisplay devices that have limited intensity resolutions and colorpalettes. It provides an alternative for video representation, ren-dering, storage, and transmission when continuous tone videois not necessary or not practical. And it can be used in variousapplications, including the following.

• Display. It can be used to render continuous tone videoonto display devices, when there is a mismatch betweenthe image/video representation and the display capabilitybecause of the constraints of cost and system complexity,such as small electronic gadgets (e.g., cellular phone,personal digital assistant (PDA), game console, andvehicle dashboard), large screen display (e.g., cinemaposter, commercial billboard, and stadium screen), andflexible display (e.g., packaging label).

• Data reduction. With a shorter bit depth, the size of ahalftone or colortone video is much smaller than its coun-terpart with continuous tone and can be further reducedafter exploring the temporal consistency of the static andslow-moving visual patterns.

Manuscript received January 22, 2004; revised April 6, 2005. This work wasdone when the author was with the Research and Development Laboratories,Eastman Kodak Company, Rochester, NY. The associate editor coordinatingthe review of this manuscript and approving it for publication was Dr. Zhigang(Zeke) Fan.

The author is with the Visualization and Computer Vision Lab, GE GlobalResearch, Niskayuna, NY 12309 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TIP.2005.863023

• Error-resilient communication. As stochastic noisepatterns are used to conceal the quantization errors inthe spatiotemporal domain less visible to the human eye,any random perturbation on the halftone video, suchas channel noise, is less pronounced in terms of imagequality degradation. Therefore, it is particularly suitablefor wireless communication.

Related prior art includes digital image halftoning and theextension to handling image sequences. Image halftoning re-duces the image intensity/color resolution for the best possiblereproduction and has wide applications in the printing industry.A number of techniques have been proposed [2]–[7], such aserror diffusion, ordered dither, dot diffusion, and stochasticscreening. A comprehensive review can be found in [8]–[10].Studies have also been carried out applying the halftoningtechnology to image sequences [11]–[16]. A three-dimensional(3-D) error diffusion algorithm is proposed in [11], including ascheme to minimize the flickering artifacts. In [12], an iterativeimage halftoning algorithm is applied to the image sequencewhere the halftone map on the previous frame is used as thestarting point for iterative refinement on the current imageframe, thus minimizing the temporal flicker. In [14], spatiotem-poral error diffusion filters are designed for the luminanceand chrominance channels at different frame rates. And thedirect-binary-search algorithm is applied to 3-D error diffusionin [15].

Motivated by the widespread use of image halftoning in theprinting industry, we extended the concept to video halftoningfor display applications, by reducing the video tone scale withminimum visual degradation. The major contributions of thispaper are as follows:

• scheme of video tone-scale reduction by separable tem-poral and spatial error diffusion;

• method of temporal flicker reduction by the use of motioninformation.

After the problem formulation in Section II, separable temporaland spatial error diffusions are employed in Section III to diffusethe quantization error of a pixel to its causal temporal neighboralong the motion trajectory and its causal spatial neighbors. Thediffused interframe error is large for fast-moving patterns athigh frame rates and small for static patterns at low frame rates.To alleviate the temporal flickering artifacts, the quantizationthreshold of a pixel is adaptively adjusted based on the motioninformation in Section IV, which increases the inertia of the in-terframe error diffusion in static regions to enhance temporalconsistency and encourage free error diffusion in fast-movingregions for the best image quality. A video halftoning algo-rithm is presented in Section V. It is applied to the generation

1057-7149/$20.00 © 2006 IEEE

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 2: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

SUN: VIDEO HALFTONING 679

of halftone and colortone video sequences, and evaluated withthe other halftoning techniques in Section VI. The paper is con-cluded in Section VII. A preliminary version of this paper ap-peared in [17].

II. PROBLEM FORMULATION

A digital video sequenceis a temporally varying, two-dimensional

(2-D) spatial signal on frame , sampled and quantized at spa-tial location . Signal contains a single luminance channelfor grayscale video and two additional chrominance channelsfor color video. Each channel is quantized to bits, e.g., 8-bitgrayscale video and 24-bit color video when . The task ofvideo halftoning is to transform a full resolution video with acontinuous tone scale (e.g., ) to a dithered video witha shorter bit depth (e.g., ), such that the perceivedvisual difference is made as small as possible.

As shown in Fig. 1, the difference between the continuoustone video and the halftone video is displayed and per-ceived by human eyes. Under the assumption of linearity andignoring the device-dependent display MTF, the perceived vi-sual difference can be written as

(1)

where denotes convolution, and is the impulse responseof the visual system. If is separable in temporal and spatialdimensions, , with and as the temporal andspatial impulse responses, can be further written as

(2)

Accordingly, video halftoning can be formulated as an optimiza-tion problem

(3)

i.e., seeking the optimal halftone video , which minimizesthe perceived visual difference (a summation taken across thespatial and time coordinates).

Video halftoning can be taken as an extension to imagehalftoning by spreading the quantization error of a pixel toits 3-D spatiotemporal neighbors (instead of its 2-D spatialneighbors only) and making the noise less visible to the humanvisual system (HVS). Video halftoning adds an additional tem-poral dimension to conceal the quantization noise and has moreflexibility to enhance video intensity resolution. It also needsmore computational power to process a large amount of data.The spatiotemporal characteristics of the human visual systemare more complicated, and the artifact of temporal flicker needsspecial attention. In the following, the intensity values arenormalized to [0,1], with as black, as white,and as the middle point.

Fig. 1. Video halftoning finds the best halftone renderingV with the minimalperceived visual difference �.

III. SPATIOTEMPORAL ERROR DIFFUSION

We start with a discussion of 3-D error diffusion inSection III-A. Because of its complexity and difficulty inpractice, we resort to separable error diffusion in Section III-B,which is followed by separate discussions on the one-dimen-sional (1-D) temporal error diffusion in Section III-C and the2-D spatial error diffusion in Section III-D. At the end, the ideais extended to multitone and colortone video in Section III-E.

A. Three-Dimensional Error Diffusion

To evaluate the cost function in (3), a spatiotemporal modelof the HVS [18] based on the psychophysical tests can be used.It has a modulation transfer function (MTF) of

(4)

where is the spatial frequency in cycles per degree, andis the temporal frequency in Hz. The model shows lowpass inspatial dimensions and bandpass in temporal dimension. Thebasic idea is to spread the quantization errors to the stop bandsas high frequency noise (“blue noise”) such that they are lessvisible after the spatiotemporal filtering of .

The optimization in (3) can be carried out by 3-D spatiotem-poral error diffusion. An incoming video , along with the pre-viously diffused error , i.e., , are quantized asthe halftone video . The quantization erroris spatiotemporally filtered and fed back to theinput. An alternative approach searches the spatiotemporal so-lution space for the optimal halftone video . It flips a pixel

from 0 to 1 or from 1 to 0 and accepts the new stateif the flip brings down the cost function in (3). The process re-peats until no flip occurs in one sweep of all the pixels on allthe frames, when a local optimum is reached. Both approachesinvolve 3-D spatiotemporal filtering and require large memoryand intensive computation.

The 3-D error diffusion has a few difficulties in practice. Be-cause the human visual system is very complicated [19], a modelis only valid for certain viewing conditions, and error diffusionfilters need to adapt to the video content. Therefore, only localoptimal solutions are practical. As the operations of filtering andcost function evaluation are carried out on 3-D video entities[e.g., group of frames (GOP)] with a large amount of data, theyneed intensive computation, introduce delay, and require highsystem complexity. Any compromise tends to introduce addi-tional artifacts, such as temporal flicker. Therefore, we move

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 3: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

680 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCH 2006

Fig. 2. Separable temporal and spatial error diffusion with motion-adaptivegain control.

away from the 3-D model and resort to the separable temporaland spatial filters studied in [20].

B. Separable Error Diffusion

The 3-D spatiotemporal MTF can be approxi-mated by a concatenation of a 1-D temporal MTF and a2-D spatial MTF ,

, as shown in [20]. Therefore, the 3-D spatiotemporalerror diffusion can be carried out by a temporal error diffusionfollowed by a spatial error diffusion, which greatly simplifiesthe system complexity.

The basic idea is to diffuse part of the quantization error ofa pixel into its causal temporal neighbor along the motion tra-jectory and the remainder to its causal spatial neighbors to mini-mize the spatial visual distortion. The exact amount is controlledby a temporal diffusion map. More interframe error is diffusedtemporally in fast-moving regions, leaving less error to be dif-fused to the spatial neighbors.

The separable error diffusion scheme is shown in Fig. 2.The video frames are processed sequentially. The pixels insidethe frame are scanned in a serpentine order, from left to righton even lines and from right to left on odd lines. At a pixellocation , the image intensity and thequantization errors diffused from its spatiotemporal neighbors

are quantized to , by a comparison ofwith the threshold

ifotherwise.

(5)

Pixel on halftone video is denoted as a blackdot if the adjusted intensity value is less than the threshold, or awhite dot otherwise, yielding a quantization error

(6)

To improve the visual quality, the quantization erroris diffused to the spatiotemporal neighbors. As shown in

Fig. 3. (a) Diffusion of error " (p) to its spatial neighbors and the temporalneighbor on the next frame and (b) collection of error " (p) from its spatialneighbors " and temporal neighbor " .

Fig. 3(a), part of the error is diffused along the motion trajec-tory to the temporal neighbor as and the rest to the intraframeneighbors as in the spatial domain. The diffused errorin (5) is collected from its spatiotemporal neighbors from theprevious computation, as shown in Fig. 3(b)

(7)

Part of is contributed by from the temporalneighbor on the previous frame with a weight of ,and the rest from from the spatial neighbors on the cur-rent frame with a weight of . Motion vector

specifies the horizontal and vertical dis-placements at location in frame to its correspondencein frame . Bilinear interpolation is carried out at thenoninteger locations on the temporal error image . The arethe spatial error diffusion filter coefficients with .In Fig. 3, the spatial neighborsand filter coefficients are chosen as those defined in thevariable-coefficient error diffusion [6], with varying withintensity code value .

C. Temporal Diffusion

The temporal characteristics of the HVS is complicated andless well known than its spatial counterpart. Here, we employa model proposed in [21]. Based on the psychophysical exper-iments, the temporal model consists of a lowpass filter and abandpass filter. Specifically, it uses function

(8)

and its high-order derivatives to model the temporal mechanismof the targets perceived at the center of the human eye. Function

and its normalized second-order derivative , withs and s, are shown in Fig. 4(a), and the frequency

responses are depicted in Fig. 4(b), showing one lowpass filterand one bandpass filter.

Finite impulse response (FIR) filters with linear phases canbe designed to approximate and [22]. At the framerates of 30 and 60 Hz, a total of five and nine video frames fallinto the time span of and . The five-tap lowpass FIR

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 4: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

SUN: VIDEO HALFTONING 681

Fig. 4. Temporal characteristics of the human visual system. (a) Impulseresponse and (b) frequency response.

Fig. 5. (a) Lowpass filter g (t) and the bandpass filter g (t) at 30 Hz.(b) The lowpass and bandpass filters of g (t) and g (t) at 60 Hz.

filter and the five-tap bandpass filter for 30 Hzvideo are shown in Fig. 5(a), and the nine-tap FIR filtersand for 60 Hz video are shown in Fig. 5(b). The filtercoefficients vary with the choices of the scale parameter andthe time-to-peak parameter .

The temporal diffusion map, , on frame [also de-noted as in (7)] is content dependent and can be deter-mined by the temporal characteristics of the HVS and the videoframe rate. Based on the temporal filters and , wechoose as

(9)

so that the major part of the noise energy falls into the stopbands. In (9), is the temporallysmoothed version of . At low frame rates ( 10 Hz),

as , there is no temporal errordiffusion, and all of the quantization errors are exclusively dif-fused to the spatial neighbors. It is the same situation in the staticregions at high frame rates. In the fast-moving regions at highframe rates, approaches 1, allowing more quantization errorto be diffused across frames, and leaving less quantization errorsin the spatial domain. The high frequency noises become lessvisible after temporal smoothing by the HVS. At frame rateshigher than 60 Hz, the temporal masking effect of the humaneye should be taken into consideration, and some frames can bedropped because the sensation of a high-contrast pattern lastsfor a finite duration.

D. Spatial Diffusion

After temporal error diffusion, the rest of the quantizationerror is diffused in the spatial domain, and can be carried

out by various image halftoning techniques with adaptive gaincontrol. Here, only , part of , is diffused spatially, and thesplit of temporal error and spatial error is motion adaptiveand content dependent.

Spatial error diffusion involves the choices of the causalneighbors and the design of the error diffusion filter. Basedon the psychophysical experiments, a model of the spatialfrequency response of the HVS

(10)

has been proposed in [23], where is the fre-quency in degrees per cycle. It has low pass characteristics, witha peak at 8 cycles/degree and dropping to 0 beyond 30 cycles/degree. Thus, it is desirable to distribute the quantization errorto the high frequency bands as the less visible “blue noise.” Inthe following, we use the 2-D variable-coefficient error diffu-sion filter [6] for spatial error diffusion.

E. Colortone Video

The idea of separable error diffusion can be extended tovideos with multiple tone-scale levels or multiple color chan-nels. To generate multitone video for multilevel displays,the binary thresholding in (5) is replaced by a multilevelquantizer, and the quantization error of a pixel is diffused toits causal spatiotemporal neighbors in a similar way. Withadditional chrominance channels in a color video, videohalftoning has more flexibility and complexity to diffuse andconceal the quantization errors in color space as well as thespatiotemporal domain. When color dependency is ignored,the digital video halftoning scheme presented in Fig. 2 can beapplied directly to colortone video generation, by replacingthe scalar intensity variable with a vector color vari-able . Separable error diffusionis independently carried out in the YUV color channels. Thisapproach is taken in the following experiments.

It is also desirable to explore the color dependency and dif-fuse quantization errors across color channels. For example, thehuman eye is less sensitive to noise in the chrominance chan-nels than that in the luminance channel. However, it involvesa more sophisticated vision model and more complicated errordiffusion filters.

IV. MOTION-ADAPTIVE TEMPORAL CONSISTENCY

Temporal flicker is a special artifact that, over time, alter-nates black and white patterns at the same spatial location. It canbe caused by model approximation or independent intraframehalftoning. To alleviate temporal flicker, we use adaptive gaincontrol to increase the temporal consistency in by adaptivelychanging the threshold

ifotherwise

(11)

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 5: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

682 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCH 2006

used in the quantization decision (5). The threshold ismoved away from the middle point , based on the previousvisual pattern, video frame rate, and local motion. The adaptivegain control increases the inertia of the interframe halftoning,making similar to unless the spatiotem-porally diffused error, , is large enough.

The gain control map on frame (also denoted asin (11)) is content dependent and can be chosen as

(12)

where is the motion vector from point in frameto its corresponding point in frame . In static and slow-moving regions, is close to 1, and the halftoning of

is strongly biased to for enhanced tem-poral consistency. In fast-moving regions with large motion vec-tors, is close to 0, and free error diffusion is encouragedto conceal the quantization error. Scale factor , (e.g., 0.75),guides the transition from slow to fast motion.

Numerous motion estimation algorithms can be used tocompute , such as gradient-based, region-based, en-ergy-based, and transform-based approaches. Here, we use arobust algorithm [24] for the computational purpose. In theregions with outliers, caused by occasional model violation orocclusion, is set to 0. It is also helpful to run a medianfiltering on to smooth out any inconsistent outliers. Forsome compressed video, such as the MPEG, QUICKTIME, orstreaming video, the block motion vectors are readily availablein the data stream without further computation.

An alternative model of is the use of temporal vari-ances of adjacent frames instead of the motion vectors

(13)

where expectationis the windowed average of temporal intensity,

with scale factor specifying the intensity deviation (e.g., 5).Another alternative is to use the temporal highpass filtering

as a measure of the intensity changes

(14)

where is a bandpass/highpass temporal filter.

V. ALGORITHM

The video halftoning algorithm is summarized as follows.

Input: Digital continuous tone video.

Output: Halftone/colortone video.

1) Initialize temporal filter , tem-poral diffusion map , gaincontrol map , motion field

, and frame index .2) Process the frames sequentially,

and scan the pixel onframe in a serpentine order.

3) Collect the diffused quantizationerror from the spatiotem-poral neighbors (7).

4) Quantize to based onthe motion-adaptive quantizationthreshold (5).

5) Compute the quantization error(6).

6) Diffuse part of to the tem-poral neighbor along the motiontrajectory.

7) Diffuse the rest of to thecausal spatial neighbors on frame .

8) Go to step 2) for the rest of thepixels on frame , then increaseframe index .

9) Retrieve or compute the motionfield from frame to frame

.10) Determine the temporal diffusion

map (9).11) Determine the gain control map

(12), (13), (14).12) Go to step 2) until all the frames

are processed.

The algorithm has a few advantages. First, it has relativelylow time and space complexities. At each pixel location, onlythe causal spatial neighbors and the temporal neighbor are in-volved in computation. Furthermore, separable error diffusionis carried out by 1-D temporal diffusion followed by spatialdiffusion, which greatly reduces the system complexity. Giventhe motion information, the algorithm can generate high qualityhalftone video for real-time applications with minimal delay.Second, the spatial error diffusion is flexible and compatiblewith the widely available image halftoning techniques, whichcan be used as the plug-in modules. Third, temporal flickeringartifacts are minimized by motion-adaptive gain control.

VI. EXPERIMENTAL RESULTS

A. Setup

The proposed video halftoning scheme is tested on two videosequences. The grayscale “Trevor” sequence has 99 frames anda spatial resolution of 256 256. It is shot by a static camerawith a static textured background and a moving foreground (a

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 6: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

SUN: VIDEO HALFTONING 683

Fig. 6. (a) Frames 2, 34, 66, and 99 of the grayscale video sequence “Trevor” overlaid with motion vectors to the previous frames. (b) Frame 34 of the halftonevideo at the frame rates of (left) 30 Hz and (right) 60 Hz.

person wearing a highly textured shirt and tie). The color “Foot-ball” sequence has 97 frames and a spatial resolution of 360

240. It is shot by a slow-moving camera with fast-movingplayers and a highly textured field.

A weighted signal-to-noise ratio (WSNR) is used as a metricfor performance evaluation, which is defined as

(15)

where the temporal filter is chosen as orshown in Fig. 4. It is a measure of the filtered signal energy overthe filtered noise energy. A large WSNR indicates high videoquality with small visual degradation. Subjective visual qualityis also judged by observers.

B. Halftone/Colortone Video

Selected frames of the “Trevor” sequence, overlaid with themotion fields to the previous frames, are shown in Fig. 6(a). The8-bit grayscale sequence is rendered as a monochrome videowith only black and white dots. The results on frame 34 of thehalftone videos rendered at frame rates of 30 and 60 Hz areprinted in Fig. 6(b) at a spatial resolution of 120 dpi. The randompatterns, coupled with the characteristics of the HVS, provide asensation of enhanced tone scale.

The algorithm is also applied to the colortone video. Selectedframes of the 24-bit color “Football” sequence, overlaid withthe motion fields to the previous frames, are shown in Fig. 7(a).The continuous tone color video is rendered as a colortone videowith a palette of only eight colors, black, white, red, green, blue,

cyan, magenta, and yellow. The results on frame 34 at framerates of 30 and 60 Hz are printed in Fig. 7(b) at 120 dpi. Thecolortone video uses only a fraction of the colors to produce arealistic tone scale rendering.

Examples of the gain control maps, , on frame 34 ofthe sequences are shown in Fig. 8(a) and (c). The white regionsdenote static and slow-moving patterns, which are strongly bi-ased to the halftone patterns on the previous frames. The darkregions denote fast-moving patterns that allow free error dif-fusion for the best possible image reproduction. The temporaldiffusion map, , determines the weights for the temporalerror diffusion. It tends to increase at high video frame rates.Examples on frame 34 of the sequences are shown in Fig. 8(b)and (d). The dark regions diffuse all errors inside a frame, andthe white regions spread more errors across frames.

C. Evaluation

The proposed video halftoning algorithm is compared to fivedifferent halftoning techniques, including the Floyd-Steinbergerror diffusion method [2], the ordered dither method [3], theframe-dependent image halftoning method (Gotsman) [12],the Hild-Pins halftoning method [11], and the 3-D error diffu-sion method (AFHBA) [14]. The visual results for the 30 Hz“Trevor” sequence are shown in Fig. 9, and the numericalresults in terms of WSNR are presented in Table I, where thetemporal filter is chosen as the lowpass filter orthe bandpass filter at 30 Hz [shown in Fig. 5(a)].

The halftoning results by the six methods on a portion offrame 34 in Fig. 6(b) are printed at 100 dpi in rows (a) and (c)of Fig. 9. The rendering results by the ordered dither method in

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 7: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

684 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCH 2006

Fig. 7. (a) Frames 2, 34, 66, and 97 of the color video sequence “Football” overlaid with motion vectors to the previous frames. (b) Frame 34 of the colortonevideo at the frame rates of (left) 30 Hz and (right) 60 Hz. (Color version available online at http://ieeexplore.ieee.org.)

Fig. 8. Gain control maps � (i; j) in (a) and (c) and temporal diffusion maps � (i; j) in (b) and (d) on frame 34 of the sequences at 30 Hz.

TABLE IPERFORMANCE COMPARISON OF THE HALFTONING SCHEMES

III (a) and the Hild-Pins method in II (c) are not as good as theothers. The differences between the halftone frames 34 and 33(the flickering artifacts) by the six methods are shown in rows(b) and (d) of Fig. 9. With special attention, the temporal flickercan be dramatically reduced by the video-halftoning method inI (b), the Gotsman method in I (d), and the Hild-Pins methodin II (d). The AFHBA and Floyd-Steinberg methods give goodspatial rendering in III (c) and II (a); however, the flickeringartifacts dominate the frame differences in III (d) and II (b),in the static background as well as the moving foreground,yielding poor temporal quality. Conversely, the order-ditherand Hild-Pins methods enforce temporal consistency aggres-sively, so the frame difference is very small, which means thetemporally adjacent halftone frames are almost identical, andthe spatial quality is poor. Only the video-halftoning method

enforces content-dependent temporal consistency. The obser-vations are supported by the numerical results in Table I, whichprovide the WSNR in the lowpass and the bandpass temporalchannels. Overall, the video-halftoning technique provides thebest spatiotemporal halftone rendering.

For the algorithm complexity, the Gotsman method is themost computationally intensive approach. A direct binarysearch is carried out to flip the black and white dots until thevisual difference is minimized. It is not suitable for real-timevideo processing tasks with a large number of frames. Nextcomes the video-halftoning method. It needs extra computationto dynamically update the temporal diffusion mapand the gain control map . However, given the motioninformation, it still can be used in real-time applications. Therest of the methods can all be implemented very efficiently.It is worth notice that all of the methods guarantee only localoptimum as a result of model approximation. This opens upfuture opportunities to explore the best possible renderingbased on the spatiotemporal model, the viewing condition, andthe video content.

VII. CONCLUSION

We have presented a video halftoning scheme to render con-tinuous tone digital video as halftone and colortone video ondisplay devices with limited tone scales and color palettes. Sep-arable 1-D temporal and 2-D spatial error diffusions are carriedout to spread the quantization errors in spatiotemporal domainsless visible to the human visual system. Temporal FIR filters are

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 8: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

SUN: VIDEO HALFTONING 685

Fig. 9. Halftone frame 34 and the flickering artifact at 30 Hz by the video halftoning method in I (a) and (b), the Floyd-Steinberg method [2] in II (a) and (b), theordered dither method [3] in III (a) and (b), the Gotsman method [12] in I (c) and (d), the Hild-Pins method [11] in II (c) and (d), and the AFHBA method [14] inIII (c) and (d).

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.

Page 9: 678 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. …dcl/Video Halftoning.pdf · as channel noise, is less pronounced in terms of image quality degradation. Therefore, it is

686 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 3, MARCH 2006

designed at various video frame rates to diffuse temporal errorsalong motion trajectory across frames. And a motion-adaptivegain control scheme is presented to enhance temporal consis-tency and alleviate flickering artifacts.

REFERENCES

[1] A. M. Tekalp, Digital Video Processing. Englewood Cliffs, NJ: Pren-tice-Hall, 1995.

[2] R. Floyd and L. Steinberg, “An adaptive algorithm for spatial greyscale,” Proc. Soc. Inf. Display, vol. 17, no. 2, pp. 75–77, Mar. 1976.

[3] B. E. Bayer, “An optimum method for two-level rendition of continuous-tone pictures,” in Proc. IEEE Int. Conf. Communication, vol. 1, 1973, pp.11–15.

[4] J. Sullivan, L. Ray, and R. Miller, “Design of minimum visual modula-tion halftone patterns,” IEEE Trans. Syst., Man, Cybern., vol. 21, no. 1,pp. 33–38, Jan./Feb. 1991.

[5] B. Kolpatzik and C. A. Bouman, “Optimized error diffusion for imagedisplay,” J. Electron. Imag., vol. 1, no. 3, pp. 277–292, 1992.

[6] V. Ostromoukhov, “A simple and efficient error-diffusion algorithm,” inProc. ACM SIGGRAPH, 2001, pp. 567–572.

[7] N. Damera-Venkata, B. L. Evans, and V. Monga, “Color error-diffusionhalftoning,” IEEE Signal Process. Mag., vol. 20, no. 4, pp. 51–58, Jul.2003.

[8] R. A. Ulichney, Digital Halftoning. Cambridge, MA: MIT Press,1987.

[9] H. R. Kang, Digital Color Halftoning. New York: SPIE/IEEE Press,1999.

[10] J. C. Stoffel and J. F. Moreland, “A survey of electronic techniques forpictorial image reproduction,” IEEE Trans. Commun., vol. 29, no. 12,pp. 1898–1925, Dec. 1981.

[11] H. Hild and M. Pins, “A 3-D error diffusion dither algorithm forhalf-tone animation on bitmap screens,” in State-of-the-Art in ComputerAnimation—Proceedings of Computer Animation. Berlin, Germany:Springer-Verlag, 1989, pp. 181–190.

[12] C. Gotsman, “Halftoning of image sequence,” Vis. Comput., vol. 9, no.5, pp. 255–266, 1993.

[13] J. B. Mulligan, “Methods for spatiotemporal dithering,” in Proc. SID Int.Symp. Dig. Tech. Papers, Seattle, WA, 1993, pp. 155–158.

[14] C. B. Atkins, T. J. Flohr, D. P. Hilgenberg, C. A. Bouman, and J. P.Allebach, “Model-based color image sequence quantization,” in Proc.SPIE/IS&T Conf. Human Vision, Visual Processing, and Digital DisplayV, vol. 2179, Feb. 1994, pp. 310–317.

[15] D. P. Hilgenberg, T. J. Flohr, C. B. Atkins, J. P. Allebach, and C.A. Bouman, “Least-squares model-based video halftoning,” in Proc.SPIE/IS&T Conf. Human Vision, Visual Processing, and Digital DisplayV, vol. 2179, Feb. 1994, pp. 7–10.

[16] D. P. Scholnik and J. O. Coleman, “Joint spatial and temporal delta-sigma modulation for wide-band antenna arrays and video halftoning,”presented at the IEEE ICASSP, Salt Lake City, UT, May 2001.

[17] Z. Sun, “A method to generate halftone video,” presented at the IEEEICASSP, Philadelphia, PA, Mar. 2005.

[18] D. H. Kelly, “Motion and vision: II. Stabilized spatio-temporal thresholdsurface,” J. Opt. Soc. Amer., vol. 69, pp. 1340–1349, 1979.

[19] B. A. Wandell, Foundations of Vision. Sunderland, MA: Sinauer, 1995.[20] E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the

perception of motion,” J. Opt. Soc. Amer. A, vol. 2, no. 2, pp. 284–299,Feb. 1985.

[21] R. E. Fredericksen and R. F. Hess, “Estimating multiple temporal mech-anisms in human vision,” Vis. Res., vol. 38, pp. 1023–1040, 1998.

[22] S. Winkler, “Issues in vision modeling for perceptual video quality as-sessment,” Signal Process., vol. 78, no. 2, pp. 231–252, Oct. 1999.

[23] J. L. Mannos and D. J. Sakrison, “The effects of a visual fidelity criterionon the encoding of images,” IEEE Trans. Inf. Theory, vol. 20, no. 4, pp.525–536, Jul. 1974.

[24] M. J. Black and P. Anandan, “The robust estimation of multiple motions:parametric and piecewise-smooth flow fields,” Comput. Vis. Image Un-derstand., vol. 63, no. 1, pp. 75–104, Jan. 1996.

Zhaohui Sun (M’00) received the B.E. and M.E. de-grees in electronics engineering and information sci-ence from the University of Science and Technologyof China, Hefei, in 1992 and 1995, respectively, andthe M.S. and Ph.D. degrees in electrical and com-puter engineering from the University of Rochester,Rochester, NY, in 1998 and 2000, respectively.

He is currently with the Visualization and Com-puter Vision Lab, GE Global Research, Niskayuna,NY. From 2000 to 2005, he was with the Researchand Development Laboratories, Eastman Kodak

Company, Rochester, as a Research Scientist, a Senior Research Scientist,and a Principal Research Scientist. In 1998, he was an Intern Researcher withthe Imaging and Visualization Department, Siemens Corporate Research,Princeton, NJ. His research interests include vision technologies, digitalvideo/image processing, medical imaging, and multimedia computing.

Authorized licensed use limited to: Universidad Federal de Pernambuco. Downloaded on October 23, 2009 at 12:08 from IEEE Xplore. Restrictions apply.