creating various styles of animations using example-based...

Creating Various Styles of Animations Using Example-Based Filtering

Ryota Hashimoto Henry Johan Tomoyuki Nishita

Department of Computer ScienceThe University of Tokyo

{hashiryo, henry, nis}@is.s.u-tokyo.ac.jp

Abstract

A number of Non-Photorealistic Rendering methods forproducing artistic style images have been developed. Re-cently, a method called “Image Analogies” was proposed.This method is based on the idea of example-based filteringthat uses a pair of images (an original image and a filteredimage) as training data and creates an image which has astyle that resembles the filtered image of the training data.This method has great flexibility since the users can cre-ate various styles of images by simply changing the trainingdata. In this paper, we extend “Image Analogies” to cre-ate various styles of animations. The coherence betweenframes is considered by computing the pixel flows betweenthe frames. From experimental results, the proposed methodcan create nice animation sequences.

1 Introduction

Non-Photorealistic Rendering (NPR) has gained muchattention since it can create more impressive images thanphotorealistic rendering does. The advantage of NPR is thatit can omit fine details and produce images which containonly the important portions of the scenes. One of the im-portant fields in NPR deals with the problem of how to pro-duce artistic style images, such as pen-and-ink illustrations,oil paintings, watercolor paintings, and so on. A number ofNPR methods for these purposes are proposed and users canchoose the most effective rendering method for expressingtheir idea.

Hertzmann et al. [8] proposed a new image synthesismethod based on the concept of machine learning. Theirmethod learns a filter from given examples images (whichconsist of two images, unfiltered and filtered) and appliesthe learned filter to the target image. The advantage of thismethod is that it can create various styles of images by sim-ply changing the training data. In the rest of this paper, wewill use the term “Image Analogies” to refer to this method.

“Image Analogies”, however, cannot directly be applied

to animations. In the case of animation, frame-to-frame co-herence is very crucial. Simply applying “Image Analo-gies” to each frame of the animation independently willproduce an animation that appears to flicker considerably.In this paper, we extend “Image Analogies” to deal withanimation. To maintain the coherence, the pixel flows be-tween the previous frame and the current frame are firstcomputed. Then, based on this information, the method de-cides whether the pixel values in the current frame shouldbe recomputed or simply copied from the previous frame.As a result, coherency between frames is preserved consid-erably.

This paper is organized as follows. Section 2 discussesrelated work on artistic style rendering. Section 3 brieflydescribes “Image Analogies”. Section 4 describes the pro-posed method for creating various styles of animations.Section 5 shows the experimental results of our method.Section 6 concludes this paper and describes a challengefor future research.

2 Related work

Many methods for creating artistic style images havebeen proposed. Haeberli [4] created impressionistic imagesby drawing an ordered collection of brush strokes. Methodsfor creating pen-and-ink style images have been presentedby Salisbury et al. [14] and Winkenbach and Salesin [16].Curtis et al. [3] proposed a watercolor painting method us-ing fluid simulation. Hertzmann [6] used cubic B-splines tosimulate strokes of various sizes in order to produce stroke-based paintings (e.g. oil paintings). Kowalski et al. [11]presented an algorithm that uses strokes to render 3D scenesin a stylized manner suggesting the complexity of the scene,such as fur, grass and trees, without representing it explic-itly. These previous works can be used only to create spe-cific art-style images. On the other hand, Hertzmann et al.[8] proposed an image synthesis method that learns the fil-ter represented by training images and applies it to a targetimage. As a result, various styles of artistic images can becreated by simply changing the training images.

In the case of NPR animation, Meier [13] proposed amethod for rendering animations in a painterly style thatprovides the coherence between frames in animations bymodeling surfaces as 3D particle sets. Techniques for main-taining coherence when creating impressionist style anima-tions using the optical flow approach were presented byLitwinowicz [12] and Hertzmann and Perlin [7]. Kaplanet al. [9] presented an algorithm for rendering coherentscenes and highly coherent animations in a variety of artis-tic styles using an editable particle system. Haga et al. [5]proposed a method for rendering pen-and-ink-style anima-tions with frame-to-frame coherence by using the stroke in-formation of the previous frame. However, these methodscan only create animations of a specific artistic style. Kleinet al. [10] proposed a method that treats the input video asa space time volume image of data. The coherence betweenframes is established with the help of the users. Neverthe-less, this method creates a flickering animation for certainartistic styles.

3 Overview of “Image Analogies”

“Image Analogies” is based on the recent advancementin texture synthesis algorithms [15, 2]. Texture synthesisaims to create textures with arbitrary size and shapes, whichlook like the original input texture. The idea of these ap-proaches is to copy the value of the pixel in the originaltexture which has a neighborhood resembling the neighbor-hood of the computed pixel in the texture being synthesized.

As input, “Image Analogies” takes three images, the un-filtered source image A, the filtered source image A′, andthe unfiltered target image B. As output, the method pro-duces the filtered target image B′, such that

A : A′ :: B : B′.

In other words, the goal is to create image B′ that relates toB “in the same way” as A′ relates to A.

The approach assumes that the colors at and around anygiven pixel q in A correspond to the colors at and aroundthe same pixel q in A′. Through out the paper, the symbolq is used to specify both a pixel in A and its correspond-ing pixel in A′. Similarly, symbol p is used to specify thecorresponding pixels in B and B′.

The image synthesis process using a multiresolution ap-proach is as follows:

1. Create image B′ from the coarsest to the finest levels.

2. Determine the pixel value of each pixel p in B′l (image

at the l-th level resolution) in a raster scan orderingby finding a pixel q in A′

l which has a neighborhoodnearest to the neighborhood of p.

pp

unfiltered image filtered image

Figure 1. The single resolution neighborhoodof pixel p is defined as the union of the L-shaped region which centers on p in the fil-tered image and the rectangular region whichcenters on p in the corresponding unfilteredimage.

The core of this algorithm is the neighborhood matching.The single resolution neighborhood of pixel p is shown inFigure 1. Intuitively, the size of the neighborhood shouldbe at least as large as the largest structure of the image filterrepresented by the training data A and A′. However, for im-ages containing large scale structures, a large neighborhoodmust be used, resulting in increased computation time.

This problem is solved by using a multiresolution syn-thesis approach. Computational cost is reduced since largescale structures can be represented more compactly with afew pixels at a coarser level resolution. Assume that p′ isthe corresponding pixel of p in the one level coarser resolu-tion filtered image. The multiresolution neighborhood of pis extended from the single resolution neighborhood of p toinclude the single resolution neighborhood of p′.

The search in step 2 is done by performing a globalsearch and a local search, and the best search result is re-turned. The global search is performed using a similar ap-proach to that proposed by Wei and Levoy [15]. The Ap-proximate Nearest Neighbor (ANN) [1] method is used toaccelerate the search. The local search uses the approachproposed by Ashikhmin [2] which uses the search result ofthe preceding pixels and their relative positions.

4 Creating animations

In this section, we first describe an outline of the algo-rithm, then explain its details.

4.1 Outline

Our algorithm takes two images, the unfiltered sourceimage A and the filtered source image A′, and a se-quence of images (animation), the unfiltered target ani-

previous frame current frame

pp

search domain

Figure 2. Search domain for motion estima-tion.

mation B1 · · ·BT (where T denotes the length of the an-imation). The algorithm produces the filtered animationB′

1 · · ·B′T as output, such that

A : A′ :: Bt : B′t

for every t. In other words, our algorithm creates an anima-tion such that the relationship between Bt and B′

t is sameas the relationship between A and A′ for every t.

The problem when creating NPR animations is the lossof coherence between frames. Usually, we are aware ofthe loss of coherence when the animation appears to flicker.The noise that generates the flickering is easily introducedespecially when we create animations containing certainartistic styles. In our approach, we try to preserve coherencyby maintaining the temporal redundancy of the original an-imation.

The outline of the proposed method is as follows:

1. Create the first frame B′1 using the algorithm described

in Section 3.

2. Create frames for t = 2, · · · , T from the coarsest to thefinest level using the following steps.

3. Calculate the motion between B′t−1,l and B′

t,l, (imagesat the l-th level resolution) by computing the corre-spondence map prev which maps the pixels in B′

t,l tothe pixels in B′

t−1,l (refer to Section 4.2) .

4. Determine the pixel value of each pixel p in B′t,l in

a raster scan ordering by first selecting the searchmethod according to the change between the neighbor-hood of p and the neighborhood of prev(p) in the fil-tered image, and then find the nearest pixel q from A(refer to Section 4.3).

4.2 Motion estimation

In motion estimation, for each pixel p in Bt,l we searchfor its corresponding pixel prev(p) in Bt−1,l. Before we

previous frame current frame

stripes

Figure 3. Stripes appear when several pixelsin the current frame are set to correspond toa single pixel in the previous frame.

describe the algorithm, we first define the notion of neigh-borhood used in motion estimation. The definition of neigh-borhood used here is different from that used in “ImageAnalogies”. For motion estimation, the neighborhood ofp is defined as an n × n pixel rectangle with p as its center(n is an odd number). In our experiment, we set n to 5. Thisvalue is determined based on experimental results.

The corresponding pixel of p at the previous frameprev(p) is computed by finding a pixel in the previousframe whose neighborhood resembles the neighborhood ofp. The similarity between two neighborhoods is measuredby the Mean Absolute Difference of the colors of their pix-els. In most cases, the motion is relatively small, thus alocal search around the same location as p in the previousframe is sufficient to find the corresponding pixel. Basedon several experimental results, in practice, we performthe search on pixels in Bt−1,l which are located inside an(6n + 1)× (6n + 1) pixel rectangle whose center coincideswith the location of p (see Figure 2). This approach workswell for preserving coherence when we create an anima-tion using “Image Analogies” because it resembles “ImageAnalogies” in the way that both methods perform searchingby comparing neighborhoods of pixels.

However, due to the simplicity of the algorithm, there isa chance that several pixels in Bt,l are set to correspond toa single pixel u in Bt−1,l. This can cause problems sincethere is a possibility that the value of u is used as the valueof those pixels. As a result, stripes (see Figure 3), consistingof pixels of the same color, may appear in Bt,l. In this case,we employ a heuristic approach which selects the pixel inBt,l with the minimum moving distance as the pixel thatcorresponds to u. For the rest of the pixels, we assume thatthey do not have corresponding pixels in Bt−1,l. From ex-perimental results, this simple solution works well.

prev(p)

root

nearest to

subtree

Figure 4. Shortcut global search is performedin the subtree of the ANN tree.

4.3 Coherence preserving search

Using the correspondence map prev described in Section4.2, we measure the coherency degree of pixel p, coh(p),which represents the amount of change in the neighborhoodpixels of p in the filtered image from the previous frame.Specifically, coh(p) is defined as the value of the numberof pixels in the neighborhood of p for which the colors andrelative positions with respect to p are unchanged since theprevious frame divided by the total number of neighborhoodpixels (0.0 ≤ coh(p) ≤ 1.0). For pixels which do not havecorresponding pixels in the previous frame, we assign zeroto their coherency values.

According to coh(p), we modify the search methods asfollows.

0 ≤ coh(p) < ε1The neighborhood of p has changed greatly, therefore,the search is performed using both a global and a localsearch.

ε1 ≤ coh(p) < ε2The neighborhood of p has been well preserved, so weassume that the global search result is located near theresult of the previous frame. Thus, we can search forthe nearest pixel of p with a shortcut global search anda local search. The shortcut global search is performednot in the whole ANN tree, but in the subtree whichincludes the nearest pixel of prev(p) (see Figure 4).The depth of the subtree is determined in proportion tothe value of coh(p).

Table 1. The computation times (minutes perframe) for generating the result animations.

Style of the animation Computation time

Texture-by-numbers 2Oil painting 2Watercolor painting 2Pen-and-ink 2

ε2 ≤ coh(p)The neighborhood of p has been very well preserved,so there is no need to search for the nearest pixel of p.Instead, we just set the nearest pixel of prev(p) to bethe nearest pixel of p.

The optimal values for ε1 and ε2 are determined basedon the results of several experiments. In practice, we set ε1to 0.3 and ε2 to 0.8, respectively.

5 Results

All of the results presented here (Figures 6 and 8 (seeColor Plate for Figure 8), from top to bottom) show severalframes taken from the created animations1.

Figure 5 shows the training images for creating the an-imations of a flower field. Figure 6 shows the original an-imation, the resulting animation using “Image Analogies”,and the resulting animation using the proposed method. Theproposed method outperformed “Image Analogies” with re-spect to the quality of the created animations. Creating theframes independently using “Image Analogies” producedan animation that appeared to flicker considerably, for ex-ample inside the encircled areas of the middle column inFigure 6. On the contrary, the proposed method succeededto preserve the coherence between frames, for example in-side the encircled areas of the right column in Figure 6.

The input movie for the examples shown in Figure 8 wastaken using a video camera. The movie was processed inthree different styles using the training data shown in Figure7. The results of applying oil painting, watercolor painting,and pen-and-ink styles to the original movie can be seen inFigure 8. We also used “Image Analogies” to create anima-tions for these examples, however, the resulting animationsexhibited severe flickering due to the lack of coherence be-tween frames.

Table 1 shows the length of time required to create oneframe of each of the animations using a Pentium 4 3.06GHz machine. The size of each frame of the animations

1Created animations can be seen at the following URL.http://nis-lab.is.s.u-tokyo.ac.jp/cgi2003 hashimoto/

in Figures 6 and 8 are 320×240 pixels and 352×240 pixels,respectively. The computation time depends on the framesizes, the sizes of the training images, and the amount ofobject movement in the animation.

6 Conclusion and future work

This paper has proposed a method that creates variousstyles of animations using example-based filtering. Whencreating the animation, the proposed method measures thecoherence between frames and uses this information to se-lect the best suited search method. As a result, the amountof observed flickering in the created animations can beconsiderably reduced. We have shown that the proposedmethod can be used to create various styles of animations.

Similar to “Image Analogies”, the proposed method hasa high computational cost. This becomes a problem whenwe want to create a long animation with frames of a largesize. Therefore, we are interested in speeding up the imagesynthesis process.

Acknowledgment

We would like to thank Toshiyuki Haga for his help anddiscussion during the early stages of this work. We alsowould like to thank Hajime Matsui for providing the oilpainting image in Figure 7 and Tomohiro Nishita for pro-viding the pen-and-ink image in Figure 7.

References

[1] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silver-man, and A. Y. Wu, “An Optimal Algorithm for Ap-proximate Nearest Neighbor Searching in Fixed Di-mensions”, Journal of the ACM, Vol. 45, No. 6, 1998,pp. 891-923.

[2] M. Ashikhmin, “Synthesizing Natural Textures”,Proceedings on 2001 Symposium on Interactive 3DGraphics, 2001, pp. 217-226.

[3] C. J. Curtis, S. E. Anderson, J. E. Seims, K. W. Fleis-cher, and D. H. Salesin, “Computer-Generated Wa-tercolor”, Proceedings of ACM SIGGRAPH 97, 1997,pp. 421-430.

[4] P. E. Haeberli, “Paint by Numbers: Abstract ImageRepresentations”, Computer Graphics (Proceedingsof ACM SIGGRAPH 90), Vol. 24, No. 4, 1990, pp.207-214.

[5] T. Haga, H. Johan, and T. Nishita, “AnimationMethod for Pen-and-Ink Illustrations Using Stroke

Coherency”, Proceedings of CAD/Graphics 2001,2001, pp. 333-343.

[6] A. Hertzmann, “Painterly Rendering with CurvedBrush Strokes of Multiple Sizes”, Proceedings ofACM SIGGRAPH 98, 1998, pp. 453-460.

[7] A. Hertzmann and K. Perlin, “Painterly Rendering forVideo and Interaction”, Proceedings of NPAR 2000,2000, pp. 7-12.

[8] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, andD. H. Salesin, “Image Analogies”, Proceedings ofACM SIGGRAPH 2001, 2001, pp. 327-340.

[9] M. Kaplan, B. Gooch, and E. Cohen, “InteractiveArtistic Rendering”, Proceedings of NPAR 2000,2000, pp. 67-74.

[10] A. W. Klein, P. J. Sloan, A. Finkelstein, and M. F. Co-hen, “Stylized Video Cubes”, Proceedings of 2002Symposium on Computer Animation, 2002, pp.15-22.

[11] M. A. Kowalski, L. Markosian, J. D. Northrup,L. Bourdev, R. Barzel, L. S. Holden, and J. F. Hughes,“Art-Based Rendering of Fur, Grass, and Trees”, Pro-ceedings of ACM SIGGRAPH 99, 1999, pp. 433-438.

[12] P. Litwinowicz, “Processing Images and Video foran Impressionist Effect”, Proceedings of ACM SIG-GRAPH 97, 1997, pp. 407-414.

[13] B. J. Meier, “Painterly Rendering for Animation”,Proceedings of ACM SIGGRAPH 96, 1996, pp. 477-484.

[14] M. P. Salisbury, S. E. Anderson, R. Barzel, andD. H. Salesin, “Interactive Pen-and-Ink Illustration”,Proceedings of ACM SIGGRAPH 94, 1994, pp. 101-108.

[15] L. Wei and M. Levoy, “Fast Texture Synthesis UsingTree-Structured Vector Quantization”, Proceedings ofACM SIGGRAPH 2000, 2000, pp. 479-488.

[16] G. Winkenbach and D. H. Salesin, “Computer-Generated Pen-and-Ink Illustration”, Proceedings ofACM SIGGRAPH 94, 1994, pp. 91-100.

unfiltered filtered

Figure 5. The unfiltered and filtered source images for creating animations of a flower field.

Original “Image Analogies” Proposed method

Figure 6. Creating animations of a flower field using a texture-by-numbers approach. The regionsinside the encircled areas in the resulting animation using “Image Analogies” flicker considerablywhile the corresponding regions using the proposed method do not.

unfiltered filtered unfiltered filtered unfiltered filtered

Oil painting Watercolor painting Pen-and-ink

Figure 7. The unfiltered and filtered source images for creating animations in artistic styles.

Original Oil painting Watercolor painting Pen-and-ink

Figure 8. Original movie and oil painting, watercolor painting, and pen-and-ink versions.

creating various styles of animations using example-based...

Documents