drawing motion without understanding it

Drawing Motion Without Understanding It

Vincenzo Caglioti, Alessandro Giusti, Andrea Riva, and Marco Uberti

Politecnico di Milano, Dipartimento di Elettronica e [email protected]

Abstract. We introduce a novel technique for summarizing a shortvideo to a single image, by augmenting the last frame of the video withcomic-like lines behind moving objects (speedlines), with the goal of con-veying their motion. Compared to existing literature, our approach isnovel as we do not attempt to track moving objects nor to attain anyhigh-level understanding of the scene: our technique is therefore very gen-eral and able to handle long, complex, or articulated motions. We onlyrequire that a reasonably correct foreground mask can be computed ineach of the input frames, by means of background subtraction. Speed-lines are then progressively built through low-level manipulation of suchmasks. We explore application scenarios in diverse fields and provideexamples and experimental results.

1 Introduction

Representing motions and actions in a single image is a well known problem forvisual artists. Universally accepted solutions are found in comics, where motionis represented through various abstract graphical devices [1] (Figure 1), and inphotography, where object images are often deliberately motion blurred in orderto convey their speed (Figure 2). In this paper, we propose a simple, low-levelalgorithm for representing in a single image the motion occurring in a video, byaugmenting the last frame of the video sequence with lines conveying the motionoccurred in the previous frames (speedlines). In particular, we recover speedlinesas the envelope of the objects’ apparent contours in time.

Drawing speedlines is mainly a matter of visual style in case the motion ofobjects in the scene is known in advance; this happens in synthetically modeledscenes, or when motion is previously recovered by, e.g., object/people tracking

Fig. 1. Speedlines are commonplace in comic books.

2 Vincenzo Caglioti, Alessandro Giusti, Andrea Riva, and Marco Uberti

(a) Synthetic motion blur (b) N = 53, 400x320 (c) N = 53, 400x320

Fig. 2. Motion blur (a) fails to convey whether the car was moving forwards(b) or backwards (c). Short speedlines in (b,c) can instead represent the motionorientation, and do not cause loss of detail in the foreground.

algorithms; then, a straightforward, albeit visually rude technique for represent-ing such motion could be to simply draw the known object trajectories in theprevious frames; several other techniques, more refined and appealing, are pre-sented in literature (see Section 2). Our approach is fundamentally different asit operates at a much lower level, and it does not explicitly try to separate orreconstruct moving objects from the input video; we just require an acceptableforeground/background segmentation for each frame, and assume that the cam-era is still. Then, speedlines are computed by means of few simple operationson the foreground masks of successive frames; in other words, we do not try touse or infer any information pertaining the scene semantics, and we representmotion without understanding it.

On one hand, this approach has several important advantages: low compu-tational complexity; robustness to segmentation errors; and a remarkable gen-erality, which allows applications in diverse fields without any modification ofthe algorithm. On the other hand, since they lack any explicit high level, sym-bolic information about the scene evolution, our results are worthless for anysubsequent automated processing step: the only actual user of the output of ouralgorithm is an human observer.

Such a system can be useful for a number of reasons. A first obvious appli-cation is summarizing a short video segment in order to convey instantaneousmotion in media where video is not available (printed, low-bandwidth devices),or where an image is preferable, as its meaning can be grasped immediately, with-out explicit attention. In these contexts, a similar goal is commonly achieved byshooting a single image with a longer exposure time, which results in motionblur; however, motion blur corrupts the foreground and does not convey motionorientation (see Figure 2).

Another application, which is commonly overlooked in related works, is sum-marizing a longer video segment in order to represent an extended motion, possi-bly complex, articulated or inhomogeneous in time. Motion blur is hardly helpfulin this case, as it would make the subject too confused; an applicable technique isinstead multiple exposure photography (or strobing), i.e. a long exposure photo-graph with stroboscopic lighting. Physical devices frequently used to attain sim-ilar goals are smoke trails or ribbons (e.g. in rhythmic gymnastics). In Section 4we describe a prototype video surveillance application which uses speedlines to

Lecture Notes in Computer Science 3

(a) 38 frames, 180x144 (b) 473 frames, 320x256

Fig. 3. Speedlines describing long motions.

summarize long, temporally sparse events, and detail a number of practical ad-vantages of such approach.

Finally, speedlines can be used to augment every frame of a video stream;when combined with refined, good-looking visualizations (which is not a primaryfocus in this paper), this can be applied for video special effects. In other con-texts, speedlines can allow a casual viewer (such as a surveillance operator) toimmediately grasp what’s happening without having to look at the video streamfor a longer time or rewind it, as each single frame visually conveys the scene’spast temporal history. Another example is using speedlines for immediately un-derstanding if and how slow objects, such as ships at a distance, are moving;interestingly, this is intuitively achieved by looking at the ship’s trail, if visible,which is a real-world counterpart to our speedlines.

The paper is structured as follows: after reviewing existing literature in Sec-tion 2, we describe our technique in Section 3: we initially introduce our modeland briefly discuss background subtraction, which is given as granted in therest of the paper; then we detail the core our approach and its possible exten-sions. In Section 4 we present application examples and experimental results.We conclude with Section 5, which also presents future works, extensions, andadditional foreseen application scenarios.

2 Related Works

Few main graphical devices for representing motion in a single image are de-scribed and used in literature [2,3]: drawing trailing speedlines (or motion lines)behind the moving object; replicating the moving object’s image (such as instrobing) or its contours; introducing deformations on the object in order toconvey its acceleration. In this work, we focus on the first (and partially thesecond, see Section 3.3) of these techniques, whose perceptual foundations areinvestigated by brain researchers in [4].

Several related works aim at rendering motion cues from synthetic anima-tions; in particular [5] uses the scene’s 3D model and animation keyframes asinput, whereas [6] presents a more specialized system in the context of human


gait representation; in [7] speedlines are synthesized in a framework for nonpho-torealistic rendering, by selecting vertexes of the 3D moving objects which willgenerate speedline trails whenever the object’s speed exceeds a given threshold.Our system radically differs from the cited efforts as we use a video as input,instead of a 3D animated synthetic model.

A semi-automated system with similar goals is introduced in [8], which tracksmoving objects in video and applies several graphically-pleasing effects to repre-sent their motion, including short rectilinear speedlines. Straight speedlines canalso be generated for translational motion of a tracked object in [9]. A differentapproach is adopted in [10], where individual features are tracked in a manu-ally selected foreground object, and speedlines drawn as their trajectories; highlevel scene understanding is also a key characteristic of several other works [11],which fully exploit their higher abstraction level by introducing interesting addi-tional effects, such as distortion for representing acceleration. The technique weare proposing differs as it lets speedlines “build themselves” through low-levelmanipulations of the evolving foreground masks, without requiring any user in-teraction nor explicitly tracking the objects; also, we do not assume anythingon the object’s motion, which is not required to be explicitly modeled or recov-ered: in fact, our system effortlessly handles complex motion of articulated ordeformable objects, and also scenes with multiple moving objects which can notbe easily separated or tracked. Moreover, our technique directly works on verylow resolution video, where moving objects (possibly affected by motion blur)often bear no recognizable features. This also allows us to handle new scenariosand applications, such as surveillance tasks, which, to the best of our knowl-edge, have never been targeted by previous literature. An interesting relationalso holds with [12], which deals with the opposite problem of inferring objectmovement from a single image affected by motion blur.

3 Technique

3.1 Overview

We work on a sequence of N video frames I1..IN , uniformly sampled in time.We assume that the camera is fixed, and that one or several objects are movingin the foreground.

We require that for each frame Ii, the foreground can be extracted, so toobtain a binary foreground mask Mi, whose pixels are 1 where frame Ii de-picts the foreground, and 0 otherwise; this problem (background subtraction) isextensively dealt with in literature, and many effective algorithms such as [13]are available for application in most operating conditions, including unsteadycameras and difficult backgrounds.

From now on, we will assume that a reasonable segmentation is obtainedusing any background subtraction technique; Section 4 shows that spatially ortemporally local segmentation errors only marginally affect the quality of results.

By processing foreground masks, we finally output an image F as the lastframe IN with speedlines superimposed, conveying the motion of objects in the


Fig. 4. Our algorithm applied to N = 17 frames of a rototranslating object,with k = 6

scene during the considered time frame. From the theoretical point of view,these lines are an approximation of the envelope of the moving objects’ apparentcontours; we now describe their construction.

3.2 Building Speedlines

First, the foreground mask Mi of each input frame Ii i = 1 . . . N is computed.The video is then processed by analyzing adjacent foreground masks in smallgroups (see Figure 4), whose size is defined by a parameter k, 3 ≤ k ≤ N . k is aparameter controlling the amount of detail in the speedlines in case of complexmotion, and its effects are better described in Section 4.1.

In particular, we compute N−k+1 binary images Si k ≤ i ≤ N ; each Si rep-resents the atomic pieces of speedlines originating from the k frames Ii−k . . . Ii.Si is computed as the set of edge pixels of the union1 of masks Mi−k . . . Mi,which are not also edge pixels of Mi−k or Mi:

Si = edge

i⋃j=i−k

Mj

\ (edge(Mi−k) ∪ edge(Mi)) (1)

When the trajectories of the objects in frames i− k . . . i are simple and nonintersecting, Si approximates the envelopes of the foreground contours duringsuch time interval.

The N−k+1 atomic pieces of speedlines Si are then merged through a visu-alization function f(. . .), in order to determine the final speedlines S and theirappearance. A simplistic f(...) function just computes the union of all Si. Alter-natively, such function may improve visualization in several ways, some of whichwe briefly explored in our implementation, and are introduced in Section 3.3

S is finally composited over IN , which gives the system’s output image F .The complete algorithm is summarized as follows:

for i = 1 . . . N doMi ← foreground(Ii)

end for1 We consider the union of binary masks as the boolean OR operation on such masks.


(a) (b) (c) (d) (e)

Fig. 5. Speedlines for complex motions, with foreground replication; speedlinecolor encodes time (e). In (a) two cars park, and a person comes out from thecar below. (a,b) from PETS dataset [16].

for i = k . . . N doSi ← edge

(⋃ij=i−k Mj

)\ (edge(Mi−k) ∪ edge(Mi))

end forS ← f(Sk, Sk+1 . . . SN )F ← composite S over IN

3.3 Extensions and Implementation Notes

In order to improve the informational content and polish of speedlines, the func-tion f(. . .), which combines speedline pieces originating from different frames,can implement several visualization improvements, such as drawing speedlinepieces with varying stroke width, color, or transparency.

Also, when applying the algorithm to longer timeframes, the foreground ofimportant frames (keyframes) can be semitransparently superimposed over theoutput image (see Figure 5); in the simplest implementations, keyframes can beregularly sampled in time; as a more powerful alternative, a number of sophis-ticated techniques for automatically finding meaningful keyframes in video areproposed in the literature on video summarization [14,15].

4 Experimental Results

We implemented the proposed algorithm in a Matlab prototype; we tested sev-eral different application scenarios, both with video sequences from externalsources such as [17,16] and produced ad hoc. We experimented with both shortmotions (Figures 2, 6, 7 and 6) and long, complex events (Figures 3 and 5); ourvideos have a wide range of different image resolutions, video quality and framerates; we also applied our technique to time-lapse image sequences of very slowevents (see [18]). We provide source code, original videos and full details in thesupplementary material [18].


(a) (b) (c) (d) (e)

Fig. 6. (a,b,c): Speedlines computed directly from 640x480 silhouette data [17].Note replication of contours in (c) due to fast motion (see Section 4.1). (d,e):the quickly moving hand draws copies of its contour instead of speedlines in (e),due to a coarse frame rate.

,

In our prototype, background subtraction is implemented by simply thresh-olding the absolute differences of each video frame with respect to a static back-ground model, as our test videos were not very demanding from this point ofview. Also, we smooth the recovered foreground by means of a median filter withsquare support, which helps in reducing artifacts due to background subtraction,and also allows the algorithm to create smoother speedlines. In fact, the exactshape of foreground masks is not fundamental for drawing good-looking, descrip-tive speedlines, which is also a reason why we tolerate imprecise backgroundsubtraction in the first place. Moreover, even macroscopic errors in backgroundsubtraction, if limited to few frames, only marginally affect our final results; infact, such errors would only affect few of the speedline pieces we use to constructthe final speedlines.

Computational costs in our unoptimized prototype are currently rather low,which allows us to process low-resolution webcam video in real time. Still, weexpect huge improvements from optimized implementation in lower-level lan-guages, as the algorithm has no computationally-intensive steps, at least unlessmore sophisticated background subtraction comes into play.

4.1 Discussion

As we previously mentioned, parameter k allows us to define how detailed theresulting speedlines appear: in particular, in presence of self-intersecting trajec-tories, a low k value causes more detailed speedlines to be built; in presence ofunreliable masks or very complex motions which do not have to be representedin fine detail, setting an higher k parameter helps in improving the system’srobustness and simplicity of the output.

Our system also requires that the framerate of the input video is sufficientlyfast w.r.t. the object motion: in fact, in order for speedlines to be smooth, fore-ground masks in adjacent frames must be mostly overlapping. On one hand, this


(a) (b) (c) (d)

Fig. 7. Although less graphically pleasant, our results in drawing short speed-lines (top) are comparable in informational content to those recovered by thehigh-level approach in [11] (a,b,c). Note speedlines for hands moving down inthird image, as we describe every motion in the scene. In the bottom-right ex-ample (d) speedlines are determined manually [6], and are remarkably similar toour result

.

rules out the use of our system for surveillance video sampled very low framerates, which is not an uncommon scenario. On the other hand, when the framer-ate is too coarse, the system does not abruptly fail, but basically replicates thesubject’s contours instead of drawing its envelopes, which is also a traditionally-employed technique for motion representation [1] (Figures 6 and 7b). In practice,an user can often make sense of the resulting image, and in some scenarios mayalso consider this a feature rather than a shortcoming, as quickly-moving objectsgain a distinctive appearance.

We investigated a specific application of our technique, which is detailed inthe Supplementary Material [18]: video from a surveillance camera is analyzedin order to segment simple temporally sparse events; each event, with a typicalduration between several seconds and one minute, is summarized in a singleimage with speedlines and foreground replication. This is useful for a number ofreasons:

– the summary image can be easily and cheaply transmitted for immediatereview to e.g. a mobile device.2.

– an operator can quickly review many events by grasping what happenedin each event at a glance; if the summary image for an event is judged

2 in several “extreme” scenarios, such as wireless sensor networks with very strict com-putational and power requirements, storing and/or transmitting a single summaryimage for an event may often be the only viable option


unclear, unusual or suspect, the operator can click on it in order to see thecorresponding video segment. This can potentially dramatically speedup thereview process of surveillance video.

– An additional advantage of our approach in a security-related scenario is thetransparency of the algorithm due to its low level of complexity. Moreover,speedlines are built in such a way that they encompass any area affected bymotion, which is a nice property in this context.

When applied to a large, simple moving object for creating short speedlines,a fundamental disadvantage of our approach with respect to others becomesapparent: our speedlines are in fact only drawn at the extremes of the object’smotion, which creates a poor visual effect (e.g. compare Figure 1a to Figure 2b);we are currently investigating other options for dealing with this issue.

5 Conclusions and Future Works

In this paper we presented a low-level algorithm for representing in a singleimage the motion occurring in a video: the last frame of the video sequence isaugmented with speedlines, which convey the motion occurred in the previousframes by tracing the envelopes of the contours of moving objects. We proposean efficient low-level technique for constructing such speedlines, which is robustto temporally or spatially local segmentation errors, and does not require objecttracking. Experimental results confirm that the approach is valid and can beapplied to many different scenarios.

The main contribution of this paper over the state of the art is twofold.

– Our algorithm is very general: due to its low level of abstraction, it effort-lessly deals with long and complex trajectories, multiple moving objects,articulated motion, and low resolution and quality of the input video, whichexisting algorithms can not handle.

– We explore the use of speedlines for representing long, complex motions, andintroduce speedlines as an effective tool for video surveillance applications.

We are currently implementing the algorithm on a DSP-equipped smart cam-era, and studying a number of possible optimizations for its efficient implemen-tation. Moreover, we are also experimenting other uses, for surveillance andentertainment applications. Finally, we are experimenting with improved visu-alization techniques, in order to better convey motion without the need of sceneunderstanding.

References

1. McCloud, S.: Understanding Comics: The Invisible Art. Kitchen Sink Press (1993)1, 8

2. Strothotte, T., Preim, B., Raab, A., Schumann, J., Forsey, D.R.: How to renderframes and influence people. In: Proc. of Eurographics. (1994) 3


3. Kawagishi, Y., Hatsuyama, K., Kondo, K.: Cartoon blur: Non-photorealistic mo-tion blur. Computer Graphics International (2003) 276 3

4. Kawabe, T., Miura, K.: Representation of dynamic events triggered by motionlines and static human postures. Experimental Brain Research 175 (2006) 3

5. Masuch, M., Schlechtweg, S., Schulz, R.: Depicting motion in motionless pictures.In: Proc. SIGGRAPH. (1999) 3

6. Chen, A.: Non-photorealistic rendering of dynamic motion. http://gvu.cc.

gatech.edu/animation/Areas/nonphotorealistic/results.html (1999) 3, 87. Lake, A., Marshall, C., Harris, M., Blackstein, M.: Rendering techniques for scal-

able real-time 3d animation. In: Proc. of symposium on NPR animation andrendering. (2000) 4

8. Kim, B., Essa, I.: Video-based nonphotorealistic and expressive illustration ofmotion. In: Proc. of Computer Graphics International. (2005) 4

9. Hwang, W.I., Lee, P.J., Chun, B.K., Ryu, D.S., Cho, H.G.: Cinema comics: Cartoongeneration from video. In: Proc. of GRAPP. (2006) 4

10. Markovic, D., Gelautz, M.: Comics-like motion depiction from stereo. In: Proceed-ings of WSCG. (2006) 4

11. Collomosse, J.P., Hall, P.M.: Video motion analysis for the synthesis of dynamiccues and futurist art. Graphical Models (2006) 4, 8

12. Caglioti, V., Giusti, A.: On the apparent transparency of a motion blurred object.International Journal of Computer Vision (2008) 4

13. Mahadevan, V., Vasconcelos, N.: Background subtraction in highly dynamic scenes.In: Proc. of CVPR. (2008) 4

14. Gong, Y., Liu, X.: Generating optimal video summaries. In: Proc. of ICME, ICME(2000) 6

15. Zhao, Z., Elgammal, A.: Information theoretic key frame selection for action recog-nition. In: Proc. of BMVC. (08) 6

16. PETS: Performance evaluation of tracking and surveillance (2008) 617. Ragheb, H., Velastin, S., Remagnino, P., Ellis, T.: Vihasi: Virtual human action

silhouette data for the performance evaluation of silhouette-based action recogni-tion methods. In: Proc. of Workshop on Activity Monitoring by Multi-CameraSurveillance Systems. (2008) 6, 7

18. Caglioti, V., Giusti, A., Riva, A., Uberti, M.: Supplementary material: http:

//www.leet.it/home/giusti/speedlines (2009) 6, 8

http://gvu.cc.gatech.edu/animation/Areas/nonphotorealistic/results.html

http://gvu.cc.gatech.edu/animation/Areas/nonphotorealistic/results.html

http://www.leet.it/home/giusti/speedlines

http://www.leet.it/home/giusti/speedlines

drawing motion without understanding it

Documents