인동작생성을위 - | calab.hanyang.ac.krcalab.hanyang.ac.kr/dissertations/phdthesis.pdf ·...

박사학위논문

Doctoral Thesis

온라인동작생성을위한동작모델링기법

Motion Modeling for On-line Motion Synthesis

권태수 (權泰秀 Kwon, Taesoo)

전자전산학과전산학전공

Department of Electrical Engineering and Computer Science

Division of Computer Science

한국과학기술원

Korea Advanced Institute of Science and Technology

2007

DCS

20025806

권 태 수. Kwon, Taesoo. Motion Modeling for On-line Motion Synthesis. 온

라인 동작생성을 위한 동작 모델링 기법. Department of Electrical Engineer-

ing and Computer Science, Division of Computer Science . 2007. 72p. Advisor

Prof. Shin, Sung Yong. Text in English.

Abstract

In this thesis, we propose an example-based framework for on-line motion synthesis.

Our framework consists of three parts: motion modeling, behavior modeling and motion

synthesis. In the motion modeling part, an unlabeled motion sequence is first decomposed

into motion segments, exploiting the contact forces against the ground. Those motion seg-

ments are subsequently classified into groups of motion segments such that the same group

of motion segments share an identical structure. Finally, we construct a motion transition

graph by representing these groups and their connectivity to other groups as nodes and

edges, respectively. In behavior modeling, we build a motion transition model that con-

nects motion segments according to on-line motion specifications, while adapting to the

(time-varying) environment. In motion synthesis, given a stream of motion specifications

in an on-line manner, our system generates a corresponding motion while traversing the

motion transition graph guided by the motion transition model. Based on the framework,

we first address the issues in motion dynamics and transition that arise from the cyclic

nature of locomotions. We then generalize the idea to two-character motions, in partic-

ular, motions for martial arts performed by a pair of characters. Although the focus of

the present work is on locomotve motions and martial arts motions, we believe that the

framework of the proposed approach can be conveyed to other footstep driven motions as

well.

i

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Motion modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 Behavior modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Online Motion Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.2 Locomotion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.3 Two-character Interactions . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Locomotion Synthesis 13

2.1 Motion Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Motion Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.3 Locomotive Motion Classification . . . . . . . . . . . . . . . . . . . . 16

2.1.4 Motion Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.5 Hierarchical Motion Transition Graph . . . . . . . . . . . . . . . . . . 22

2.2 Example Motion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2.1 Vehicle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.2 Example Vehicle Trajectories . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.3 Motion Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2.4 Speed and Acceleration bounds . . . . . . . . . . . . . . . . . . . . . . 28

2.3 Locomotion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

iii

3 Two-Character Motion Synthesis 35

3.1 Two-Character Motion Classification . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 MSVM-based Classification . . . . . . . . . . . . . . . . . . . . . . . . 37

3.1.2 Rule-based Classification . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Interaction Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Background on Underlying Statistical Models . . . . . . . . . . . . . . . . . . 46

3.4 Motion Coupling and Postprocessing . . . . . . . . . . . . . . . . . . . . . . . 49

4 Results 52

4.1 Motion Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Motion Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Discussion 62

6 Conclusions 64

Summary (in Korean) 66

References 67

iv

List of Tables

2.1 Masses for links of our articulated figure . . . . . . . . . . . . . . . . . . . . 16

2.2 The strings for actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Action repertoire table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1 Example motion dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Results on motion segmentation in comparison to manual segmentation . . 53

4.3 Results on motion labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Classification results for locomotions . . . . . . . . . . . . . . . . . . . . . . . 55

4.5 Classification results for two-fighter motions . . . . . . . . . . . . . . . . . . 55

v

List of Figures

1.1 Overall structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 The magnitude of the contact force and corresponding poses . . . . . . . . . 14

2.2 Contact force and motion segmentation . . . . . . . . . . . . . . . . . . . . . 15

2.3 The representative pose of each phase . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Examples of string encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Motion transition graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Pelvis trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.7 Vehicle trajectory estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.8 Speed profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.9 Acceleration bounds - RUN. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.10 Acceleration bounds - WALK. . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.11 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1 Jump kick motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Action annotations: All empty boxes denote the “others” class. . . . . . . 37

3.3 Basic data structure B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 A node of G1 (or G2) has a set of links pointing to the actions in B . . . 40

3.5 Bayesian network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Reference coordinate frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.7 The action labels of the candidates . . . . . . . . . . . . . . . . . . . . . . . 50

4.1 Results on motion segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Trajectory refinement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 On-line motion synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.4 Leaning due to accelerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.5 Human trajectory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.6 Synthesized two-fighter motion . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.7 Two control modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.8 Synthesized interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

vi

4.9 Comparisons with example motions. Upper parts of each sub-figure are from

synthesized motions while lower parts are from example motions. . . . . . 61

vii

1. Introduction

1.1 Motivation

Synthesizing appealing motions of human-like characters in an on-line, real-time manner is

an important issue in the context of computer games and virtual environments. The tra-

ditional pipeline for on-line, real-time motion synthesis goes through the following steps:

First, the animators manually prepare sets of short motion segments by segmenting cap-

tured or key-framed motion sequences. The animators then define the rules for determin-

ing transitions between a pair of motion segments according to given situations and scenar-

ios. In run-time, a novel motion is synthesized in an on-line, real-time manner by stitching

the short motion segments that are chosen guided by the rules. Although this scheme is

successful to produce responsive motions of virtual characters, there exist inherent limita-

tions: The traditional animation pipeline requires a large amount of work by skilled ani-

mators and programmers. Also, the quality and variety of motions are often compromised

due to limited human resources, while guaranteeing an on-line, real-time performance.

Recently, there have been some research efforts to overcome these limitations by au-

tomating the pipeline employing a set of unlabeled example motion data [3,4,21,33,36,39,

49, 57]. In these approaches, motion transition graphs encapsulate motion segments and

transitions between motion segments. Since the motion transition graphs are automati-

cally constructed from example motion data, synthesized motions retain naturalness and

variety embedded in captured motions. However, the motion transition graphs tend to be

too large to achieve a desired performance for motion search. Therefore, it is very diffi-

cult, if not impossible, to adopt these approaches for online, real-time motion synthesis.

We suggest an example-based framework for online real-time motion synthesis. Our

framework consists of two main components: motion modeling and behavior modeling.

In motion modeling, we construct a novel motion transition graph from example motion

data, where a node represents a group of motion segments of a similar structure, and an

edge represents a transition between a pair of groups, respectively. In behavior modeling,

we build a motion transition model that connects motion segments according to on-line

motion specifications, while adapting to the (time-varying) environment.

Based on the framework, we first address the issues in motion dynamics and transition

that arise from the cyclic nature of locomotions: Given an unlabeled motion sequence of

1

a human-like articulated figure, the sequence is decomposed into motion segments based

on the contact forces between the figure and the ground. Those motion segments are then

classified into groups of motion segments such that the motions in the same group share

an identical motion type, exploiting the biomechanical observations on footstep patterns.

We also propose a hierarchical motion transition graph to incorporate a motion hierarchy

and transition motions. We then generalize the scheme to two-character motions, in par-

ticular, two-fighter motions. Convincing interactive motions for a pair of characters can-

not be realized by simply juxtaposing their individual motions; the motion of a charac-

ter should be chosen with respect to that of the other, and vice versa. We present an

example-based method for capturing the interactions between two fighters embedded in

the example motion stream.

1.2 Objectives

In this section, we provide the objectives of this thesis. As the proposed framework con-

sists of two components, motion modeling and behavior modeling, we state the objectives

separately for each of the components.

1.2.1 Motion modeling

The objectives for motion modeling are two-fold: controllability and accessibility. By con-

trollability, we mean that our modeling scheme allows the user to specify a desired motion

in an on-line manner. By accessibility, we mean that a desired example motion segment

can be accessed efficiently. To satisfy these requirements, a stream of example motions

is modeled as a motion transition graph, in which nodes and edges represent groups of

motion segments and their transitions, respectively.

Rose et al. proposed a motion transition graph called “verb graph”, where a node

represents a group of motion segments of an identical structure, and an edge represents the

transition from a group to a group [51]. As demonstrated in locomotive motion generation

[46,47] and rhythmic motion synthesis [30], the motion transition graphs based on motion

blending have enhanced efficiency and controllability for motion synthesis, while retaining

naturalness embedded in captured motions. To facilitate this approach, the major premise

is the availability of labeled motion data satisfying the following two properties.

• The group of motions at each node should have an identical structure.

• The group of motions at a node should transit seamlessly to that of motions at a

2

node connected by an edge (possibly a self-edge).

In [51, 46, 47], it was assumed that labeled motion clips are available. Kim et al. [30]

proposed an automatic labeling scheme for rhythmic motions to construct their motion

transition graphs, exploiting motion beats and rhythmic patterns embedded in the mo-

tions.

In this thesis, inspired by the work in [30], we propose a novel on-line motion synthesis

approach for non-rhythmic motions, in particular, locomotive motions such as running and

walking motions, and martial arts motions such as Kickboxing, Karate and Taekwondo.

Unlike rhythmic motions, reference temporal patterns such as motion beats and rhythmic

patterns are not available for these motions in general. Exploiting biomechanical results

on human contact force profiles, we first cut an unlabeled motion sequence into motion

segments to identify motion units called actions. We then classify those motion segments

into groups of motion segments such that the same group of motion segments share an

identical structure while extracting their parameters simultaneously. Finally, we construct

a motion model, based on the motion transition graph.

1.2.2 Behavior modeling

Even with motion models embodied in the graphs, it is non-trivial issue to generate con-

vincing motions appropriate to given situations and scenarios. We present example-based

methods for modeling locomotive behavior and interacting behavior of human-like charac-

ters.

Behavior modeling for locomotive motions: Although rich researches have been done

in locomotion synthesis, the steering behavior of human-like character has not been ad-

dressed well. During straight walking and running, the pelvis of human body oscillates

due to the pelvis movements (rotations and translations) caused by supporting feet. Such

oscillations or curvature variations are the unique characteristics of human steering be-

havior. Simply placing the pelvis along a user-specified trajectory would not produce a

natural motion. This immediately raises an issue: how to incorporate these characteris-

tics into a user-specified trajectory.

For on-line applications, the user commonly prescribes a motion by interactively pro-

viding a motion type and its trajectory. In particular, the trajectory is specified either

explicitly by a point stream that is sampled with an input device such as a mouse, or im-

plicitly by a force profile that is given with a user interface equipped with slide bars (or

a joystick). The former directly produces the trajectory of a human-like figure. Although

3

it is easy to specify, the trajectory itself is neither precise nor smooth. Moreover, it is far

from a natural human trajectory. On the other hand, integrating an input force profile

that is sampled at each frame, the latter yields a smooth trajectory in an equally-easy

manner. However, the resulting trajectory is not natural, either. In either case, little ef-

fort has been made to produce a natural human trajectory.

No matter what method we employ, it would be difficult to generate high-quality lo-

comotion with such a poor trajectory. In this thesis, we present a data-driven method for

refining a input trajectory for on-line, real-time locomotion synthesis, given the type of a

locomotive motion. Choosing the center of the pelvis as the root of an articulated charac-

ter, we describe how to yield a natural pelvis trajectory from the input trajectory. With-

out loss of generality, we assume that the input trajectory is given in an explicit form, that

is, in the form of a point stream sampled at each frame. The refined trajectory gives the

global pelvis position and orientation at each frame. The refinement is performed frame

by frame in an online manner. Our method performs a two-step refinement: first clamping

speed and acceleration and then adding naturalness. For trajectory refinement, we make

use of example motion data.

Behavior modeling for two fighter motions: Aside from their popularity, standing-

up martial arts such as Kickboxing, Karate, and Taekwondo have attracted our attention

because of its repertories of diverse and dynamic motions. A pair of players in a match are

allowed to use not only both hands and feet to exchange punches and kicks but also arms

and legs to block the opponent’s attacks. They move in accordance with each other within

a rectangular area, seeking the opportunities to attack while avoiding the opponent’s at-

tacks. Furthermore, the players continuously interact with each other directly through a

combination of attacks and counter-attacks or indirectly by observations on the opponent’s

footwork and bodywork.

In the computer animation community, Karate motions were used to create believable

demonstrations for motion analysis and synthesis [33,24], in which the focus lies on a single

player’s motions. However, convincing interactive motions for a pair of characters cannot

be realized by simply juxtaposing their individual motions. The motion of a character

should be chosen with respect to that of the other, and vice versa. In other words, each

action by a character should be accompanied by a proper reaction by the counterpart

such that they alternate in combination. Moreover, the motions need to be adapted to

the environment including the characters themselves, so that the actions and reactions

are exchanged timely at the right places.

A naive, straightforward solution would be to regard a pair of characters as a sin-

4

gle entity with at least twice as many degrees of freedom so that the results on tradi-

tional motion analysis and synthesis for single characters could be applied. Such a solu-

tion yielded impressive results in rhythmic motion synthesis, in particular, dance motion

generation [30]. Unlike dance motions, however, martial arts are performed by a pair of

players in a rather asynchronous manner with a variety of motions for each player. For

such asynchronous interactions, it is non-trivial even for a human expert to determine

the next action/reaction of a character: the type of motion, the local joint configurations,

and the relative root position and orientation with respect to the opponent. In this the-

sis, we present an example-based method for analysis and synthesis of two-character mo-

tions while properly capturing and reflecting their interactions embedded in the example

motion stream, of which each frame contains a snapshot of a two-character motion. Our

objectives are two-fold: coping with the inherent diversity of two-character motions and

enhancing the efficiency of motion synthesis for on-line, real-time applications.

1.3 Related Work

Our work is inspired by three areas in character animation: online motion synthesis, loco-

motion control and two-character interactions. We review related results in each of those

areas.

1.3.1 Online Motion Synthesis

Example-based Motion Synthesis: Due to the recent popularity of motion capture

and reuse, there have been rich research results in this area. These results can be clas-

sified into two groups: motion rearrangement [5, 32, 37, 38, 39, 57] and motion blending

[1,26,47,51,52]. The former category of methods synthesizes motions by rearranging mo-

tion segments (also poses) in an example motion stream while retaining details of the ex-

ample motions. On the other hand, the latter category of methods synthesizes a desired

motion in an on-line, real-time manner by blending labeled motion segments. Combining

advantages of both categories of methods, hybrid methods have been studied to generate

high quality motions in an on-line, real-time manner [30,34,43,47,51].

Rose et al. [51] proposed a framework of motion blending based on scattered data in-

terpolation with radial basis functions. They introduced a verb graph for motion transi-

tion. Later, Sloan et al. [55] adopted cardinal basis functions for further acceleration. De-

spite of its superb efficiency, this approach was intended for real-time, but not for on-line

motion synthesis.

5

Park et al. [46, 47] have enhanced the framework of Rose et al. [51] for on-line loco-

motion blending. Their most important contribution is arguably to model labeled motion

clips available in a motion library, by incorporating motion rearrangement [3, 4, 21,33,36,

39, 49, 57] into the framework of motion blending, based on a motion transition graph.

However, the authors did not address how to obtain the labeled motion clips to construct

the motion transition graph.

Recently, Kim et al. [30] also adapted the motion transition graph [46] for on-line

rhythmic motion synthesis. Given an unlabeled example motion sequence, their approach

decomposes the sequence into a set of motion segments and clusters them to obtain a col-

lection of labeled sets of motion segments by exploiting their rhythmic structures. The au-

thors modeled the rhythmic motion sequence as a motion transition graph, where a node

and an edge represent a set of labeled segments and the transition from one labeled seg-

ment set to a segment set (possibly itself). Unfortunately, this method is not applicable

to non-rhythmic motions.

Motion Segmentation and Classification: Motion segmentation has recently emerged

as an important issue. Although research on this issue is still in an early stage, many re-

sults have been presented [5, 7, 8, 10, 19, 20, 29, 30, 33, 34, 44]. Our review focuses on the

results for motion decomposition directly related to our work.

Our starting point is the work by Bindiganavale and Badler [10], in which the mo-

ment of an interaction of an articulated character with an object in the environment is

captured by checking the zero-crossing moments of joint accelerations. Fod et al. [19] and

Kim et al. [30] used this idea to segment a motion sequence, and classified the resulting

motion segments based on principal component analysis (PCA) and K-means clustering.

On top of zero-crossing moments, Kim et al. exploited rhythmic patterns called ”motion

beats” for motion segmentation and classification, which are hard to be generalized for

other types of motions.

Arikan et al. [5] employed a support vector machine classifier to interactively anno-

tate motion data. Although this semi-automatic classifier works well for motion annota-

tion, the per-pose annotation scheme may not be precise enough to capture the transi-

tion moments between different types of actions. We get around this problem by annotat-

ing motion streams at the granularity of actions rather than individual poses. Jenkins and

Mataric [29] and Barbit et al. [8] proposed automatic motion classification methods. How-

ever, their goals are not automatic motion blending and transition. Exploiting the tem-

poral correspondence between motions, Kovar and Gleicher [31] proposed an automatic

scheme to extract motion segments that are similar to a query motion. This scheme as-

6

sumes the availability of query motions. As pointed out by the authors, the scheme some-

times requires manual filtering to obtain blendable motions and does not take into account

motion transition. Muller et al. [44] proposed a motion search method in which the char-

acteristics of a motion are specified by a collection of Boolean functions. For timewarping,

they captured the moments where the values of these functions change. A similar idea was

also used to classify locomotions in Kwon and Shin [34], where a rule-based classifier was

provided based on a symbolic representation scheme of locomotion phases. We propose

a hybrid scheme for motion classification that utilizes both support vector machines and

rule-based classifiers.

1.3.2 Locomotion Control

Motion control has been a recurring theme in character animation, crowd simulation, and

robotics. Our search focuses on the work directly related to on-line locomotion control.

Force-based control : Reynolds [50] adopted a vehicle model to simulate typical steer-

ing behaviors for simple autonomous creatures in an on-line manner. These creatures were

abstracted as simple vehicles, that is, oriented particles with their own dynamics. The

steering behaviors were achieved by integrating applied forces. The vehicle model was fur-

ther extended by incorporating real-time path planning [25]. We also use a vehicle model

to obtain a smooth, feasible trajectory, exploiting example motion data.

Based on a social force model of Helbing and Molner [27], Metoyer and Hodgins [42]

simulated reactive pedestrians. Interpreting a pedestrian as a vehicle, the 2D trajectory

of the pedestrian was generated by integrating the social force field. Teuille et al. [58]

presented a more sophisticated potential field for the similar purpose. Together with the

potential field, motion capture data were used for motion synthesis. Based on force inte-

gration, the force-based approaches were able to yield smooth trajectories. However, these

approaches did not allow for task-level on-line control. Moreover, it is hard to reflect nat-

ural human steering behavior with these approaches. We adopt the idea of force integra-

tion for trajectory refinement.

Position-based control : In this category of approaches, a point stream is sampled di-

rectly for on-line locomotion control, based on motion blending. Early work by Guo and

Roberge [26], Wily and Hahn [60], and Rose et al. [51] laid the ground for on-line mo-

tion control. Park et al. [46,47] extended the early work for on-line locomotion synthesis.

Guided by a motion trajectory, they were able to synthesize a locomotion stream in an

7

on-line, real-time manner. With a sequence of supporting foot positions sampled along

a user-specified trajectory, their work demonstrated an on-line path-following capability.

Mukai et al. [43] further extended this work based on geostatistics. Recently, Kwon and

Shin [34] presented a framework of on-line, real-time locomotion synthesis. This frame-

work automated the whole process of locomotion modeling and synthesis, including mo-

tion segmentation and classification. The pelvis trajectory was specified in an on-line man-

ner by integrating a stream of 2D displacements on the ground. In general, a position-

based approach was not able to yield a smooth trajectory. In addition, the trajectory was

also lacking in naturalness. We refine such a trajectory by imitating the example motion

data.

1.3.3 Two-character Interactions

Interactions between characters are extremely important for synthesis of believable mo-

tions involving two or more characters. Zordan and Hodgins [62] incorporated physical

simulation into data-driven animation, to synthesize reactive upperbody motions of a char-

acter to various impacts by the opponent in boxing and table tennis. Zordan et al. [63]

extended this approach to full-body motions. Arikan et al. [6] suggested a method to dis-

criminate realistic deformation from its unrealistic versions for example reactive motions

to pushing. The methods in this category did not deal with mutual interactions although

they were able to synthesize various reactive motions to external forces.

Kim et al. [30] demonstrated an impressive animated scene of ballroom dancing by

multiple couples, each being considered as a single entity with more than twice as many

degrees of freedom of a single character. Lai et al. [35] proposed a group motion graph

that represents the positional configurations among multiple characters. However, the au-

thors did not deal with interactions among the characters.

Guided by examples of coupled dancing motions, Hsu et al. [28] generated the motion

of the synthetic partner for a dancer using motion capture data. The motion of the dancer

was used as a control signal to search for an example motion segment from databases.

However, this approach is hard to be applied to martial art motions where the two char-

acters are interacting with each other in a rather asynchronous manner both in time and

space. Liu et al. [40] presented a physics-based method for creating multi-character mo-

tions from short single-character sequences. The authors formulated multi-character mo-

tion synthesis as a spacetime optimization problem where constraints represent the desired

character interactions. However, this paper did not address how to choose input motions

that can be paired. Moreover, the formulation did not allow a real-time task-level control

8

in an on-line manner.

A coupled hidden markov model (CHMM) [12, 11] is commonly used to model the

cross dependencies between two or more synchronous signals, for example, audio/visual

signals [14]. However, martial arts are performed by a pair of players in a rather asyn-

chronous manner with a variety of motions for each player. Since such asynchronous be-

havior is hard to model with the CHMM, we propose an asynchronously-coupled dynamic

Bayesian network to model interactions between the two players.

1.4 Contributions

Synthesizing appealing motions of human-like characters in an on-line, real-time manner

is an important issue in the context of computer games and virtual environments. In this

thesis, we present example-based methods for generating locomotions and interactive mo-

tions of human-like characters, in an on-line, real-time manner. Our technical contribu-

tions are three-fold : motion labeling, motion prescription and interaction modeling.

First, we propose a novel motion labeling scheme that spans both motion segmentation

and classification. We construct a hierarchical motion transition graph reflecting the cyclic

nature of locomotion. To our knowledge, the proposed approach provides the first auto-

matic labeling scheme for locomotive motions allowing both motion blending and transi-

tion. This scheme is further extended for two-character motions to construct a coupled

motion transition graph reflecting interaction behavior between two characters. Combin-

ing the advantages of the support vector machine classifiers and rule-based classifiers, our

motion labeling scheme is general enough to be conveyed to other footstep driven motions

as well.

Next, we present a novel data-driven scheme for prescribing a desired locomotion in

an intuitive manner. For on-line applications, the user commonly prescribes a motion by

providing the trajectory for a character to follow together with a motion type. It can be

easily specified with an input device such as a mouse. However, the user-specified trajec-

tory is far from a natural human trajectory since it is probably jerky and lacks in human

characteristics such as oscillations and curvature variations caused by pelvis movements.

We propose a novel data-driven scheme for transforming a user-prescribed trajectory to a

human trajectory in an on-line manner. Employing a vehicle model, our scheme produces

a smooth, feasible trajectory. By imitating the example motion data on top of the vehicle

trajectory, a natural human trajectory can be generated efficiently.

We finally propose an example-based method for capturing the interactions between

9

example motions

motion specifications

motionmodeling

behavior modeling

motion synthesisoutput motions

analysis

synthesis

example motions

motion specifications

motionmodeling

behavior modeling

motion synthesisoutput motions

analysis

synthesis

Figure 1.1: Overall structure

two fighters embedded in the example motion stream. Martial arts are performed by a

pair of players in a rather asynchronous manner. To model such asynchronous interac-

tions between the two players, we propose an asynchronously-coupled dynamic Bayesian

network. Based on the dynamic Bayesian network, our method can reproduce captured

interactions such that each action by a character is accompanied by a proper reaction by

the counterpart, and vice versa.

1.5 Overview

As illustrated in Figure 1.1, our framework for on-line motion generation consists of two

major components: motion analysis and motion synthesis. Motion analysis step is com-

posed of two tasks, motion modeling and behavior modeling. In motion modeling, unla-

beled example motion sequences of a human-like articulated figure are first decomposed

into motion segments based on the contact forces against the ground. Those motion seg-

ments are then classified into groups of actions such that the motions in the same group

share an identical structure, exploiting the biomechanical observations on footstep pat-

terns. Finally, a motion transition graph is constructed from example motion data, where

a node represents a group of motion segments of a similar structure, and an edge repre-

sents a transition between a pair of groups, respectively. In behavior modeling, we build a

motion transition model that connects motion segments according to on-line motion spec-

ifications, while adapting to the (time-varying) environment. In motion synthesis, given a

stream of motion specifications in an on-line manner, our system generates a correspond-

ing motion while traversing the motion transition graph guided by the motion transition

model. In what follows, we first explain our framework in the context of locomotive mo-

10

tions, and then generalize it for two-character motions.

Locomotive motions: In motion modeling, we construct a hierarchical motion tran-

sition graph from example locomotion sequences to incorporate a motion hierarchy and

transition nodes. The motion hierarchy represents the cyclic nature of locomotion: cyclic

motions at the coarse level such as running and walking are represented by combining

primitive actions at the fine level, each of which encodes a footstep pattern. The transi-

tion nodes are for seamless transitions among locomotive motions. In behavior modeling,

characteristics of human steering behavior are extracted from example motion data. The

extracted information includes bounds on speed and acceleration along pelvis trajectories

and details of trajectories such as positional and orientational pelvis oscillations.

In runtime synthesis, an input point stream goes through three steps for trajectory

refinement: force extraction, clamping outliers, and adding details. The first step extracts

a force profile from an input point stream. In the second step, the force profile is inte-

grated to produce a smooth vehicle trajectory, while clamping speed and acceleration at

each frame. The last step adds details to the vehicle trajectory to produce a natural hu-

man locomotion. The three steps are performed frame by frame in an on-line manner,

while referring to the information extracted during analysis.

Two-character motions: We further extends our online locomotion generation frame-

work to other footstep-driven motions, in particular, motions for martial arts performed

by a pair of players. In motion analysis, a stream of example motions is transformed into

a coupled motion transition graph in order to build an interaction model (or a motion

transition model); given a pair of single-player motion transition graphs, two nodes in dif-

ferent graphs are connected by a cross edge if the action group for one node is followed by

that for the other with a significantly-large probability. We then model the interactions

between players with a dynamic Bayesian network [16,45].

In motion synthesis, the coupled motion transition graph is traversed in an on-line

manner, possibly in accordance with a stream of motion specifications, if any, for an

avatar. While generating a motion for a character, the motion transition graph is accessed

to search for a proper reaction (or counteraction) for the other character in a probabilis-

tic manner, guided by the interaction (or motion transition) model that is built on the

Bayesian network. The next motion for the other character is coupled with the motion of

the current character in both space and time for realistic motion synthesis.

The remainder of the thesis is organized as follows: In Chapter 2, we describe how to

analyze and synthesize locomotive motions. In Chapter 3, we present how to capture the

11

interactions between two players from an example two-player motion stream. We show

results in Chapter 4 and discuss the weaknesses and limitations of our approach in Chap-

ter 5. Chapter 6 concludes the thesis with some future research topics.

12

2. Locomotion Synthesis

2.1 Motion Modeling

In this section, we propose an automatic method to construct a motion transition graph,

given unlabeled locomotion data. The locomotion data is first decomposed into motion

segments, and then these segments are classified into groups of actions of identical struc-

tures. The collection of action groups and their connectivity are mapped onto the node

and edge sets of a motion transition graph, respectively.

2.1.1 Preliminaries

We represent a captured (unlabeled) human motion M as a sequence of postures sam-

pled at discrete times called frames. The posture at each frame is described by a tuple,

(p,q1,q2, · · · ,qJ), where p ∈ R3 and q1 ∈ S3 specify the position and orientation of the

root, which is the pelvis in our case, qj ∈ S3 gives the orientation of joint j, and J is the

number of joints in M.

Motion Half Cycles : Our criteria for motion segmentation are three-fold:

1. Every motion segment should be simple enough to have an intuitive parametrization.

2. Every motion segment should be long enough to contain meaningful motion seman-

tics.

3. An important motion feature should not be split into consecutive motion segments.

Locomotive motions such as walking and running exhibit an inherent cyclic nature.

Because of this nature, locomotion cycles would be apparent candidates for segmentation

units that satisfy these criteria. However, each cycle of such a motion is composed of two

half cycles initiated by left and right footsteps, respectively. Moreover, the characteristics

of the half cycles are quite different to violate criterion 1 if they are parameterized as a

single unit. Hence, we choose motion half cycles as basic segmentation units for cyclic

motions such as walking and running.

13

Figure 2.1: The magnitude of the contact force and corresponding poses

Biomechanical Observations : For motion segmentation, we rely on biomechanical

literature [48,61] , guided by criterion 3. As illustrated in Figure 2.1, it is well-known in

biomechanics that motions such as walking and running have quite different contact force

patterns.

A running motion is composed of two stages: constrained and unconstrained stages. In

the constrained stage, one of the feet contacts the ground, which causes physical interac-

tion involving contact and friction forces. Thus, this stage has important motion features.

In the unconstrained stage, neither foot contacts the ground, thus the magnitude of the

contact force has a single local minimum in this stage. The motion segment, which is de-

lineated by the frames with a pair of consecutive peaks of unconstrained stages, forms a

half cycle of a run motion. This half cycle completely contains a constrained stage with

important motion features, thus satisfying criterion 3.

Unlike a running motion, a walking motion consists of a single constrained stage, and

thus yields a quite different contact force profile. By a similar analysis to a running mo-

tion, however, we can see that a pair of consecutive local minima (at the mid-stance of a

single foot support phase) of constrained stages delineate a half cycle containing impor-

tant motion features(at a double foot support phase).

A contact force minimum occurs near the middle of a single limb support phase of a

constrained stage or a flight phase (with no limbs supported) of an unconstrained stage.

Thus, we can identify the type of locomotion at the minimum, which will be exploited in

14

motion classification. On the contrary, critical interactions with the ground occur either

in the double limb support phase while walking or in the single limb support phase while

running, thus resulting in important motion features.

A transition motion between two different motions shares the biomechanical charac-

teristics with both of the motions. For example, the first half of a walk-to-run transition

motion resembles a walking motion and the last half resembles a running motion, as il-

lustrated in Figure 2.1. The COM further sinks down to a valley at the constrained stage

after the walking motion to store the energy for the unconstrained stage of the running

motion.

2.1.2 Motion Segmentation

The center of mass (COM) of a human performer encodes important information for seg-

menting an unlabeled motion sequence M. Let cM(t) be the COM trajectory that gives

the COM at time t, that is,

cM(t) =∑

i miri(t)∑i mi

, (2.1)

where mi and ri(t) are the mass and COM position, respectively, for link i of the artic-

ulated figure at time t. The mass of each link given in Table 2.1 is obtained based on

biomechanical data in [61,56].

motionsegment

motionsegment

Figure 2.2: Contact force and motion segmentation

In order to handle diverse, dynamic motions such as punches, kicks, and jumps as well

as moves, we exploit the contact force of the player against the ground. The acceleration

of the COM movement at frame t, denoted by aM (t), is given by the second derivative

of CM (t), that is,

aM (t) =d2CM (t)

dt2. (2.2)

Therefore, contact force F (t) can be obtained as follows

F (t) = m · aM (t) − m · g, (2.3)

15

Table 2.1: Masses for links of our articulated figurelink name mass link name mass

pelvis 1.0 chest 1.5

neck 0.5 head 1.5

left shoulder 0.2 right shoulder 0.2

left upper arm 0.4 right upper arm 0.4

left lower arm 0.4 right lower arm 0.4

left hand 0.2 right hand 0.2

left upper leg 1.0 right upper leg 1.0

left lower leg 1.0 right lower leg 1.0

left foot 0.3 right foot 0.3

where m =∑

i mi is the total mass of the player, and g denotes the acceleration by

gravity.

Without loss of generality, suppose that the motion sequence M is initiated and ended

by a pair of local minima of contact forces. Otherwise, we can cut off the initial and final

segments, each of which is a short segment. Every motion segment, which is delineated by

a pair of consecutive local minima, is not necessarily a half cycle since we need to consider

non-cyclic motions such as standing motions and transitions motions.

Let T = {t0, t1, · · · , tr} be the sequence of minima embedded in F(t). As shown in

Figure 2.2, example motion stream M is segmented at every frame ti, 1 ≤ i ≤ r, where

contact force F(ti) exhibits a local minimum. We first identify all standing motion seg-

ments by detecting every frame with a stand pose. Excluding the standing motion seg-

ments, we cut the remaining portion of M into motion segments with the sequence of min-

ima. Together with the standing motion segments, they give the set of motion segments,

S = {s0, s1, · · · , sk} such that M = s0||s1|| · · · ||sk, where || is a concatenation operator.

In principle, si, 0 ≤ i ≤ k should be a half cycle if it is neither a standing motion segment

nor a transition motion. However, this may not be true in practice due to the approxi-

mation error of the contact forces and the noise from motion capture. We will return to

this issue while classifying the motion segments in S.

2.1.3 Locomotive Motion Classification

In this section, we describe how to classify the set of motion segments, S = {s0, s1, · · · , sk}into a collection of groups of actions of identical structures so that the actions in the

16

(a) S (b) R (c) L (d) D (e) F(a) S (b) R (c) L (d) D (e) F

Figure 2.3: The representative pose of each phase

same group are blendable without artifacts. To achieve this goal, we exploit biomechanical

observations on footstep patterns.

Footstep Patterns : Footsteps characterize locomotive motion stages. In particular, a

foot contacts the ground in a constrained stage, and neither foot contacts the ground in

an unconstrained stage. The unconstrained stage has a single flight phase. However, de-

pending on the foot (feet) contacting the ground, the constrained stage is classified into

three phases: a left foot support phase, a right foot support phase, and a double support

phase. To address a standing motion, we add a stand phase. In summary, we have five

locomotion phases characterized with footstep patterns: a left foot support phase (L), a

right foot support phase(R), a double support phase(D), a flight phase(F), and a stand

phase(S), where the symbols in parentheses denote the corresponding phases. The repre-

sentative pose of each phase is illustrated in Figure 2.3.

String Mapping : Our strategy of motion labeling is to encode each motion segment

in S as a string of symbols in Σ = {L, R, D, F, S}, and to classify the motion segments

with the same string into the same group. The reason that we use the strings rather than

the motion segments is two-fold: First, we can avoid troublesome time-warping to align

the characteristic features. Second, string processing is more robust than numerical com-

putation. To assign a string to a motion segment, we first extract the footsteps from the

motion segment, employing the constraint detection scheme in [41], based on the observa-

tion that a contact foot on the ground maintains its height at the ground level and zero

17

velocity for some consecutive frames. This scheme worked so well that few footsteps are

missing in our experiments.

Given the footstep sequence, we can identify all phases in each motion segment. We

first mark the portion of the motion M containing the S phases, based on the fact that an

S phase consists of a sequence of poses with a fixed COM position that lasts longer than a

threshold time. We then detect the F and D phases to mark the remaining portion of M.

A distinct D phase occurs when both feet contact the ground for a time interval. In an F

phase, the feet are above the ground, which one might expect to be easily recognizable.

In practice, some D phases could be misclassified into F phases at times, since a foot

contact status is checked rather conservatively. To prevent this misclassification, we use

the contact force together with the foot contact status. Suppose that no foot contact the

ground for a portion of M. If the minimum of the contact force is closer to the center of

the portion than the maximum is, then this portion is identified as an F phase. Otherwise,

it is identified as a D phase. Finally, we identify the L and the R phases to further flag

the unmarked portion of M. In both phases, only one foot contacts the ground. While

scanning the motion segment, we divide it into phases and assign a symbol to each phase

to obtain a string.

We would have a total of 17 strings encoding all actions as summarized in Table 2.2.

For example, a walking motion has two strings, LDR and RDL, representing half cycles

beginning with left and right support feet, respectively. These half cycles end with right

and left support feet, respectively after a double support phase. A running motion has

two strings, FLF and FRF, representing half cycles beginning with flight phase followed

by left and right support feet, respectively. A walk-to-run transition motion also has two

strings, LDRF and RDLF, which are not half cycles. These strings show the characteristics

of transition motions; in particular, the three-symbol prefix of each string encodes a walk

motion, and the single-symbol suffix a run motion. We show a few examples of string

encodings in Figure 2.4. All strings are self-explanatory, and thus are not further described

here.

Refinement : Ideally, a motion segment would be an action, that is, a string encod-

ing a motion segment would be a string listed in Table 2.2. However, we may have un-

expected strings that encode some motion segments in S. Therefore, those strings should

be post-processed. Remember that the unlabeled motion sequence M consists of motion

segments si, 0 ≤ i ≤ k in S, that is, M = s0||s1|| · · · ||sk. Without loss of generality, the

string f(si), 0 ≤ i ≤ k that encodes a motion segment si is not in Table 2.2. Consider

a tuple, (f(si−1), f(si), f(si+1)), where f(s−1) = f(sk+1) = ε denotes an empty string.

18

(a) R D L

(b) F R F

(c) R D L F

Figure 2.4: Examples of string encodings

19

Table 2.2: The strings for actions

strings

LDR

RDL

FLFFRF

S

LDRF

RDLF

FRDL

FLDR

LDRS

RDLS

SRDL

SLDR

SLDRF

SRDLF

FRDLS

FLDRS

type code

motion type

action (ground contact status)

half cycle

WK

RN

ST

WR

RW

WS

SW

SR

RS

walk

run

stand

walk to run transition

walk to stand transition

run to walk transition

stand to walk transition

stand to run transition

run to stand transition

initial support by left foot

initial support by right footmiddle support by left foot

middle support by right foot

-initial support by left foot

initial support by right foot

final support by left foot

final support by right foot









yes

yesyesyes

nono

nono

no

no

no

no

no

no

no

no

no

strings

LDR

RDL

FLFFRF

S

LDRF

RDLF

FRDL

FLDR

LDRS

RDLS

SRDL

SLDR

SLDRF

SRDLF

FRDLS

FLDRS

type code

motion type

action (ground contact status)

half cycle

WK

RN

ST

WR

RW

WS

SW

SR

RS

walk

run

stand

walk to run transition

walk to stand transition

run to walk transition

stand to walk transition

stand to run transition

run to stand transition


initial support by right footmiddle support by left foot

middle support by right foot

-initial support by left foot












yes

yesyesyes

nono

nono

no

no

no

no

no

no

no

no

no

Empirically, such a motion segment si is most likely from a high-to-low speed transition

such as a run-to-walk, a run-to-stand, or a walk-to-stand transition. Specifically, a small

contact force minimum may occur while making the transition. Such a minimum would

divide a transition segment into smaller ones, resulting in invalid strings. Therefore, we

concatenate these strings by ignoring the troublesome minimum. The new string will be

f(si−1||si) or f(si||si+1), whichever is a valid string encoding a high-to-low speed transi-

tion.

An invalid string also is produced very rarely in a low-to-high transition by missing a

contact force minimum in a single foot or a flight phase. In this case, we split si into two

parts, s1i and s2

i such that f(s1i ) and f(s2

i ) are valid, by enumerating all possible cases.

After the suggested string refinement, we rarely encounter invalid strings. The remain-

ing invalid strings, if any, are discarded. Further refinement would result in a motion seg-

ment, whose contact force profile is quite different from the others with the same string,

which would later cause artifacts in motion blending.

Let B be the sequence of motion segments after refinement, that is,

B = (m0,m1, · · · ,mb) . (2.4)

M is not necessarily equal to m0||m1|| · · · ||mb since some motion segments si, 1 ≤ i ≤ k

20

in M could be discarded during refinement. However, mi, 0 ≤ i ≤ b is an action, and

thus f(mi) is a valid string. By string mapping and refinement, we have classified the

motion segments in B into a maximum of 17 groups of actions such that the actions in

each group share the same string. For future reference, we denote C as the collection of

such action groups, that is,

C = (g0,g1, · · · ,gc) , (2.5)

where gi, 0 ≤ i ≤ c < 17 is a group of motion segments.

2.1.4 Motion Parametrization

In this section, we first describe how to parameterize a motion segment and then explain

how to extract the keytimes for feature alignment.

Parametrization : A motion segment can be parameterized in different ways depending

on how it will be used. Our objective for motion parametrization is on-line locomotion

control. Therefore, motion parameters should include the information such as the type of

motion, the destination or direction of the motion, and the speed of the motion.

We identify a motion segment m in the set B with a parameter vector,

p(m) = (t(m), ft(m), v(m), az(m), ax(m)) , (2.6)

where t(m), ft(m), v(m), az(m) and ax(m) denote the motion type, the foot contact sta-

tus, the speed, the tangential acceleration and the lateral acceleration of m, respectively.

The motion type and the foot contact status are directly obtained from Table 2.2 by table

search with the string representing m. We will explain later how to derive v(m), θ(m)

and a(m).

Let {xte} be a trajectory that well characterizes the example locomotion M; we will

explain later how to extract such trajectory. We extract the parameters of motion segment

m from trajectory {xte}. The speed v(m) is defined as follows:

v(m) = ||xse||, (2.7)

where s is the start frame of m and xse is the first derivative of xt

e at time s. v(m)

characterizes the speed along xte.

Letting Tm(j) = xje

∥xje∥

, the tangential acceleration parameter az(m) is given by

az(m) = xse · Tm(s). (2.8)

21

(a)(a)(a)(a) (b)(b)(b)(b)(a)(a)(a)(a) (b)(b)(b)(b)

Figure 2.5: Motion transition graph

By incorporating az(m), forward (or backward) leaning arising from sudden speed change

is implicitly parameterized.

Finally, the lateral acceleration ax(t) is defined as

ax(m) = xse ·

((0, 1, 0)T × Tm(s)

). (2.9)

ax(m) approximates the average turning speed for m, and also reflects implicitly lateral

leaning due to turning.

Keytime Extraction : Every keytime of a motion segment is a moment at which a

motion feature occurs. The keytimes in locomotion are the moments of heel-strikes and

toe-offs, which are used to time-warp the motions to be blended for feature alignment.

These occur only at the start and end frames of every L or R phase, which has already

been identified. Thus, it is easy to detect the keytimes. Since every contact force minimum

initiates or ends a motion segment, the minima can also be considered as keytimes.

2.1.5 Hierarchical Motion Transition Graph

To reflect the cyclic nature of locomotion, we propose a motion transition graph with

two-level hierarchy as shown in Figure 2.5. The building blocks at the coarse level are

locomotive motions and nine transition motions (see Figure 2.5 (a)), while those at the

fine level are motion segments (see Figure 2.5 (b)).

Let G = (N (Nc,Nf ),A(Ac,Af )) denote a motion transition graph, where (Nc,Ac)

represents the conventional motion transition graph [46, 47, 30]. The node set N consists

22

of two type of node sets, that is, the node set Nc and node set Nf , which represent the

building blocks at the coarse level and those at the fine level, respectively. Accordingly,

the edge set A is composed of two types of edge sets, the edge set Ac and the edge set

Af , which connects their corresponding nodes, respectively.

Our strategy is first to construct the fine-level graph according to B and C as defined

in Equations 2.4 and 2.5, respectively, and then to complete the graph by adding the

structure at the coarse level as illustrated in Figure 2.5 (a).

Guided by Table 2.2, we initially construct the fine-level graph as shown in Figure 2.5.

Every action group in C gives rise to a node in Nf . If there is a pair of motion segments,

mp and mq in B such that mp ∈ gi, mq ∈ gj , and mp||mq ∈ M, then (mp,mq) is mapped

onto an edge in Af if it is not yet added.

To add the coarse-level structure composed of Nc and Ac, we first classify the fine-level

nodes in Nf into the coarse-level nodes in Nc according to Table 2.2. Then, the coarse-

level edges in Ac are obtained accordingly. Specifically for an ordered pair of coarse-level

nodes (g, h), (g, h) is an edge in Ac, if there are a pair of fine-level nodes x and y such

that x and y are respectively classified into g and h, and (x, y) is an edge in Af .

Finally we point out that our graph construction scheme can be easily extended for

multiple unlabeled motion sequences since they can be concatenated into a single sequence.

2.2 Example Motion Analysis

Although rich researches have been done in locomotion synthesis, the steering behavior of

human-like character has not been addressed well. To get a concrete feel, consider Figure

2.6. Figures 2.6 (a) and (b) show how user-specified trajectories (colored red) are differ-

ent from actual human trajectories (colored green) during straight walking and running.

The oscillations of the actual trajectories are due to the pelvis movements (rotations and

translations) caused by supporting feet. Figures 2.6 (c) and 2.6 (d) show the variations

of actual pelvis trajectories during curved walking and running. Such oscillations or cur-

vature variations are the unique characteristics of human steering behavior. Simply plac-

ing the pelvis along a user-specified trajectory would not produce a natural motion. This

immediately raises an issue: how to incorporate these characteristics into a user-specified

trajectory.

In this section, we analyze the pelvis trajectory of an example motion stream to obtain

the trajectory of a simplified vehicle and motion details along the vehicle trajectory. A

23

Figure 2.6: Pelvis trajectories

pelvis trajectory can be reproduced by adding the motion details to the vehicle trajectory.

In this sense, the trajectory analysis stage can be viewed as the inverse of the synthesis

process. We also extract bounds on speed and acceleration for clamping outliers along the

vehicle trajectory.

2.2.1 Vehicle Model

For ease of control, we abstract a human-like character as a particle with an orientation.

The particle is constrained to move on the floor. The orientation of the particle is aligned

with its velocity. Such a particle is called as a vehicle [15, 50,25].

The state of a vehicle at time t is specified by an ordered tuple, (xt, xt, θt), where

xt, xt, and θt denote the position, velocity, and orientation of the vehicle, respectively.

The orientation of a vehicle θt is defined as a unit quaternion:

θt =(h · xt

||xt||,h × xt

||xt||

), (2.10)

where h is the unit halfway vector between (0, 0, 1)T and xt/||xt||. Assuming that the

vehicle has a unit mass, the state of the vehicle evolves as follows

xt = xt−1 + f t−1;xt = xt−1 + xt−1.

24

(a) pelvis trajectory (b) vehicle trajectory

(c) L-trajectory (b) R-trajectory

(a) pelvis trajectory (b) vehicle trajectory

(c) L-trajectory (b) R-trajectory

Figure 2.7: Vehicle trajectory estimation

When xt is a zero vector, θt is not well-defined. In this case, θt is set to the most recent

valid orientation before time t.

2.2.2 Example Vehicle Trajectories

A single locomotion stream consists of one or more cyclic motions such as walking and

running together with transition motions between different kinds of motions. The whole

stream is delimited by a pair of standing motions. Without loss of generality, there is

no intervening standing motion in the stream. For a cyclic motion, the characteristics

of steering behavior are repeating at the frequency of a full cycle. We make use of the

cyclic nature of locomotion to extract a vehicle trajectory from the pelvis trajectory for

the motion stream.

25

Figure 2.8: Speed profile.

As illustrated in Figure 2.7, our basic idea is to construct a pair of curves, called L-

trajectory (colored red) and R-trajectory (colored blue), and then to average these curves

to obtain a vehicle trajectory (colored black). In order to compute L-trajectory, our scheme

samples the position of the pelvis at the first frame of every cycle (L-cycle) that is initi-

ated by a left supporting foot. An L-cycle starts at the middle frame of a left support-

ing foot phase. Let {y(ti)}ni=0 be the sequence of all pelvis positions projected onto the

ground. ti, 0 ≤ i ≤ n denotes the time (frame) at which the i-th position is sampled.

We find a non-uniform cubic spline curve interpolating {y(ti)}ni=0 with its knot sequence

{ti}ni=0 to obtain L-trajectory. We will explain in detail in the next section. R-trajectory

can be constructed in the symmetrical manner.

Let xL(t) and xR(t), t0 ≤ t ≤ tn denote L-trajectory and R-trajectory, respectively.

Both curves, xL(t) and xR(t) are parameterized with the frame numbers of the same mo-

tion stream. Moreover, the curves share the same knot values at the end points. There-

fore, their average is well-defined. Our vehicle trajectory, denoted by xe(t), is obtained

by averaging the two curves: xe(t) = 1/2 (xL(t) + xR(t)). Since both xL(t) and xR(t) are

C2-continuous, xe(t) is also C2-continuous. As shown in Figure 2.7 (b) and Figure 2.8,

the vehicle trajectory, xe(t) (colored black) is a smooth, has little oscillations and curva-

ture variations, and lies between L-trajectory xL(t) and R-trajectory xR(t).

Curve Fitting Given a sequence of control points {y(ti)}ni=0 and the knot sequence

{ti}ni=0, we describe how to construct a piece-wise cubic non-uniform spline curve with

C2 continuity [9]. Since there are n + 1 control points, we have n cubic polynomial curve

26

segments

yi(t) = ai + bit + cit2 + dit

3, i = 0, 1, ..., n − 1,

for t ∈ [0, (ti+1 − ti)]. We have to determine 4n unknown coefficients. From the C2 con-

tinuity conditions, we have 4(n − 1) linear equality equations:

yi−1(ti − ti−1) = yi(0),

y(ti) = yi(0),

y′i−1(ti − ti−1) = y′

i(0), (2.11)

y′′i−1(ti − ti−1) = y′′

i (0),

for all interior points y(ti), i ∈ 1, 2, · · · , n − 1. Moreover, the curve interpolates the end

points y(t0) and y(tn+1):

y0(0) = y(t0),

yn−1(tn − tn−1) = y(tn). (2.12)

Thus, we obtain additional two equations. Since a motion stream is delimited by a pair

of standing motion, the first derivatives at the end points vanishes, which yields two more

equations:

y′0(t + 0) = 0 and y′

n−1(tn − tn−1) = 0 (2.13)

Using 4n equality equations in Equations (2.11), (2.12), and (2.13), we can determine 4n

unknown coefficients.

2.2.3 Motion Details

Choosing the center of the pelvis as the root, the pose of a character at every frame i

is represented as a tuple(pi

0,qi0,q

i1, · · · ,qi

m

), where pi

0 and qi0 respectively denote the

position and orientation of the root at frame i, and qij , 1 ≤ j ≤ m is the orientation of

joint j at frame i. All orientations are specified in unit quaternions. Given the joint con-

figuration(pi

0,qi0,q

i1, · · · ,qi

m

), the details of the pelvis trajectory consist of the position

and orientation displacements between the pelvis (root) and the vehicle at all frames.

In order to represent the orientation displacement at each frame i, the pelvis orienta-

tion qi0 at frame i is decomposed into two parts, that is, the rotational part about the

vertical axis denoted by qivert and the remainder qi

offset such that qi0 = qi

vert · qioffset.

Here, qioffset are expressed in a coordinate invariant manner. The vertical orientation dis-

placement between the pelvis and the vehicle is given by dqivert = qi

vert ·(θi

e

)−1 Therefore,

θie =

(dqi

vert

)−1 · qi0 ·

(qi

offset

)−1

27

or

qi0 = dqi

vert · θie · qi

offset.

Thus, the orientation displacement is characterized by(dqi

vert,qioffset

).

The position displacement between the pelvis and the vehicle at frame i is given by

pi0 − xi

e where xie = xe(i). We represent it in the local coordinate of the vehicle for effec-

tively adding details to the vehicle trajectory:

(0, dxie)

T =(θi

e

)−1 · (0,pi0 − xi

e)T · θi

e,

where dxie is the position displacement in the local coordinate frame. In summary, the

motion details at each frame i is given by(dxi

e, dqivert,q

ioffset

).

Now, a final remark is in order: When a example motion stream is segmented, the

extracted vehicle trajectory and the motion details are also segmented accordingly. Thus,

every motion segment has its own vehicle trajectory and motion details.

2.2.4 Speed and Acceleration bounds

A raw captured vehicle trajectory suffers from undesirable characteristics: lack of smooth-

ness, unrealistically-large speed and acceleration, and sudden changes in the force profile.

In this section, we describe how to learn bounds on speed, acceleration, and inter-frame

acceleration changes from example motion data to clamp outliers for later trajectory re-

finement.

Interframe Acceleration Variations: In our model, a vehicle is an oriented particle

with a unit mass. Therefore, xte = f t for all t, where ft is the force that is applied to the

particle at time t. The acceleration difference between a pair of consecutive frames reflects

an instantaneous force change, which should be bounded within human ability, that is,

∥ xte − xt−1

e ∥≤ uf .

The bound uf for a group of example motion segments is obtained by taking the maximum

magnitude of acceleration differences between consecutive frames for all motion data in the

group.

28

0 1 2 3 4

0.2

0.1

0

-0.1

-0.2

2 3 4

0.2

0.1

0

-0.1

-0.2

ex& ex&

xa za

0 1 2 3 4

0.2

0.1

0

-0.1

-0.2

2 3 4

0.2

0.1

0

-0.1

-0.2

ex& ex& ex& ex&

xa za

Figure 2.9: Acceleration bounds - RUN.

0.08

0.04

0

0.04

0.08

0 0.5 1 1.5 2 2.5 3 ex& ex& 0.5 1 1.5 2 2.5 ex& ex&

0.08

0.04

0

0.04

0.08

xa za

Figure 2.10: Acceleration bounds - WALK.

29

Acceleration Bounds: Since a vehicle is constrained to move on the floor, an acceler-

ation xe has two components : a tangential component az and a lateral component ax,

where

az = xe · (xe/||xe||) , and

ax = xe ·[(0, 1, 0)T × (xe/||xe||)

]. (2.14)

Since xte = f t, xt

e is bounded for all t. Thus, both ax and az is bounded. For cyclic

motions such as walking and running, the two components show quite different behaviors

with respect to the vehicle speed ||xe|| as shown in Figure 2.9 and 2.10. The tangential

component is bounded within a fixed interval that contains the average regardless of ||xe||,while the variations of the lateral component are amplified as ||xe|| increases. ax represents

a centripetal acceleration when the vehicle moves along a circular path. For such a circular

motion, the turning speed is given by ax/||xe||. Since the turning speed is not arbitrarily

large, az is linearly bounded by ||xte||. This supports our intuition that pelvis rotations

result in motion details along the vehicle trajectory.

Based on this observation, we estimate different bounds on these components:

az − cz ≤ atz ≤ az + cz for all t, and

|atx| ≤ cx · ||xt

e|| for all t.

Here, az is the average of the tangential components for all frames in the example motion

data. cz and cx are chosen so that 90% of the tangential and lateral components lie within

the bounds.

For transition motions such as standing-to-walking and walking-to-running, the charac-

teristics of a motion at the both extreme frames reflect different types of motions. Thus,

we estimate the bounds of the components at the both extremes and interpolate the re-

sults to obtain the time-varying bounds.

Speed bounds: For cyclic motions such as walking and running, the speed variations

over time is insignificant, assuming that motion segments are short, for example, spanning

a half cycle [34] or a full cycle [46, 43]. In this case, speed variations could be linearly

approximated even for transition motions. Thus, both upper and lower speed bounds for

a group of motion segments are estimated at the initial and final frames, from which the

time-varying speed bounds over all frames are obtained by linear interpolation. At the

initial frame, the speed bounds can be acquired by finding the minimum and maximum

speeds among all initial frames of motion segments in the group. Those at the final frame

can also be computed symmetrically.

30

Figure 2.11: Block diagram

2.3 Locomotion Control

In this section, we describe how to refine a stream of raw-sampled 2D positions to ob-

tain a human trajectory. The 2D positions are sampled at the rate of 30Hz in an on-line

manner with an interactive input device such as mouse. As illustrated in Figure 2.11, an

input point stream goes through three steps for trajectory refinement: force extraction,

clamping outliers, and adding details. The first step extracts a force profile from an in-

put point stream. When a force profile is given as the input, this step is skipped. The

rational for force extraction is that smoothness of a trajectory is easy to achieve with a

force profile as long as it is continuous. In the second step, the force profile is integrated

to produce a vehicle trajectory, while clamping speed and acceleration at each frame. The

last step adds details to the vehicle trajectory to produce a natural pelvis trajectory. The

three steps are performed frame by frame in an on-line manner, while referring to the in-

formation extracted during analysis. Since the example motion data are decomposed into

a motion segments which were classified into groups according to their logical similarity,

and parameterized by a motion type and the speed and acceleration at the initial frame,

we can determine the sequence of local joint configurations of a user-prescribed motion

segment immediately after first trajectory position is sampled.

Force Extraction: Without loss of generality, we use a mouse to sample a point stream.

We convert the raw-captured point stream into a force profile for smooth trajectory con-

struction. Let x0 be the vehicle position at the last frame of the previously-synthesized

motion segment. When x0 is not available (the first motion segment), the user should

sample an extra position x0. Suppose that cursor position St at time t has just been

sampled. Then, St is transformed to the force that is to be exerted to the vehicle model

31

for a character. We employ a spring-damper model to transform St to a force:

f t = α(St − xt−1) − βxt−1.

Coefficient α and β are chosen empirically.

Clamping Outliers: We now describe how to obtain a vehicle trajectory in an on-line

manner by force integration while clamping the inter-frame acceleration change, acceler-

ation, and speed. As defined in Section 2.2.1, the state of the vehicle at time t is spec-

ified by (xt, xt, θt), where xt, xt, and θt denote the position, velocity and orientation of

the vehicle. The vehicle trajectory is fully specified by a sequence of vehicle states at all

frames. Again, by the unit mass assumption, the acceleration of the vehicle at frame t is

given by xt = f t. Thus, unclamped acceleration xt at each frame is trivially obtained at

every frame t.

Suppose that we are now at frame t. The inter-frame acceleration change is clamped

and added to xt−1, using the corresponding bound that is estimated in Section 2.2.4. That

is,

xt ← xt−1 + clamp(||xt − xt−1||) · xt − xt−1

||xt − xt−1||.

The tangential and lateral components of xt are clamped using their respect bounds at

frame t and the results are added to further refine xt:

xt ←clamp(atx) ·

((0, 1, 0)T × xt−1

||xt−1||

)+clamp(at

z) ·xt−1

||xt−1||.

From xt, we derive xt and clamp it:

xt ← xt−1 + xt∆t,

xt ← clamp(||xt||) · xt/||xt||,

where ∆t is the inter-frame time increment. The position of the vehicle at time t is given

by

xt ← xt−1 + xt∆t,

and its orientation θt is computed using Equation (2.10).

Adding Details: Provided with vehicle state (xt, xt, θt), we describe how to add motion

details to vehicle position xt and orientation θt in an on-line manner to convert them to

32

a human pelvis configuration, (pt0,q

t0), where pt

0 and qt0 denote the pelvis position and

orientation, respectively. In order to do this, a motion prescription is made by combining

a user-provided motion type with vehicle speed ||x1|| and vehicle acceleration x1 at the

first frame (frame 1).

Given the motion prescription, a sequence of local joint configurations are generated

at the first frame by blending example motion segments. Therefore, the number of frames

for the prescribed motion is fixed at the first frame, say k. The motion details for the cor-

responding local poses are also blended with the system. With this information obtained,

the task of adding details is performed frame by frame.

Let (dxte, dq

tvert,q

toffset), be the motion details at frame t, 1 ≤ t ≤ k. Then, the pelvis

orientation at frame t is derived by applying dqtvert and qt

offset to vehicle orientation θt:

qt0 = dqt

vert · θt · qtoffset.

In order to compute pelvis position pt0, let

(w,v) = θt(0, dxte)

T (θt)−1,

where v is a 3D vector. Then,

pt0 = xt + v

Together with local joint configuration (q1, · · · ,qm), the refined pelvis configuration (pt0,q

t0)

fully specifies human pose, (pt0,q

t0,q

t1, · · · ,qt

m) at frame t.

Motion Blending We adapt the framework of on-line motion blending in [47,30] to gen-

erate an action at a fine-level node, given a parameter vector. Based on multidimensional

scattered data interpolation, this framework is composed of four parts: parametrization,

weight computation, time-warping, and posture blending.

In parametrization, the motion segments that are assigned to an fine-level node are

placed at the points in the parameter space defined by three numerical parameters: speed,

tangential acceleration, and lateral acceleration. Provided with the motion prescription(t(m), ft(m), ||x1||, x1 · x1

||x1|| , x1 ·

((0, 1, 0) × x1

||x1||

)), the corresponding fine-level node is

visited to compute the contribution of each motion segment stored in this node. We use

a scheme of geostatistical motion interpolation [43] instead of that of scattered data inter-

polation based on cardinal basis functions [47,30]. The latter have exhibited quite unsta-

ble behavior, since many motion segments at each fine-level node of a cyclic motion have

similar parameter vectors.

We adopt the incremental time-warping scheme in [47] to align the motion features of

the basic movements to be blended. The keytimes have been extracted in Section 2.1.4.

33

Finally, the time-warped movements are blended using the incremental posture blending

scheme [47]. The motion details for the corresponding local poses are also synthesized by

blending.

The synthesized movement is adapted to the environment in an on-line manner to pre-

vent foot sliding and penetration, employing an on-line motion retargeting scheme [54] af-

ter computing the target foot position at each frame by blending the foot positions of the

basic movements using the technique given in [47].

Motion Stitching Although our motion synthesis scheme produces a sequence of local

joint of configurations of high quality at every fine-level node, some jerkiness may be ob-

served at knots between adjacent synthesized sequences since these are blended separately.

Adapting the scheme in [24], we address this problem: the two sequences are stitched

seamlessly at the knot while propagating the error only forward for online motion synthe-

sis.This guarantees the C1 continuity for each degree of freedom.

34

3. Two-Character Motion Synthesis

In this section, we describe how to model a two-player example motion stream as a cou-

pled motion transition graph, which is a generalization of Chapter 2 in terms of both the

structure and the coverage of motions. The issues in motion modeling are three-fold: mo-

tion segmentation, motion classification, and graph construction. For motion segmenta-

tion, we apply the method presented in Chapter 2 for each individual motion. We discuss

each of the remaining issues in a different section.

3.1 Two-Character Motion Classification

Our focus is on two-player motions in standing-up martial arts such as Kickboxing, Karate

and Taekwondo. The players exchange a combination of punches and kicks while dodging

the opponent’s attacks. The punches are delivered by using the front parts of the tightly

clenched fists. The kicks are delivered by using the parts near the ankles or the knee

bones. We do not deal with grappling or throwing motions which are observed in Judo

and wrestling. The players are not allowed to attack the fallen-down opponents.

Given a collection of motion segments, S = (s1, s2, ..., sr), motion classification assigns

a label to each motion segment, si, 1 ≤ i ≤ r, which is a primitive action. Thus, motion

classification can be viewed as a mapping from S to a finite set of action labels such that

S can be classified into disjoint action groups.

To obtain a complete motion taxonomy is hard, if not impossible, even for an expert

of martial arts. Our objective of motion classification is to provide a flexible way of classi-

fying motions that can be adapted easily to various applications. For our purpose, the re-

quirements for motion classification are two-fold: controllability and accessibility. By con-

trollability, we mean that our classification scheme allows the user to specify a desired mo-

tion in an on-line manner. By accessibility, we mean that a desired example action can

be accessed efficiently. To satisfy these requirements, the motion segments in the same

action group share not only structural similarity but also functional similarity.

Our motion vocabulary consists of seven words, each of which corresponds to a motion

or a motion aspect: left-punch, right-punch, left-kick, right-kick, jump, react, and move.

The last word move is used to encode the footstep pattern of an action, which is an im-

portant aspect of a motion. The first six words are used to specify a motion for interac-

35

heightforce

time

time

(a) height profile

(b) force profile

kick-up kick-downheightforce

time

time

(a) height profile

(b) force profile

heightforce

time

time

(a) height profile

(b) force profile

kick-up kick-down

Figure 3.1: Jump kick motion

Table 3.1: Action repertoire tablemotion aspects action qualifiers

left-kick up, down, downup, others

right-kick up, down, downup, others

left-punch fwd, bwd, bwdfwd, fwdbwd, others

right-punch fwd, bwd, bwdfwd, fwdbwd, others

jump up, down, others

react (to) kick, punch, others

move L-L, L-R, L-F, R-L, R-R, R-F, F-L, F-R, F-F

tive applications. The choice of a vocabulary determines the level of user-control. There

also are composite motions such as kicking while jumping and kicking while punching.

In such a case, more than one words are needed to describe a motion. A motion is rep-

resented as a sequence of motion phases, each of which corresponds to a primitive action

(or a motion segment). To characterize an action, we use a set of additional qualifiers for

each word as shown in Table 3.1.

In order to annotate the motion segments, in principle, we follow the semi-automatic

approach suggested by Arikan et al. [5]. Based on a support vector machine (SVM), they

were able to successfully annotate motion streams of American football. Our approach

is different from theirs in three aspects. First, our approach is complimented by rule-

based classification. In particular, rule-based classification [44, 34] is adopted for the last

36

right-kick up down up downleft-kick up down

right-punch fwd bwdleft-punch

jump up down

jump kick multiple kicks followed by a punch

Figure 3.2: Action annotations: All empty boxes denote the “others” class.

two motion aspects in Table 3.1. Second, we employ multi-class support vector machine

(MSVM) classifiers [18] rather than binary SVM classifiers to effectively handle multi-

ple, mutually-exclusive action qualifiers within each motion aspect. Finally, we annotate

motion streams at the granularity of actions rather than poses, to effectively synthesize

motions while skipping time-consuming optimization. An action is annotated with a label,

which is 7-dimensional vector l = (l1, l2, ..., l7) in our case. Each element of the vector, li,

1 ≤ i ≤ 7 corresponds to an action qualifier for a motion aspect, and takes on an integer

value, that is, li ∈ {1, 2, ..., ni}, where ni is the number of qualifiers in motion aspect i.

We handle the first five motion aspects in Table 3.1 with MSVM-based classification and

the last two with rule-based classification.

3.1.1 MSVM-based Classification

MSVM classifiers evolve from binary classifiers based on SVMs, and thus both types of

classifiers share similar user interface. Given a sequence of motion segments, the user an-

notates manually a small portion of motion segments as training data, and then an MSVM

classifier automatically takes care of the rest of the segments. To apply an MSVM classi-

fier, a feature vector is extracted from each motion segment. For every motion segment, a

set of N poses of a player is sampled at equally-spaced time instances including the start

and end times of the motion segment to form a feature vector. In our experiments, N=10

works well. We use a public domain library LIBSVM [13] for semi-automatic motion an-

notation.

For motion classification, a human pose is represented by vector, Q = (p0, q0,p1, ...,pm),

where p0 and q0 are the root position and orientation, respectively, and pi, 1 ≤ i ≤ m is

the position of joint i. The root position and orientation of a sampled pose is given by its

horizontal translation and vertical rotation with respect to the root position and orienta-

tion at the previous sample. We choose the pelvis as the root. Joint position pi, 1 ≤ i ≤ m

is specified in the local coordinate frame with respect to the root at the current sample.

37

Our human model has 17 joints, that is, m = 17.

Kick classifiers: Our segmentation scheme divides a kicking motion into two consecu-

tive actions, “kick-up” and “kick-down”, as shown in Figure 3.1. Two consecutive kicks

consist of three parts. The first and last parts are kick-up and kick-down, respectively.

The middle part is an action “kick-downup”. Multiple consecutive kicking motions are

best described with regular expressions:

(up)(downup)∗(down),

where * denotes zero or more instances of the action in the parenthesis. We annotate

motion segments that are not any of primitive actions for the kicking motions as “others”.

Punch classifiers: Our segmentation scheme shows different behaviors depending on

whether body weight is exerted on the punch or not. In the former case, the character

utilizes the body rotation as well as weight shift to maximize the speed of the end-effector.

In this case, a punch consists of two consecutive motion segments, one with body rota-

tion coincident with arm rotation, and one with body rotation reversed for retraction. We

annotate those two motion segments as “fwd” and “bwd”, respectively. Two consecutive

punches consist of three parts: “fwd”, “bwdfwd” and “bwd” analogously to kicking mo-

tions. In the latter case, a faster but weaker attack is made by straightening and retract-

ing an arm. We annotate such actions as “fwdbwd”. Combining these facts, punches can

be described using the following regular expressions.

(fwd)(bwdfwd)∗(bwd) ∨ (fwdbwd)

Jump classifier: A jumping motion consists of two actions, that is, “jump-up” and

“jump-down” each of which has an interaction with the ground: one for jumping and one

for landing. The rest of the motion segments are annotated with “others”.

3.1.2 Rule-based Classification

The remaining two motion aspects, react and move are hard to annotate with our MSVM.

The feature vector of a motion segment does not contain any information on the corre-

sponding action of the opponent. Thus the action qualifiers in the react motion aspect

are difficult to discriminate. With the feature vectors alone, our MSVM has often got

confused to identify the footstep patterns. We develop simple classification rules to cope

with these difficulties.

38

React classifier: A player blocks the opponent’s attack or counter-attacks the oppo-

nent. Failing to block the opponent’s attack allows a hit by the opponent, resulting in

balance loss. An action is regarded as a reaction if the opponent’s corresponding action

is an attack. An attacking action has one of the first four motion aspects in Table 3.1.

Thus, we have three action qualifiers: “kick,” “punch,” and “others.” Considering the op-

ponent’s actions, we use the following straight forward rules for the react motion aspect:

react-kick =o.kick-up ∨ o.kick-downup,

react-punch =¬react-kick∧

(o.punch-fwd ∨ o.punch-bwdfwd ∨ o.punch-fwdbwd) ,

others =¬react-kick ∧ ¬react-punch.

where prefix o denotes the opponent. The actions of the opponent only include those that

actually make the opponent’s fists and feet move toward the player. Kicking has priority

over punching since the former conveys more impact in general.

Move classifier: The move classifier is used to identify the footstep pattern of an action,

in particular, the first and last foot stances for smooth action stitching. If a foot touches

the ground at the end of the current action, the same foot should touch the ground at

the beginning of the next action. Inspired by Kwon and Shin [34], we encodes an action

using symbols, L, R, and F denoting the left supporting foot, the right supporting foot,

and no supporting foot (a flight phase), respectively. We remove the double foot stance

by identifying it as L or R, whichever is the closer neighbor of this foot stance in the

example motion stream. The resulting footstep patterns are: L-L, L-R, L-F , R-L, R-R,

R-F , F -L, F -R and F -F . The initial and final symbols of each string represent the first

and last foot stances of the corresponding action, and symbol “-” in the middle means

that the foot stances in the middle, if any, are ignored.

3.2 Interaction Modeling

In this section, we describe how to capture the interactions between two players from

an example two-player motion stream. More specifically, provided with a coupled motion

transition graph, we explain how to train a motion transition model, considering the in-

teractions between the two players. We employ a dynamic Bayesian network for this pur-

pose, assuming that the example motion stream has been transformed to the coupled mo-

tion transition graph.

39

(a)

(b)

s1i-1 s1

i s1i+1

s2j-1 s2

j s2j+1

s1i-1 s1

i s1i+1

s2j-1 s2

j s2j+1

s1i-1 s1

i s1i+1

s2j-1 s2

j s2j+1

s1i-1 s1

i s1i+1

s2j-1 s2

j s2j+1

Example motion stream Linked List

Figure 3.3: Basic data structure B

edges cross edges linksB

G1 G2C

Figure 3.4: A node of G1 (or G2) has a set of links pointing to the actions in B

Coupled Motion Transition Graph We represent a two-player example motion stream

as a coupled motion transition graph denoted by C = (B,G1,G2,R,C), where B, G1,

G2, R, and C are a basic data structure, a motion transition graph for player 1, that for

player 2, an action table, and a set of cross edges, respectively. In what follows, we will

define each of these in sequence.

A single-character motion stream can be regarded as a sequence of actions stitched

along the time axis, as illustrated in Figure 3.3. The two players perform their own se-

quences of actions in a rather asynchronous manner. In other words, actions performed

by a player are partially overlapped in time with those by the other. We adopt a linked

list for our basic data structure B to capture this relationship. The node representing an

40

action of a player in general has two links pointing to the next action of the same player

and the other player’s action (reaction), respectively. Our motion transition graph is con-

structed on top of this data structure.

Gi = (Vi,Ei), i = 1, 2 is the motion transition graph for each player i, in which Vi

and Ei denote the node set and the edge set, respectively. A node vj in Vi represents

a group of example actions which share an identical label (annotation vector) lj . A pair

of nodes vj and vk in Vi are connected by directed edge ejk in Ei, if the action group,

denoted by vj , is followed by an action group, denoted by vk, with a transition probability

that is significantly large (see Section 3.2 for transition probability estimation). As shown

in Figure 3.4, a node of a graph has a set of pointers to the nodes in B, each of which

provides a link to a member of the corresponding action group.

The action table R is used to store Table 3.1 together with the links to individual

motion transition graphs, G1 and G2. Every action qualifier of a motion aspect points to

a set of nodes (action groups) of Gi, i = 1, 2, that share the same motion aspect and also

the same action qualifier. This table allows the user to prescribe a motion in an intuitive

manner.

Finally, the set C consists of all cross edges combining two single-player motion tran-

sition graphs, G1 and G2 as shown in Figure 3.4. A pair of nodes vj and vk in different

graphs G1 and G2 are connected by a cross edge in C if one is followed by the other with

a significantly large transition probability (see Section 3.2).

Dynamic Bayesian Network A Bayesian network is a useful model for representing

the causal relationships among random variables as demonstrated in various applications

such as image processing and speech/gesture recognition and synthesis [45]. As illus-

trated in Figure 3.3, the action of a player is determined by his/her own current action

together with that of the other player. Let Aci be the ith action of player c, c = 1, 2 for

i = 1, 2, · · · , n. Suppose that players 1 and 2 have been performing actions A1i and A2

j ,

respectively, and that player 1 changes his/her action before player 2 does. It is natural to

consider that a player gathers the information on the opponent via observations, and then

fuses this information with what the player possesses or self-observes on himself/herself, in

order to finally determine the player’s next action. Thus, the next action Aii+1 of player

1 depends on the current actions of players 1 and 2, that is, (A1i ,A

2j ). The next action

A2j+1 for player 2 can be chosen in a symmetrical manner.

As shown in Figure 3.5, these two types of motion transitions give the temporal or-

der that specifies the non-reversible direction of causal relationship among the actions of

the two players. We represent a two-character motion stream as a series of such transition

41

transition to transition to

(a) Building blocks (b) Examples

Figure 3.5: Bayesian network

player 1

player 22COM

( )21 COMCOM2

1 +

1COM

O

2A j

11A +i1A i

Figure 3.6: Reference coordinate frame

42

patterns, which can be best illustrated with a dynamic Bayesian network. Each transition

type gives rise to a building block of the Bayesian network. The building blocks (see Fig-

ure 3.5(a)) are combined dynamically to yield a variety of network structures, with which

we capture dynamic, stationary, and asynchronous characteristics of two-character motion

streams (see Figure 3.5(b)).

Formulated as a motion synthesis problem, our goal is to estimate conditional proba-

bility P (A1i+1|A1

i ,A2j) for predicting A1

i+1. In order to estimate the conditional probabil-

ity, we first parameterize an action, exploiting its characteristic features, in particular, the

action label (type) l = (l1, l2, ..., ln) obtained in Section 3.1 and the first and last poses

of the action, denoted by Qs and Qe, respectively. More specifically, the parameters of

the action consists of two vectors, (l,n), where l is the action label as mentioned above,

and n is derived from poses Qs and Qe by applying principal component analysis (PCA).

Notice that each element of l takes on a discrete value, while that of n has a continuous

value and that pose Q is specified by vector (p0, q0,p1, ...,pm).

We represent both Qs and Qe in a coordinate-invariant manner. For the root position

and orientation (p0 and q0), we build a local coordinate frame: The origin of the coordi-

nate frame is set to the average of the COMs of the players that are projected onto the

ground (see Figure 3.6). The vector that points from the COM of the player to perform

the next action toward the COM of the opponent is chosen as the z axis, the vertical

up-direction as the y axis, and the x axis as the cross product of the former two axes.

This coordinate system is convenient to abstract the geometric setting for the players. p0

and q0 are set to the horizontal translation of the root joint and its vertical rotation with

respect to the origin, respectively.

Joint positions pi, 1 ≤ i ≤ m are given in yet another local coordinate frame with

respect to the root joint of each player to represent their displacements from the root.

Let Q be the pose Q excluding the root position and orientation. We apply PCA to Qs

and Qe to extract two 3-dimensional vectors, ps and pe, respectively. These cover about

60% of the variations of data in our experiments. Finally, the 12-dimensional parameter

vector n of data is obtained: n = ((ps0, q

s0, p

s), (pe0, q

e0, p

e)). For notational convenience,

we use action A and its parameters (l,n), interchangeably, that is, A = (l,n).

Now, we factorize the conditional probability of the next action A1i+1 into three terms

by rearranging the discrete and continuous parameters as follows :

P (A1i+1|A1

i ,A2j ) = P (l1i+1,n

1i+1|l1i ,n1

i , l2j ,n

2j )

≈ γ · P (l1i+1|l1i , l2j ) × P (n1i ,n

2j |l1i+1) × P (n1

i+1|l1i+1,n1i ,n

2j ) (3.1)

for some positive constant γ. Before showing detailed derivation, we first briefly introduce

43

the idea of Equation (3.1).

For motion synthesis (Section 3.4), Equation (3.1) is used to estimate the conditional

probabilities of the candidates for the next action. Intuitively, the first term (excluding γ)

encodes a logical relationship between actions, that is, how the label of the next action of

a player is likely to be chosen based on the current action labels of the players. This term

is estimated using multinomial distributions (Section 3.3 A). The second term reflects the

geometric setting of the players for the next action; for instance, the two characters are

more likely to be distant if a kick action is chosen rather than a punch. We compute this

term, assuming that the numerical parameters in n1i and n2

j are from Gaussian distribu-

tions (Section 3.3 B). The third term represents the relationship between numerical pa-

rameters, given a candidate for the next action label, which guards against motion discon-

tinuity. Through a preliminary experiment, we have observed that each of scalar elements

of n1i+1 is non-linearly related to n1

i and n2j . We employ Gaussian processes for non-linear

regression to capture these dependencies (Section 3.3 C).

Transition probability computation We explain how to derive Equation (3.1) for

player 1. The counterpart for player 2 can be obtained in a symmetrical manner. As the

action A1i+1 has both discrete and continuous parameters, we decompose the probability

into two parts by the conditional chain rule P (X,Y |M) = P (X|M) × P (Y |X,M):

P (A1i+1|A1

i ,A2j) = P (l1i+1,n

1i+1|l1i ,n1

i , l2j ,n

2j )

= P (l1i+1|l1i ,n1i , l

2j ,n

2j ) × P (n1

i+1|l1i+1, l1i ,n

1i , l

2j ,n

2j ). (3.2)

We handle each of the conditional probabilities separately. First, let us consider the

discrete part. By applying the conditional Bayes rule, P (Y |X,M) = P (X|Y,M)×P (Y |M)P (X|M) , we

have

P (l1i+1|l1i ,n1i , l

2j ,n

2j ) =

P (l1i+1|l1i , l2j ) × P (n1i ,n

2j |l1i , l2j , l1i+1)

P (n1i ,n

2j |l1i , l2j )

. (3.3)

The denominator in the righthand side of Equation (3.3) is regarded as a positive con-

stant 1/γ since it has already been fixed when action A1i+1 is to be chosen. The second

term of the numerator is approximated to P (n1i ,n

2j |l1i+1) to avoid the bias due to sparse

data. A justification of this approximation is that l1i+1 is affected by both l1i and l2j to

reflect these. Thus, the discrete part of Equation (3.2) is approximated as follows:

P (l1i+1|l1i ,n1i , l

2j ,n

2j ) ≈ γ · P (l1i+1|l1i , l2j ) × P (n1

i ,n2j |l1i+1), (3.4)

44

By a similar argument, we approximate the continuous part to

P (n1i+1|l1i+1, l

1i ,n

1i , l

2j ,n

2j ) ≈ P (n1

i+1|l1i+1,n1i ,n

2j ), (3.5)

by dropping l1i and l2j . Combining Equations (3.4) and (3.5), we obtain Equation (3.1).

Estimation In the current section, we describe how to estimate the conditional proba-

bilities on the righthand side (the second line) of Equation (3.1).

In order to learn each term, we collect training data from the example motion stream.

More specifically, we assemble the training data set while traversing basic data structure

B (see Figure 3.3). For player 1, the training set is D1 = {(A1h,A2

g,A1h+1) for all h and

g}, where (A1h,A2

g,A1h+1) represents an action transition from A1

h to A1h+1 for player 1,

when player 2 is performing action A2g. The training set D2 for player 2 is symmetrically

formed. Remember that A = (l,n) for an action A as explained in Section 3.2.

For the first term, we could estimate conditional probability P (l1i+1|l1i , l2j) directly by

scanning D1. However, this would yield unreliable, overfitting probabilities because of

sparse training data. Instead, we approximate this probability by adopting the idea from

factoring techniques in [12,59] as follow:

P (l1i+1|l1i , l2j ) ∝ P (l1i+1|l1i ) × P (l1i+1|l2j ). (3.6)

Probabilities P (l1i+1|l1i ) and P (l1i+1|l2j ) are further factored as followes:

P (l1i+1|l1i ) ∝7∏

m=1

7∏k=1

P ((lk)1i+1 | (lm)1i ),

P (l1i+1|l2j ) ∝7∏

m=1

7∏k=1

P ((lk)1i+1 | (lm)2j ). (3.7)

where (lk)1i+1, 1 ≤ k ≤ 7 denotes the kth element of action label (l)1i+1. (lm)1i and (lm)2j ,

1 ≤ m ≤ 7 are interpreted similarly.

We estimate probabilities P ((lk)1i+1|(lm)1i ) and P ((lk)1i+1|(lm)2j ), with multinomial dis-

tributions based on a Dirichlet prior (see Section 3.3 A.). The resulting probabilities P (l1i+1|l1i )and P (l1i+1|l2j ) should be normalized [12].

We precompute the probabilities P (l1i+1|l1i , l2j ) for all tuples (l1i+1, l1i , l2j ) extracted from

D1 and store these probabilities as a table in the node of the player 1’s motion transition

graph with action label l1i .

45

For the second term P (n1i ,n

2j |l1i+1), we factor the covariance matrix for a Gaussian

distribution employing the method in [22], again to avoid the bias due to sparse data.

More specifically, a full covariance matrix is decomposed into a decorrelating transform

matrix and a diagonal covariance matrix. The decorrelating matrix is shared by all the

nodes in a graph, whilst each node maintains its own diagonal covariance matrix. The

probability density function of each node is modeled as a Gaussian distribution function.

The parameters of the function are stored in the node of the motion transition graph with

action label l1i+1. We estimate P (n1i ,n

2j |l1i+1) based on this information (see Section3.3 B.).

For the third term, we adopt Gaussian process regression (see Section3.3 C.). Specif-

ically, we compute the conditional Gaussian distribution for each scalar element of n1i+1

that has not been fixed. Notice that n1i+1 = ((ps

0, qs0, p

s), (pe0, q

e0, p

e)) as defined in Sec-

tion 3.2. Since (ps0, q

s0, p

s) is given directly from the previous action of player 1, regression

is performed only for the (pe0, q

e0, p

e). We use a public-domain library TPros to estimate

the conditional probability distributions [23]. Again, we store the estimated hyperparam-

eters together with the mean and the inverse covariance matrix of each training data set

at the node with l1i+1.

3.3 Background on Underlying Statistical Models

A. Multinomial distributions The multinomial distribution is a generalization of the

binomial distribution, where each trial results in one of k possible outcomes, with proba-

bilities ΘM = (θ1, θ2, · · · , θk) where∑

i θi = 1 and θi ≥ 0 for all i. Of interest is predicting

each outcome: Given a training set {xn} that consists of the outcomes of N independent

draws x1, x2, · · · , xN from an unknown multinomial distribution, we estimate probability

of the next outcome x∗. The Bayesian estimate of this probability is

P (x∗|{xn}) =∫

P (x∗|ΘM)P (ΘM|{xn})dΘM. (3.8)

Under the uniform prior assumption the Bayesian estimate is reduced to

P (XN+1 = i|{xn}) =Ni + 1∑j (Nj + 1)

, (3.9)

where Ni is the number of observations of outcome i in the training set. This is a

special case of the Dirichlet prior for the multinomial distributions where all hyperparam-

eters are equally set to one [17].

46

We show how to estimate P ((lk)1i+1|(lm)1i ) given D1 = {(A1h,A2

g,A1h+1) for all h and g}

(see Equation (3.7)). P ((lk)1i+1|(lm)2j ) can be estimated in a similar manner. We extract

{xn} from D1 by ignoring A2g for all g and numerical parameters n1

h and n1h+1 for all h.

More precisely, we first contruct DM = {(a, b)|a = ((lm)1h), (a, b) = ((lm)1h, (lk)1h+1) for all h}and then obtain {xn} = {b|(a, b) ∈ DM}. Setting x∗ = (lk)1i+1, we compute the Bayesian

estimate of P (x∗|{xn}) for P ((lk)1i+1|(lm)1i ).

B. Semi-tied Covariance Matrices A Gaussian density function is defined as follows:

p(x|µ,Σ) =1

(2π)N/2|Σ|1/2exp

(−1

2(x − µ)T Σ−1(x − µ)

)(3.10)

or shortly, x ∼ N (µ,Σ). To reduce the number of model parameters, a full covariance

matrix Σis decomposed into a decorrelating transform matrix H and a diagonal covariance

matrix Σdiag:

Σ(m) = HΣ(m)diagH

T (3.11)

The decorrelating matrix H is shared by all Gaussian distributions, whilst each Gaussian

distribution maintains its own diagonal covariance matrix. Let the set of all training data

be {{x(m)

n }}

={{x(1)

1 , · · · ,x(1)N1

}, · · · , {x(M)1 , · · · ,x(M)

NM}}

,

where x(j)i , 1 ≤ i ≤ Nj , 1 ≤ j ≤ M is sampled from j-th Gaussian distribution, and Nj

is the number of samples that belong to the j-th Gaussian distribution. The maximum-

likelihood (ML) estimate of model parameters Θ =(H,

{Σ(m)

diag, µ(m)

})is

ΘML = arg maxΘ

log p({

{x(m)n }

}|Θ

). (3.12)

ΘML can be optained using an iterative algorithm based on an expectation-maximization

(EM) approach [22]. The Gaussian distribution of a new sample x(m)Nm+1 is given by

x(m)Nm+1 ∼ N (µ(m),HΣ(m)

diagHT ). (3.13)

We show how to estimate P (n1i ,n

2j |l1i+1). Let {x(m)

n } ={(n1

h,n2g)|l1h+1 = l1,m for all h and g

},

where l1,m is an action label m for character 1. We collect a set of all training data

DG ={{x(m)

n }}

, and then estimate ΘML employing Equation (3.12). Given next ac-

tion l1i+1, we set x(m)Nm+1 = (n1

i ,n2j ) to obtain conditional probability P(n1

i ,n2j |l1i+1) with

Equation (3.13).

47

C. Gaussian process regression A Gaussian process is a collection of random vari-

ables, t = (t(x1), t(x2), · · · , t(xN )) that have a joint Gaussian distribution

P (t|{xn}) =1Z

exp(−1

2(t − µ)T C−1(t − µ)

)(3.14)

or t|{xn} ∼ N (µ,C) for any finite collection of input data {xn} = (x1,x2, · · · ,xN ),

where Z is a normalizing constant, µ is the mean of t, and C is the covariance matrix

defined by a parameterized covariance function [23]. The covariance function that we use

is

f(xi,xj ; ΘP ) = θ1 exp

−12

L∑l=1

(x

(l)i − x

(l)j

)2

r2l

+ θ2 + θ3δij , (3.15)

where xi,xj ∈ {xn}, x(l)i , 1 ≤ l ≤ L is the lth element of vector xi, ΘP = (θ1, θ2, θ3, r1, · · · , rL)

are hyperparameters, and δij is a Kronecker delta function. Let Cij be element (i, j) of

the covariance matrix C. Then, Cij = f(xi,xj ; ΘP ) for all i and j.

Provided with a training data set DP = {(xn, t(xn))}, our objective is to predict

t(xN+1) at new point xN+1. A straightforward derivation shows that

t(xN+1)|D,xN+1 ∼ N (kT C−1t, κ − kT C−1k). (3.16)

where

k = (f(x1,xN+1; ΘP ), f(x2,xN+1; ΘP ), · · · , f(xN ,xN+1; ΘP ))T

and

κ = f(xN+1,xN+1, ΘP ).

Thus, the most probable prediction of t(xN+1) is kT C−1t. As C−1 is computed in the

training stage, the prediction can be done efficiently.

The prediction accuracy highly depends on the choice of hyperparamters ΘP . The

maximum a posterior(MAP) estimate is

ΘMAP = arg maxΘP

p(t|{xn}, ΘP )p(ΘP |{xn}). (3.17)

which approximates the Bayesian estimate. For efficiency we adopt the MAP estima-

tion as it has an analytical solution.

We apply a Gaussian process to estimate the probability distribution for each scalar

element of (pe0, q

e0, p

e) where n1i+1 = ((ps

0, qs0, p

s), (pe0, q

e0, p

e)). Let (nk)1i+1 be the kth

48

scalar element of n1i+1, 7 ≤ k ≤ 12. Notice that (ps

0, qs0, p

s) is given from A1i . We set

xh = (n1h,n2

m) and t(xh) = (nk)1h+1 for every cross edge from a node with action label

l2m to that with l1h+1 such that l1h+1 = l1i+1 to obtain DP = {(xn, t(xn))}. We also set

xN+1 = (n1i ,n

2j ) letting |DP | = N . Given xN+1 and DP , we employ Equation (3.16) to

estimate the probability distribution and t(xN+1), that is, (nk)1i+1.

3.4 Motion Coupling and Postprocessing

In this section, we describe how to synthesize a two-character motion. For motion syn-

thesis, we prefer the term “characters” to “players” since we deal with virtual characters

unlike in motion analysis. A pair of characters exchange actions and reactions via cross

edges while traversing their respective single-player motion transition graphs. One of the

characters may be designated as an avatar. In this case, the character is under the control

of the user while still communicating with the other via the cross edges. We first describe

the basic model for two-character motion synthesis, assuming that neither of them is un-

der the user’s control. We then extend the basic model to incorporate the user’s control.

Basic Model Let us revisit Figure 3.5 to set up a situation for motion transition. Sup-

pose that characters 1 and 2 perform action A1i (the ith action of character 1) and A2

j

(the jth action of character 2), respectively. Without loss of generality, action A1i is com-

pleted before action A2j . Then character 1 determines the next action A1

i+1 to perform,

guided by the motion transition model that has been trained in Section 3.2. Given current

actions A1i and A2

j , a node of the motion transition graph for character 1 is chosen among

the candidate nodes that are incident from the node with action label l1i and also from

the node with action label l2j via a cross edge, to determine A1i+1 = (l1i+1,n

1i+1). There

are two plausible alternatives to choose the next action: choosing the one with the high-

est estimated probability or randomly choosing a node among the candidates according to

their estimated probabilities. We use the latter alternative to generate a motion stream

with greater variety.

Let l1,ki+1, k = 1, 2, · · · be the action labels of the candidates as shown in Figure 3.7.

Then, we compute the conditional probabilities P (A1,ki+1|A1

i ,A2j ) for all k using Equation

(3.1), where A1,ki+1 = (l1,k

i+1,n1,ki+1), while estimating parameters n1,k

i+1 based on Gaussian pro-

cesses. Given the conditional probabilities, the next action A1i+1 = (l1i+1,n

1i+1) with the

estimated numerical parameters is determined at random according to the probabilities.

In order to use an example action as the next action, the node with l1i+1 in the motion

49

candidates

Figure 3.7: The action labels of the candidates

transition graph is first accessed, and then the example action with the numerical param-

eters that are most similar to n1i+1 in a Euclidean sense is chosen among the actions be-

longing to the action group represented by this node. These actions can be accessed via

the links, which is stored in the node, to the example actions in the basic data structure

B.

Motion Coupling: Provided with the newly-generated action A1i+1 of character 1 to-

gether with its immediate predecessor A1i and the on-going action A2

j of character 2, we

deal with two issues: how to stitch A1i and A1

i+1 seamlessly while preventing motion ar-

tifacts such as foot sliding and penetration and how to couple the two actions, A1i+1 and

A2j of the two different characters. Since the first issue has been addressed rather well

during the last decade, we only deal with the second issue. For the first issue, we refer

the readers to the work in [24, 33, 53]. In motion coupling, we address two problems as

follows: how to make the two characters face each other and how to synchronize the in-

teraction moments between two characters, such as hitting the face and punching on the

body.

By using parameters n1i+1 determined for the next action A1

i+1 from n1i and n2

j , the

two characters face each other reasonably well in the resulting animation even without

explicit motion coupling. Our concern is how to compensate for small errors caused by

sparse example motions. We settle this problem in a simple manner by adjusting the esti-

mated parameters n1i+1 = ((ps

0, qs0, p

s), (pe0, q

e0, p

e)). In particular, the root position pe0 and

orientation qe0 at the last frame of the next action are modified to those values on the half

way from the chosen parameter values toward their respective estimated ones (see Section

3.4). The rationale for this heuristic is to reflect the estimated root position and orien-

tation, while avoiding jerkiness by large changes. This simple heuristic works well in our

experiments.

For the second problem, the interaction moments are regarded as the keytimes to apply

50

timewarping for action alignment. By the way in which the example motion stream is

segmented, the interaction moments can be identified easily by examining the contact force

profiles of actions.

Interactive Model In this section, we extend our basic model to allow user interactions.

One of the characters is designated as an avatar controlled by the user to support on-

line applications. In order to control the avatar, the user supplies a stream of motion

specifications in an on-line manner, which is stored in a queue. Upon finishing the current

action, the avatar dequeues motion specifications one by one to derive the next actions.

The attack motions such as kicking and punching are prescribed explicitly, while the

non-attack motions are invoked implicitly whenever the queue is empty. In the former

case, the user is allowed to specify an attack motion using a motion prescription such as

”kick” or ”punch” or more specifically with a motion aspect such as ”left-kick”, ”right-

kick”, ”left-punch”, or ”right-punch”. Accordingly, the avatar refers to the action table

R to search for the candidates for each action induced by the motion prescription, and

choose one of them guided by the learned transition model described in Section 3.2. For

instance, “kick-up” or “kick-downup” actions are searched when a ”kick” motion is spec-

ified. While searching for such an action, the avatar may perform other actions than the

specified motion in response to the opponent’s actions. When the queue is empty, the

avatar follows the mouse pointer driven by the user in an on-line manner. With the cur-

sor, the user explicitly specifies the horizontal root translation, which is used to update

the corresponding element pe in the numerical parameters n1i+1 of the next action. The

remaining elements of n1i+1 are estimated in the same way as in the basic model.

Postprocessing By the way in which we choose actions, the two characters may not be

perfectly positioned to perform their respective actions, which could result in unexpected

collisions at times. To handle such collisions, we adopted a data-driven physical simulator

based on a PD-servo [62] although this is not explicitly stated. A public domain library

PhysX [2] was used to implement the simulator. We leave as a future research issue how to

determine the ideal position of each character, which is free from the unexpected collisions,

while avoiding motion discontinuity or jerkiness.

51

4. Results

We performed experiments on an athlon PC (AMD athlon 64 2GHz processor and 2GB

memory). Our human-like articulated figure had 54 DOFs (degrees of freedom): 6DOFs

for the pelvis, 6DOFs for the spine, 9DOFs for each limb, 3DOFs for the neck, and 3DOFs

for the head. The sampling rate of the example motion data were 30Hz or 60Hz. We first

show results for motion modeling and then those for motion synthesis.

4.1 Motion Modeling

To demonstrate the effectiveness of our motion modeling scheme, we performed experi-

ments on locomotive motions and martial arts motions summarized in Table 4.1. We cap-

tured two locomotive motion sequences, referred to as Locomotions A and B. Locomotion

A is sampled at a rate of 30Hz and 3.2 minutes long. Locomotion B is at 60Hz and 7.4

minutes long. In capturing these motion sequences, the motion performer was allowed to

make any desired combination of locomotive motions while varying speed and turning an-

gle. Another experiments were conducted for a stream of two-player kickboxing motions

and that of Taekwondo motions. The former is 16 minutes long and sampled at 60Hz,

and the latter 9.8 minute long and at 30Hz. All motion streams were obtained by con-

catenating motion clips, each of which is slightly less than one and a half minutes long.

We applied our motion labeling schemes to those motion sequences and constructed their

corresponding motion transition graphs. Since a pair of consecutive motion segments may

be from different motion clips, special case is needed in building edges (see Section 2.1.5).

In order to validate our motion segmentation scheme, we compare the results to those

obtained from manual segmentation. As summarized in Table 4.2, the differences of the

segmentation frames of our scheme from the corresponding frames of manual segmentation

are within one frame for locomotions. However, for martial arts motions, the maximum

error is large at ten frames because of outliers: It is very difficult to define an optimal

segmentation even for a human expert since there is no sharp boundary between each

step in martial arts motions. However, excluding such outliers, segmentation moments are

highly collocated even for martial arts motions as shown in Figure 4.1.

52

Table 4.1: Example motion datasetloco. A loco. B Kickboxing Taekwondo

length 3.2 min 7.4 min 16 min 9.8 min

# of motion clips 3 7 11 8

sampling rates 30Hz 60Hz 60Hz 30Hz

# of characters 1 1 2 2

Table 4.2: Results on motion segmentation in comparison to manual segmentation

loco. A loco. B kickboxing Taekwondo

# of motion segments 312 814 1445 901

manual segmentation 312 814 1433 923

mean error 0.46 frame 0.38 frame 1.94 frame 1.58 frame

std.dev. error 0.64 frame 0.6 frame 2.79 frame 2.11 frame

minimum error 0 frame 0 frame 0 frame 0 frame

maximum error 2 frame 2 frame 13 frames 10 frames

percentage of outliers 0 % 0% 14.7% 9.3%

(a) walk

(b) run

(c) Taekwondo

(d) kickboxing

automatic segmentation

manual segmentation

Figure 4.1: Results on motion segmentation.

53

Table 4.3 summarizes the results of motion modeling together with the timing data

for each task of motion modeling. For locomotion A, we obtained 338 motion segments,

which were classified into 15 groups as shown in 4.4. Every motion segment was mapped

onto one of the groups listed in Table 2.2. However, not every group had a motion seg-

ment assigned to it since some actions did not appear in the example motion sequence.

Thus, the resulting motion transition graph missed two nodes, in particular, the SLDRF

and FLDRS nodes. For locomotion B, we obtained similar results. Unlike Locomotion A,

all groups have one or more motion segments assigned to them. In Locomotion B, we

observed that the number of walk-to-stand transitions is significantly smaller than that

of stand-to-walk, since many motion clips that were concatenated in Locomotion B were

initiated by a standing motion and ended by a walking motion. It takes less than one

minute to obtain the motion transition graph from an unlabeled locomotion sequence. For

Kickboxing motion, we obtained 1445 and 1310 motion segments respectively for each in-

dividual motion, which were classified into 110 and 96 groups. For Taekwondo motion,

98 and 114 motion segments were classified into 98 and 114 groups, respectively for each

individual motion. Table 4.5 summarizes classification results for two-fighter motions. It

took four hours and two hours to classify the kickboxing and Taekwondo motion streams,

respectively, including manual interaction times. We used about 10% of the motion seg-

ments as training data for both motion streams.

Table 4.3: Results on motion labeling

loco. A loco. BKickboxing Taekwondo

player1 player2 player1 player2

segmentation 0.5s 1.7s 3.3s 4.2s 1.4s 1.5s

graph construction 0.3s 1.3s 2.4s 2.1s 0.4s 0.8s

interaction modeling - - 1252s 1833s 603s 451s

# of motion segments 338 838 1445 1310 901 791

# of action groups 15 17 110 96 98 114

avg # of edges 1 1.27 1.41 45.2 40.3 27.4 32.6

avg # of cross edges 2 - - 70.4 76.1 42.3 54.7

1average # of outgoing edges per node.2average # of outgoing cross edges per node.

54

Table 4.4: Classification results for locomotions

strings stringscounts countsLoco.A Loco.ALoco.B Loco.B

LDRRDLFLFFRF

SLDRFRDLFFRDLFLDR

LDRSRDLSSRDLSLDR

SLDRFSRDLFFRDLSFLDRS

total

89 18790 19448 18551 18520 352 102 41 23 2

2 39 23 79 90 23 26 40 5

338 838

strings stringscounts countsLoco.A Loco.ALoco.B Loco.B

LDRRDLFLFFRF

SLDRFRDLFFRDLFLDR

LDRSRDLSSRDLSLDR

SLDRFSRDLFFRDLSFLDRS

total

89 18790 19448 18551 18520 352 102 41 23 2

2 39 23 79 90 23 26 40 5

338 838

Table 4.5: Classification results for two-fighter motions# of actions 3

motion Kickboxing Taekwondo

vocabulary player1 player2 player1 player2

left-punch 126 89 9 11

right-punch 114 112 35 35

left-kick 29 38 21 45

right-kick 98 111 86 56

jump 0 0 13 38

react (to) 203 168 77 73

4.2 Motion Synthesis

In order to visually validate our motion modeling scheme, we first performed experiments

for locomotive motion synthesis. We perform trajectory refinement and synthesize loco-

motive motions with these trajectories.

In the first experiments, we show how final trajectories are different from their initial

versions. As shown in Figure 4.2, two parametric curves (including a straight line) are

sampled to produce the corresponding point streams to be used as the input. Both the

input and output curves are visualized side by side.

The next experiments are performed for data streams sampled with two types of input

devices, an analog joystick and a mouse (See Figure 4.3). For the mouse, time-varying

cursor positions are sampled in an on-line manner to derive force profiles. For the joystick,

force profiles are obtained from stick positions and directions. Motion types are specified

by a keyboard or buttons.3# of actions does not include actions with qualifier “others”.

55

(a) straight walking (b) curved walking(a) straight walking (b) curved walking

Figure 4.2: Trajectory refinement.

Figure 4.3: On-line motion synthesis.

56

frame 222

frame 226

frame 230

frame 234

frame 242

frame 238

frame 246

frame 250

Figure 4.4: Leaning due to accelerations.

57

Figure 4.5: Human trajectory.

As shown in Figure 4.4, our scheme reflects some dynamical aspects of locomotion such

as leaning incorporating acceleration as a parameter. Figure 4.5 shows a refined trajectory

after clamping outliers and adding details learned from example motion data. The motions

that are synthesized with final trajectories are shown in the accompanying video. Our

scheme facilitates on-line locomotion control without latency. Equipped with the scheme,

the locomotion synthesis system can produce more than 500 frames per second.

More experiments were performed to demonstrate the capability of our method for

two-character motion synthesis. We performed our experiments in two modes: automatic

and interactive modes. In the automatic mode, our method generated the motions of both

characters automatically while traversing the coupled motion transition graph, guided by

the learned motion transition model. Results are shown in Figure 4.6. As shown in Figure

4.9, our scheme produces interactions that are similar to those in example motions. In

the interaction mode, one of the characters (white shirt) was designated as an avatar.

The motion of the avatar was directed in an on-line manner by specifying a motion to

perform and the position to move to via the keyboard and the mouse, respectively. In both

modes, our method showed real-time performances: more than 30 fps(frame per second)

and more than 500 fps including and excluding rendering time, respectively. Figure 4.7

shows snapshots of the resulting two-character animations.

58

frame

54

frame

62

frame

70

frame

78

frame

86

frame

94

frame

102

frame

110

frame

118

frame

126

frame

54

frame

62

frame

70

frame

78

frame

86

frame

94

frame

102

frame

110

frame

118

frame

126

Figure 4.6: Synthesized two-fighter motion

59

(a) automatic (b) interactive

Figure 4.7: Two control modes

(a) punch (b) kick

Figure 4.8: Synthesized interactions

60

(a) punch 1 (b) punch 2 (c) low kick

(d) high kick (e) avoid 1 (f) avoid 2

Figure 4.9: Comparisons with example motions. Upper parts of each sub-figure are from

synthesized motions while lower parts are from example motions.

61

5. Discussion

We begin with locomotive motion synthesis. Relying on the contact forces of a human-

like articulated figure, our motion segmentation scheme is applicable to various footstep

driven motions such as locomotions and martial arts motions. Our segmentation scheme

is quite robust to avoid any human adjustment. However, the unlabeled example motion

should be captured from a real performer but not created manually by an animator. In

the latter case, the example motion may not exhibit the biomechanical observations that

our scheme heavily rely on.

We have parameterized the trajectory of an example locomotion motion segment with

the horizontal speed of the vehicle and its horizontal acceleration vector at the initial

frame. The rationale for this choice is that the motion segment is so short that the in-

formation at the initial frame characterizes the whole motion segment, which implies that

the motion segment itself is parameterized in the same way. This guarantees latency-free

trajectory construction. The trajectory could be further improved by incorporating a look-

ahead capability while sacrificing responsiveness.

Our locomotion synthesis scheme samples the intended (unknown) trajectory point by

point interactively. Since the user can easily adapt to the clamping to form a human-

in-the-loop feedback process, the intended trajectory can be achieved. As shown in the

first experiment, an input trajectory could be given as a curve. In this case, the curve is

converted to a point stream by properly sampling the curve. However, the curve may not

preserve its global shape when its speed and acceleration are clamped.

Now, we consider two-character motion generation. It must be noted that captured

example two-character motions are quite different in general from similar motions in real

situations. Specifically, many combinations of attacks and reactions are simulated by a

pair of performers to avoid injury to any of them, which may deteriorate the quality of

synthesized motions. One solution could be to incorporate physical-based techniques at

the moments of actual contacts between the players so as to enhance the quality of the

captured example motions [62,63].

In the interactive mode, the prescription of the label of an action for the avatar is

directly or indirectly under the user’s control. However, the numerical parameters of the

action are hard to prescribe when a motion is specified. Under the assumption that the

avatar imitates example motions, our strategy is to estimate the numerical parameters

62

from the prescribed action label and the numerical parameters of the current actions of

the characters using Gaussian processes. However, further investigation is needed to have

the numerical parameters under the user’s control.

63

6. Conclusions

In this thesis, an example-based framework for on-line, real-time motion synthesis is pro-

posed. A key component of our framework is the motion labeling scheme. We modeled

an unlabeled example motion in terms of labeled motion segments called actions so that

motion generation and transition can be performed effectively in accordance with on-line

motion specifications by the user. In particular, we proposed a hierarchical motion tran-

sition graph to represent a locomotion model and a coupled motion transition graph to

represent two-fighter motions. Exploiting biomechanical observation data on human con-

tact forces and footstep patterns, our scheme decomposes the example motion into groups

of motion segments such that the motion segments in the same group share an identical

footstep pattern. Moreover, based on string processing rather than numerical computa-

tion, our scheme for classifying locomotive motions is extremely robust. This scheme is

further extended for labeling martial arts motions combining a supervised learning tech-

nique called MSVM. As demonstrated in results, dynamic and diverse martial arts motions

are successfully classified. We believe that our motion labeling scheme can be adopted for

labeling other footstep driven motions as well.

We also present an on-line data-driven scheme for effectively prescribing a human pelvis

trajectory. Our scheme analyzes example motion data to extract the information on hu-

man steering behavior including motion details together with bounds on inter-frame accel-

eration variations, acceleration bounds, and speed bounds. Given a stream of point sam-

ples in an on-line manner, our scheme first transforms them one by one to a smooth vehi-

cle trajectory by clamping outliers, exploiting the bounds that have been estimated from

the example motion data. Our scheme then adds motion details to the vehicle trajectory

to obtain a human pelvis trajectory to follow. For explanation purpose, we concentrate on

the “pelvis” trajectory using a “mouse” as input device. However, we can easily generalize

our scheme for other trajectories such as a COM (center of mass) trajectory and a ZMP

(zero moment point) trajectory. We can also use an input device other than a mouse, for

example, a joystick or even slide bars.

For capturing the interactions between two fighters embedded in the example motion

stream, a dynamic Bayesian network is adopted to train a motion transition model with

the data collected into the coupled motion transition graph. This model facilitates repro-

duction of captured interactions at runtime such that each action by a character is ac-

64

companied by a proper reaction by the counterpart in a probabilistic manner. The next

motion for the other character is coupled with the motion of the current character in both

space and time for realistic motion synthesis.

In the future, we would like to generalize the behavior model for locomotive motions to

support more complex movements such as side-stepping and backward walking. For this

purpose, our vehicle model may be generalized such that the orientation of the vehicle is

not necessarily aligned with its velocity. The generalized vehicle can be controlled based on

torques as well as forces. Currently, movements for martial arts are described based on the

pelvis trajectory. The generalized vehicle model may provide a better way for describing

the movements of a character.

The proposed interaction model can possibly be adapted for other two-player sports

games such as tennis and ping-pong in each of which a variety of dynamic motions are

performed to hit a ball. Another interesting extensions include interactions among multi-

ple characters observed in team sports such as soccer and basketball; it would be a feasible

solution to employ traditional crowd simulation approach to generate global movements,

while synthesizing detailed local interactions among a small number of players based on

our example-based framework.

65

요약문

온라인동작생성을위한동작모델링기법

본 연구에서는 온라인 동작 생성을 위한 예제기반 프레임웍을 제시한다. 제안된 프레임

웍은 동작 모델링과 행동 양식 모델링 그리고 동작 생성의 세가지 단계로 구성되어 있

다. 동작 모델링 단계에서는 먼저 예제 동작을 캐릭터와 지면간의 접촉력에 기반하여 분

할한다. 분할된 동작 세그먼트는 유사한 구조를 갖는 동작 세그먼트들 끼리 그룹핑된다.

마지막으로 생성된 그룹들을 노드에 할당하고, 그룹들간이 전이를 에지로 할당하여 동작

전이 그래프를 생성한다. 행동 양식 모델링에서는 동작 세그먼트들을 시간에 따라 변화

하는 환경에 맞도록, 그리고 실시간 온라인으로 주어지는 사용자의 동작 지침에 맞도록

동작 세그먼트를 연결하는 동작 전이 모델을 구성한다. 동작 생성 단계에서는 온라인으

로 주어지는 동작지침에 따라 동작 전이모델을 참조하여 그래프를 탐색하고, 각 노드에서

생성되는 동작을 연결하여 동작지침에 해당하는 동작을 얻는다. 이와 같은 프레임웍에 기

반하여 먼저 보행동작의 순환적 특징에 의한 특징적인 움직임과 동작간의 전이를 다룬다.

또한 이를 일반화하여 두명의 격투 동작을 생성하는 방법을 제안한다. 현재의 연구는 보

행 동작과 격투 동작의 두 가지 동작에 대해서만 수행되었지만, 다른종류의 이족 캐릭터

동작에도 제안된 방법이 적용 가능할 것이다.

66

References

[1] Y. Abe, C. Karen Liu, and Z. Popovic. Momentum-based parameterization of

dynamic character motion. In SCA ’04: Proceedings of the 2004 ACM SIG-

GRAPH/Eurographics symposium on Computer animation, pages 173–182, New York,

NY, USA, 2004. ACM Press.

[2] AGEIA. Physx sdk 2.6, http://ageia.com.

[3] O. Arikan and D. A. Forsyth. Interactive motion generation from examples. ACM

Transactions on Graphics (Proc. SIGGRAPH 2002), 21(3):483–490, July 2002.

[4] O. Arikan, D. A. Forsyth, and J. F. O’Brien. Motion synthesis from annotations.

ACM Transactions on Graphics (Proc. SIGGRAPH 2003), 22(3):402–408, July 2003.

[5] O. Arikan, D. A. Forsyth, and J. F. O’Brien. Motion synthesis from annotations.

ACM Transactions on Graphics (Proc. SIGGRAPH 2003), 22(3):402–408, July 2003.

[6] O. Arikan, D. A. Forsyth, and J. F. O’Brien. Pushing people around. In ACM SIG-

GRAPH/Eurographics Symposium on Computer Animation 2005, July 2005.

[7] J. Assa, Y. Caspi, and D. Cohen-Or. Action synopsis: pose selection and illustration.

ACM Trans. Graph., 24(3):667–676, 2005.

[8] J. Barbic, A. Safonova, J. Pan, C. Faloutsos, J. Hodgins, and N. Pollard. Segmenting

motion capture data into distinct behaviors. In In the Proc. of Graphics Interface,

pages 185–194, 2004.

[9] Richard H. Bartels, John C. Beatty, and Brian A. Barsky. An introduction to splines

for use in computer graphics & geometric modeling. Morgan Kaufmann Publishers

Inc., San Francisco, CA, USA, 1987.

[10] R. Bindiganavale and N. I. Badler. Motion abstraction and mapping with spatial

constraints. In Proceedings of International Workshop CAPTECH’98, pages 70–82,

1998.

[11] M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex

action recognition. In Proceedings of IEEE Computer Society Conference on Computer

Vision and Pattern Recognition, pages 994 – 999, 1997.

67

[12] Mattew Brand. Coupled hidden markov models for modeling in-

teracting processes. Technical Report 405, MIT Media Lab,

http://xenia.media.mit.edu/ brand/Publications.html, 1996.

[13] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines

(version 2.31), September 07 2001.

[14] S. M. Chu and T. S. Huang. Audio-visual speech modeling using coupled hid-

den markov models. In Proceedings of IEEE International Conference on Acoustics,

Speech, and Signal Processing, pages 2009 – 2012, 2002.

[15] Luis Correia and A. Steiger-Garo. A useful autonomous vehicle with a hierarchical

behavior control. In European Conference on Artificial Life, pages 625–639, 1995.

[16] B. D’Ambrosio. Inference in bayesian networks. AI Magazine, 20(2):21–36, 1999.

[17] Morris H. DeGroot. Optimal Statistical Decisions. McGraw-Hill, New York, 1970.

[18] Kai-Bo Duan and S. Sathiya Keerthi. Which is the best multiclass SVM method? an

empirical study. In Nikunj C. Oza, Robi Polikar, Josef Kittler, and Fabio Roli, edi-

tors, Multiple Classifier Systems, volume 3541 of Lecture Notes in Computer Science,

pages 278–285. Springer, 2005.

[19] A. Fod, M. J. Mataric, and O.C. Jenkins. Automated derivation of primitives for

movement classification. Autonomous Robots, 12(1):39–54, 2002.

[20] K. Forbes and E. Fiume. An efficient search algorithm for motion data using weighted

pca. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium

on Computer animation, pages 67–76, New York, NY, USA, 2005. ACM Press.

[21] A. Galata, N. Johnson, and D. Hogg. Learning variable-length markov modles of

behavior. Computer Vision and Image Understanding, 81:398–413, 2000.

[22] M. Gales. Semi-tied covariance matrices for hidden markov models. IEEE Transac-

tions Speech and Audio Processing, 7:272–281, 1999.

[23] M. N. Gibbs. Bayesian Gaussian precesses for regression and classification. PhD

thesis, Cambridge University, 1997.

[24] M. Gleicher, H. J. Shin, L. Kovar, and A. Jepsen. Snap-together motion: Assembling

run-time animation. ACM Transactions on Graphics 22 (Proc. SIGGRAPH 2003),

22(3):702–702, July 2003.

68

[25] Jared Go, Thuc Vu, and James J. Kuffner. Autonomous behaviors for inter-

active vehicle animations. In SCA ’04: Proceedings of the 2004 ACM SIG-

GRAPH/Eurographics symposium on Computer animation, pages 9–18, Aire-la-Ville,

Switzerland, Switzerland, 2004. Eurographics Association.

[26] S. Guo and J. Roberge. A High-Level Control Mechanism for Human Locomotion

Based on Parametric Frame Space Interpolation. In Proceedings of Eurographics

Workshop on Computer Animation and Simulation 96, pages 95–107, August 1996.

[27] D. Helbing, I. J. Farkas, and T. Vicsek. Simulating dynamic features of escape panic.

In Nature, pages 407:487–490, 2000.

[28] Eugene Hsu, Sommer Gentry, and Jovan Popovic. Example-based control of human

motion. pages 69–78.

[29] O.C. Jenkins and M.J. Mataric. Automated derivation of behavior vocaburaries for

autonomous humanoid motion. In In Proc. of AAMAS’03, pages 225–232, 2003.

[30] T. H. Kim, S. I. Park, and S. Y. Shin. Rhythmic-motion synthesis based on motion-

beat analysis. ACM Transactions on Graphics (Proc. SIGGRAPH 2003), 22(3):392–

401, July 2003.

[31] L. Kovar and M. Gleicher. Automated extraction and parameterization of motions in

large data sets. ACM Trans. Graph., 23(3):559–568, 2004.

[32] L. Kovar, M. Gleicher, and J. Schreiner. Footskate cleanup for motion capture editing.

In Proceedings of ACM SIGGRAPH Symposium on Computer Animation, July 2002.

[33] Lucas Kovar, Michael Gleicher, and Frédéric Pighin. Motion graphs. In

SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics and

interactive techniques, pages 473–482, New York, NY, USA, 2002. ACM Press.

[34] T. Kwon and S. Y. Shin. Motion modeling for on-line locomotion synthesis. In SCA

’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer

animation, pages 29–38, New York, NY, USA, 2005. ACM Press.

[35] Y. C. Lai, S. Chenney, and S. Fan. Group motion graphs. In SCA ’05: Proceedings of

the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation, pages

281–290, New York, NY, USA, 2005. ACM Press.

69

[36] J. Lee, J. Chai, P. S. A. Reitsma, J. K. Hodgins, and N. S. Pollard. Interactive Con-

trol of Avatars Animated with Human Motion Data. ACM Transactions on Graphics

(Proc. SIGGRAPH 2002), 21(3):491–500, July 2002.

[37] J. Lee, J. Chai, P. S. A. Reitsma, J. K. Hodgins, and N. S. Pollard. Interactive Con-

trol of Avatars Animated with Human Motion Data. ACM Transactions on Graphics

(Proc. SIGGRAPH 2002), 21(3):491–500, July 2002.

[38] Jehee Lee and Kang Hoon Lee. Precomputing avatar behavior from human motion

data. In SCA ’04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics sympo-

sium on Computer animation, pages 79–87, New York, NY, USA, 2004. ACM Press.

[39] Y. Li, T. Wang, and H. Shum. Motion Texture: A Two-Level Statistical Model

for Character Motion Synthesis. ACM Transactions on Graphics (Proc. SIGGRAPH

2002), 21(3):465–472, July 2002.

[40] C. Karen Liu, Aaron Hertzmann, and Zoran Popović. Composition of com-

plex optimal multi-character motions. In SCA ’06: Proceedings of the 2006 ACM

SIGGRAPH/Eurographics symposium on Computer animation, pages 215–222, Aire-

la-Ville, Switzerland, Switzerland, 2006. Eurographics Association.

[41] C. Karen Liu and Zoran Popovic. Synthesis of complex dynamic character motion

from simple animations. In John Hughes, editor, SIGGRAPH 2002 Conference Pro-

ceedings, Annual Conference Series, pages 408–416. ACM Press/ACM SIGGRAPH,

2002.

[42] Ronald A. Metoyer and Jessica K. Hodgins. Reactive pedestrian path following from

examples. The Visual Computer, 20(10):635–649, 2004.

[43] T. Mukai and S. Kuriyama. Geostatistical motion interpolation. ACM Trans. Graph.,

24(3):1062–1070, 2005.

[44] M. Muller, T. Roder, and M. Clausen. Efficient content-based retrieval of motion

capture data. ACM Trans. Graph., 24(3):677–685, 2005.

[45] Kevin P. Murphy. An introduction to graphical models. Unpublished, May 2001.

[46] S. I. Park, H. J. Shin, and S. Y. Shin. On-line locomotion generation based on motion

blending. In Proceedings of ACM SIGGRAPH Symposium on Computer Animation,

pages 105–111, July 2002.

70

[47] S. I. Park, H. J. Shin, and S. Y. Shin. On-line motion blending for real-time locomo-

tion generation. Computer Animation and Virtual Worlds, 15:125–138, Sept. 2004.

[48] Jacquelin Perry. Gait analysis: Normal and Pathological Function. Delmar Learning,

1992.

[49] K. Pullen and C. Bregler. Motion Capture Assisted Animation: Texturing and Syn-

thesis. ACM Transactions on Graphics (Proc. SIGGRAPH 2002), 21(3):501–508, July

2002.

[50] Craig Reynolds. Steering behaviors for autonomous characters, 1999.

[51] C. Rose, M. F. Cohen, and B. Bodenheimer. Verbs and adverbs: Mulidimensional

motion interpolation. IEEE Computer Graphics and Applications, 18(5):32–40, Sept.

1998.

[52] A. Safonova and J. K. Hodgins. Analyzing the physical correctness of interpolated

human motion. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics

symposium on Computer animation, pages 171–180, New York, NY, USA, 2005. ACM

Press.

[53] H. J. Shin, J. Lee, S. Y. Shin, and M. Gleicher. Computer puppetry: An importance-

based approach. ACM Trans. Graph., 20(2):67–94, 2001.

[54] Hyun Joon Shin, Jehee Lee, Sung Yong Shin, and Michael Gleicher. Computer pup-

petry: An importance-based approach. ACM Trans. Graph., 20(2):67–94, 2001.

[55] P. Sloan, C. F. Rose, and M. F. Cohen. Shape by example. In Proceedings of 2001

ACM Symposium on Interactive 3D Graphics, pages 135–144, 2001.

[56] A. Sulejmanpasic and J. Popovic. Adaptation of performed ballistic motion. ACM

Trans. Graph., 24(1):165–179, 2005.

[57] L. M. Tanco and A. Hilton. Realistic synthesis of novel human movements from a

database of motion capture examples. In Proceedings of the IEEE Workshop on Hu-

man Motion, pages 137–142, 2000.

[58] A. Treuille, S. Cooper, and Z. Popovic. Continuum crowds. ACM Trans. Graph.,

25(3):1160–1168, 2006.

[59] Geoffrey I. Webb, Janice R. Boughton, and Zhihai Wang. Not so naive bayes: Ag-

gregating one-dependence estimators. Machine Learning, 58(1):5–24, 2005.

71

[60] D. J. Wiley and J. K. Hahn. Interpolation synthesis for articulated fiture motion.

IEEE Computer Graphis and Applications, 17(6):39–45, 1997.

[61] D. A. Winter. Biomechanics and Motor Control of Human Movement. John Wiley

and Sons Inc, 1990.

[62] V. B. Zordan and J. K. Hodgins. Motion capture-driven simulations that hit and

react. In Proceedings of ACM SIGGRAPH Symposium on Computer Animation, pages

89–96, July 2002.

[63] V. B. Zordan, A. Majkowska, B. Chiu, and M. Fast. Dynamic response for motion

capture animation. ACM Trans. Graph., 24(3):697–701, 2005.

72

인동작생성을위 - | calab.hanyang.ac.krcalab.hanyang.ac.kr/dissertations/phdthesis.pdf ·...

Documents