인동작생성을위 - | calab.hanyang.ac.krcalab.hanyang.ac.kr/dissertations/phdthesis.pdf ·...
TRANSCRIPT
박사학위논문
Doctoral Thesis
온라인동작생성을위한동작모델링기법
Motion Modeling for On-line Motion Synthesis
권태수 (權泰秀 Kwon, Taesoo)
전자전산학과전산학전공
Department of Electrical Engineering and Computer Science
Division of Computer Science
한국과학기술원
Korea Advanced Institute of Science and Technology
2007
DCS
20025806
권 태 수. Kwon, Taesoo. Motion Modeling for On-line Motion Synthesis. 온
라인 동작생성을 위한 동작 모델링 기법. Department of Electrical Engineer-
ing and Computer Science, Division of Computer Science . 2007. 72p. Advisor
Prof. Shin, Sung Yong. Text in English.
Abstract
In this thesis, we propose an example-based framework for on-line motion synthesis.
Our framework consists of three parts: motion modeling, behavior modeling and motion
synthesis. In the motion modeling part, an unlabeled motion sequence is first decomposed
into motion segments, exploiting the contact forces against the ground. Those motion seg-
ments are subsequently classified into groups of motion segments such that the same group
of motion segments share an identical structure. Finally, we construct a motion transition
graph by representing these groups and their connectivity to other groups as nodes and
edges, respectively. In behavior modeling, we build a motion transition model that con-
nects motion segments according to on-line motion specifications, while adapting to the
(time-varying) environment. In motion synthesis, given a stream of motion specifications
in an on-line manner, our system generates a corresponding motion while traversing the
motion transition graph guided by the motion transition model. Based on the framework,
we first address the issues in motion dynamics and transition that arise from the cyclic
nature of locomotions. We then generalize the idea to two-character motions, in partic-
ular, motions for martial arts performed by a pair of characters. Although the focus of
the present work is on locomotve motions and martial arts motions, we believe that the
framework of the proposed approach can be conveyed to other footstep driven motions as
well.
i
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Motion modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Behavior modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Online Motion Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Locomotion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 Two-character Interactions . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Locomotion Synthesis 13
2.1 Motion Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Motion Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Locomotive Motion Classification . . . . . . . . . . . . . . . . . . . . 16
2.1.4 Motion Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.5 Hierarchical Motion Transition Graph . . . . . . . . . . . . . . . . . . 22
2.2 Example Motion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Vehicle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2 Example Vehicle Trajectories . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Motion Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.4 Speed and Acceleration bounds . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Locomotion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iii
3 Two-Character Motion Synthesis 35
3.1 Two-Character Motion Classification . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 MSVM-based Classification . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.2 Rule-based Classification . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Interaction Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Background on Underlying Statistical Models . . . . . . . . . . . . . . . . . . 46
3.4 Motion Coupling and Postprocessing . . . . . . . . . . . . . . . . . . . . . . . 49
4 Results 52
4.1 Motion Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 Motion Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Discussion 62
6 Conclusions 64
Summary (in Korean) 66
References 67
iv
List of Tables
2.1 Masses for links of our articulated figure . . . . . . . . . . . . . . . . . . . . 16
2.2 The strings for actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Action repertoire table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1 Example motion dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Results on motion segmentation in comparison to manual segmentation . . 53
4.3 Results on motion labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Classification results for locomotions . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Classification results for two-fighter motions . . . . . . . . . . . . . . . . . . 55
v
List of Figures
1.1 Overall structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 The magnitude of the contact force and corresponding poses . . . . . . . . . 14
2.2 Contact force and motion segmentation . . . . . . . . . . . . . . . . . . . . . 15
2.3 The representative pose of each phase . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Examples of string encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Motion transition graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Pelvis trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 Vehicle trajectory estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Speed profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.9 Acceleration bounds - RUN. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.10 Acceleration bounds - WALK. . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.11 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Jump kick motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Action annotations: All empty boxes denote the “others” class. . . . . . . 37
3.3 Basic data structure B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 A node of G1 (or G2) has a set of links pointing to the actions in B . . . 40
3.5 Bayesian network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6 Reference coordinate frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7 The action labels of the candidates . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Results on motion segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Trajectory refinement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 On-line motion synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Leaning due to accelerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.5 Human trajectory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.6 Synthesized two-fighter motion . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.7 Two control modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.8 Synthesized interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
vi
4.9 Comparisons with example motions. Upper parts of each sub-figure are from
synthesized motions while lower parts are from example motions. . . . . . 61
vii
1. Introduction
1.1 Motivation
Synthesizing appealing motions of human-like characters in an on-line, real-time manner is
an important issue in the context of computer games and virtual environments. The tra-
ditional pipeline for on-line, real-time motion synthesis goes through the following steps:
First, the animators manually prepare sets of short motion segments by segmenting cap-
tured or key-framed motion sequences. The animators then define the rules for determin-
ing transitions between a pair of motion segments according to given situations and scenar-
ios. In run-time, a novel motion is synthesized in an on-line, real-time manner by stitching
the short motion segments that are chosen guided by the rules. Although this scheme is
successful to produce responsive motions of virtual characters, there exist inherent limita-
tions: The traditional animation pipeline requires a large amount of work by skilled ani-
mators and programmers. Also, the quality and variety of motions are often compromised
due to limited human resources, while guaranteeing an on-line, real-time performance.
Recently, there have been some research efforts to overcome these limitations by au-
tomating the pipeline employing a set of unlabeled example motion data [3,4,21,33,36,39,
49, 57]. In these approaches, motion transition graphs encapsulate motion segments and
transitions between motion segments. Since the motion transition graphs are automati-
cally constructed from example motion data, synthesized motions retain naturalness and
variety embedded in captured motions. However, the motion transition graphs tend to be
too large to achieve a desired performance for motion search. Therefore, it is very diffi-
cult, if not impossible, to adopt these approaches for online, real-time motion synthesis.
We suggest an example-based framework for online real-time motion synthesis. Our
framework consists of two main components: motion modeling and behavior modeling.
In motion modeling, we construct a novel motion transition graph from example motion
data, where a node represents a group of motion segments of a similar structure, and an
edge represents a transition between a pair of groups, respectively. In behavior modeling,
we build a motion transition model that connects motion segments according to on-line
motion specifications, while adapting to the (time-varying) environment.
Based on the framework, we first address the issues in motion dynamics and transition
that arise from the cyclic nature of locomotions: Given an unlabeled motion sequence of
1
a human-like articulated figure, the sequence is decomposed into motion segments based
on the contact forces between the figure and the ground. Those motion segments are then
classified into groups of motion segments such that the motions in the same group share
an identical motion type, exploiting the biomechanical observations on footstep patterns.
We also propose a hierarchical motion transition graph to incorporate a motion hierarchy
and transition motions. We then generalize the scheme to two-character motions, in par-
ticular, two-fighter motions. Convincing interactive motions for a pair of characters can-
not be realized by simply juxtaposing their individual motions; the motion of a charac-
ter should be chosen with respect to that of the other, and vice versa. We present an
example-based method for capturing the interactions between two fighters embedded in
the example motion stream.
1.2 Objectives
In this section, we provide the objectives of this thesis. As the proposed framework con-
sists of two components, motion modeling and behavior modeling, we state the objectives
separately for each of the components.
1.2.1 Motion modeling
The objectives for motion modeling are two-fold: controllability and accessibility. By con-
trollability, we mean that our modeling scheme allows the user to specify a desired motion
in an on-line manner. By accessibility, we mean that a desired example motion segment
can be accessed efficiently. To satisfy these requirements, a stream of example motions
is modeled as a motion transition graph, in which nodes and edges represent groups of
motion segments and their transitions, respectively.
Rose et al. proposed a motion transition graph called “verb graph”, where a node
represents a group of motion segments of an identical structure, and an edge represents the
transition from a group to a group [51]. As demonstrated in locomotive motion generation
[46,47] and rhythmic motion synthesis [30], the motion transition graphs based on motion
blending have enhanced efficiency and controllability for motion synthesis, while retaining
naturalness embedded in captured motions. To facilitate this approach, the major premise
is the availability of labeled motion data satisfying the following two properties.
• The group of motions at each node should have an identical structure.
• The group of motions at a node should transit seamlessly to that of motions at a
2
node connected by an edge (possibly a self-edge).
In [51, 46, 47], it was assumed that labeled motion clips are available. Kim et al. [30]
proposed an automatic labeling scheme for rhythmic motions to construct their motion
transition graphs, exploiting motion beats and rhythmic patterns embedded in the mo-
tions.
In this thesis, inspired by the work in [30], we propose a novel on-line motion synthesis
approach for non-rhythmic motions, in particular, locomotive motions such as running and
walking motions, and martial arts motions such as Kickboxing, Karate and Taekwondo.
Unlike rhythmic motions, reference temporal patterns such as motion beats and rhythmic
patterns are not available for these motions in general. Exploiting biomechanical results
on human contact force profiles, we first cut an unlabeled motion sequence into motion
segments to identify motion units called actions. We then classify those motion segments
into groups of motion segments such that the same group of motion segments share an
identical structure while extracting their parameters simultaneously. Finally, we construct
a motion model, based on the motion transition graph.
1.2.2 Behavior modeling
Even with motion models embodied in the graphs, it is non-trivial issue to generate con-
vincing motions appropriate to given situations and scenarios. We present example-based
methods for modeling locomotive behavior and interacting behavior of human-like charac-
ters.
Behavior modeling for locomotive motions: Although rich researches have been done
in locomotion synthesis, the steering behavior of human-like character has not been ad-
dressed well. During straight walking and running, the pelvis of human body oscillates
due to the pelvis movements (rotations and translations) caused by supporting feet. Such
oscillations or curvature variations are the unique characteristics of human steering be-
havior. Simply placing the pelvis along a user-specified trajectory would not produce a
natural motion. This immediately raises an issue: how to incorporate these characteris-
tics into a user-specified trajectory.
For on-line applications, the user commonly prescribes a motion by interactively pro-
viding a motion type and its trajectory. In particular, the trajectory is specified either
explicitly by a point stream that is sampled with an input device such as a mouse, or im-
plicitly by a force profile that is given with a user interface equipped with slide bars (or
a joystick). The former directly produces the trajectory of a human-like figure. Although
3
it is easy to specify, the trajectory itself is neither precise nor smooth. Moreover, it is far
from a natural human trajectory. On the other hand, integrating an input force profile
that is sampled at each frame, the latter yields a smooth trajectory in an equally-easy
manner. However, the resulting trajectory is not natural, either. In either case, little ef-
fort has been made to produce a natural human trajectory.
No matter what method we employ, it would be difficult to generate high-quality lo-
comotion with such a poor trajectory. In this thesis, we present a data-driven method for
refining a input trajectory for on-line, real-time locomotion synthesis, given the type of a
locomotive motion. Choosing the center of the pelvis as the root of an articulated charac-
ter, we describe how to yield a natural pelvis trajectory from the input trajectory. With-
out loss of generality, we assume that the input trajectory is given in an explicit form, that
is, in the form of a point stream sampled at each frame. The refined trajectory gives the
global pelvis position and orientation at each frame. The refinement is performed frame
by frame in an online manner. Our method performs a two-step refinement: first clamping
speed and acceleration and then adding naturalness. For trajectory refinement, we make
use of example motion data.
Behavior modeling for two fighter motions: Aside from their popularity, standing-
up martial arts such as Kickboxing, Karate, and Taekwondo have attracted our attention
because of its repertories of diverse and dynamic motions. A pair of players in a match are
allowed to use not only both hands and feet to exchange punches and kicks but also arms
and legs to block the opponent’s attacks. They move in accordance with each other within
a rectangular area, seeking the opportunities to attack while avoiding the opponent’s at-
tacks. Furthermore, the players continuously interact with each other directly through a
combination of attacks and counter-attacks or indirectly by observations on the opponent’s
footwork and bodywork.
In the computer animation community, Karate motions were used to create believable
demonstrations for motion analysis and synthesis [33,24], in which the focus lies on a single
player’s motions. However, convincing interactive motions for a pair of characters cannot
be realized by simply juxtaposing their individual motions. The motion of a character
should be chosen with respect to that of the other, and vice versa. In other words, each
action by a character should be accompanied by a proper reaction by the counterpart
such that they alternate in combination. Moreover, the motions need to be adapted to
the environment including the characters themselves, so that the actions and reactions
are exchanged timely at the right places.
A naive, straightforward solution would be to regard a pair of characters as a sin-
4
gle entity with at least twice as many degrees of freedom so that the results on tradi-
tional motion analysis and synthesis for single characters could be applied. Such a solu-
tion yielded impressive results in rhythmic motion synthesis, in particular, dance motion
generation [30]. Unlike dance motions, however, martial arts are performed by a pair of
players in a rather asynchronous manner with a variety of motions for each player. For
such asynchronous interactions, it is non-trivial even for a human expert to determine
the next action/reaction of a character: the type of motion, the local joint configurations,
and the relative root position and orientation with respect to the opponent. In this the-
sis, we present an example-based method for analysis and synthesis of two-character mo-
tions while properly capturing and reflecting their interactions embedded in the example
motion stream, of which each frame contains a snapshot of a two-character motion. Our
objectives are two-fold: coping with the inherent diversity of two-character motions and
enhancing the efficiency of motion synthesis for on-line, real-time applications.
1.3 Related Work
Our work is inspired by three areas in character animation: online motion synthesis, loco-
motion control and two-character interactions. We review related results in each of those
areas.
1.3.1 Online Motion Synthesis
Example-based Motion Synthesis: Due to the recent popularity of motion capture
and reuse, there have been rich research results in this area. These results can be clas-
sified into two groups: motion rearrangement [5, 32, 37, 38, 39, 57] and motion blending
[1,26,47,51,52]. The former category of methods synthesizes motions by rearranging mo-
tion segments (also poses) in an example motion stream while retaining details of the ex-
ample motions. On the other hand, the latter category of methods synthesizes a desired
motion in an on-line, real-time manner by blending labeled motion segments. Combining
advantages of both categories of methods, hybrid methods have been studied to generate
high quality motions in an on-line, real-time manner [30,34,43,47,51].
Rose et al. [51] proposed a framework of motion blending based on scattered data in-
terpolation with radial basis functions. They introduced a verb graph for motion transi-
tion. Later, Sloan et al. [55] adopted cardinal basis functions for further acceleration. De-
spite of its superb efficiency, this approach was intended for real-time, but not for on-line
motion synthesis.
5
Park et al. [46, 47] have enhanced the framework of Rose et al. [51] for on-line loco-
motion blending. Their most important contribution is arguably to model labeled motion
clips available in a motion library, by incorporating motion rearrangement [3, 4, 21,33,36,
39, 49, 57] into the framework of motion blending, based on a motion transition graph.
However, the authors did not address how to obtain the labeled motion clips to construct
the motion transition graph.
Recently, Kim et al. [30] also adapted the motion transition graph [46] for on-line
rhythmic motion synthesis. Given an unlabeled example motion sequence, their approach
decomposes the sequence into a set of motion segments and clusters them to obtain a col-
lection of labeled sets of motion segments by exploiting their rhythmic structures. The au-
thors modeled the rhythmic motion sequence as a motion transition graph, where a node
and an edge represent a set of labeled segments and the transition from one labeled seg-
ment set to a segment set (possibly itself). Unfortunately, this method is not applicable
to non-rhythmic motions.
Motion Segmentation and Classification: Motion segmentation has recently emerged
as an important issue. Although research on this issue is still in an early stage, many re-
sults have been presented [5, 7, 8, 10, 19, 20, 29, 30, 33, 34, 44]. Our review focuses on the
results for motion decomposition directly related to our work.
Our starting point is the work by Bindiganavale and Badler [10], in which the mo-
ment of an interaction of an articulated character with an object in the environment is
captured by checking the zero-crossing moments of joint accelerations. Fod et al. [19] and
Kim et al. [30] used this idea to segment a motion sequence, and classified the resulting
motion segments based on principal component analysis (PCA) and K-means clustering.
On top of zero-crossing moments, Kim et al. exploited rhythmic patterns called ”motion
beats” for motion segmentation and classification, which are hard to be generalized for
other types of motions.
Arikan et al. [5] employed a support vector machine classifier to interactively anno-
tate motion data. Although this semi-automatic classifier works well for motion annota-
tion, the per-pose annotation scheme may not be precise enough to capture the transi-
tion moments between different types of actions. We get around this problem by annotat-
ing motion streams at the granularity of actions rather than individual poses. Jenkins and
Mataric [29] and Barbit et al. [8] proposed automatic motion classification methods. How-
ever, their goals are not automatic motion blending and transition. Exploiting the tem-
poral correspondence between motions, Kovar and Gleicher [31] proposed an automatic
scheme to extract motion segments that are similar to a query motion. This scheme as-
6
sumes the availability of query motions. As pointed out by the authors, the scheme some-
times requires manual filtering to obtain blendable motions and does not take into account
motion transition. Muller et al. [44] proposed a motion search method in which the char-
acteristics of a motion are specified by a collection of Boolean functions. For timewarping,
they captured the moments where the values of these functions change. A similar idea was
also used to classify locomotions in Kwon and Shin [34], where a rule-based classifier was
provided based on a symbolic representation scheme of locomotion phases. We propose
a hybrid scheme for motion classification that utilizes both support vector machines and
rule-based classifiers.
1.3.2 Locomotion Control
Motion control has been a recurring theme in character animation, crowd simulation, and
robotics. Our search focuses on the work directly related to on-line locomotion control.
Force-based control : Reynolds [50] adopted a vehicle model to simulate typical steer-
ing behaviors for simple autonomous creatures in an on-line manner. These creatures were
abstracted as simple vehicles, that is, oriented particles with their own dynamics. The
steering behaviors were achieved by integrating applied forces. The vehicle model was fur-
ther extended by incorporating real-time path planning [25]. We also use a vehicle model
to obtain a smooth, feasible trajectory, exploiting example motion data.
Based on a social force model of Helbing and Molner [27], Metoyer and Hodgins [42]
simulated reactive pedestrians. Interpreting a pedestrian as a vehicle, the 2D trajectory
of the pedestrian was generated by integrating the social force field. Teuille et al. [58]
presented a more sophisticated potential field for the similar purpose. Together with the
potential field, motion capture data were used for motion synthesis. Based on force inte-
gration, the force-based approaches were able to yield smooth trajectories. However, these
approaches did not allow for task-level on-line control. Moreover, it is hard to reflect nat-
ural human steering behavior with these approaches. We adopt the idea of force integra-
tion for trajectory refinement.
Position-based control : In this category of approaches, a point stream is sampled di-
rectly for on-line locomotion control, based on motion blending. Early work by Guo and
Roberge [26], Wily and Hahn [60], and Rose et al. [51] laid the ground for on-line mo-
tion control. Park et al. [46,47] extended the early work for on-line locomotion synthesis.
Guided by a motion trajectory, they were able to synthesize a locomotion stream in an
7
on-line, real-time manner. With a sequence of supporting foot positions sampled along
a user-specified trajectory, their work demonstrated an on-line path-following capability.
Mukai et al. [43] further extended this work based on geostatistics. Recently, Kwon and
Shin [34] presented a framework of on-line, real-time locomotion synthesis. This frame-
work automated the whole process of locomotion modeling and synthesis, including mo-
tion segmentation and classification. The pelvis trajectory was specified in an on-line man-
ner by integrating a stream of 2D displacements on the ground. In general, a position-
based approach was not able to yield a smooth trajectory. In addition, the trajectory was
also lacking in naturalness. We refine such a trajectory by imitating the example motion
data.
1.3.3 Two-character Interactions
Interactions between characters are extremely important for synthesis of believable mo-
tions involving two or more characters. Zordan and Hodgins [62] incorporated physical
simulation into data-driven animation, to synthesize reactive upperbody motions of a char-
acter to various impacts by the opponent in boxing and table tennis. Zordan et al. [63]
extended this approach to full-body motions. Arikan et al. [6] suggested a method to dis-
criminate realistic deformation from its unrealistic versions for example reactive motions
to pushing. The methods in this category did not deal with mutual interactions although
they were able to synthesize various reactive motions to external forces.
Kim et al. [30] demonstrated an impressive animated scene of ballroom dancing by
multiple couples, each being considered as a single entity with more than twice as many
degrees of freedom of a single character. Lai et al. [35] proposed a group motion graph
that represents the positional configurations among multiple characters. However, the au-
thors did not deal with interactions among the characters.
Guided by examples of coupled dancing motions, Hsu et al. [28] generated the motion
of the synthetic partner for a dancer using motion capture data. The motion of the dancer
was used as a control signal to search for an example motion segment from databases.
However, this approach is hard to be applied to martial art motions where the two char-
acters are interacting with each other in a rather asynchronous manner both in time and
space. Liu et al. [40] presented a physics-based method for creating multi-character mo-
tions from short single-character sequences. The authors formulated multi-character mo-
tion synthesis as a spacetime optimization problem where constraints represent the desired
character interactions. However, this paper did not address how to choose input motions
that can be paired. Moreover, the formulation did not allow a real-time task-level control
8
in an on-line manner.
A coupled hidden markov model (CHMM) [12, 11] is commonly used to model the
cross dependencies between two or more synchronous signals, for example, audio/visual
signals [14]. However, martial arts are performed by a pair of players in a rather asyn-
chronous manner with a variety of motions for each player. Since such asynchronous be-
havior is hard to model with the CHMM, we propose an asynchronously-coupled dynamic
Bayesian network to model interactions between the two players.
1.4 Contributions
Synthesizing appealing motions of human-like characters in an on-line, real-time manner
is an important issue in the context of computer games and virtual environments. In this
thesis, we present example-based methods for generating locomotions and interactive mo-
tions of human-like characters, in an on-line, real-time manner. Our technical contribu-
tions are three-fold : motion labeling, motion prescription and interaction modeling.
First, we propose a novel motion labeling scheme that spans both motion segmentation
and classification. We construct a hierarchical motion transition graph reflecting the cyclic
nature of locomotion. To our knowledge, the proposed approach provides the first auto-
matic labeling scheme for locomotive motions allowing both motion blending and transi-
tion. This scheme is further extended for two-character motions to construct a coupled
motion transition graph reflecting interaction behavior between two characters. Combin-
ing the advantages of the support vector machine classifiers and rule-based classifiers, our
motion labeling scheme is general enough to be conveyed to other footstep driven motions
as well.
Next, we present a novel data-driven scheme for prescribing a desired locomotion in
an intuitive manner. For on-line applications, the user commonly prescribes a motion by
providing the trajectory for a character to follow together with a motion type. It can be
easily specified with an input device such as a mouse. However, the user-specified trajec-
tory is far from a natural human trajectory since it is probably jerky and lacks in human
characteristics such as oscillations and curvature variations caused by pelvis movements.
We propose a novel data-driven scheme for transforming a user-prescribed trajectory to a
human trajectory in an on-line manner. Employing a vehicle model, our scheme produces
a smooth, feasible trajectory. By imitating the example motion data on top of the vehicle
trajectory, a natural human trajectory can be generated efficiently.
We finally propose an example-based method for capturing the interactions between
9
example motions
motion specifications
motionmodeling
behavior modeling
motion synthesisoutput motions
analysis
synthesis
example motions
motion specifications
motionmodeling
behavior modeling
motion synthesisoutput motions
analysis
synthesis
Figure 1.1: Overall structure
two fighters embedded in the example motion stream. Martial arts are performed by a
pair of players in a rather asynchronous manner. To model such asynchronous interac-
tions between the two players, we propose an asynchronously-coupled dynamic Bayesian
network. Based on the dynamic Bayesian network, our method can reproduce captured
interactions such that each action by a character is accompanied by a proper reaction by
the counterpart, and vice versa.
1.5 Overview
As illustrated in Figure 1.1, our framework for on-line motion generation consists of two
major components: motion analysis and motion synthesis. Motion analysis step is com-
posed of two tasks, motion modeling and behavior modeling. In motion modeling, unla-
beled example motion sequences of a human-like articulated figure are first decomposed
into motion segments based on the contact forces against the ground. Those motion seg-
ments are then classified into groups of actions such that the motions in the same group
share an identical structure, exploiting the biomechanical observations on footstep pat-
terns. Finally, a motion transition graph is constructed from example motion data, where
a node represents a group of motion segments of a similar structure, and an edge repre-
sents a transition between a pair of groups, respectively. In behavior modeling, we build a
motion transition model that connects motion segments according to on-line motion spec-
ifications, while adapting to the (time-varying) environment. In motion synthesis, given a
stream of motion specifications in an on-line manner, our system generates a correspond-
ing motion while traversing the motion transition graph guided by the motion transition
model. In what follows, we first explain our framework in the context of locomotive mo-
10
tions, and then generalize it for two-character motions.
Locomotive motions: In motion modeling, we construct a hierarchical motion tran-
sition graph from example locomotion sequences to incorporate a motion hierarchy and
transition nodes. The motion hierarchy represents the cyclic nature of locomotion: cyclic
motions at the coarse level such as running and walking are represented by combining
primitive actions at the fine level, each of which encodes a footstep pattern. The transi-
tion nodes are for seamless transitions among locomotive motions. In behavior modeling,
characteristics of human steering behavior are extracted from example motion data. The
extracted information includes bounds on speed and acceleration along pelvis trajectories
and details of trajectories such as positional and orientational pelvis oscillations.
In runtime synthesis, an input point stream goes through three steps for trajectory
refinement: force extraction, clamping outliers, and adding details. The first step extracts
a force profile from an input point stream. In the second step, the force profile is inte-
grated to produce a smooth vehicle trajectory, while clamping speed and acceleration at
each frame. The last step adds details to the vehicle trajectory to produce a natural hu-
man locomotion. The three steps are performed frame by frame in an on-line manner,
while referring to the information extracted during analysis.
Two-character motions: We further extends our online locomotion generation frame-
work to other footstep-driven motions, in particular, motions for martial arts performed
by a pair of players. In motion analysis, a stream of example motions is transformed into
a coupled motion transition graph in order to build an interaction model (or a motion
transition model); given a pair of single-player motion transition graphs, two nodes in dif-
ferent graphs are connected by a cross edge if the action group for one node is followed by
that for the other with a significantly-large probability. We then model the interactions
between players with a dynamic Bayesian network [16,45].
In motion synthesis, the coupled motion transition graph is traversed in an on-line
manner, possibly in accordance with a stream of motion specifications, if any, for an
avatar. While generating a motion for a character, the motion transition graph is accessed
to search for a proper reaction (or counteraction) for the other character in a probabilis-
tic manner, guided by the interaction (or motion transition) model that is built on the
Bayesian network. The next motion for the other character is coupled with the motion of
the current character in both space and time for realistic motion synthesis.
The remainder of the thesis is organized as follows: In Chapter 2, we describe how to
analyze and synthesize locomotive motions. In Chapter 3, we present how to capture the
11
interactions between two players from an example two-player motion stream. We show
results in Chapter 4 and discuss the weaknesses and limitations of our approach in Chap-
ter 5. Chapter 6 concludes the thesis with some future research topics.
12
2. Locomotion Synthesis
2.1 Motion Modeling
In this section, we propose an automatic method to construct a motion transition graph,
given unlabeled locomotion data. The locomotion data is first decomposed into motion
segments, and then these segments are classified into groups of actions of identical struc-
tures. The collection of action groups and their connectivity are mapped onto the node
and edge sets of a motion transition graph, respectively.
2.1.1 Preliminaries
We represent a captured (unlabeled) human motion M as a sequence of postures sam-
pled at discrete times called frames. The posture at each frame is described by a tuple,
(p,q1,q2, · · · ,qJ), where p ∈ R3 and q1 ∈ S3 specify the position and orientation of the
root, which is the pelvis in our case, qj ∈ S3 gives the orientation of joint j, and J is the
number of joints in M.
Motion Half Cycles : Our criteria for motion segmentation are three-fold:
1. Every motion segment should be simple enough to have an intuitive parametrization.
2. Every motion segment should be long enough to contain meaningful motion seman-
tics.
3. An important motion feature should not be split into consecutive motion segments.
Locomotive motions such as walking and running exhibit an inherent cyclic nature.
Because of this nature, locomotion cycles would be apparent candidates for segmentation
units that satisfy these criteria. However, each cycle of such a motion is composed of two
half cycles initiated by left and right footsteps, respectively. Moreover, the characteristics
of the half cycles are quite different to violate criterion 1 if they are parameterized as a
single unit. Hence, we choose motion half cycles as basic segmentation units for cyclic
motions such as walking and running.
13
Figure 2.1: The magnitude of the contact force and corresponding poses
Biomechanical Observations : For motion segmentation, we rely on biomechanical
literature [48,61] , guided by criterion 3. As illustrated in Figure 2.1, it is well-known in
biomechanics that motions such as walking and running have quite different contact force
patterns.
A running motion is composed of two stages: constrained and unconstrained stages. In
the constrained stage, one of the feet contacts the ground, which causes physical interac-
tion involving contact and friction forces. Thus, this stage has important motion features.
In the unconstrained stage, neither foot contacts the ground, thus the magnitude of the
contact force has a single local minimum in this stage. The motion segment, which is de-
lineated by the frames with a pair of consecutive peaks of unconstrained stages, forms a
half cycle of a run motion. This half cycle completely contains a constrained stage with
important motion features, thus satisfying criterion 3.
Unlike a running motion, a walking motion consists of a single constrained stage, and
thus yields a quite different contact force profile. By a similar analysis to a running mo-
tion, however, we can see that a pair of consecutive local minima (at the mid-stance of a
single foot support phase) of constrained stages delineate a half cycle containing impor-
tant motion features(at a double foot support phase).
A contact force minimum occurs near the middle of a single limb support phase of a
constrained stage or a flight phase (with no limbs supported) of an unconstrained stage.
Thus, we can identify the type of locomotion at the minimum, which will be exploited in
14
motion classification. On the contrary, critical interactions with the ground occur either
in the double limb support phase while walking or in the single limb support phase while
running, thus resulting in important motion features.
A transition motion between two different motions shares the biomechanical charac-
teristics with both of the motions. For example, the first half of a walk-to-run transition
motion resembles a walking motion and the last half resembles a running motion, as il-
lustrated in Figure 2.1. The COM further sinks down to a valley at the constrained stage
after the walking motion to store the energy for the unconstrained stage of the running
motion.
2.1.2 Motion Segmentation
The center of mass (COM) of a human performer encodes important information for seg-
menting an unlabeled motion sequence M. Let cM(t) be the COM trajectory that gives
the COM at time t, that is,
cM(t) =∑
i miri(t)∑i mi
, (2.1)
where mi and ri(t) are the mass and COM position, respectively, for link i of the artic-
ulated figure at time t. The mass of each link given in Table 2.1 is obtained based on
biomechanical data in [61,56].
motionsegment
motionsegment
Figure 2.2: Contact force and motion segmentation
In order to handle diverse, dynamic motions such as punches, kicks, and jumps as well
as moves, we exploit the contact force of the player against the ground. The acceleration
of the COM movement at frame t, denoted by aM (t), is given by the second derivative
of CM (t), that is,
aM (t) =d2CM (t)
dt2. (2.2)
Therefore, contact force F (t) can be obtained as follows
F (t) = m · aM (t) − m · g, (2.3)
15
Table 2.1: Masses for links of our articulated figurelink name mass link name mass
pelvis 1.0 chest 1.5
neck 0.5 head 1.5
left shoulder 0.2 right shoulder 0.2
left upper arm 0.4 right upper arm 0.4
left lower arm 0.4 right lower arm 0.4
left hand 0.2 right hand 0.2
left upper leg 1.0 right upper leg 1.0
left lower leg 1.0 right lower leg 1.0
left foot 0.3 right foot 0.3
where m =∑
i mi is the total mass of the player, and g denotes the acceleration by
gravity.
Without loss of generality, suppose that the motion sequence M is initiated and ended
by a pair of local minima of contact forces. Otherwise, we can cut off the initial and final
segments, each of which is a short segment. Every motion segment, which is delineated by
a pair of consecutive local minima, is not necessarily a half cycle since we need to consider
non-cyclic motions such as standing motions and transitions motions.
Let T = {t0, t1, · · · , tr} be the sequence of minima embedded in F(t). As shown in
Figure 2.2, example motion stream M is segmented at every frame ti, 1 ≤ i ≤ r, where
contact force F(ti) exhibits a local minimum. We first identify all standing motion seg-
ments by detecting every frame with a stand pose. Excluding the standing motion seg-
ments, we cut the remaining portion of M into motion segments with the sequence of min-
ima. Together with the standing motion segments, they give the set of motion segments,
S = {s0, s1, · · · , sk} such that M = s0||s1|| · · · ||sk, where || is a concatenation operator.
In principle, si, 0 ≤ i ≤ k should be a half cycle if it is neither a standing motion segment
nor a transition motion. However, this may not be true in practice due to the approxi-
mation error of the contact forces and the noise from motion capture. We will return to
this issue while classifying the motion segments in S.
2.1.3 Locomotive Motion Classification
In this section, we describe how to classify the set of motion segments, S = {s0, s1, · · · , sk}into a collection of groups of actions of identical structures so that the actions in the
16
(a) S (b) R (c) L (d) D (e) F(a) S (b) R (c) L (d) D (e) F
Figure 2.3: The representative pose of each phase
same group are blendable without artifacts. To achieve this goal, we exploit biomechanical
observations on footstep patterns.
Footstep Patterns : Footsteps characterize locomotive motion stages. In particular, a
foot contacts the ground in a constrained stage, and neither foot contacts the ground in
an unconstrained stage. The unconstrained stage has a single flight phase. However, de-
pending on the foot (feet) contacting the ground, the constrained stage is classified into
three phases: a left foot support phase, a right foot support phase, and a double support
phase. To address a standing motion, we add a stand phase. In summary, we have five
locomotion phases characterized with footstep patterns: a left foot support phase (L), a
right foot support phase(R), a double support phase(D), a flight phase(F), and a stand
phase(S), where the symbols in parentheses denote the corresponding phases. The repre-
sentative pose of each phase is illustrated in Figure 2.3.
String Mapping : Our strategy of motion labeling is to encode each motion segment
in S as a string of symbols in Σ = {L, R, D, F, S}, and to classify the motion segments
with the same string into the same group. The reason that we use the strings rather than
the motion segments is two-fold: First, we can avoid troublesome time-warping to align
the characteristic features. Second, string processing is more robust than numerical com-
putation. To assign a string to a motion segment, we first extract the footsteps from the
motion segment, employing the constraint detection scheme in [41], based on the observa-
tion that a contact foot on the ground maintains its height at the ground level and zero
17
velocity for some consecutive frames. This scheme worked so well that few footsteps are
missing in our experiments.
Given the footstep sequence, we can identify all phases in each motion segment. We
first mark the portion of the motion M containing the S phases, based on the fact that an
S phase consists of a sequence of poses with a fixed COM position that lasts longer than a
threshold time. We then detect the F and D phases to mark the remaining portion of M.
A distinct D phase occurs when both feet contact the ground for a time interval. In an F
phase, the feet are above the ground, which one might expect to be easily recognizable.
In practice, some D phases could be misclassified into F phases at times, since a foot
contact status is checked rather conservatively. To prevent this misclassification, we use
the contact force together with the foot contact status. Suppose that no foot contact the
ground for a portion of M. If the minimum of the contact force is closer to the center of
the portion than the maximum is, then this portion is identified as an F phase. Otherwise,
it is identified as a D phase. Finally, we identify the L and the R phases to further flag
the unmarked portion of M. In both phases, only one foot contacts the ground. While
scanning the motion segment, we divide it into phases and assign a symbol to each phase
to obtain a string.
We would have a total of 17 strings encoding all actions as summarized in Table 2.2.
For example, a walking motion has two strings, LDR and RDL, representing half cycles
beginning with left and right support feet, respectively. These half cycles end with right
and left support feet, respectively after a double support phase. A running motion has
two strings, FLF and FRF, representing half cycles beginning with flight phase followed
by left and right support feet, respectively. A walk-to-run transition motion also has two
strings, LDRF and RDLF, which are not half cycles. These strings show the characteristics
of transition motions; in particular, the three-symbol prefix of each string encodes a walk
motion, and the single-symbol suffix a run motion. We show a few examples of string
encodings in Figure 2.4. All strings are self-explanatory, and thus are not further described
here.
Refinement : Ideally, a motion segment would be an action, that is, a string encod-
ing a motion segment would be a string listed in Table 2.2. However, we may have un-
expected strings that encode some motion segments in S. Therefore, those strings should
be post-processed. Remember that the unlabeled motion sequence M consists of motion
segments si, 0 ≤ i ≤ k in S, that is, M = s0||s1|| · · · ||sk. Without loss of generality, the
string f(si), 0 ≤ i ≤ k that encodes a motion segment si is not in Table 2.2. Consider
a tuple, (f(si−1), f(si), f(si+1)), where f(s−1) = f(sk+1) = ε denotes an empty string.
18
(a) R D L
(b) F R F
(c) R D L F
Figure 2.4: Examples of string encodings
19
Table 2.2: The strings for actions
strings
LDR
RDL
FLFFRF
S
LDRF
RDLF
FRDL
FLDR
LDRS
RDLS
SRDL
SLDR
SLDRF
SRDLF
FRDLS
FLDRS
type code
motion type
action (ground contact status)
half cycle
WK
RN
ST
WR
RW
WS
SW
SR
RS
walk
run
stand
walk to run transition
walk to stand transition
run to walk transition
stand to walk transition
stand to run transition
run to stand transition
initial support by left foot
initial support by right footmiddle support by left foot
middle support by right foot
-initial support by left foot
initial support by right foot
final support by left foot
final support by right foot
initial support by left foot
initial support by right foot
final support by left foot
final support by right foot
final support by left foot
final support by right foot
initial support by left foot
initial support by right foot
yes
yesyesyes
nono
nono
no
no
no
no
no
no
no
no
no
strings
LDR
RDL
FLFFRF
S
LDRF
RDLF
FRDL
FLDR
LDRS
RDLS
SRDL
SLDR
SLDRF
SRDLF
FRDLS
FLDRS
type code
motion type
action (ground contact status)
half cycle
WK
RN
ST
WR
RW
WS
SW
SR
RS
walk
run
stand
walk to run transition
walk to stand transition
run to walk transition
stand to walk transition
stand to run transition
run to stand transition
initial support by left foot
initial support by right footmiddle support by left foot
middle support by right foot
-initial support by left foot
initial support by right foot
final support by left foot
final support by right foot
initial support by left foot
initial support by right foot
final support by left foot
final support by right foot
final support by left foot
final support by right foot
initial support by left foot
initial support by right foot
yes
yesyesyes
nono
nono
no
no
no
no
no
no
no
no
no
Empirically, such a motion segment si is most likely from a high-to-low speed transition
such as a run-to-walk, a run-to-stand, or a walk-to-stand transition. Specifically, a small
contact force minimum may occur while making the transition. Such a minimum would
divide a transition segment into smaller ones, resulting in invalid strings. Therefore, we
concatenate these strings by ignoring the troublesome minimum. The new string will be
f(si−1||si) or f(si||si+1), whichever is a valid string encoding a high-to-low speed transi-
tion.
An invalid string also is produced very rarely in a low-to-high transition by missing a
contact force minimum in a single foot or a flight phase. In this case, we split si into two
parts, s1i and s2
i such that f(s1i ) and f(s2
i ) are valid, by enumerating all possible cases.
After the suggested string refinement, we rarely encounter invalid strings. The remain-
ing invalid strings, if any, are discarded. Further refinement would result in a motion seg-
ment, whose contact force profile is quite different from the others with the same string,
which would later cause artifacts in motion blending.
Let B be the sequence of motion segments after refinement, that is,
B = (m0,m1, · · · ,mb) . (2.4)
M is not necessarily equal to m0||m1|| · · · ||mb since some motion segments si, 1 ≤ i ≤ k
20
in M could be discarded during refinement. However, mi, 0 ≤ i ≤ b is an action, and
thus f(mi) is a valid string. By string mapping and refinement, we have classified the
motion segments in B into a maximum of 17 groups of actions such that the actions in
each group share the same string. For future reference, we denote C as the collection of
such action groups, that is,
C = (g0,g1, · · · ,gc) , (2.5)
where gi, 0 ≤ i ≤ c < 17 is a group of motion segments.
2.1.4 Motion Parametrization
In this section, we first describe how to parameterize a motion segment and then explain
how to extract the keytimes for feature alignment.
Parametrization : A motion segment can be parameterized in different ways depending
on how it will be used. Our objective for motion parametrization is on-line locomotion
control. Therefore, motion parameters should include the information such as the type of
motion, the destination or direction of the motion, and the speed of the motion.
We identify a motion segment m in the set B with a parameter vector,
p(m) = (t(m), ft(m), v(m), az(m), ax(m)) , (2.6)
where t(m), ft(m), v(m), az(m) and ax(m) denote the motion type, the foot contact sta-
tus, the speed, the tangential acceleration and the lateral acceleration of m, respectively.
The motion type and the foot contact status are directly obtained from Table 2.2 by table
search with the string representing m. We will explain later how to derive v(m), θ(m)
and a(m).
Let {xte} be a trajectory that well characterizes the example locomotion M; we will
explain later how to extract such trajectory. We extract the parameters of motion segment
m from trajectory {xte}. The speed v(m) is defined as follows:
v(m) = ||xse||, (2.7)
where s is the start frame of m and xse is the first derivative of xt
e at time s. v(m)
characterizes the speed along xte.
Letting Tm(j) = xje
∥xje∥
, the tangential acceleration parameter az(m) is given by
az(m) = xse · Tm(s). (2.8)
21
(a)(a)(a)(a) (b)(b)(b)(b)(a)(a)(a)(a) (b)(b)(b)(b)
Figure 2.5: Motion transition graph
By incorporating az(m), forward (or backward) leaning arising from sudden speed change
is implicitly parameterized.
Finally, the lateral acceleration ax(t) is defined as
ax(m) = xse ·
((0, 1, 0)T × Tm(s)
). (2.9)
ax(m) approximates the average turning speed for m, and also reflects implicitly lateral
leaning due to turning.
Keytime Extraction : Every keytime of a motion segment is a moment at which a
motion feature occurs. The keytimes in locomotion are the moments of heel-strikes and
toe-offs, which are used to time-warp the motions to be blended for feature alignment.
These occur only at the start and end frames of every L or R phase, which has already
been identified. Thus, it is easy to detect the keytimes. Since every contact force minimum
initiates or ends a motion segment, the minima can also be considered as keytimes.
2.1.5 Hierarchical Motion Transition Graph
To reflect the cyclic nature of locomotion, we propose a motion transition graph with
two-level hierarchy as shown in Figure 2.5. The building blocks at the coarse level are
locomotive motions and nine transition motions (see Figure 2.5 (a)), while those at the
fine level are motion segments (see Figure 2.5 (b)).
Let G = (N (Nc,Nf ),A(Ac,Af )) denote a motion transition graph, where (Nc,Ac)
represents the conventional motion transition graph [46, 47, 30]. The node set N consists
22
of two type of node sets, that is, the node set Nc and node set Nf , which represent the
building blocks at the coarse level and those at the fine level, respectively. Accordingly,
the edge set A is composed of two types of edge sets, the edge set Ac and the edge set
Af , which connects their corresponding nodes, respectively.
Our strategy is first to construct the fine-level graph according to B and C as defined
in Equations 2.4 and 2.5, respectively, and then to complete the graph by adding the
structure at the coarse level as illustrated in Figure 2.5 (a).
Guided by Table 2.2, we initially construct the fine-level graph as shown in Figure 2.5.
Every action group in C gives rise to a node in Nf . If there is a pair of motion segments,
mp and mq in B such that mp ∈ gi, mq ∈ gj , and mp||mq ∈ M, then (mp,mq) is mapped
onto an edge in Af if it is not yet added.
To add the coarse-level structure composed of Nc and Ac, we first classify the fine-level
nodes in Nf into the coarse-level nodes in Nc according to Table 2.2. Then, the coarse-
level edges in Ac are obtained accordingly. Specifically for an ordered pair of coarse-level
nodes (g, h), (g, h) is an edge in Ac, if there are a pair of fine-level nodes x and y such
that x and y are respectively classified into g and h, and (x, y) is an edge in Af .
Finally we point out that our graph construction scheme can be easily extended for
multiple unlabeled motion sequences since they can be concatenated into a single sequence.
2.2 Example Motion Analysis
Although rich researches have been done in locomotion synthesis, the steering behavior of
human-like character has not been addressed well. To get a concrete feel, consider Figure
2.6. Figures 2.6 (a) and (b) show how user-specified trajectories (colored red) are differ-
ent from actual human trajectories (colored green) during straight walking and running.
The oscillations of the actual trajectories are due to the pelvis movements (rotations and
translations) caused by supporting feet. Figures 2.6 (c) and 2.6 (d) show the variations
of actual pelvis trajectories during curved walking and running. Such oscillations or cur-
vature variations are the unique characteristics of human steering behavior. Simply plac-
ing the pelvis along a user-specified trajectory would not produce a natural motion. This
immediately raises an issue: how to incorporate these characteristics into a user-specified
trajectory.
In this section, we analyze the pelvis trajectory of an example motion stream to obtain
the trajectory of a simplified vehicle and motion details along the vehicle trajectory. A
23
Figure 2.6: Pelvis trajectories
pelvis trajectory can be reproduced by adding the motion details to the vehicle trajectory.
In this sense, the trajectory analysis stage can be viewed as the inverse of the synthesis
process. We also extract bounds on speed and acceleration for clamping outliers along the
vehicle trajectory.
2.2.1 Vehicle Model
For ease of control, we abstract a human-like character as a particle with an orientation.
The particle is constrained to move on the floor. The orientation of the particle is aligned
with its velocity. Such a particle is called as a vehicle [15, 50,25].
The state of a vehicle at time t is specified by an ordered tuple, (xt, xt, θt), where
xt, xt, and θt denote the position, velocity, and orientation of the vehicle, respectively.
The orientation of a vehicle θt is defined as a unit quaternion:
θt =(h · xt
||xt||,h × xt
||xt||
), (2.10)
where h is the unit halfway vector between (0, 0, 1)T and xt/||xt||. Assuming that the
vehicle has a unit mass, the state of the vehicle evolves as follows
xt = xt−1 + f t−1;xt = xt−1 + xt−1.
24
(a) pelvis trajectory (b) vehicle trajectory
(c) L-trajectory (b) R-trajectory
(a) pelvis trajectory (b) vehicle trajectory
(c) L-trajectory (b) R-trajectory
Figure 2.7: Vehicle trajectory estimation
When xt is a zero vector, θt is not well-defined. In this case, θt is set to the most recent
valid orientation before time t.
2.2.2 Example Vehicle Trajectories
A single locomotion stream consists of one or more cyclic motions such as walking and
running together with transition motions between different kinds of motions. The whole
stream is delimited by a pair of standing motions. Without loss of generality, there is
no intervening standing motion in the stream. For a cyclic motion, the characteristics
of steering behavior are repeating at the frequency of a full cycle. We make use of the
cyclic nature of locomotion to extract a vehicle trajectory from the pelvis trajectory for
the motion stream.
25
Figure 2.8: Speed profile.
As illustrated in Figure 2.7, our basic idea is to construct a pair of curves, called L-
trajectory (colored red) and R-trajectory (colored blue), and then to average these curves
to obtain a vehicle trajectory (colored black). In order to compute L-trajectory, our scheme
samples the position of the pelvis at the first frame of every cycle (L-cycle) that is initi-
ated by a left supporting foot. An L-cycle starts at the middle frame of a left support-
ing foot phase. Let {y(ti)}ni=0 be the sequence of all pelvis positions projected onto the
ground. ti, 0 ≤ i ≤ n denotes the time (frame) at which the i-th position is sampled.
We find a non-uniform cubic spline curve interpolating {y(ti)}ni=0 with its knot sequence
{ti}ni=0 to obtain L-trajectory. We will explain in detail in the next section. R-trajectory
can be constructed in the symmetrical manner.
Let xL(t) and xR(t), t0 ≤ t ≤ tn denote L-trajectory and R-trajectory, respectively.
Both curves, xL(t) and xR(t) are parameterized with the frame numbers of the same mo-
tion stream. Moreover, the curves share the same knot values at the end points. There-
fore, their average is well-defined. Our vehicle trajectory, denoted by xe(t), is obtained
by averaging the two curves: xe(t) = 1/2 (xL(t) + xR(t)). Since both xL(t) and xR(t) are
C2-continuous, xe(t) is also C2-continuous. As shown in Figure 2.7 (b) and Figure 2.8,
the vehicle trajectory, xe(t) (colored black) is a smooth, has little oscillations and curva-
ture variations, and lies between L-trajectory xL(t) and R-trajectory xR(t).
Curve Fitting Given a sequence of control points {y(ti)}ni=0 and the knot sequence
{ti}ni=0, we describe how to construct a piece-wise cubic non-uniform spline curve with
C2 continuity [9]. Since there are n + 1 control points, we have n cubic polynomial curve
26
segments
yi(t) = ai + bit + cit2 + dit
3, i = 0, 1, ..., n − 1,
for t ∈ [0, (ti+1 − ti)]. We have to determine 4n unknown coefficients. From the C2 con-
tinuity conditions, we have 4(n − 1) linear equality equations:
yi−1(ti − ti−1) = yi(0),
y(ti) = yi(0),
y′i−1(ti − ti−1) = y′
i(0), (2.11)
y′′i−1(ti − ti−1) = y′′
i (0),
for all interior points y(ti), i ∈ 1, 2, · · · , n − 1. Moreover, the curve interpolates the end
points y(t0) and y(tn+1):
y0(0) = y(t0),
yn−1(tn − tn−1) = y(tn). (2.12)
Thus, we obtain additional two equations. Since a motion stream is delimited by a pair
of standing motion, the first derivatives at the end points vanishes, which yields two more
equations:
y′0(t + 0) = 0 and y′
n−1(tn − tn−1) = 0 (2.13)
Using 4n equality equations in Equations (2.11), (2.12), and (2.13), we can determine 4n
unknown coefficients.
2.2.3 Motion Details
Choosing the center of the pelvis as the root, the pose of a character at every frame i
is represented as a tuple(pi
0,qi0,q
i1, · · · ,qi
m
), where pi
0 and qi0 respectively denote the
position and orientation of the root at frame i, and qij , 1 ≤ j ≤ m is the orientation of
joint j at frame i. All orientations are specified in unit quaternions. Given the joint con-
figuration(pi
0,qi0,q
i1, · · · ,qi
m
), the details of the pelvis trajectory consist of the position
and orientation displacements between the pelvis (root) and the vehicle at all frames.
In order to represent the orientation displacement at each frame i, the pelvis orienta-
tion qi0 at frame i is decomposed into two parts, that is, the rotational part about the
vertical axis denoted by qivert and the remainder qi
offset such that qi0 = qi
vert · qioffset.
Here, qioffset are expressed in a coordinate invariant manner. The vertical orientation dis-
placement between the pelvis and the vehicle is given by dqivert = qi
vert ·(θi
e
)−1 Therefore,
θie =
(dqi
vert
)−1 · qi0 ·
(qi
offset
)−1
27
or
qi0 = dqi
vert · θie · qi
offset.
Thus, the orientation displacement is characterized by(dqi
vert,qioffset
).
The position displacement between the pelvis and the vehicle at frame i is given by
pi0 − xi
e where xie = xe(i). We represent it in the local coordinate of the vehicle for effec-
tively adding details to the vehicle trajectory:
(0, dxie)
T =(θi
e
)−1 · (0,pi0 − xi
e)T · θi
e,
where dxie is the position displacement in the local coordinate frame. In summary, the
motion details at each frame i is given by(dxi
e, dqivert,q
ioffset
).
Now, a final remark is in order: When a example motion stream is segmented, the
extracted vehicle trajectory and the motion details are also segmented accordingly. Thus,
every motion segment has its own vehicle trajectory and motion details.
2.2.4 Speed and Acceleration bounds
A raw captured vehicle trajectory suffers from undesirable characteristics: lack of smooth-
ness, unrealistically-large speed and acceleration, and sudden changes in the force profile.
In this section, we describe how to learn bounds on speed, acceleration, and inter-frame
acceleration changes from example motion data to clamp outliers for later trajectory re-
finement.
Interframe Acceleration Variations: In our model, a vehicle is an oriented particle
with a unit mass. Therefore, xte = f t for all t, where ft is the force that is applied to the
particle at time t. The acceleration difference between a pair of consecutive frames reflects
an instantaneous force change, which should be bounded within human ability, that is,
∥ xte − xt−1
e ∥≤ uf .
The bound uf for a group of example motion segments is obtained by taking the maximum
magnitude of acceleration differences between consecutive frames for all motion data in the
group.
28
0 1 2 3 4
0.2
0.1
0
-0.1
-0.2
2 3 4
0.2
0.1
0
-0.1
-0.2
ex& ex&
xa za
0 1 2 3 4
0.2
0.1
0
-0.1
-0.2
2 3 4
0.2
0.1
0
-0.1
-0.2
ex& ex& ex& ex&
xa za
Figure 2.9: Acceleration bounds - RUN.
0.08
0.04
0
0.04
0.08
0 0.5 1 1.5 2 2.5 3 ex& ex& 0.5 1 1.5 2 2.5 ex& ex&
0.08
0.04
0
0.04
0.08
xa za
Figure 2.10: Acceleration bounds - WALK.
29
Acceleration Bounds: Since a vehicle is constrained to move on the floor, an acceler-
ation xe has two components : a tangential component az and a lateral component ax,
where
az = xe · (xe/||xe||) , and
ax = xe ·[(0, 1, 0)T × (xe/||xe||)
]. (2.14)
Since xte = f t, xt
e is bounded for all t. Thus, both ax and az is bounded. For cyclic
motions such as walking and running, the two components show quite different behaviors
with respect to the vehicle speed ||xe|| as shown in Figure 2.9 and 2.10. The tangential
component is bounded within a fixed interval that contains the average regardless of ||xe||,while the variations of the lateral component are amplified as ||xe|| increases. ax represents
a centripetal acceleration when the vehicle moves along a circular path. For such a circular
motion, the turning speed is given by ax/||xe||. Since the turning speed is not arbitrarily
large, az is linearly bounded by ||xte||. This supports our intuition that pelvis rotations
result in motion details along the vehicle trajectory.
Based on this observation, we estimate different bounds on these components:
az − cz ≤ atz ≤ az + cz for all t, and
|atx| ≤ cx · ||xt
e|| for all t.
Here, az is the average of the tangential components for all frames in the example motion
data. cz and cx are chosen so that 90% of the tangential and lateral components lie within
the bounds.
For transition motions such as standing-to-walking and walking-to-running, the charac-
teristics of a motion at the both extreme frames reflect different types of motions. Thus,
we estimate the bounds of the components at the both extremes and interpolate the re-
sults to obtain the time-varying bounds.
Speed bounds: For cyclic motions such as walking and running, the speed variations
over time is insignificant, assuming that motion segments are short, for example, spanning
a half cycle [34] or a full cycle [46, 43]. In this case, speed variations could be linearly
approximated even for transition motions. Thus, both upper and lower speed bounds for
a group of motion segments are estimated at the initial and final frames, from which the
time-varying speed bounds over all frames are obtained by linear interpolation. At the
initial frame, the speed bounds can be acquired by finding the minimum and maximum
speeds among all initial frames of motion segments in the group. Those at the final frame
can also be computed symmetrically.
30
Figure 2.11: Block diagram
2.3 Locomotion Control
In this section, we describe how to refine a stream of raw-sampled 2D positions to ob-
tain a human trajectory. The 2D positions are sampled at the rate of 30Hz in an on-line
manner with an interactive input device such as mouse. As illustrated in Figure 2.11, an
input point stream goes through three steps for trajectory refinement: force extraction,
clamping outliers, and adding details. The first step extracts a force profile from an in-
put point stream. When a force profile is given as the input, this step is skipped. The
rational for force extraction is that smoothness of a trajectory is easy to achieve with a
force profile as long as it is continuous. In the second step, the force profile is integrated
to produce a vehicle trajectory, while clamping speed and acceleration at each frame. The
last step adds details to the vehicle trajectory to produce a natural pelvis trajectory. The
three steps are performed frame by frame in an on-line manner, while referring to the in-
formation extracted during analysis. Since the example motion data are decomposed into
a motion segments which were classified into groups according to their logical similarity,
and parameterized by a motion type and the speed and acceleration at the initial frame,
we can determine the sequence of local joint configurations of a user-prescribed motion
segment immediately after first trajectory position is sampled.
Force Extraction: Without loss of generality, we use a mouse to sample a point stream.
We convert the raw-captured point stream into a force profile for smooth trajectory con-
struction. Let x0 be the vehicle position at the last frame of the previously-synthesized
motion segment. When x0 is not available (the first motion segment), the user should
sample an extra position x0. Suppose that cursor position St at time t has just been
sampled. Then, St is transformed to the force that is to be exerted to the vehicle model
31
for a character. We employ a spring-damper model to transform St to a force:
f t = α(St − xt−1) − βxt−1.
Coefficient α and β are chosen empirically.
Clamping Outliers: We now describe how to obtain a vehicle trajectory in an on-line
manner by force integration while clamping the inter-frame acceleration change, acceler-
ation, and speed. As defined in Section 2.2.1, the state of the vehicle at time t is spec-
ified by (xt, xt, θt), where xt, xt, and θt denote the position, velocity and orientation of
the vehicle. The vehicle trajectory is fully specified by a sequence of vehicle states at all
frames. Again, by the unit mass assumption, the acceleration of the vehicle at frame t is
given by xt = f t. Thus, unclamped acceleration xt at each frame is trivially obtained at
every frame t.
Suppose that we are now at frame t. The inter-frame acceleration change is clamped
and added to xt−1, using the corresponding bound that is estimated in Section 2.2.4. That
is,
xt ← xt−1 + clamp(||xt − xt−1||) · xt − xt−1
||xt − xt−1||.
The tangential and lateral components of xt are clamped using their respect bounds at
frame t and the results are added to further refine xt:
xt ←clamp(atx) ·
((0, 1, 0)T × xt−1
||xt−1||
)+clamp(at
z) ·xt−1
||xt−1||.
From xt, we derive xt and clamp it:
xt ← xt−1 + xt∆t,
xt ← clamp(||xt||) · xt/||xt||,
where ∆t is the inter-frame time increment. The position of the vehicle at time t is given
by
xt ← xt−1 + xt∆t,
and its orientation θt is computed using Equation (2.10).
Adding Details: Provided with vehicle state (xt, xt, θt), we describe how to add motion
details to vehicle position xt and orientation θt in an on-line manner to convert them to
32
a human pelvis configuration, (pt0,q
t0), where pt
0 and qt0 denote the pelvis position and
orientation, respectively. In order to do this, a motion prescription is made by combining
a user-provided motion type with vehicle speed ||x1|| and vehicle acceleration x1 at the
first frame (frame 1).
Given the motion prescription, a sequence of local joint configurations are generated
at the first frame by blending example motion segments. Therefore, the number of frames
for the prescribed motion is fixed at the first frame, say k. The motion details for the cor-
responding local poses are also blended with the system. With this information obtained,
the task of adding details is performed frame by frame.
Let (dxte, dq
tvert,q
toffset), be the motion details at frame t, 1 ≤ t ≤ k. Then, the pelvis
orientation at frame t is derived by applying dqtvert and qt
offset to vehicle orientation θt:
qt0 = dqt
vert · θt · qtoffset.
In order to compute pelvis position pt0, let
(w,v) = θt(0, dxte)
T (θt)−1,
where v is a 3D vector. Then,
pt0 = xt + v
Together with local joint configuration (q1, · · · ,qm), the refined pelvis configuration (pt0,q
t0)
fully specifies human pose, (pt0,q
t0,q
t1, · · · ,qt
m) at frame t.
Motion Blending We adapt the framework of on-line motion blending in [47,30] to gen-
erate an action at a fine-level node, given a parameter vector. Based on multidimensional
scattered data interpolation, this framework is composed of four parts: parametrization,
weight computation, time-warping, and posture blending.
In parametrization, the motion segments that are assigned to an fine-level node are
placed at the points in the parameter space defined by three numerical parameters: speed,
tangential acceleration, and lateral acceleration. Provided with the motion prescription(t(m), ft(m), ||x1||, x1 · x1
||x1|| , x1 ·
((0, 1, 0) × x1
||x1||
)), the corresponding fine-level node is
visited to compute the contribution of each motion segment stored in this node. We use
a scheme of geostatistical motion interpolation [43] instead of that of scattered data inter-
polation based on cardinal basis functions [47,30]. The latter have exhibited quite unsta-
ble behavior, since many motion segments at each fine-level node of a cyclic motion have
similar parameter vectors.
We adopt the incremental time-warping scheme in [47] to align the motion features of
the basic movements to be blended. The keytimes have been extracted in Section 2.1.4.
33
Finally, the time-warped movements are blended using the incremental posture blending
scheme [47]. The motion details for the corresponding local poses are also synthesized by
blending.
The synthesized movement is adapted to the environment in an on-line manner to pre-
vent foot sliding and penetration, employing an on-line motion retargeting scheme [54] af-
ter computing the target foot position at each frame by blending the foot positions of the
basic movements using the technique given in [47].
Motion Stitching Although our motion synthesis scheme produces a sequence of local
joint of configurations of high quality at every fine-level node, some jerkiness may be ob-
served at knots between adjacent synthesized sequences since these are blended separately.
Adapting the scheme in [24], we address this problem: the two sequences are stitched
seamlessly at the knot while propagating the error only forward for online motion synthe-
sis.This guarantees the C1 continuity for each degree of freedom.
34
3. Two-Character Motion Synthesis
In this section, we describe how to model a two-player example motion stream as a cou-
pled motion transition graph, which is a generalization of Chapter 2 in terms of both the
structure and the coverage of motions. The issues in motion modeling are three-fold: mo-
tion segmentation, motion classification, and graph construction. For motion segmenta-
tion, we apply the method presented in Chapter 2 for each individual motion. We discuss
each of the remaining issues in a different section.
3.1 Two-Character Motion Classification
Our focus is on two-player motions in standing-up martial arts such as Kickboxing, Karate
and Taekwondo. The players exchange a combination of punches and kicks while dodging
the opponent’s attacks. The punches are delivered by using the front parts of the tightly
clenched fists. The kicks are delivered by using the parts near the ankles or the knee
bones. We do not deal with grappling or throwing motions which are observed in Judo
and wrestling. The players are not allowed to attack the fallen-down opponents.
Given a collection of motion segments, S = (s1, s2, ..., sr), motion classification assigns
a label to each motion segment, si, 1 ≤ i ≤ r, which is a primitive action. Thus, motion
classification can be viewed as a mapping from S to a finite set of action labels such that
S can be classified into disjoint action groups.
To obtain a complete motion taxonomy is hard, if not impossible, even for an expert
of martial arts. Our objective of motion classification is to provide a flexible way of classi-
fying motions that can be adapted easily to various applications. For our purpose, the re-
quirements for motion classification are two-fold: controllability and accessibility. By con-
trollability, we mean that our classification scheme allows the user to specify a desired mo-
tion in an on-line manner. By accessibility, we mean that a desired example action can
be accessed efficiently. To satisfy these requirements, the motion segments in the same
action group share not only structural similarity but also functional similarity.
Our motion vocabulary consists of seven words, each of which corresponds to a motion
or a motion aspect: left-punch, right-punch, left-kick, right-kick, jump, react, and move.
The last word move is used to encode the footstep pattern of an action, which is an im-
portant aspect of a motion. The first six words are used to specify a motion for interac-
35
heightforce
time
time
(a) height profile
(b) force profile
kick-up kick-downheightforce
time
time
(a) height profile
(b) force profile
heightforce
time
time
(a) height profile
(b) force profile
kick-up kick-down
Figure 3.1: Jump kick motion
Table 3.1: Action repertoire tablemotion aspects action qualifiers
left-kick up, down, downup, others
right-kick up, down, downup, others
left-punch fwd, bwd, bwdfwd, fwdbwd, others
right-punch fwd, bwd, bwdfwd, fwdbwd, others
jump up, down, others
react (to) kick, punch, others
move L-L, L-R, L-F, R-L, R-R, R-F, F-L, F-R, F-F
tive applications. The choice of a vocabulary determines the level of user-control. There
also are composite motions such as kicking while jumping and kicking while punching.
In such a case, more than one words are needed to describe a motion. A motion is rep-
resented as a sequence of motion phases, each of which corresponds to a primitive action
(or a motion segment). To characterize an action, we use a set of additional qualifiers for
each word as shown in Table 3.1.
In order to annotate the motion segments, in principle, we follow the semi-automatic
approach suggested by Arikan et al. [5]. Based on a support vector machine (SVM), they
were able to successfully annotate motion streams of American football. Our approach
is different from theirs in three aspects. First, our approach is complimented by rule-
based classification. In particular, rule-based classification [44, 34] is adopted for the last
36
right-kick up down up downleft-kick up down
right-punch fwd bwdleft-punch
jump up down
jump kick multiple kicks followed by a punch
Figure 3.2: Action annotations: All empty boxes denote the “others” class.
two motion aspects in Table 3.1. Second, we employ multi-class support vector machine
(MSVM) classifiers [18] rather than binary SVM classifiers to effectively handle multi-
ple, mutually-exclusive action qualifiers within each motion aspect. Finally, we annotate
motion streams at the granularity of actions rather than poses, to effectively synthesize
motions while skipping time-consuming optimization. An action is annotated with a label,
which is 7-dimensional vector l = (l1, l2, ..., l7) in our case. Each element of the vector, li,
1 ≤ i ≤ 7 corresponds to an action qualifier for a motion aspect, and takes on an integer
value, that is, li ∈ {1, 2, ..., ni}, where ni is the number of qualifiers in motion aspect i.
We handle the first five motion aspects in Table 3.1 with MSVM-based classification and
the last two with rule-based classification.
3.1.1 MSVM-based Classification
MSVM classifiers evolve from binary classifiers based on SVMs, and thus both types of
classifiers share similar user interface. Given a sequence of motion segments, the user an-
notates manually a small portion of motion segments as training data, and then an MSVM
classifier automatically takes care of the rest of the segments. To apply an MSVM classi-
fier, a feature vector is extracted from each motion segment. For every motion segment, a
set of N poses of a player is sampled at equally-spaced time instances including the start
and end times of the motion segment to form a feature vector. In our experiments, N=10
works well. We use a public domain library LIBSVM [13] for semi-automatic motion an-
notation.
For motion classification, a human pose is represented by vector, Q = (p0, q0,p1, ...,pm),
where p0 and q0 are the root position and orientation, respectively, and pi, 1 ≤ i ≤ m is
the position of joint i. The root position and orientation of a sampled pose is given by its
horizontal translation and vertical rotation with respect to the root position and orienta-
tion at the previous sample. We choose the pelvis as the root. Joint position pi, 1 ≤ i ≤ m
is specified in the local coordinate frame with respect to the root at the current sample.
37
Our human model has 17 joints, that is, m = 17.
Kick classifiers: Our segmentation scheme divides a kicking motion into two consecu-
tive actions, “kick-up” and “kick-down”, as shown in Figure 3.1. Two consecutive kicks
consist of three parts. The first and last parts are kick-up and kick-down, respectively.
The middle part is an action “kick-downup”. Multiple consecutive kicking motions are
best described with regular expressions:
(up)(downup)∗(down),
where * denotes zero or more instances of the action in the parenthesis. We annotate
motion segments that are not any of primitive actions for the kicking motions as “others”.
Punch classifiers: Our segmentation scheme shows different behaviors depending on
whether body weight is exerted on the punch or not. In the former case, the character
utilizes the body rotation as well as weight shift to maximize the speed of the end-effector.
In this case, a punch consists of two consecutive motion segments, one with body rota-
tion coincident with arm rotation, and one with body rotation reversed for retraction. We
annotate those two motion segments as “fwd” and “bwd”, respectively. Two consecutive
punches consist of three parts: “fwd”, “bwdfwd” and “bwd” analogously to kicking mo-
tions. In the latter case, a faster but weaker attack is made by straightening and retract-
ing an arm. We annotate such actions as “fwdbwd”. Combining these facts, punches can
be described using the following regular expressions.
(fwd)(bwdfwd)∗(bwd) ∨ (fwdbwd)
Jump classifier: A jumping motion consists of two actions, that is, “jump-up” and
“jump-down” each of which has an interaction with the ground: one for jumping and one
for landing. The rest of the motion segments are annotated with “others”.
3.1.2 Rule-based Classification
The remaining two motion aspects, react and move are hard to annotate with our MSVM.
The feature vector of a motion segment does not contain any information on the corre-
sponding action of the opponent. Thus the action qualifiers in the react motion aspect
are difficult to discriminate. With the feature vectors alone, our MSVM has often got
confused to identify the footstep patterns. We develop simple classification rules to cope
with these difficulties.
38
React classifier: A player blocks the opponent’s attack or counter-attacks the oppo-
nent. Failing to block the opponent’s attack allows a hit by the opponent, resulting in
balance loss. An action is regarded as a reaction if the opponent’s corresponding action
is an attack. An attacking action has one of the first four motion aspects in Table 3.1.
Thus, we have three action qualifiers: “kick,” “punch,” and “others.” Considering the op-
ponent’s actions, we use the following straight forward rules for the react motion aspect:
react-kick =o.kick-up ∨ o.kick-downup,
react-punch =¬react-kick∧
(o.punch-fwd ∨ o.punch-bwdfwd ∨ o.punch-fwdbwd) ,
others =¬react-kick ∧ ¬react-punch.
where prefix o denotes the opponent. The actions of the opponent only include those that
actually make the opponent’s fists and feet move toward the player. Kicking has priority
over punching since the former conveys more impact in general.
Move classifier: The move classifier is used to identify the footstep pattern of an action,
in particular, the first and last foot stances for smooth action stitching. If a foot touches
the ground at the end of the current action, the same foot should touch the ground at
the beginning of the next action. Inspired by Kwon and Shin [34], we encodes an action
using symbols, L, R, and F denoting the left supporting foot, the right supporting foot,
and no supporting foot (a flight phase), respectively. We remove the double foot stance
by identifying it as L or R, whichever is the closer neighbor of this foot stance in the
example motion stream. The resulting footstep patterns are: L-L, L-R, L-F , R-L, R-R,
R-F , F -L, F -R and F -F . The initial and final symbols of each string represent the first
and last foot stances of the corresponding action, and symbol “-” in the middle means
that the foot stances in the middle, if any, are ignored.
3.2 Interaction Modeling
In this section, we describe how to capture the interactions between two players from
an example two-player motion stream. More specifically, provided with a coupled motion
transition graph, we explain how to train a motion transition model, considering the in-
teractions between the two players. We employ a dynamic Bayesian network for this pur-
pose, assuming that the example motion stream has been transformed to the coupled mo-
tion transition graph.
39
(a)
(b)
s1i-1 s1
i s1i+1
s2j-1 s2
j s2j+1
s1i-1 s1
i s1i+1
s2j-1 s2
j s2j+1
s1i-1 s1
i s1i+1
s2j-1 s2
j s2j+1
s1i-1 s1
i s1i+1
s2j-1 s2
j s2j+1
Example motion stream Linked List
Figure 3.3: Basic data structure B
edges cross edges linksB
G1 G2C
Figure 3.4: A node of G1 (or G2) has a set of links pointing to the actions in B
Coupled Motion Transition Graph We represent a two-player example motion stream
as a coupled motion transition graph denoted by C = (B,G1,G2,R,C), where B, G1,
G2, R, and C are a basic data structure, a motion transition graph for player 1, that for
player 2, an action table, and a set of cross edges, respectively. In what follows, we will
define each of these in sequence.
A single-character motion stream can be regarded as a sequence of actions stitched
along the time axis, as illustrated in Figure 3.3. The two players perform their own se-
quences of actions in a rather asynchronous manner. In other words, actions performed
by a player are partially overlapped in time with those by the other. We adopt a linked
list for our basic data structure B to capture this relationship. The node representing an
40
action of a player in general has two links pointing to the next action of the same player
and the other player’s action (reaction), respectively. Our motion transition graph is con-
structed on top of this data structure.
Gi = (Vi,Ei), i = 1, 2 is the motion transition graph for each player i, in which Vi
and Ei denote the node set and the edge set, respectively. A node vj in Vi represents
a group of example actions which share an identical label (annotation vector) lj . A pair
of nodes vj and vk in Vi are connected by directed edge ejk in Ei, if the action group,
denoted by vj , is followed by an action group, denoted by vk, with a transition probability
that is significantly large (see Section 3.2 for transition probability estimation). As shown
in Figure 3.4, a node of a graph has a set of pointers to the nodes in B, each of which
provides a link to a member of the corresponding action group.
The action table R is used to store Table 3.1 together with the links to individual
motion transition graphs, G1 and G2. Every action qualifier of a motion aspect points to
a set of nodes (action groups) of Gi, i = 1, 2, that share the same motion aspect and also
the same action qualifier. This table allows the user to prescribe a motion in an intuitive
manner.
Finally, the set C consists of all cross edges combining two single-player motion tran-
sition graphs, G1 and G2 as shown in Figure 3.4. A pair of nodes vj and vk in different
graphs G1 and G2 are connected by a cross edge in C if one is followed by the other with
a significantly large transition probability (see Section 3.2).
Dynamic Bayesian Network A Bayesian network is a useful model for representing
the causal relationships among random variables as demonstrated in various applications
such as image processing and speech/gesture recognition and synthesis [45]. As illus-
trated in Figure 3.3, the action of a player is determined by his/her own current action
together with that of the other player. Let Aci be the ith action of player c, c = 1, 2 for
i = 1, 2, · · · , n. Suppose that players 1 and 2 have been performing actions A1i and A2
j ,
respectively, and that player 1 changes his/her action before player 2 does. It is natural to
consider that a player gathers the information on the opponent via observations, and then
fuses this information with what the player possesses or self-observes on himself/herself, in
order to finally determine the player’s next action. Thus, the next action Aii+1 of player
1 depends on the current actions of players 1 and 2, that is, (A1i ,A
2j ). The next action
A2j+1 for player 2 can be chosen in a symmetrical manner.
As shown in Figure 3.5, these two types of motion transitions give the temporal or-
der that specifies the non-reversible direction of causal relationship among the actions of
the two players. We represent a two-character motion stream as a series of such transition
41
transition to transition to
(a) Building blocks (b) Examples
Figure 3.5: Bayesian network
player 1
player 22COM
( )21 COMCOM2
1 +
1COM
O
2A j
11A +i1A i
Figure 3.6: Reference coordinate frame
42
patterns, which can be best illustrated with a dynamic Bayesian network. Each transition
type gives rise to a building block of the Bayesian network. The building blocks (see Fig-
ure 3.5(a)) are combined dynamically to yield a variety of network structures, with which
we capture dynamic, stationary, and asynchronous characteristics of two-character motion
streams (see Figure 3.5(b)).
Formulated as a motion synthesis problem, our goal is to estimate conditional proba-
bility P (A1i+1|A1
i ,A2j) for predicting A1
i+1. In order to estimate the conditional probabil-
ity, we first parameterize an action, exploiting its characteristic features, in particular, the
action label (type) l = (l1, l2, ..., ln) obtained in Section 3.1 and the first and last poses
of the action, denoted by Qs and Qe, respectively. More specifically, the parameters of
the action consists of two vectors, (l,n), where l is the action label as mentioned above,
and n is derived from poses Qs and Qe by applying principal component analysis (PCA).
Notice that each element of l takes on a discrete value, while that of n has a continuous
value and that pose Q is specified by vector (p0, q0,p1, ...,pm).
We represent both Qs and Qe in a coordinate-invariant manner. For the root position
and orientation (p0 and q0), we build a local coordinate frame: The origin of the coordi-
nate frame is set to the average of the COMs of the players that are projected onto the
ground (see Figure 3.6). The vector that points from the COM of the player to perform
the next action toward the COM of the opponent is chosen as the z axis, the vertical
up-direction as the y axis, and the x axis as the cross product of the former two axes.
This coordinate system is convenient to abstract the geometric setting for the players. p0
and q0 are set to the horizontal translation of the root joint and its vertical rotation with
respect to the origin, respectively.
Joint positions pi, 1 ≤ i ≤ m are given in yet another local coordinate frame with
respect to the root joint of each player to represent their displacements from the root.
Let Q be the pose Q excluding the root position and orientation. We apply PCA to Qs
and Qe to extract two 3-dimensional vectors, ps and pe, respectively. These cover about
60% of the variations of data in our experiments. Finally, the 12-dimensional parameter
vector n of data is obtained: n = ((ps0, q
s0, p
s), (pe0, q
e0, p
e)). For notational convenience,
we use action A and its parameters (l,n), interchangeably, that is, A = (l,n).
Now, we factorize the conditional probability of the next action A1i+1 into three terms
by rearranging the discrete and continuous parameters as follows :
P (A1i+1|A1
i ,A2j ) = P (l1i+1,n
1i+1|l1i ,n1
i , l2j ,n
2j )
≈ γ · P (l1i+1|l1i , l2j ) × P (n1i ,n
2j |l1i+1) × P (n1
i+1|l1i+1,n1i ,n
2j ) (3.1)
for some positive constant γ. Before showing detailed derivation, we first briefly introduce
43
the idea of Equation (3.1).
For motion synthesis (Section 3.4), Equation (3.1) is used to estimate the conditional
probabilities of the candidates for the next action. Intuitively, the first term (excluding γ)
encodes a logical relationship between actions, that is, how the label of the next action of
a player is likely to be chosen based on the current action labels of the players. This term
is estimated using multinomial distributions (Section 3.3 A). The second term reflects the
geometric setting of the players for the next action; for instance, the two characters are
more likely to be distant if a kick action is chosen rather than a punch. We compute this
term, assuming that the numerical parameters in n1i and n2
j are from Gaussian distribu-
tions (Section 3.3 B). The third term represents the relationship between numerical pa-
rameters, given a candidate for the next action label, which guards against motion discon-
tinuity. Through a preliminary experiment, we have observed that each of scalar elements
of n1i+1 is non-linearly related to n1
i and n2j . We employ Gaussian processes for non-linear
regression to capture these dependencies (Section 3.3 C).
Transition probability computation We explain how to derive Equation (3.1) for
player 1. The counterpart for player 2 can be obtained in a symmetrical manner. As the
action A1i+1 has both discrete and continuous parameters, we decompose the probability
into two parts by the conditional chain rule P (X,Y |M) = P (X|M) × P (Y |X,M):
P (A1i+1|A1
i ,A2j) = P (l1i+1,n
1i+1|l1i ,n1
i , l2j ,n
2j )
= P (l1i+1|l1i ,n1i , l
2j ,n
2j ) × P (n1
i+1|l1i+1, l1i ,n
1i , l
2j ,n
2j ). (3.2)
We handle each of the conditional probabilities separately. First, let us consider the
discrete part. By applying the conditional Bayes rule, P (Y |X,M) = P (X|Y,M)×P (Y |M)P (X|M) , we
have
P (l1i+1|l1i ,n1i , l
2j ,n
2j ) =
P (l1i+1|l1i , l2j ) × P (n1i ,n
2j |l1i , l2j , l1i+1)
P (n1i ,n
2j |l1i , l2j )
. (3.3)
The denominator in the righthand side of Equation (3.3) is regarded as a positive con-
stant 1/γ since it has already been fixed when action A1i+1 is to be chosen. The second
term of the numerator is approximated to P (n1i ,n
2j |l1i+1) to avoid the bias due to sparse
data. A justification of this approximation is that l1i+1 is affected by both l1i and l2j to
reflect these. Thus, the discrete part of Equation (3.2) is approximated as follows:
P (l1i+1|l1i ,n1i , l
2j ,n
2j ) ≈ γ · P (l1i+1|l1i , l2j ) × P (n1
i ,n2j |l1i+1), (3.4)
44
By a similar argument, we approximate the continuous part to
P (n1i+1|l1i+1, l
1i ,n
1i , l
2j ,n
2j ) ≈ P (n1
i+1|l1i+1,n1i ,n
2j ), (3.5)
by dropping l1i and l2j . Combining Equations (3.4) and (3.5), we obtain Equation (3.1).
Estimation In the current section, we describe how to estimate the conditional proba-
bilities on the righthand side (the second line) of Equation (3.1).
In order to learn each term, we collect training data from the example motion stream.
More specifically, we assemble the training data set while traversing basic data structure
B (see Figure 3.3). For player 1, the training set is D1 = {(A1h,A2
g,A1h+1) for all h and
g}, where (A1h,A2
g,A1h+1) represents an action transition from A1
h to A1h+1 for player 1,
when player 2 is performing action A2g. The training set D2 for player 2 is symmetrically
formed. Remember that A = (l,n) for an action A as explained in Section 3.2.
For the first term, we could estimate conditional probability P (l1i+1|l1i , l2j) directly by
scanning D1. However, this would yield unreliable, overfitting probabilities because of
sparse training data. Instead, we approximate this probability by adopting the idea from
factoring techniques in [12,59] as follow:
P (l1i+1|l1i , l2j ) ∝ P (l1i+1|l1i ) × P (l1i+1|l2j ). (3.6)
Probabilities P (l1i+1|l1i ) and P (l1i+1|l2j ) are further factored as followes:
P (l1i+1|l1i ) ∝7∏
m=1
7∏k=1
P ((lk)1i+1 | (lm)1i ),
P (l1i+1|l2j ) ∝7∏
m=1
7∏k=1
P ((lk)1i+1 | (lm)2j ). (3.7)
where (lk)1i+1, 1 ≤ k ≤ 7 denotes the kth element of action label (l)1i+1. (lm)1i and (lm)2j ,
1 ≤ m ≤ 7 are interpreted similarly.
We estimate probabilities P ((lk)1i+1|(lm)1i ) and P ((lk)1i+1|(lm)2j ), with multinomial dis-
tributions based on a Dirichlet prior (see Section 3.3 A.). The resulting probabilities P (l1i+1|l1i )and P (l1i+1|l2j ) should be normalized [12].
We precompute the probabilities P (l1i+1|l1i , l2j ) for all tuples (l1i+1, l1i , l2j ) extracted from
D1 and store these probabilities as a table in the node of the player 1’s motion transition
graph with action label l1i .
45
For the second term P (n1i ,n
2j |l1i+1), we factor the covariance matrix for a Gaussian
distribution employing the method in [22], again to avoid the bias due to sparse data.
More specifically, a full covariance matrix is decomposed into a decorrelating transform
matrix and a diagonal covariance matrix. The decorrelating matrix is shared by all the
nodes in a graph, whilst each node maintains its own diagonal covariance matrix. The
probability density function of each node is modeled as a Gaussian distribution function.
The parameters of the function are stored in the node of the motion transition graph with
action label l1i+1. We estimate P (n1i ,n
2j |l1i+1) based on this information (see Section3.3 B.).
For the third term, we adopt Gaussian process regression (see Section3.3 C.). Specif-
ically, we compute the conditional Gaussian distribution for each scalar element of n1i+1
that has not been fixed. Notice that n1i+1 = ((ps
0, qs0, p
s), (pe0, q
e0, p
e)) as defined in Sec-
tion 3.2. Since (ps0, q
s0, p
s) is given directly from the previous action of player 1, regression
is performed only for the (pe0, q
e0, p
e). We use a public-domain library TPros to estimate
the conditional probability distributions [23]. Again, we store the estimated hyperparam-
eters together with the mean and the inverse covariance matrix of each training data set
at the node with l1i+1.
3.3 Background on Underlying Statistical Models
A. Multinomial distributions The multinomial distribution is a generalization of the
binomial distribution, where each trial results in one of k possible outcomes, with proba-
bilities ΘM = (θ1, θ2, · · · , θk) where∑
i θi = 1 and θi ≥ 0 for all i. Of interest is predicting
each outcome: Given a training set {xn} that consists of the outcomes of N independent
draws x1, x2, · · · , xN from an unknown multinomial distribution, we estimate probability
of the next outcome x∗. The Bayesian estimate of this probability is
P (x∗|{xn}) =∫
P (x∗|ΘM)P (ΘM|{xn})dΘM. (3.8)
Under the uniform prior assumption the Bayesian estimate is reduced to
P (XN+1 = i|{xn}) =Ni + 1∑j (Nj + 1)
, (3.9)
where Ni is the number of observations of outcome i in the training set. This is a
special case of the Dirichlet prior for the multinomial distributions where all hyperparam-
eters are equally set to one [17].
46
We show how to estimate P ((lk)1i+1|(lm)1i ) given D1 = {(A1h,A2
g,A1h+1) for all h and g}
(see Equation (3.7)). P ((lk)1i+1|(lm)2j ) can be estimated in a similar manner. We extract
{xn} from D1 by ignoring A2g for all g and numerical parameters n1
h and n1h+1 for all h.
More precisely, we first contruct DM = {(a, b)|a = ((lm)1h), (a, b) = ((lm)1h, (lk)1h+1) for all h}and then obtain {xn} = {b|(a, b) ∈ DM}. Setting x∗ = (lk)1i+1, we compute the Bayesian
estimate of P (x∗|{xn}) for P ((lk)1i+1|(lm)1i ).
B. Semi-tied Covariance Matrices A Gaussian density function is defined as follows:
p(x|µ,Σ) =1
(2π)N/2|Σ|1/2exp
(−1
2(x − µ)T Σ−1(x − µ)
)(3.10)
or shortly, x ∼ N (µ,Σ). To reduce the number of model parameters, a full covariance
matrix Σis decomposed into a decorrelating transform matrix H and a diagonal covariance
matrix Σdiag:
Σ(m) = HΣ(m)diagH
T (3.11)
The decorrelating matrix H is shared by all Gaussian distributions, whilst each Gaussian
distribution maintains its own diagonal covariance matrix. Let the set of all training data
be {{x(m)
n }}
={{x(1)
1 , · · · ,x(1)N1
}, · · · , {x(M)1 , · · · ,x(M)
NM}}
,
where x(j)i , 1 ≤ i ≤ Nj , 1 ≤ j ≤ M is sampled from j-th Gaussian distribution, and Nj
is the number of samples that belong to the j-th Gaussian distribution. The maximum-
likelihood (ML) estimate of model parameters Θ =(H,
{Σ(m)
diag, µ(m)
})is
ΘML = arg maxΘ
log p({
{x(m)n }
}|Θ
). (3.12)
ΘML can be optained using an iterative algorithm based on an expectation-maximization
(EM) approach [22]. The Gaussian distribution of a new sample x(m)Nm+1 is given by
x(m)Nm+1 ∼ N (µ(m),HΣ(m)
diagHT ). (3.13)
We show how to estimate P (n1i ,n
2j |l1i+1). Let {x(m)
n } ={(n1
h,n2g)|l1h+1 = l1,m for all h and g
},
where l1,m is an action label m for character 1. We collect a set of all training data
DG ={{x(m)
n }}
, and then estimate ΘML employing Equation (3.12). Given next ac-
tion l1i+1, we set x(m)Nm+1 = (n1
i ,n2j ) to obtain conditional probability P(n1
i ,n2j |l1i+1) with
Equation (3.13).
47
C. Gaussian process regression A Gaussian process is a collection of random vari-
ables, t = (t(x1), t(x2), · · · , t(xN )) that have a joint Gaussian distribution
P (t|{xn}) =1Z
exp(−1
2(t − µ)T C−1(t − µ)
)(3.14)
or t|{xn} ∼ N (µ,C) for any finite collection of input data {xn} = (x1,x2, · · · ,xN ),
where Z is a normalizing constant, µ is the mean of t, and C is the covariance matrix
defined by a parameterized covariance function [23]. The covariance function that we use
is
f(xi,xj ; ΘP ) = θ1 exp
−12
L∑l=1
(x
(l)i − x
(l)j
)2
r2l
+ θ2 + θ3δij , (3.15)
where xi,xj ∈ {xn}, x(l)i , 1 ≤ l ≤ L is the lth element of vector xi, ΘP = (θ1, θ2, θ3, r1, · · · , rL)
are hyperparameters, and δij is a Kronecker delta function. Let Cij be element (i, j) of
the covariance matrix C. Then, Cij = f(xi,xj ; ΘP ) for all i and j.
Provided with a training data set DP = {(xn, t(xn))}, our objective is to predict
t(xN+1) at new point xN+1. A straightforward derivation shows that
t(xN+1)|D,xN+1 ∼ N (kT C−1t, κ − kT C−1k). (3.16)
where
k = (f(x1,xN+1; ΘP ), f(x2,xN+1; ΘP ), · · · , f(xN ,xN+1; ΘP ))T
and
κ = f(xN+1,xN+1, ΘP ).
Thus, the most probable prediction of t(xN+1) is kT C−1t. As C−1 is computed in the
training stage, the prediction can be done efficiently.
The prediction accuracy highly depends on the choice of hyperparamters ΘP . The
maximum a posterior(MAP) estimate is
ΘMAP = arg maxΘP
p(t|{xn}, ΘP )p(ΘP |{xn}). (3.17)
which approximates the Bayesian estimate. For efficiency we adopt the MAP estima-
tion as it has an analytical solution.
We apply a Gaussian process to estimate the probability distribution for each scalar
element of (pe0, q
e0, p
e) where n1i+1 = ((ps
0, qs0, p
s), (pe0, q
e0, p
e)). Let (nk)1i+1 be the kth
48
scalar element of n1i+1, 7 ≤ k ≤ 12. Notice that (ps
0, qs0, p
s) is given from A1i . We set
xh = (n1h,n2
m) and t(xh) = (nk)1h+1 for every cross edge from a node with action label
l2m to that with l1h+1 such that l1h+1 = l1i+1 to obtain DP = {(xn, t(xn))}. We also set
xN+1 = (n1i ,n
2j ) letting |DP | = N . Given xN+1 and DP , we employ Equation (3.16) to
estimate the probability distribution and t(xN+1), that is, (nk)1i+1.
3.4 Motion Coupling and Postprocessing
In this section, we describe how to synthesize a two-character motion. For motion syn-
thesis, we prefer the term “characters” to “players” since we deal with virtual characters
unlike in motion analysis. A pair of characters exchange actions and reactions via cross
edges while traversing their respective single-player motion transition graphs. One of the
characters may be designated as an avatar. In this case, the character is under the control
of the user while still communicating with the other via the cross edges. We first describe
the basic model for two-character motion synthesis, assuming that neither of them is un-
der the user’s control. We then extend the basic model to incorporate the user’s control.
Basic Model Let us revisit Figure 3.5 to set up a situation for motion transition. Sup-
pose that characters 1 and 2 perform action A1i (the ith action of character 1) and A2
j
(the jth action of character 2), respectively. Without loss of generality, action A1i is com-
pleted before action A2j . Then character 1 determines the next action A1
i+1 to perform,
guided by the motion transition model that has been trained in Section 3.2. Given current
actions A1i and A2
j , a node of the motion transition graph for character 1 is chosen among
the candidate nodes that are incident from the node with action label l1i and also from
the node with action label l2j via a cross edge, to determine A1i+1 = (l1i+1,n
1i+1). There
are two plausible alternatives to choose the next action: choosing the one with the high-
est estimated probability or randomly choosing a node among the candidates according to
their estimated probabilities. We use the latter alternative to generate a motion stream
with greater variety.
Let l1,ki+1, k = 1, 2, · · · be the action labels of the candidates as shown in Figure 3.7.
Then, we compute the conditional probabilities P (A1,ki+1|A1
i ,A2j ) for all k using Equation
(3.1), where A1,ki+1 = (l1,k
i+1,n1,ki+1), while estimating parameters n1,k
i+1 based on Gaussian pro-
cesses. Given the conditional probabilities, the next action A1i+1 = (l1i+1,n
1i+1) with the
estimated numerical parameters is determined at random according to the probabilities.
In order to use an example action as the next action, the node with l1i+1 in the motion
49
candidates
Figure 3.7: The action labels of the candidates
transition graph is first accessed, and then the example action with the numerical param-
eters that are most similar to n1i+1 in a Euclidean sense is chosen among the actions be-
longing to the action group represented by this node. These actions can be accessed via
the links, which is stored in the node, to the example actions in the basic data structure
B.
Motion Coupling: Provided with the newly-generated action A1i+1 of character 1 to-
gether with its immediate predecessor A1i and the on-going action A2
j of character 2, we
deal with two issues: how to stitch A1i and A1
i+1 seamlessly while preventing motion ar-
tifacts such as foot sliding and penetration and how to couple the two actions, A1i+1 and
A2j of the two different characters. Since the first issue has been addressed rather well
during the last decade, we only deal with the second issue. For the first issue, we refer
the readers to the work in [24, 33, 53]. In motion coupling, we address two problems as
follows: how to make the two characters face each other and how to synchronize the in-
teraction moments between two characters, such as hitting the face and punching on the
body.
By using parameters n1i+1 determined for the next action A1
i+1 from n1i and n2
j , the
two characters face each other reasonably well in the resulting animation even without
explicit motion coupling. Our concern is how to compensate for small errors caused by
sparse example motions. We settle this problem in a simple manner by adjusting the esti-
mated parameters n1i+1 = ((ps
0, qs0, p
s), (pe0, q
e0, p
e)). In particular, the root position pe0 and
orientation qe0 at the last frame of the next action are modified to those values on the half
way from the chosen parameter values toward their respective estimated ones (see Section
3.4). The rationale for this heuristic is to reflect the estimated root position and orien-
tation, while avoiding jerkiness by large changes. This simple heuristic works well in our
experiments.
For the second problem, the interaction moments are regarded as the keytimes to apply
50
timewarping for action alignment. By the way in which the example motion stream is
segmented, the interaction moments can be identified easily by examining the contact force
profiles of actions.
Interactive Model In this section, we extend our basic model to allow user interactions.
One of the characters is designated as an avatar controlled by the user to support on-
line applications. In order to control the avatar, the user supplies a stream of motion
specifications in an on-line manner, which is stored in a queue. Upon finishing the current
action, the avatar dequeues motion specifications one by one to derive the next actions.
The attack motions such as kicking and punching are prescribed explicitly, while the
non-attack motions are invoked implicitly whenever the queue is empty. In the former
case, the user is allowed to specify an attack motion using a motion prescription such as
”kick” or ”punch” or more specifically with a motion aspect such as ”left-kick”, ”right-
kick”, ”left-punch”, or ”right-punch”. Accordingly, the avatar refers to the action table
R to search for the candidates for each action induced by the motion prescription, and
choose one of them guided by the learned transition model described in Section 3.2. For
instance, “kick-up” or “kick-downup” actions are searched when a ”kick” motion is spec-
ified. While searching for such an action, the avatar may perform other actions than the
specified motion in response to the opponent’s actions. When the queue is empty, the
avatar follows the mouse pointer driven by the user in an on-line manner. With the cur-
sor, the user explicitly specifies the horizontal root translation, which is used to update
the corresponding element pe in the numerical parameters n1i+1 of the next action. The
remaining elements of n1i+1 are estimated in the same way as in the basic model.
Postprocessing By the way in which we choose actions, the two characters may not be
perfectly positioned to perform their respective actions, which could result in unexpected
collisions at times. To handle such collisions, we adopted a data-driven physical simulator
based on a PD-servo [62] although this is not explicitly stated. A public domain library
PhysX [2] was used to implement the simulator. We leave as a future research issue how to
determine the ideal position of each character, which is free from the unexpected collisions,
while avoiding motion discontinuity or jerkiness.
51
4. Results
We performed experiments on an athlon PC (AMD athlon 64 2GHz processor and 2GB
memory). Our human-like articulated figure had 54 DOFs (degrees of freedom): 6DOFs
for the pelvis, 6DOFs for the spine, 9DOFs for each limb, 3DOFs for the neck, and 3DOFs
for the head. The sampling rate of the example motion data were 30Hz or 60Hz. We first
show results for motion modeling and then those for motion synthesis.
4.1 Motion Modeling
To demonstrate the effectiveness of our motion modeling scheme, we performed experi-
ments on locomotive motions and martial arts motions summarized in Table 4.1. We cap-
tured two locomotive motion sequences, referred to as Locomotions A and B. Locomotion
A is sampled at a rate of 30Hz and 3.2 minutes long. Locomotion B is at 60Hz and 7.4
minutes long. In capturing these motion sequences, the motion performer was allowed to
make any desired combination of locomotive motions while varying speed and turning an-
gle. Another experiments were conducted for a stream of two-player kickboxing motions
and that of Taekwondo motions. The former is 16 minutes long and sampled at 60Hz,
and the latter 9.8 minute long and at 30Hz. All motion streams were obtained by con-
catenating motion clips, each of which is slightly less than one and a half minutes long.
We applied our motion labeling schemes to those motion sequences and constructed their
corresponding motion transition graphs. Since a pair of consecutive motion segments may
be from different motion clips, special case is needed in building edges (see Section 2.1.5).
In order to validate our motion segmentation scheme, we compare the results to those
obtained from manual segmentation. As summarized in Table 4.2, the differences of the
segmentation frames of our scheme from the corresponding frames of manual segmentation
are within one frame for locomotions. However, for martial arts motions, the maximum
error is large at ten frames because of outliers: It is very difficult to define an optimal
segmentation even for a human expert since there is no sharp boundary between each
step in martial arts motions. However, excluding such outliers, segmentation moments are
highly collocated even for martial arts motions as shown in Figure 4.1.
52
Table 4.1: Example motion datasetloco. A loco. B Kickboxing Taekwondo
length 3.2 min 7.4 min 16 min 9.8 min
# of motion clips 3 7 11 8
sampling rates 30Hz 60Hz 60Hz 30Hz
# of characters 1 1 2 2
Table 4.2: Results on motion segmentation in comparison to manual segmentation
loco. A loco. B kickboxing Taekwondo
# of motion segments 312 814 1445 901
manual segmentation 312 814 1433 923
mean error 0.46 frame 0.38 frame 1.94 frame 1.58 frame
std.dev. error 0.64 frame 0.6 frame 2.79 frame 2.11 frame
minimum error 0 frame 0 frame 0 frame 0 frame
maximum error 2 frame 2 frame 13 frames 10 frames
percentage of outliers 0 % 0% 14.7% 9.3%
(a) walk
(b) run
(c) Taekwondo
(d) kickboxing
automatic segmentation
manual segmentation
Figure 4.1: Results on motion segmentation.
53
Table 4.3 summarizes the results of motion modeling together with the timing data
for each task of motion modeling. For locomotion A, we obtained 338 motion segments,
which were classified into 15 groups as shown in 4.4. Every motion segment was mapped
onto one of the groups listed in Table 2.2. However, not every group had a motion seg-
ment assigned to it since some actions did not appear in the example motion sequence.
Thus, the resulting motion transition graph missed two nodes, in particular, the SLDRF
and FLDRS nodes. For locomotion B, we obtained similar results. Unlike Locomotion A,
all groups have one or more motion segments assigned to them. In Locomotion B, we
observed that the number of walk-to-stand transitions is significantly smaller than that
of stand-to-walk, since many motion clips that were concatenated in Locomotion B were
initiated by a standing motion and ended by a walking motion. It takes less than one
minute to obtain the motion transition graph from an unlabeled locomotion sequence. For
Kickboxing motion, we obtained 1445 and 1310 motion segments respectively for each in-
dividual motion, which were classified into 110 and 96 groups. For Taekwondo motion,
98 and 114 motion segments were classified into 98 and 114 groups, respectively for each
individual motion. Table 4.5 summarizes classification results for two-fighter motions. It
took four hours and two hours to classify the kickboxing and Taekwondo motion streams,
respectively, including manual interaction times. We used about 10% of the motion seg-
ments as training data for both motion streams.
Table 4.3: Results on motion labeling
loco. A loco. BKickboxing Taekwondo
player1 player2 player1 player2
segmentation 0.5s 1.7s 3.3s 4.2s 1.4s 1.5s
graph construction 0.3s 1.3s 2.4s 2.1s 0.4s 0.8s
interaction modeling - - 1252s 1833s 603s 451s
# of motion segments 338 838 1445 1310 901 791
# of action groups 15 17 110 96 98 114
avg # of edges 1 1.27 1.41 45.2 40.3 27.4 32.6
avg # of cross edges 2 - - 70.4 76.1 42.3 54.7
1average # of outgoing edges per node.2average # of outgoing cross edges per node.
54
Table 4.4: Classification results for locomotions
strings stringscounts countsLoco.A Loco.ALoco.B Loco.B
LDRRDLFLFFRF
SLDRFRDLFFRDLFLDR
LDRSRDLSSRDLSLDR
SLDRFSRDLFFRDLSFLDRS
total
89 18790 19448 18551 18520 352 102 41 23 2
2 39 23 79 90 23 26 40 5
338 838
strings stringscounts countsLoco.A Loco.ALoco.B Loco.B
LDRRDLFLFFRF
SLDRFRDLFFRDLFLDR
LDRSRDLSSRDLSLDR
SLDRFSRDLFFRDLSFLDRS
total
89 18790 19448 18551 18520 352 102 41 23 2
2 39 23 79 90 23 26 40 5
338 838
Table 4.5: Classification results for two-fighter motions# of actions 3
motion Kickboxing Taekwondo
vocabulary player1 player2 player1 player2
left-punch 126 89 9 11
right-punch 114 112 35 35
left-kick 29 38 21 45
right-kick 98 111 86 56
jump 0 0 13 38
react (to) 203 168 77 73
4.2 Motion Synthesis
In order to visually validate our motion modeling scheme, we first performed experiments
for locomotive motion synthesis. We perform trajectory refinement and synthesize loco-
motive motions with these trajectories.
In the first experiments, we show how final trajectories are different from their initial
versions. As shown in Figure 4.2, two parametric curves (including a straight line) are
sampled to produce the corresponding point streams to be used as the input. Both the
input and output curves are visualized side by side.
The next experiments are performed for data streams sampled with two types of input
devices, an analog joystick and a mouse (See Figure 4.3). For the mouse, time-varying
cursor positions are sampled in an on-line manner to derive force profiles. For the joystick,
force profiles are obtained from stick positions and directions. Motion types are specified
by a keyboard or buttons.3# of actions does not include actions with qualifier “others”.
55
(a) straight walking (b) curved walking(a) straight walking (b) curved walking
Figure 4.2: Trajectory refinement.
Figure 4.3: On-line motion synthesis.
56
frame 222
frame 226
frame 230
frame 234
frame 242
frame 238
frame 246
frame 250
Figure 4.4: Leaning due to accelerations.
57
Figure 4.5: Human trajectory.
As shown in Figure 4.4, our scheme reflects some dynamical aspects of locomotion such
as leaning incorporating acceleration as a parameter. Figure 4.5 shows a refined trajectory
after clamping outliers and adding details learned from example motion data. The motions
that are synthesized with final trajectories are shown in the accompanying video. Our
scheme facilitates on-line locomotion control without latency. Equipped with the scheme,
the locomotion synthesis system can produce more than 500 frames per second.
More experiments were performed to demonstrate the capability of our method for
two-character motion synthesis. We performed our experiments in two modes: automatic
and interactive modes. In the automatic mode, our method generated the motions of both
characters automatically while traversing the coupled motion transition graph, guided by
the learned motion transition model. Results are shown in Figure 4.6. As shown in Figure
4.9, our scheme produces interactions that are similar to those in example motions. In
the interaction mode, one of the characters (white shirt) was designated as an avatar.
The motion of the avatar was directed in an on-line manner by specifying a motion to
perform and the position to move to via the keyboard and the mouse, respectively. In both
modes, our method showed real-time performances: more than 30 fps(frame per second)
and more than 500 fps including and excluding rendering time, respectively. Figure 4.7
shows snapshots of the resulting two-character animations.
58
frame
54
frame
62
frame
70
frame
78
frame
86
frame
94
frame
102
frame
110
frame
118
frame
126
frame
54
frame
62
frame
70
frame
78
frame
86
frame
94
frame
102
frame
110
frame
118
frame
126
Figure 4.6: Synthesized two-fighter motion
59
(a) automatic (b) interactive
Figure 4.7: Two control modes
(a) punch (b) kick
Figure 4.8: Synthesized interactions
60
(a) punch 1 (b) punch 2 (c) low kick
(d) high kick (e) avoid 1 (f) avoid 2
Figure 4.9: Comparisons with example motions. Upper parts of each sub-figure are from
synthesized motions while lower parts are from example motions.
61
5. Discussion
We begin with locomotive motion synthesis. Relying on the contact forces of a human-
like articulated figure, our motion segmentation scheme is applicable to various footstep
driven motions such as locomotions and martial arts motions. Our segmentation scheme
is quite robust to avoid any human adjustment. However, the unlabeled example motion
should be captured from a real performer but not created manually by an animator. In
the latter case, the example motion may not exhibit the biomechanical observations that
our scheme heavily rely on.
We have parameterized the trajectory of an example locomotion motion segment with
the horizontal speed of the vehicle and its horizontal acceleration vector at the initial
frame. The rationale for this choice is that the motion segment is so short that the in-
formation at the initial frame characterizes the whole motion segment, which implies that
the motion segment itself is parameterized in the same way. This guarantees latency-free
trajectory construction. The trajectory could be further improved by incorporating a look-
ahead capability while sacrificing responsiveness.
Our locomotion synthesis scheme samples the intended (unknown) trajectory point by
point interactively. Since the user can easily adapt to the clamping to form a human-
in-the-loop feedback process, the intended trajectory can be achieved. As shown in the
first experiment, an input trajectory could be given as a curve. In this case, the curve is
converted to a point stream by properly sampling the curve. However, the curve may not
preserve its global shape when its speed and acceleration are clamped.
Now, we consider two-character motion generation. It must be noted that captured
example two-character motions are quite different in general from similar motions in real
situations. Specifically, many combinations of attacks and reactions are simulated by a
pair of performers to avoid injury to any of them, which may deteriorate the quality of
synthesized motions. One solution could be to incorporate physical-based techniques at
the moments of actual contacts between the players so as to enhance the quality of the
captured example motions [62,63].
In the interactive mode, the prescription of the label of an action for the avatar is
directly or indirectly under the user’s control. However, the numerical parameters of the
action are hard to prescribe when a motion is specified. Under the assumption that the
avatar imitates example motions, our strategy is to estimate the numerical parameters
62
from the prescribed action label and the numerical parameters of the current actions of
the characters using Gaussian processes. However, further investigation is needed to have
the numerical parameters under the user’s control.
63
6. Conclusions
In this thesis, an example-based framework for on-line, real-time motion synthesis is pro-
posed. A key component of our framework is the motion labeling scheme. We modeled
an unlabeled example motion in terms of labeled motion segments called actions so that
motion generation and transition can be performed effectively in accordance with on-line
motion specifications by the user. In particular, we proposed a hierarchical motion tran-
sition graph to represent a locomotion model and a coupled motion transition graph to
represent two-fighter motions. Exploiting biomechanical observation data on human con-
tact forces and footstep patterns, our scheme decomposes the example motion into groups
of motion segments such that the motion segments in the same group share an identical
footstep pattern. Moreover, based on string processing rather than numerical computa-
tion, our scheme for classifying locomotive motions is extremely robust. This scheme is
further extended for labeling martial arts motions combining a supervised learning tech-
nique called MSVM. As demonstrated in results, dynamic and diverse martial arts motions
are successfully classified. We believe that our motion labeling scheme can be adopted for
labeling other footstep driven motions as well.
We also present an on-line data-driven scheme for effectively prescribing a human pelvis
trajectory. Our scheme analyzes example motion data to extract the information on hu-
man steering behavior including motion details together with bounds on inter-frame accel-
eration variations, acceleration bounds, and speed bounds. Given a stream of point sam-
ples in an on-line manner, our scheme first transforms them one by one to a smooth vehi-
cle trajectory by clamping outliers, exploiting the bounds that have been estimated from
the example motion data. Our scheme then adds motion details to the vehicle trajectory
to obtain a human pelvis trajectory to follow. For explanation purpose, we concentrate on
the “pelvis” trajectory using a “mouse” as input device. However, we can easily generalize
our scheme for other trajectories such as a COM (center of mass) trajectory and a ZMP
(zero moment point) trajectory. We can also use an input device other than a mouse, for
example, a joystick or even slide bars.
For capturing the interactions between two fighters embedded in the example motion
stream, a dynamic Bayesian network is adopted to train a motion transition model with
the data collected into the coupled motion transition graph. This model facilitates repro-
duction of captured interactions at runtime such that each action by a character is ac-
64
companied by a proper reaction by the counterpart in a probabilistic manner. The next
motion for the other character is coupled with the motion of the current character in both
space and time for realistic motion synthesis.
In the future, we would like to generalize the behavior model for locomotive motions to
support more complex movements such as side-stepping and backward walking. For this
purpose, our vehicle model may be generalized such that the orientation of the vehicle is
not necessarily aligned with its velocity. The generalized vehicle can be controlled based on
torques as well as forces. Currently, movements for martial arts are described based on the
pelvis trajectory. The generalized vehicle model may provide a better way for describing
the movements of a character.
The proposed interaction model can possibly be adapted for other two-player sports
games such as tennis and ping-pong in each of which a variety of dynamic motions are
performed to hit a ball. Another interesting extensions include interactions among multi-
ple characters observed in team sports such as soccer and basketball; it would be a feasible
solution to employ traditional crowd simulation approach to generate global movements,
while synthesizing detailed local interactions among a small number of players based on
our example-based framework.
65
요약문
온라인동작생성을위한동작모델링기법
본 연구에서는 온라인 동작 생성을 위한 예제기반 프레임웍을 제시한다. 제안된 프레임
웍은 동작 모델링과 행동 양식 모델링 그리고 동작 생성의 세가지 단계로 구성되어 있
다. 동작 모델링 단계에서는 먼저 예제 동작을 캐릭터와 지면간의 접촉력에 기반하여 분
할한다. 분할된 동작 세그먼트는 유사한 구조를 갖는 동작 세그먼트들 끼리 그룹핑된다.
마지막으로 생성된 그룹들을 노드에 할당하고, 그룹들간이 전이를 에지로 할당하여 동작
전이 그래프를 생성한다. 행동 양식 모델링에서는 동작 세그먼트들을 시간에 따라 변화
하는 환경에 맞도록, 그리고 실시간 온라인으로 주어지는 사용자의 동작 지침에 맞도록
동작 세그먼트를 연결하는 동작 전이 모델을 구성한다. 동작 생성 단계에서는 온라인으
로 주어지는 동작지침에 따라 동작 전이모델을 참조하여 그래프를 탐색하고, 각 노드에서
생성되는 동작을 연결하여 동작지침에 해당하는 동작을 얻는다. 이와 같은 프레임웍에 기
반하여 먼저 보행동작의 순환적 특징에 의한 특징적인 움직임과 동작간의 전이를 다룬다.
또한 이를 일반화하여 두명의 격투 동작을 생성하는 방법을 제안한다. 현재의 연구는 보
행 동작과 격투 동작의 두 가지 동작에 대해서만 수행되었지만, 다른종류의 이족 캐릭터
동작에도 제안된 방법이 적용 가능할 것이다.
66
References
[1] Y. Abe, C. Karen Liu, and Z. Popovic. Momentum-based parameterization of
dynamic character motion. In SCA ’04: Proceedings of the 2004 ACM SIG-
GRAPH/Eurographics symposium on Computer animation, pages 173–182, New York,
NY, USA, 2004. ACM Press.
[2] AGEIA. Physx sdk 2.6, http://ageia.com.
[3] O. Arikan and D. A. Forsyth. Interactive motion generation from examples. ACM
Transactions on Graphics (Proc. SIGGRAPH 2002), 21(3):483–490, July 2002.
[4] O. Arikan, D. A. Forsyth, and J. F. O’Brien. Motion synthesis from annotations.
ACM Transactions on Graphics (Proc. SIGGRAPH 2003), 22(3):402–408, July 2003.
[5] O. Arikan, D. A. Forsyth, and J. F. O’Brien. Motion synthesis from annotations.
ACM Transactions on Graphics (Proc. SIGGRAPH 2003), 22(3):402–408, July 2003.
[6] O. Arikan, D. A. Forsyth, and J. F. O’Brien. Pushing people around. In ACM SIG-
GRAPH/Eurographics Symposium on Computer Animation 2005, July 2005.
[7] J. Assa, Y. Caspi, and D. Cohen-Or. Action synopsis: pose selection and illustration.
ACM Trans. Graph., 24(3):667–676, 2005.
[8] J. Barbic, A. Safonova, J. Pan, C. Faloutsos, J. Hodgins, and N. Pollard. Segmenting
motion capture data into distinct behaviors. In In the Proc. of Graphics Interface,
pages 185–194, 2004.
[9] Richard H. Bartels, John C. Beatty, and Brian A. Barsky. An introduction to splines
for use in computer graphics & geometric modeling. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 1987.
[10] R. Bindiganavale and N. I. Badler. Motion abstraction and mapping with spatial
constraints. In Proceedings of International Workshop CAPTECH’98, pages 70–82,
1998.
[11] M. Brand, N. Oliver, and A. Pentland. Coupled hidden markov models for complex
action recognition. In Proceedings of IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pages 994 – 999, 1997.
67
[12] Mattew Brand. Coupled hidden markov models for modeling in-
teracting processes. Technical Report 405, MIT Media Lab,
http://xenia.media.mit.edu/ brand/Publications.html, 1996.
[13] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines
(version 2.31), September 07 2001.
[14] S. M. Chu and T. S. Huang. Audio-visual speech modeling using coupled hid-
den markov models. In Proceedings of IEEE International Conference on Acoustics,
Speech, and Signal Processing, pages 2009 – 2012, 2002.
[15] Luis Correia and A. Steiger-Garo. A useful autonomous vehicle with a hierarchical
behavior control. In European Conference on Artificial Life, pages 625–639, 1995.
[16] B. D’Ambrosio. Inference in bayesian networks. AI Magazine, 20(2):21–36, 1999.
[17] Morris H. DeGroot. Optimal Statistical Decisions. McGraw-Hill, New York, 1970.
[18] Kai-Bo Duan and S. Sathiya Keerthi. Which is the best multiclass SVM method? an
empirical study. In Nikunj C. Oza, Robi Polikar, Josef Kittler, and Fabio Roli, edi-
tors, Multiple Classifier Systems, volume 3541 of Lecture Notes in Computer Science,
pages 278–285. Springer, 2005.
[19] A. Fod, M. J. Mataric, and O.C. Jenkins. Automated derivation of primitives for
movement classification. Autonomous Robots, 12(1):39–54, 2002.
[20] K. Forbes and E. Fiume. An efficient search algorithm for motion data using weighted
pca. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium
on Computer animation, pages 67–76, New York, NY, USA, 2005. ACM Press.
[21] A. Galata, N. Johnson, and D. Hogg. Learning variable-length markov modles of
behavior. Computer Vision and Image Understanding, 81:398–413, 2000.
[22] M. Gales. Semi-tied covariance matrices for hidden markov models. IEEE Transac-
tions Speech and Audio Processing, 7:272–281, 1999.
[23] M. N. Gibbs. Bayesian Gaussian precesses for regression and classification. PhD
thesis, Cambridge University, 1997.
[24] M. Gleicher, H. J. Shin, L. Kovar, and A. Jepsen. Snap-together motion: Assembling
run-time animation. ACM Transactions on Graphics 22 (Proc. SIGGRAPH 2003),
22(3):702–702, July 2003.
68
[25] Jared Go, Thuc Vu, and James J. Kuffner. Autonomous behaviors for inter-
active vehicle animations. In SCA ’04: Proceedings of the 2004 ACM SIG-
GRAPH/Eurographics symposium on Computer animation, pages 9–18, Aire-la-Ville,
Switzerland, Switzerland, 2004. Eurographics Association.
[26] S. Guo and J. Roberge. A High-Level Control Mechanism for Human Locomotion
Based on Parametric Frame Space Interpolation. In Proceedings of Eurographics
Workshop on Computer Animation and Simulation 96, pages 95–107, August 1996.
[27] D. Helbing, I. J. Farkas, and T. Vicsek. Simulating dynamic features of escape panic.
In Nature, pages 407:487–490, 2000.
[28] Eugene Hsu, Sommer Gentry, and Jovan Popovic. Example-based control of human
motion. pages 69–78.
[29] O.C. Jenkins and M.J. Mataric. Automated derivation of behavior vocaburaries for
autonomous humanoid motion. In In Proc. of AAMAS’03, pages 225–232, 2003.
[30] T. H. Kim, S. I. Park, and S. Y. Shin. Rhythmic-motion synthesis based on motion-
beat analysis. ACM Transactions on Graphics (Proc. SIGGRAPH 2003), 22(3):392–
401, July 2003.
[31] L. Kovar and M. Gleicher. Automated extraction and parameterization of motions in
large data sets. ACM Trans. Graph., 23(3):559–568, 2004.
[32] L. Kovar, M. Gleicher, and J. Schreiner. Footskate cleanup for motion capture editing.
In Proceedings of ACM SIGGRAPH Symposium on Computer Animation, July 2002.
[33] Lucas Kovar, Michael Gleicher, and Frédéric Pighin. Motion graphs. In
SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics and
interactive techniques, pages 473–482, New York, NY, USA, 2002. ACM Press.
[34] T. Kwon and S. Y. Shin. Motion modeling for on-line locomotion synthesis. In SCA
’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer
animation, pages 29–38, New York, NY, USA, 2005. ACM Press.
[35] Y. C. Lai, S. Chenney, and S. Fan. Group motion graphs. In SCA ’05: Proceedings of
the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation, pages
281–290, New York, NY, USA, 2005. ACM Press.
69
[36] J. Lee, J. Chai, P. S. A. Reitsma, J. K. Hodgins, and N. S. Pollard. Interactive Con-
trol of Avatars Animated with Human Motion Data. ACM Transactions on Graphics
(Proc. SIGGRAPH 2002), 21(3):491–500, July 2002.
[37] J. Lee, J. Chai, P. S. A. Reitsma, J. K. Hodgins, and N. S. Pollard. Interactive Con-
trol of Avatars Animated with Human Motion Data. ACM Transactions on Graphics
(Proc. SIGGRAPH 2002), 21(3):491–500, July 2002.
[38] Jehee Lee and Kang Hoon Lee. Precomputing avatar behavior from human motion
data. In SCA ’04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics sympo-
sium on Computer animation, pages 79–87, New York, NY, USA, 2004. ACM Press.
[39] Y. Li, T. Wang, and H. Shum. Motion Texture: A Two-Level Statistical Model
for Character Motion Synthesis. ACM Transactions on Graphics (Proc. SIGGRAPH
2002), 21(3):465–472, July 2002.
[40] C. Karen Liu, Aaron Hertzmann, and Zoran Popović. Composition of com-
plex optimal multi-character motions. In SCA ’06: Proceedings of the 2006 ACM
SIGGRAPH/Eurographics symposium on Computer animation, pages 215–222, Aire-
la-Ville, Switzerland, Switzerland, 2006. Eurographics Association.
[41] C. Karen Liu and Zoran Popovic. Synthesis of complex dynamic character motion
from simple animations. In John Hughes, editor, SIGGRAPH 2002 Conference Pro-
ceedings, Annual Conference Series, pages 408–416. ACM Press/ACM SIGGRAPH,
2002.
[42] Ronald A. Metoyer and Jessica K. Hodgins. Reactive pedestrian path following from
examples. The Visual Computer, 20(10):635–649, 2004.
[43] T. Mukai and S. Kuriyama. Geostatistical motion interpolation. ACM Trans. Graph.,
24(3):1062–1070, 2005.
[44] M. Muller, T. Roder, and M. Clausen. Efficient content-based retrieval of motion
capture data. ACM Trans. Graph., 24(3):677–685, 2005.
[45] Kevin P. Murphy. An introduction to graphical models. Unpublished, May 2001.
[46] S. I. Park, H. J. Shin, and S. Y. Shin. On-line locomotion generation based on motion
blending. In Proceedings of ACM SIGGRAPH Symposium on Computer Animation,
pages 105–111, July 2002.
70
[47] S. I. Park, H. J. Shin, and S. Y. Shin. On-line motion blending for real-time locomo-
tion generation. Computer Animation and Virtual Worlds, 15:125–138, Sept. 2004.
[48] Jacquelin Perry. Gait analysis: Normal and Pathological Function. Delmar Learning,
1992.
[49] K. Pullen and C. Bregler. Motion Capture Assisted Animation: Texturing and Syn-
thesis. ACM Transactions on Graphics (Proc. SIGGRAPH 2002), 21(3):501–508, July
2002.
[50] Craig Reynolds. Steering behaviors for autonomous characters, 1999.
[51] C. Rose, M. F. Cohen, and B. Bodenheimer. Verbs and adverbs: Mulidimensional
motion interpolation. IEEE Computer Graphics and Applications, 18(5):32–40, Sept.
1998.
[52] A. Safonova and J. K. Hodgins. Analyzing the physical correctness of interpolated
human motion. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics
symposium on Computer animation, pages 171–180, New York, NY, USA, 2005. ACM
Press.
[53] H. J. Shin, J. Lee, S. Y. Shin, and M. Gleicher. Computer puppetry: An importance-
based approach. ACM Trans. Graph., 20(2):67–94, 2001.
[54] Hyun Joon Shin, Jehee Lee, Sung Yong Shin, and Michael Gleicher. Computer pup-
petry: An importance-based approach. ACM Trans. Graph., 20(2):67–94, 2001.
[55] P. Sloan, C. F. Rose, and M. F. Cohen. Shape by example. In Proceedings of 2001
ACM Symposium on Interactive 3D Graphics, pages 135–144, 2001.
[56] A. Sulejmanpasic and J. Popovic. Adaptation of performed ballistic motion. ACM
Trans. Graph., 24(1):165–179, 2005.
[57] L. M. Tanco and A. Hilton. Realistic synthesis of novel human movements from a
database of motion capture examples. In Proceedings of the IEEE Workshop on Hu-
man Motion, pages 137–142, 2000.
[58] A. Treuille, S. Cooper, and Z. Popovic. Continuum crowds. ACM Trans. Graph.,
25(3):1160–1168, 2006.
[59] Geoffrey I. Webb, Janice R. Boughton, and Zhihai Wang. Not so naive bayes: Ag-
gregating one-dependence estimators. Machine Learning, 58(1):5–24, 2005.
71
[60] D. J. Wiley and J. K. Hahn. Interpolation synthesis for articulated fiture motion.
IEEE Computer Graphis and Applications, 17(6):39–45, 1997.
[61] D. A. Winter. Biomechanics and Motor Control of Human Movement. John Wiley
and Sons Inc, 1990.
[62] V. B. Zordan and J. K. Hodgins. Motion capture-driven simulations that hit and
react. In Proceedings of ACM SIGGRAPH Symposium on Computer Animation, pages
89–96, July 2002.
[63] V. B. Zordan, A. Majkowska, B. Chiu, and M. Fast. Dynamic response for motion
capture animation. ACM Trans. Graph., 24(3):697–701, 2005.
72