mixtract: a directable musical expression system - … · mixtract: a directable musical expression...
TRANSCRIPT
Mixtract: A Directable Musical Expression System
Mitsuyo Hashida, Shunji Tanaka and Haruhiro KatayoseKwansei Gakuin University
2-1 Gakuen, Sanda, Hyogo, 669-1337 Japan{hashida, s.tanaka, katayose}@kwansei.ac.jp
Abstract
This paper describes a music performance design system
focusing on phrasing, the design and development of an in-
tuitive interface to assist music performance design system.
The proposed interface has an editor to control the param-
eter curves of “dynamics” and “tempos” of hierarchical
phrase structures, and supports analysis mechanisms for hi-
erarchical phrase structures that lighten the users’ work for
music interpretation. We are interested in how a system can
assist the users in designing music performances, but not to
develop a full automatic system. Unlike the most automatic
performance rendering systems to date, assisting the pro-
cess of music interpretation and to convey the musical in-
terpretive intent to the system are focused in this paper. The
advantage of the proposed system was verified from short-
ening time required for music performance design. The pro-
posed system is more beneficial from the viewpoint that it
can be a platform to test various possibilities of phrasing
expression.
1. Introduction
Musical works are given life by performance expression,
represented by agogics and dynamics. Performance expres-
sion is as important as composition or arrangement. Perfor-
mance rendering has been one of the main topics since the
dawn of music information science. It is an ideal target to
test the potential of artificial intelligence. Some systems ap-
ply rule-based and machine-learning techniques [2, 8, 14],
whereas others apply example-based reasoning [3, 13].
Some recent performance-rendering systems that use a
corpus of musical expression have generated musical ex-
pression on a level suitable for commercial use without
any human revisions. However, machine-generated perfor-
mances do not always reach the level of music designers’
intents. The designers must prepare examples that match
their intentions. This is not an easy task. Automatic sys-
tems are good at tasks for which processes and outputs are
fixed. On the contrary, they may make tasks involving hu-
man operators less efficient [12].
In this paper, we discuss a model for musical expression
design and propose a performance-rendering system called
Mixtract with the following premises: 1) A human, not a
machine, designs musical expressions, and 2) A machine
should make the process of interacting with a human more
productive. This paper focuses on humans polishing mu-
sical expressions, whereas a typical interaction process to
generate expression design is to add expression by conduct-
ing [9, 11]. This means that our approach tries to assist a
conductor in musical interpretation and rehearsal of an or-
chestra. The remainder of this paper is organized as follows.
In Section 2, we summarize the tasks that the human de-
signer should perform and that machines should assist with
based on an argument of the variety of musical expressions
and musical theoretical constraints. In Section 3, we specify
the performance-rendering system Mixtract, based on the
arguments in Section 2. In Section 4, we present a concrete
procedure for performance design using Mixtract, introduc-
ing the GUIs. In Section 5, we discuss musical expression
design further.
2. Musical Expression Design
2.1. A Conductor’s Task
Most people may imagine a conductor’s role as being to
conduct orchestras in concert halls. Before that, conductors
have more important tasks, including musical interpretation
and rehearsal of their orchestra. Musical interpretation is
the analysis of musical qualities such as rhythm, harmony,
phrases, melody and counterpoints to create a structure of
their own and to design their performance expression.
After completing a musical interpretation, the conductor
rehearses with the orchestra members to convey his/her mu-
sical interpretive intent. It is a major concern for a conduc-
tor to convey his/her precise intent regarding performance
expression. In a series of rehearsals, conductors clarify
phrasing outlines, the important notes (apex), and their ex-
pressions. When the level of musicality of the conductor
and the orchestra is sufficient, these goals are achieved.
978-1-4244-4799-2/09/$25.00 c©2009 IEEE
Figure 1. Conceptual basis of Mixtract
The spotlight of conducting is live performance on the
stage, to be sure, but execution of musical interpretation
and rehearsals are the essence of performance design for
the conductor. Emulating and assisting this process is the
starting point of our development of a performance design
system, and we thus present Mixtract, which has the follow-
ing functions:
1. Preparing a pair of piano-roll views that have different
time-scale axes.
2. Editing hierarchical phrase expressions.
3. Directly giving individual note expressions such as an
apex of phrase.
4. Supporting analysis mechanisms for hierarchical
phrase structures that maintain user preferences.
2.2. Directability
Automation is one of the most significant purposes of
technology and development in the twenty-first century, and
we enjoy the outcomes of automation in our daily lives. Au-
tomation is very useful when the task is fixed and can pro-
ceed without any human operation. However, if this condi-
tion is not guaranteed, automation instead irritates us.
Music cannot exist without human introspection. We
should consider the interaction process between humans
and computers in supporting music design. We thus in-
troduce the design goal concept of ‘directability,’ named
for the act of conducting a music, in place of automation.
Imagine the relationship between a conductor and his/her
orchestra. This relationship should foster designer thought.
To design ‘directability,’ design standards suggested by D.
Norman [12] apply:
1. Provide rich, complex, and natural signals.
2. Be predictable.
3. Provide a good conceptual model.
4. Make the output understandable.
5. Provide continual awareness without annoyance.
6. Exploit natural mappings.
3. A Directable Musical Expression System
Mixtract is an interactive performance-rendering system
that focuses on helping users with their design of phrase ex-
pressions. We attached importance to providing ‘directabil-
ity’ in designing the user interfaces of Mixtract. Figure 1
illustrates the outline of the system.
One of the key points of Mixtract is the decomposition
of expressions of agogics and dynamics related to each of
the hierarchical phrases as we have introduced in the design
of jPop-E [7]. The major difference between Mixtract and
jPop-E is that we rebuild jPop-E to Mixtract as it can pro-
vide ‘directability’ for the users, and foster the designers’
thought eventually.
Mixtract provides two piano-roll views of phrases, the
Phrasing View and Timeline View, to assist in each step in
designing musical expression. The horizontal axis of the
Phrasing View is the quantized time value, while that of the
Timeline View is real time. Users are allowed to switch be-
tween the Phrasing View and the Timeline View, and are
Users can monitor which
GTTM (GPR) rules are
efficient for dividing phrases
Figure 2. Warnings of the GPRs confliction
Figure 3. Automatic analysis of hierarchical phrase structure
allowed to enter expression of each note using both views.
Users give the system data regarding hierarchical phrase
structure using Phrasing View. Users are allowed to use an
automatic function to analyse a hierarchical phrase structure
based on exGTTM[5], if necessary. Users can also use the
Phrasing View to edit the parameters of agogics and dynam-
ics of each of the hierarchical phrases. Users mainly use the
Timeline View when editing delicate nuances of a certain
note directly and grasping real time transitions in the musi-
cal sequence.
3.1. Editing hierarchical phrase structures
The design principle of Mixtract is that users are respon-
sible for all phrase structure analysis and parameter editing.
However, if the users are required to do everything, the task
becomes too tedious, and it would be difficult for the users
to concentrate on designing expressions.
In order to make users free from the tedious work regard-
ing giving a hierarchical phrase structure, Mixtract supports
automatic analysis functions of hierarchical phrase structure
based on exGTTM[5]. exGTTM is an extension of GTTM
(Generative Theory of Tonal Music)[10], as it may work
as a computational model. What the Mixtract user has to
do is giving the phrase segments, i.e. phrase boundaries
which (s)he, especially wants to specify directly. The spec-
ified boundaries don’t have to be ones that the automatic
analysis function recommends primarily. If the group spec-
ified by the user conflicts with the GPRs of exGTTM, then
the system displays warnings of the GPRs confliction (See
Figure 2). The other remaining phrase segments including
those of lower and higher structural layers are analyzed au-
tomatically while maintaining the phrase group specified by
the user, using the automatic analysis function (See Figure
3). exGTTM is powerful, yet it cannot always output the
Figure 4. Calculation of performance prarametersFigure 5. Windows for editing parameters
Figure 6. Synchronization between multi-parts
best hierarchical phrase structure. We have to admit that the
result of musical structure analysis might not be unique. It
is rational to apply an automation technology to make up
for the shortage of the users’ specification.
3.2. Editing and calculating expression parameters
The outline of the agogic and dynamics of the perfor-
mance are given by the product of the parameters assigned
to each of the hierarchical phrase, as shown in Figure 4. If
the user clicks a phrase segment of Phrasing View, then a
window for editing the parameters of the phrase emerges
(See Figure 5). The user can edit the parameters of the
agogics and dynamics for a phrase by selecting line mode or
freehand mode. Mixtract also provides a function to utilize
the parameters of a phrase segment of the other performance
examples [6]. For this goal, a phrase segment search engine
based on a melodic similarity is also provided.
The user can manipulate parameters of each note di-
rectly. If a user wants to edit delicate nuances of certain
notes such as apexes in a phrase, then they can directly ma-
nipulate the attack time, length and velocity of each note on
both Phrasing View and Timeline View.
3.3. Synchronizing the timing of multiple parts
Mixtract considers generating a natural expression for
multi-part music. To produce a natural performance of en-
semble music, it should suffice to provide an independent
phrasing expression to every part and to align several tim-
ings of sound between parts accordingly. To satisfy these
requirements, every part in Mixtract is given an independent
phrase structure and expression. This procedure produces a
temporal gap in the occupancy time, which is necessary for
a performance consisting of more than one part. To solve
this problem and maintain the individual expressions of the
parts, we need to estimate synchronization points to align
parts. We introduced a synchronization process based on
[7]; that aligns the onset time of each side with phrases in
those parts based on that of an attentive part which is a pri-
mary note sequence of a piece.
Figure 6 illustrates the outline of this process. We sched-
ule the individual timing of all parts according to the timing
of the attentive part. The synchronization points are identi-
(1) if a boundary is moved...
(2) the strucure is immediately re-analysed
Figure 7. An Example of editing phrase strucure
fied by comparing the phrase structures given to each part.
These points indicate the onset times of the beginning or
the last note in a phrase. The notes at the synchronization
point sound at the same time. In the area between adja-
cent synchronization points, the relevant non-attentive parts
are scaled linearly while maintaining the ratio of the note
lengths.
4. Performance Generation and Discussion
4.1. Performance Generation using Mixtract
Mixtract accepts a MusicXML format file, that most ma-
jor notation applications can export, as the target score.
Users generate expressive performances with the following
steps:
Step 1 Suggest a sequence of the phrase segments, i.e.
phrase boundaries that they desire to specify.Step 2 Apply the automatic analysis function to obtain the
whole hierarchical phrase structure.Step 3 If there remain phrase segments which are not sat-
isfactory, then go back to Step 1, or go to Step 4 (see
Fig. 7).Step 4 Draw the rough shapes of each phrase expression of
dynamics and tempo (see Fig. 5). At this step, users
can import parameters of a phrase segment from the
other performance examples, if desired.Step 5 Listen to the performance. If it is satisfactory, then
go to Step 6, or go back to Step 4.Step 6 Edit delicate nuances of certain notes such as apexes
in a phrase, if desired.Step 7 Listen to the performance again. If it is satisfactory,
it will be completed, or go back to Step 6.
Figure 8 shows a snapshot of a performance design pro-
cess using Mixtract. A demonstration movie is available
from this url [1].
Figure 8. Snapshot of performance generation using Mixtract
4.2. Efficiency of Mixtract
If we make a performance data of typical piano music
which takes around a minutes by using a general musical
sequencer, we have to set the attack time, length and veloc-
ity for more than 500 notes. Using Mixtract, the number of
the phrase segment that a user should specify is less than
20. Then the number of the phrase segments that the auto-
matic phrase analyzer obtains will be around 250 in total.
This number contains small phrase segments, for instance,
combination two of eighth notes. If limited to the phrases
of a sketchy level, i.e., more than 1 bar, the number of them
will be around 30. This means that the Mixtract user has
to specify the rough shapes of expression of dynamics and
tempo around 30 times. Even so, we believe that the mu-
sical performance design using Mixtract is more productive
than using the other commercial musical sequencers. In ad-
dition, if a function to import parameters of a phrase seg-
ment from the other performance examples is used, the user
is able to omit some of the process to give the rough shapes
of the expression.
4.3. Related works
Recently, some of the commercial notation products
are equipped with expressive performance generation func-
tions. For example, Finale, a de facto standard musical nota-
tion software, provides a function called Human-playback.
Users of the Human-playback chose and apply a musi-
cal style template from those of Baroque, Roman, Jazz,
etc., and expressive performances are obtained with one
click operation. Revising expression marks and the con-
trol weight parameters, more expressive performances will
be obtained. Expressive performance generation functions
of commercial products are very handy. However, it is im-
possibly difficult to elaborate nuances of expressive perfor-
mances with the functions only.
Mixtract is an interactive performance-rendering system.
It is also regarded as a graphical music editing system.
From this point of view, there are many related works to
Mixtract. As for graphical music editing systems, UPIC and
Iannix [4] are the most famous systems and historically im-
portant. UPIC is a computerized musical composition tool
devised by the composer I. Xenakis in 1977. It consists of
a digitizing tablet linked to a computer; users draw wave-
forms and loudness envelopes then compose those materi-
als on the tablet. UPIC makes oscillated sounds from those
waves. Iannix has taken over UPIC as a multi-formal and
multi-temporal Open Sound Control sequencer. The tablet
assigns the X-axis as representing cumulative duration and
the Y-axis as representing pitch. While the main goal of
UPIC/Iannix is a musical composition, it can be used as an
editor of musical performance expression. The main dif-
ference between Mixtract and UPIC/Iannix is that Mixtract
provides more functions to deal with musical phrase struc-
tures especially suitable for tonal music.
5. Conclusion
We developed the interface Mixtract as a support tool
for music performance design focusing on phrasing. Un-
like the most automatic performance rendering systems to
date, Mixtract assists its user’s music interpretation and to
convey the musical interpretive intent to the system. The
advantage of Mixtract is to shorten time required for mu-
sic performance design. Furthermore, it is more beneficial,
from the viewpoint that it can be a platform to test various
possibilities of music expression. We believe Mixtract can
be a good tool for musical education. We would like to con-
duct experiments to let children use Mixtract and to analyze
the results as a future work.
References
[1] http://mixtract.m-use.net/.
[2] Y. Aono, H. Katayose, and S. Inokuchi. Extraction of expres-
sion parameters with multiple regression analysis. Journal of
Information Processing, 38(7):1473–1481, 1997.
[3] J. Arcos, R. de Mantaras, and X. Serra. Saxex: A case-based
reasoning system for generating expressive musical perfor-
mances. Journal of New Music Research, 27(3):194–210,
1998.
[4] T. Coduys and G. Ferry. Iannix aesthetical/symbolic visual-
isations for hypermedia composition. In In Proceedings In-
ternational Conference Sound and Music Computing (SMC
’04), 2004.
[5] M. Hamanaka, K. Hirata, and S. Tojo. Implementing ”a gen-
erative theory of tonal music”. Journal of New Music Re-
search, 35(4):249–277, December 2006.
[6] M. Hashida and H. Katayose. A directable performance ren-
dering system: Itopul. In Proceedings of New Instruments
on Music Expression (NIME), pages 277–280, 2008.
[7] M. Hashida, N. Nagata, and H. Katayose. jpop-e: An assis-
tant system for performance rendering of ensemble music. In
proc. of New Interface on Musical Expression (NIME) 2007,
pages 313–316, 2007.
[8] O. Ishikawa, H. Katayose, and S. Inokuchi. Identification of
music performance rules based on iterated multiple regres-
sion analysis. Journal of IPSJ, 43(2):268–276, 2002. (writ-
ten in Japanese).
[9] H. Katayose and K. Okudaira. ifp: A music interface us-
ing an expressive performance template. In Entertainment
Computing 2004, Lecture Notes in Computer Science, vol-
ume 3116, pages 529–540, 2004.
[10] Lerdahl and Jackendoff. A Generative Theory of Tonal Mu-
sic. MIT Press, 1983.
[11] M. V. Mathews. The Conductor Program and Mechani-
cal Baton, Current Directions in Computer Music Research,
Cambridge, Massachusetts, pages 263–281. MIT Press,
1983.
[12] D. Norman. The Design of Future Things. Basic Books,
2007.
[13] T. Suzuki. The second phase development of case based per-
formance rendering system ‘kagurame’. Proceedings of In-
ternational Joint Conference of Artificial Intelligent (IJCAI),
2003.
[14] G. Widmer. Machine discoveries: A few simple, robust lo-
cal expression principles. Journal of New Music Research,
31(1):37–50, 2002.