mixtract: a directable musical expression system - … · mixtract: a directable musical expression...

Mixtract: A Directable Musical Expression System

Mitsuyo Hashida, Shunji Tanaka and Haruhiro KatayoseKwansei Gakuin University

2-1 Gakuen, Sanda, Hyogo, 669-1337 Japan{hashida, s.tanaka, katayose}@kwansei.ac.jp

Abstract

This paper describes a music performance design system

focusing on phrasing, the design and development of an in-

tuitive interface to assist music performance design system.

The proposed interface has an editor to control the param-

eter curves of “dynamics” and “tempos” of hierarchical

phrase structures, and supports analysis mechanisms for hi-

erarchical phrase structures that lighten the users’ work for

music interpretation. We are interested in how a system can

assist the users in designing music performances, but not to

develop a full automatic system. Unlike the most automatic

performance rendering systems to date, assisting the pro-

cess of music interpretation and to convey the musical in-

terpretive intent to the system are focused in this paper. The

advantage of the proposed system was verified from short-

ening time required for music performance design. The pro-

posed system is more beneficial from the viewpoint that it

can be a platform to test various possibilities of phrasing

expression.

1. Introduction

Musical works are given life by performance expression,

represented by agogics and dynamics. Performance expres-

sion is as important as composition or arrangement. Perfor-

mance rendering has been one of the main topics since the

dawn of music information science. It is an ideal target to

test the potential of artificial intelligence. Some systems ap-

ply rule-based and machine-learning techniques [2, 8, 14],

whereas others apply example-based reasoning [3, 13].

Some recent performance-rendering systems that use a

corpus of musical expression have generated musical ex-

pression on a level suitable for commercial use without

any human revisions. However, machine-generated perfor-

mances do not always reach the level of music designers’

intents. The designers must prepare examples that match

their intentions. This is not an easy task. Automatic sys-

tems are good at tasks for which processes and outputs are

fixed. On the contrary, they may make tasks involving hu-

man operators less efficient [12].

In this paper, we discuss a model for musical expression

design and propose a performance-rendering system called

Mixtract with the following premises: 1) A human, not a

machine, designs musical expressions, and 2) A machine

should make the process of interacting with a human more

productive. This paper focuses on humans polishing mu-

sical expressions, whereas a typical interaction process to

generate expression design is to add expression by conduct-

ing [9, 11]. This means that our approach tries to assist a

conductor in musical interpretation and rehearsal of an or-

chestra. The remainder of this paper is organized as follows.

In Section 2, we summarize the tasks that the human de-

signer should perform and that machines should assist with

based on an argument of the variety of musical expressions

and musical theoretical constraints. In Section 3, we specify

the performance-rendering system Mixtract, based on the

arguments in Section 2. In Section 4, we present a concrete

procedure for performance design using Mixtract, introduc-

ing the GUIs. In Section 5, we discuss musical expression

design further.

2. Musical Expression Design

2.1. A Conductor’s Task

Most people may imagine a conductor’s role as being to

conduct orchestras in concert halls. Before that, conductors

have more important tasks, including musical interpretation

and rehearsal of their orchestra. Musical interpretation is

the analysis of musical qualities such as rhythm, harmony,

phrases, melody and counterpoints to create a structure of

their own and to design their performance expression.

After completing a musical interpretation, the conductor

rehearses with the orchestra members to convey his/her mu-

sical interpretive intent. It is a major concern for a conduc-

tor to convey his/her precise intent regarding performance

expression. In a series of rehearsals, conductors clarify

phrasing outlines, the important notes (apex), and their ex-

pressions. When the level of musicality of the conductor

and the orchestra is sufficient, these goals are achieved.

978-1-4244-4799-2/09/$25.00 c©2009 IEEE

Figure 1. Conceptual basis of Mixtract

The spotlight of conducting is live performance on the

stage, to be sure, but execution of musical interpretation

and rehearsals are the essence of performance design for

the conductor. Emulating and assisting this process is the

starting point of our development of a performance design

system, and we thus present Mixtract, which has the follow-

ing functions:

1. Preparing a pair of piano-roll views that have different

time-scale axes.

2. Editing hierarchical phrase expressions.

3. Directly giving individual note expressions such as an

apex of phrase.

4. Supporting analysis mechanisms for hierarchical

phrase structures that maintain user preferences.

2.2. Directability

Automation is one of the most significant purposes of

technology and development in the twenty-first century, and

we enjoy the outcomes of automation in our daily lives. Au-

tomation is very useful when the task is fixed and can pro-

ceed without any human operation. However, if this condi-

tion is not guaranteed, automation instead irritates us.

Music cannot exist without human introspection. We

should consider the interaction process between humans

and computers in supporting music design. We thus in-

troduce the design goal concept of ‘directability,’ named

for the act of conducting a music, in place of automation.

Imagine the relationship between a conductor and his/her

orchestra. This relationship should foster designer thought.

To design ‘directability,’ design standards suggested by D.

Norman [12] apply:

1. Provide rich, complex, and natural signals.

2. Be predictable.

3. Provide a good conceptual model.

4. Make the output understandable.

5. Provide continual awareness without annoyance.

6. Exploit natural mappings.

3. A Directable Musical Expression System

Mixtract is an interactive performance-rendering system

that focuses on helping users with their design of phrase ex-

pressions. We attached importance to providing ‘directabil-

ity’ in designing the user interfaces of Mixtract. Figure 1

illustrates the outline of the system.

One of the key points of Mixtract is the decomposition

of expressions of agogics and dynamics related to each of

the hierarchical phrases as we have introduced in the design

of jPop-E [7]. The major difference between Mixtract and

jPop-E is that we rebuild jPop-E to Mixtract as it can pro-

vide ‘directability’ for the users, and foster the designers’

thought eventually.

Mixtract provides two piano-roll views of phrases, the

Phrasing View and Timeline View, to assist in each step in

designing musical expression. The horizontal axis of the

Phrasing View is the quantized time value, while that of the

Timeline View is real time. Users are allowed to switch be-

tween the Phrasing View and the Timeline View, and are

Users can monitor which

GTTM (GPR) rules are

efficient for dividing phrases

Figure 2. Warnings of the GPRs confliction

Figure 3. Automatic analysis of hierarchical phrase structure

allowed to enter expression of each note using both views.

Users give the system data regarding hierarchical phrase

structure using Phrasing View. Users are allowed to use an

automatic function to analyse a hierarchical phrase structure

based on exGTTM[5], if necessary. Users can also use the

Phrasing View to edit the parameters of agogics and dynam-

ics of each of the hierarchical phrases. Users mainly use the

Timeline View when editing delicate nuances of a certain

note directly and grasping real time transitions in the musi-

cal sequence.

3.1. Editing hierarchical phrase structures

The design principle of Mixtract is that users are respon-

sible for all phrase structure analysis and parameter editing.

However, if the users are required to do everything, the task

becomes too tedious, and it would be difficult for the users

to concentrate on designing expressions.

In order to make users free from the tedious work regard-

ing giving a hierarchical phrase structure, Mixtract supports

automatic analysis functions of hierarchical phrase structure

based on exGTTM[5]. exGTTM is an extension of GTTM

(Generative Theory of Tonal Music)[10], as it may work

as a computational model. What the Mixtract user has to

do is giving the phrase segments, i.e. phrase boundaries

which (s)he, especially wants to specify directly. The spec-

ified boundaries don’t have to be ones that the automatic

analysis function recommends primarily. If the group spec-

ified by the user conflicts with the GPRs of exGTTM, then

the system displays warnings of the GPRs confliction (See

Figure 2). The other remaining phrase segments including

those of lower and higher structural layers are analyzed au-

tomatically while maintaining the phrase group specified by

the user, using the automatic analysis function (See Figure

3). exGTTM is powerful, yet it cannot always output the

Figure 4. Calculation of performance prarametersFigure 5. Windows for editing parameters

Figure 6. Synchronization between multi-parts

best hierarchical phrase structure. We have to admit that the

result of musical structure analysis might not be unique. It

is rational to apply an automation technology to make up

for the shortage of the users’ specification.

3.2. Editing and calculating expression parameters

The outline of the agogic and dynamics of the perfor-

mance are given by the product of the parameters assigned

to each of the hierarchical phrase, as shown in Figure 4. If

the user clicks a phrase segment of Phrasing View, then a

window for editing the parameters of the phrase emerges

(See Figure 5). The user can edit the parameters of the

agogics and dynamics for a phrase by selecting line mode or

freehand mode. Mixtract also provides a function to utilize

the parameters of a phrase segment of the other performance

examples [6]. For this goal, a phrase segment search engine

based on a melodic similarity is also provided.

The user can manipulate parameters of each note di-

rectly. If a user wants to edit delicate nuances of certain

notes such as apexes in a phrase, then they can directly ma-

nipulate the attack time, length and velocity of each note on

both Phrasing View and Timeline View.

3.3. Synchronizing the timing of multiple parts

Mixtract considers generating a natural expression for

multi-part music. To produce a natural performance of en-

semble music, it should suffice to provide an independent

phrasing expression to every part and to align several tim-

ings of sound between parts accordingly. To satisfy these

requirements, every part in Mixtract is given an independent

phrase structure and expression. This procedure produces a

temporal gap in the occupancy time, which is necessary for

a performance consisting of more than one part. To solve

this problem and maintain the individual expressions of the

parts, we need to estimate synchronization points to align

parts. We introduced a synchronization process based on

[7]; that aligns the onset time of each side with phrases in

those parts based on that of an attentive part which is a pri-

mary note sequence of a piece.

Figure 6 illustrates the outline of this process. We sched-

ule the individual timing of all parts according to the timing

of the attentive part. The synchronization points are identi-

(1) if a boundary is moved...

(2) the strucure is immediately re-analysed

Figure 7. An Example of editing phrase strucure

fied by comparing the phrase structures given to each part.

These points indicate the onset times of the beginning or

the last note in a phrase. The notes at the synchronization

point sound at the same time. In the area between adja-

cent synchronization points, the relevant non-attentive parts

are scaled linearly while maintaining the ratio of the note

lengths.

4. Performance Generation and Discussion

4.1. Performance Generation using Mixtract

Mixtract accepts a MusicXML format file, that most ma-

jor notation applications can export, as the target score.

Users generate expressive performances with the following

steps:

Step 1 Suggest a sequence of the phrase segments, i.e.

phrase boundaries that they desire to specify.Step 2 Apply the automatic analysis function to obtain the

whole hierarchical phrase structure.Step 3 If there remain phrase segments which are not sat-

isfactory, then go back to Step 1, or go to Step 4 (see

Fig. 7).Step 4 Draw the rough shapes of each phrase expression of

dynamics and tempo (see Fig. 5). At this step, users

can import parameters of a phrase segment from the

other performance examples, if desired.Step 5 Listen to the performance. If it is satisfactory, then

go to Step 6, or go back to Step 4.Step 6 Edit delicate nuances of certain notes such as apexes

in a phrase, if desired.Step 7 Listen to the performance again. If it is satisfactory,

it will be completed, or go back to Step 6.

Figure 8 shows a snapshot of a performance design pro-

cess using Mixtract. A demonstration movie is available

from this url [1].

Figure 8. Snapshot of performance generation using Mixtract

4.2. Efficiency of Mixtract

If we make a performance data of typical piano music

which takes around a minutes by using a general musical

sequencer, we have to set the attack time, length and veloc-

ity for more than 500 notes. Using Mixtract, the number of

the phrase segment that a user should specify is less than

20. Then the number of the phrase segments that the auto-

matic phrase analyzer obtains will be around 250 in total.

This number contains small phrase segments, for instance,

combination two of eighth notes. If limited to the phrases

of a sketchy level, i.e., more than 1 bar, the number of them

will be around 30. This means that the Mixtract user has

to specify the rough shapes of expression of dynamics and

tempo around 30 times. Even so, we believe that the mu-

sical performance design using Mixtract is more productive

than using the other commercial musical sequencers. In ad-

dition, if a function to import parameters of a phrase seg-

ment from the other performance examples is used, the user

is able to omit some of the process to give the rough shapes

of the expression.

4.3. Related works

Recently, some of the commercial notation products

are equipped with expressive performance generation func-

tions. For example, Finale, a de facto standard musical nota-

tion software, provides a function called Human-playback.

Users of the Human-playback chose and apply a musi-

cal style template from those of Baroque, Roman, Jazz,

etc., and expressive performances are obtained with one

click operation. Revising expression marks and the con-

trol weight parameters, more expressive performances will

be obtained. Expressive performance generation functions

of commercial products are very handy. However, it is im-

possibly difficult to elaborate nuances of expressive perfor-

mances with the functions only.

Mixtract is an interactive performance-rendering system.

It is also regarded as a graphical music editing system.

From this point of view, there are many related works to

Mixtract. As for graphical music editing systems, UPIC and

Iannix [4] are the most famous systems and historically im-

portant. UPIC is a computerized musical composition tool

devised by the composer I. Xenakis in 1977. It consists of

a digitizing tablet linked to a computer; users draw wave-

forms and loudness envelopes then compose those materi-

als on the tablet. UPIC makes oscillated sounds from those

waves. Iannix has taken over UPIC as a multi-formal and

multi-temporal Open Sound Control sequencer. The tablet

assigns the X-axis as representing cumulative duration and

the Y-axis as representing pitch. While the main goal of

UPIC/Iannix is a musical composition, it can be used as an

editor of musical performance expression. The main dif-

ference between Mixtract and UPIC/Iannix is that Mixtract

provides more functions to deal with musical phrase struc-

tures especially suitable for tonal music.

5. Conclusion

We developed the interface Mixtract as a support tool

for music performance design focusing on phrasing. Un-

like the most automatic performance rendering systems to

date, Mixtract assists its user’s music interpretation and to

convey the musical interpretive intent to the system. The

advantage of Mixtract is to shorten time required for mu-

sic performance design. Furthermore, it is more beneficial,

from the viewpoint that it can be a platform to test various

possibilities of music expression. We believe Mixtract can

be a good tool for musical education. We would like to con-

duct experiments to let children use Mixtract and to analyze

the results as a future work.

References

[1] http://mixtract.m-use.net/.

[2] Y. Aono, H. Katayose, and S. Inokuchi. Extraction of expres-

sion parameters with multiple regression analysis. Journal of

Information Processing, 38(7):1473–1481, 1997.

[3] J. Arcos, R. de Mantaras, and X. Serra. Saxex: A case-based

reasoning system for generating expressive musical perfor-

mances. Journal of New Music Research, 27(3):194–210,

1998.

[4] T. Coduys and G. Ferry. Iannix aesthetical/symbolic visual-

isations for hypermedia composition. In In Proceedings In-

ternational Conference Sound and Music Computing (SMC

’04), 2004.

[5] M. Hamanaka, K. Hirata, and S. Tojo. Implementing ”a gen-

erative theory of tonal music”. Journal of New Music Re-

search, 35(4):249–277, December 2006.

[6] M. Hashida and H. Katayose. A directable performance ren-

dering system: Itopul. In Proceedings of New Instruments

on Music Expression (NIME), pages 277–280, 2008.

[7] M. Hashida, N. Nagata, and H. Katayose. jpop-e: An assis-

tant system for performance rendering of ensemble music. In

proc. of New Interface on Musical Expression (NIME) 2007,

pages 313–316, 2007.

[8] O. Ishikawa, H. Katayose, and S. Inokuchi. Identification of

music performance rules based on iterated multiple regres-

sion analysis. Journal of IPSJ, 43(2):268–276, 2002. (writ-

ten in Japanese).

[9] H. Katayose and K. Okudaira. ifp: A music interface us-

ing an expressive performance template. In Entertainment

Computing 2004, Lecture Notes in Computer Science, vol-

ume 3116, pages 529–540, 2004.

[10] Lerdahl and Jackendoff. A Generative Theory of Tonal Mu-

sic. MIT Press, 1983.

[11] M. V. Mathews. The Conductor Program and Mechani-

cal Baton, Current Directions in Computer Music Research,

Cambridge, Massachusetts, pages 263–281. MIT Press,

1983.

[12] D. Norman. The Design of Future Things. Basic Books,

2007.

[13] T. Suzuki. The second phase development of case based per-

formance rendering system ‘kagurame’. Proceedings of In-

ternational Joint Conference of Artificial Intelligent (IJCAI),

2003.

[14] G. Widmer. Machine discoveries: A few simple, robust lo-

cal expression principles. Journal of New Music Research,

31(1):37–50, 2002.

mixtract: a directable musical expression system - … · mixtract: a directable musical expression...

Documents