science by ear diss decampo
TRANSCRIPT
Institute of Electronic Music and Acoustics - IEM,University for Music and Dramatic Arts Graz
Science By Ear.An Interdisciplinary Approach to Sonifying ScientificData
Alberto de Campo
Dissertation
Graz, February 23, 2009
Supervisor:Prof Dr Robert Holdrich (IEM/KUG),Prof Dr Curtis Roads (MAT/UCSB)
ii
Science By Ear.An Interdisciplinary Approach to Sonifying Scientific Data
Author: Alberto de CampoContact: [email protected]
Supervisor:Prof Dr Robert Holdrich (IEM/KUG),Prof Dr Curtis Roads (MAT/UCSB)Contact: [email protected], [email protected]
DissertationInstitute of Electronic Music and Acoustics - IEM,University for Music and Dramatic Arts GrazInffeldgasse 10, A-8020 Graz, Austria
February 23, 2009, 211 pages
Abstract
Sonification of Scientific Data is intrinsically interdisciplinary: It requires collaborationbetween experts in the respective scientific domains, in psychoacoustics, in artistic designof synthetic sound, and in working with appropriate programming environments. TheSonEnvir project hosted at IEM Graz put this view into practice: in four domain sciences,sonification designs for current research questions were realised.
This dissertation contributes to sonification research in three aspects:
The body of sonification designs realised within the SonEnvir context is described, whichmay be reused in sonification research in different ways.
The software framework built with and for these sonification designs is presented, whichsupports fluid experimentation with evolving sonification designs.
A theoretical model for sonification design work, the Sonification Design Space Map, wassynthesised based the analysis of this body of sonification designs (and a few selectedothers). This model allows systematic reasoning about the process of creating sonifica-tion designs, and provides concepts for analysing and categorising existing sonificationsdesigns more systematically.
Deutsche Zusammenfassung - German abstract
Die Sonifikation von wissenschaftlichen Daten ist intrinsisch interdisziplinar: Sie verlangtZusammenarbeit zwischen ExpertInnen in den jeweiligen wissenschaftlichen Gebieten, inPsychoakustik, in der kunstlerischen Gestaltung von synthetischem Klang, und in derArbeit mit geeigneten Programmierumgebungen. Das Projekt SonEnvir, das am IEMGraz stattfand, hat diese Sichtweise in die Praxis umgesetzt: in vier wissenschaftlichenGebieten (domain sciences) wurden Sonifikations-Designs zu aktuellen Forschungsfragenrealisiert.
iii
Diese Dissertation tragt drei Aspekte zur Sonifikationforschung bei:
Der Korpus der im Kontext von SonEnvir entwickelten Sonification Designs wird detail-liert beschrieben; diese Designs konnen in der Forschungsgemeinschaft in verschiedenerWeise Weiterverwendung finden.
Das Software-Framework, das fur und mit diesen Designs gebaut wurde, wird beschrieben;es erlaubt fliessendes Experimentieren in der Entwicklung von Sonifikationsdesigns.
Ein theoretisches Modell fur die Gestaltung von Sonifikationen, die Sonification DesignSpace Map, wurde auf Basis der Analysen dieser (und ausgewahlter anderer) Designssynthetisiert. Dieses Modell erlaubt systematisches Nachdenken (reasoning) uber denGestaltungsprozess von Sonifikationsdesigns, und bietet Konzepte fur die Analyse undKategorisierung existierender Sonifikationsdesigns an.
Keywords: Sonification, Sonification Theory, Perceptualisation, Interdisciplinary Re-search, Interactive Software Development, Just In Time Programming
iv
Acknowledgements
First of all, I would like to thank Marianne Egger de Campo for designing several versionsof the XENAKIS proposal with me - a sonification project with European partners thateventually became SonEnvir. Then, I would like to thank my research partners in theSonEnvir project: Christian Daye, Christopher Frauenberger, Kathi Vogt and AnnetteWallisch, without whom this work would not have been possible. I would like to thankRobert Holdrich for his collaboration on the grant proposals, and for his contribution tothe EEG realtime sonification; and Gerhard Eckel for leading the SonEnvir project formost of its lifetime.
I would like to thank the participants of the Science By Ear workshop, who have been veryopen to a very particular experimental setup in interdisciplinary collaboration, especiallyfor the discussions which eventually led to formulating the concept of the SonificationDesign Space Map. A very special thank you is in order for the brave people whowere willing to try programming sonification designs just-in-time within this workshop:Till Bovermann, Christopher Frauenberger, Thomas Musil, Sandra Pauletto, and JulianRohrhuber.
For the Spin Models, the following Science By Ear participants also worked on a sonifica-tion design for the Ising model (besides the SonEnvir team): Thomas Hermann, HaraldMarkum, Julian Rohrhuber and Tony Stockman. Concerning the background in theo-retical physics, we would also like to thank Christof Gattringer, Christian Bernd Lang,Leopold Mathelitsch and Ulrich Hohenester.
For the piece Navegar, I would to thank Peter Jakober for researching the detailedtimeline, and Marianne Egger de Campo for suggesting the Gini index as an interestingvariable.
Alberto de Campo Graz, February 23, 2009
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Overview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Psychoacoustics, Perception, Cognition, and Interaction 6
2.1 Psychoacoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Auditory perception and memory . . . . . . . . . . . . . . . . . . . . . 8
2.3 Cognition, action, and embodiment . . . . . . . . . . . . . . . . . . . . 10
2.4 Perception, perceptualisation and interaction . . . . . . . . . . . . . . . 11
2.5 Mapping, mixing and matching metaphors . . . . . . . . . . . . . . . . 12
3 Sonification Systems 13
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 A short history of sonification . . . . . . . . . . . . . . . . . . . 14
3.1.2 A taxonomy of intended sonification uses . . . . . . . . . . . . . 17
3.2 Sonification toolkits, frameworks, applications . . . . . . . . . . . . . . 18
3.2.1 Historic systems . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Current systems . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Music and sound programming environments . . . . . . . . . . . . . . . 20
3.4 Design of a new system . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 Requirements of an ideal sonification environment . . . . . . . . 23
3.4.2 Platform choice . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 SonEnvir software - Overall scope . . . . . . . . . . . . . . . . . . . . . 24
3.5.1 Software framework . . . . . . . . . . . . . . . . . . . . . . . . 25
v
vi
3.5.2 Framework structure . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.3 The Data model . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Project Background 29
4.1 The SonEnvir project . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Partner institutions and people . . . . . . . . . . . . . . . . . . 29
4.1.2 Project flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Science By Ear - An interdisciplinary workshop . . . . . . . . . . . . . . 32
4.2.1 Workshop design . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Working methods . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 ICAD 2006 concert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.1 Listening to the Mind Listening . . . . . . . . . . . . . . . . . . 34
4.3.2 Global Music - The World by Ear . . . . . . . . . . . . . . . . . 34
5 General Sonification Models 37
5.1 The Sonification Design Space Map (SDSM) . . . . . . . . . . . . . . . 38
5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1.3 The Sonification Design Space Map . . . . . . . . . . . . . . . . 41
5.1.4 Refinement by moving on the map . . . . . . . . . . . . . . . . 43
5.1.5 Examples from the ’Science by Ear’ workshop . . . . . . . . . . . 47
5.1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1.7 Extensions of the SDS map . . . . . . . . . . . . . . . . . . . . 51
5.2 Data dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.1 Data categorisation . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.2 Data organisation . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.3 Task Data analysis - LoadFlow data . . . . . . . . . . . . . . . . 53
5.3 Synthesis models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.1 Sonification strategies . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.2 Continuous Data Representation . . . . . . . . . . . . . . . . . 57
5.3.3 Discrete Data Representation . . . . . . . . . . . . . . . . . . . 61
5.3.4 Parallel streams . . . . . . . . . . . . . . . . . . . . . . . . . . 62
vii
5.3.5 Model Based Sonification . . . . . . . . . . . . . . . . . . . . . 63
5.4 User, task, interaction models . . . . . . . . . . . . . . . . . . . . . . . 64
5.4.1 Background - related disciplines . . . . . . . . . . . . . . . . . . 64
5.4.2 Music interfaces and musical instruments . . . . . . . . . . . . . 65
5.4.3 Interactive sonification . . . . . . . . . . . . . . . . . . . . . . . 66
5.4.4 ”The Humane Interface” and sonification . . . . . . . . . . . . . 67
5.4.5 Goals, tasks, skills, context . . . . . . . . . . . . . . . . . . . . 69
5.4.6 Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Spatialisation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5.1 Speaker-based sound rendering . . . . . . . . . . . . . . . . . 75
5.5.2 Headphones . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5.3 Handling speaker imperfections . . . . . . . . . . . . . . . . . 80
6 Examples from Sociology 81
6.1 FRR Log Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.1.1 Technical background . . . . . . . . . . . . . . . . . . . . . . . 82
6.1.2 Analysis steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.1.3 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.4 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.1.5 Evaluation for the research context . . . . . . . . . . . . . . . . 88
6.1.6 Evaluation in SDSM terms . . . . . . . . . . . . . . . . . . . . 88
6.2 ’Wahlgesange’ - ’Election Songs’ . . . . . . . . . . . . . . . . . . . . . 90
6.2.1 Interface and sonification design . . . . . . . . . . . . . . . . . . 91
6.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3 Social Data Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.2 Interaction design . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.3 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7 Examples from Physics 98
7.1 Quantum Spectra sonification . . . . . . . . . . . . . . . . . . . . . . . 100
7.1.1 Quantum spectra of baryons . . . . . . . . . . . . . . . . . . . . 101
7.1.2 The Quantum Spectra Browser . . . . . . . . . . . . . . . . . . 101
viii
7.1.3 The Hyperfine Splitter . . . . . . . . . . . . . . . . . . . . . . . 104
7.1.4 Possible future work and conclusions . . . . . . . . . . . . . . . 107
7.2 Sonification of Spin models . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2.1 Physical background . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2.2 Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.2.3 Potts model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.4 Audification-based sonification . . . . . . . . . . . . . . . . . . 114
7.2.5 Channel sonification . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.6 Granular sonification . . . . . . . . . . . . . . . . . . . . . . . . 117
7.2.7 Sonification of self-similar structures . . . . . . . . . . . . . . . 119
7.2.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8 Examples from Speech Communication and Signal Processing 122
8.1 Time Series Analyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.1.1 Mathematical background . . . . . . . . . . . . . . . . . . . . . 123
8.1.2 Sonification tools . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.1.3 The PDFShaper . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.1.4 TSAnalyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.2 Listening test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.2.1 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.2.2 Listening experiment . . . . . . . . . . . . . . . . . . . . . . . . 128
8.2.3 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9 Examples from Neurology 134
9.1 Auditory screening and monitoring of EEG data . . . . . . . . . . . . . . 134
9.1.1 EEG and sonification . . . . . . . . . . . . . . . . . . . . . . . . 134
9.1.2 Rapid screening of long-time EEG recordings . . . . . . . . . . . 135
9.1.3 Realtime monitoring during EEG recording sessions . . . . . . . . 136
9.2 The EEG Screener . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
9.2.1 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . 136
9.2.2 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3 The EEG Realtime Player . . . . . . . . . . . . . . . . . . . . . . . . 140
9.3.1 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . . 141
ix
9.3.2 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9.4 Evaluation with user tests . . . . . . . . . . . . . . . . . . . . . . . . 144
9.4.1 EEG test data . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9.4.2 Initial pre-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
9.4.3 Tests with expert users . . . . . . . . . . . . . . . . . . . . . . 145
9.4.4 Analysis of expert user tests EEG Screener 1 vs. 2 . . . . . . . . 146
9.4.5 Analysis of expert user tests - RealtimePlayer 1 vs. 2 . . . . . . 147
9.4.6 Qualitative results for both players (versions 2) . . . . . . . . . 149
9.4.7 Conclusions from user tests . . . . . . . . . . . . . . . . . . . . 149
9.4.8 Next steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
9.4.9 Evaluation in SDSM terms . . . . . . . . . . . . . . . . . . . . 150
10 Examples from the Science by Ear Workshop 151
10.1 Rainfall data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.2 Polysaccharides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
10.2.1 Polysaccharides - Materials made by nature . . . . . . . . . . . . 156
10.2.2 Session notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
11 Examples from the ICAD 2006 Concert 160
11.1 Life Expectancy - Tim Barrass . . . . . . . . . . . . . . . . . . . . . . 160
11.2 Guernica 2006 - Guillaume Potard . . . . . . . . . . . . . . . . . . . . 162
11.3 ’Navegar E Preciso, Viver Nao E Preciso’ . . . . . . . . . . . . . . . . 163
11.3.1 Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
11.3.2 The route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
11.3.3 Data choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
11.3.4 Economic characteristics . . . . . . . . . . . . . . . . . . . . . 167
11.3.5 Access to drinking water . . . . . . . . . . . . . . . . . . . . . 168
11.3.6 Mapping choices . . . . . . . . . . . . . . . . . . . . . . . . . 168
11.4 Terra Nullius - Julian Rohrhuber . . . . . . . . . . . . . . . . . . . . . 169
11.4.1 Missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.4.2 The piece . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.5 Comparison of the pieces . . . . . . . . . . . . . . . . . . . . . . . . . 172
12 Conclusions 175
x
12.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
A The SonEnvir framework structure in subversion 177
A.1 The folder ’Framework’ . . . . . . . . . . . . . . . . . . . . . . . . . . 177
A.2 The folder ’SC3-Support’ . . . . . . . . . . . . . . . . . . . . . . . . . 178
A.3 Other folders in the svn repository . . . . . . . . . . . . . . . . . . . . . 178
A.4 Quarks-SonEnvir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
A.5 Quarks-SuperCollider . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
B Models - code examples 180
B.1 Spatialisation examples . . . . . . . . . . . . . . . . . . . . . . . . . . 180
B.1.1 Physical sources . . . . . . . . . . . . . . . . . . . . . . . . . 180
B.1.2 Amplitude panning . . . . . . . . . . . . . . . . . . . . . . . . 181
B.1.3 Ambisonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
B.1.4 Headphones . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
B.1.5 Handling speaker imperfections . . . . . . . . . . . . . . . . . 186
C Physics Background 189
C.1 Constituent Quark Models . . . . . . . . . . . . . . . . . . . . . . . . . 189
C.2 Potts model- theoretical background . . . . . . . . . . . . . . . . . . . 192
C.2.1 Spin models sound examples . . . . . . . . . . . . . . . . . . . . 193
D Science By Ear participants 195
E Background on ’Navegar’ 197
F Sound, meaning, language 198
List of Tables
5.1 Scale types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 The Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 The Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.4 The Data/Information: . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.5 The Data: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1 Sectors of economic activities . . . . . . . . . . . . . . . . . . . . . . . 95
9.1 Equally spaced EEG band ranges. . . . . . . . . . . . . . . . . . . . . . 135
9.2 Questionnaire scales for EEG sonification designs . . . . . . . . . . . . . 146
11.1 Navegar - Mappings of data to sound parameters . . . . . . . . . . . . . 169
11.2 Some stations along the timeline of ’Navegar’ . . . . . . . . . . . . . . . 170
B.1 Remapping spatial control values . . . . . . . . . . . . . . . . . . . . . 182
E.1 Os Argonautas - Caetano Veloso . . . . . . . . . . . . . . . . . . . 197
xi
List of Figures
2.1 Some aspects of auditory memory, from Snyder (2000). . . . . . . . . . 9
3.1 Inclined plane for Galilei’s experiments on the law of falling bodies. . . . 15
3.2 UML diagram of the data model. . . . . . . . . . . . . . . . . . . . . . 27
5.1 The Sonification Design Space Map . . . . . . . . . . . . . . . . . . . 42
5.2 SDS Map for designs with varying numbers of streams. . . . . . . . . . . 46
5.3 All design steps for the LoadFlow dataset. . . . . . . . . . . . . . . . . 48
5.4 LoadFlow - time series of dataset (averaged over many households) . . . 55
5.5 LoadFlow - time series for 3 individual households . . . . . . . . . . . . 55
6.1 The toilet prototype system used for the FRR field test. . . . . . . . . . 83
6.2 Graphical display of one usage episode (Excel). . . . . . . . . . . . . . . 85
6.3 FRR Log Player GUI and sounds mixer. . . . . . . . . . . . . . . . . . 87
6.4 SDS Map for the FRR Log Player. . . . . . . . . . . . . . . . . . . . . 89
6.5 GUI Window for the Wahlgesange Design. . . . . . . . . . . . . . . . . 91
6.6 SDS-Map for Wahlgesange. . . . . . . . . . . . . . . . . . . . . . . . . 94
6.7 GUI Window for the Social Data Explorer. . . . . . . . . . . . . . . . . 96
7.1 Excitation spectra of N (left) and ∆ (right) particles. . . . . . . . . . . 101
7.2 The QuantumSpectraBrowser GUI. . . . . . . . . . . . . . . . . . . . . 103
7.3 The Hyperfine Splitter GUI. . . . . . . . . . . . . . . . . . . . . . . . . 106
7.4 Schema of spins in the Ising model as an example for Spin models. . . . 110
7.5 Schema of the orders of phase transitions in spin models. . . . . . . . . 111
7.6 GUI for the running 4-state Potts Model in 2D. . . . . . . . . . . . . . . 113
7.7 Audification of a 4-state Potts model. . . . . . . . . . . . . . . . . . . . 115
7.8 Sequentialisation schemes for the lattice used for the audification. . . . . 115
7.9 A 3-state Potts model cooling down from super- to subcritical state. . . 117
xii
xiii
7.10 Granular sonification scheme for the Ising model. . . . . . . . . . . . . . 118
7.11 A self similar structure as a state of an Ising model. . . . . . . . . . . . 119
8.1 The PDFShaper interface . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.2 The TSAnalyser interface . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.3 The interface for the time series listening experiment. . . . . . . . . . . 128
8.4 Probability of correctness over ∆ kurtosis in set 1 . . . . . . . . . . . . 129
8.5 Probability of correctness over ∆ kurtosis in set 2 . . . . . . . . . . . . 130
8.6 Probability of correctness over ∆ skew in set 2 . . . . . . . . . . . . . . 130
8.7 Probability of correctness over ∆ skew and ∆ kurtosis in set 2 . . . . . . 131
8.8 Number of replays over ∆ kurtosis in set 2 . . . . . . . . . . . . . . . . 132
9.1 The Sonification Design Space Map for both EEG Players. . . . . . . . . 137
9.2 The EEGScreener GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
9.3 The Montage Window. . . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.4 EEG Realtime Sonification block diagram. . . . . . . . . . . . . . . . . 142
9.5 The EEG Realtime Player GUI. . . . . . . . . . . . . . . . . . . . . . . 143
9.6 Expert user test ratings for both EEGScreener versions. . . . . . . . . . 147
9.7 Expert user test ratings for both RealtimePlayer versions. . . . . . . . . 148
10.1 Precipitation in the Alpine region, 1980-1991. . . . . . . . . . . . . . . . 152
10.2 Orography of the grid of regions. . . . . . . . . . . . . . . . . . . . . . 153
10.3 SDSM map of Rainfall data set. . . . . . . . . . . . . . . . . . . . . . . 156
11.1 Magellan’s route in Antonio Pigafetta’s travelogue . . . . . . . . . . . . 165
11.2 Magellan’s route, as reported in wikipedia. . . . . . . . . . . . . . . . . 166
11.3 The countries of the world and their Gini coefficients. . . . . . . . . . . 167
11.4 Terra Nullius, latitude zones . . . . . . . . . . . . . . . . . . . . . . . . 171
11.5 SDSM comparison of the ICAD 2006 concert pieces. . . . . . . . . . . . 173
B.1 The Spectralyzer GUI window. . . . . . . . . . . . . . . . . . . . . . . . 187
C.1 Multiplet structure of the baryons as a decuplet. . . . . . . . . . . . . . 191
Chapter 1
Introduction
Sonification of Scientific Data, i.e., the perceptualisation of data by means of sound
in order to find structures and patterns within them, is intrinsically interdisciplinary: It
requires collaboration between experts in the respective scientific domains the data come
from, in psychoacoustics, in the artistic design of synthetic sound, and in working with
appropriate programming environments to realise successful sonification designs. The
concept of the SonEnvir project (hosted at IEM Graz from 2005 to 2007) has put this
view into practice: in four science domains, sonification designs for current research
questions were realised in close collaboration with audio programming specialists.
The research reported here mainly took place in the SonEnvir project context. This
dissertation contributes to sonification research in three ways:
• The body of sonification designs realised within SonEnvir is described in detail.
They may be reused in sonification research by the community, both as concepts
and as open-source implementations on which new solutions can be based.
• For realising these sonification designs, a software framework was built in the lan-
guage SuperCollider3 that allows for flexible, rapid experimentation with evolving
sonification designs (in Just In Time programming style). Being open-source, this
framework may be reused and possibly maintained by the research community in
the future.
• The analysis of this body of sonification designs (and a few others of interest)
has eventually led to a general model of sonification design work, the Sonification
Design Space Map. This contribution to sonification theory allows systematic
reasoning about the process of developing sonification designs; based on data
properties and context, it suggests candidates for the next experimental steps in
the ongoing design process. It also provides concepts for analysing and categorising
existing sonifications designs more systematically.
1
2
1.1 Motivation
Data are pervasive in modern societies: Science, politics, economics, and everyday life
depend fundamentally on data for decisions. Larger and larger amounts of data are
being acquired in the hope of their usefulness, taking advantage of continuing progress
in information technology.
While data may contain obvious information (i.e., well-understood ’content’), very often
one also assumes they contain implicit or even hidden facts about the phenomena ob-
served; understanding these hitherto unknown facts is highly desired. The research field
that most directly addresses this interest is Data Mining, or Exploratory Data Analysis.
Two approaches are in common use for extracting new information from data: One
is statistical analysis, the other is data perceptualisation, i.e, making data properties
perceptible to the human senses; and many existing software tools combine both: from
statistics programs like Excel and SPSS, science and engineering environments like MAT-
Lab and Mathematica, to a host of special-purpose tools for specific domains of science
or economy.
For scientists, perceptualisation of data is of vital interest; it is almost exclusively ap-
proached by visual means for a combination of reasons1. Visualisation tools have per-
meated scientific cultures to the point of being invisible; many scientists are well-versed
in tools that visualize their results, and rarely do scientists question how accurately and
adequately visual representations represent the data content. Many Virtual Reality sys-
tems, such as the CAVE (Cruz-Neira et al. (1992)) and others, claim scientific data
exploration as one of their stronger usage scenarios. Nevertheless, sound often seems to
be added to such systems only as an afterthought, usually with the intention to achieve
better ’immersion’ and emotional engagement (sometimes even alluding to cinema-like
effects as the inspiration for the approach intended).
Sonification, the representation of data by acoustic means, is a potentially useful al-
ternative and complement to visual approaches that has not reached the same level of
acceptance. This is the starting point for the research agenda described here: To create
an interdisciplinary research setting where scientists from different domains (’domain
scientists’) and specialists in artistic audio design and programming (’sound experts’)
work together on auditory representations (’sonification designs’) for specific scientific
data sets and their context. Such a venture should be well positioned to contribute to
the progress of sonification as a scientific discipline. This has been the guiding strategy
for the research project SonEnvir, described in some detail in section 4.1.
The thesis presented here analyses sonification design work done within the SonEnvir
project2. From these designs, it abstracts a general model for approaching sonification
1Availability, traditions of scientific cultures, ease of publishing on paper, and many others.2 These analyses follow the notion of providing ’rich context’, taken from Science Studies (see e.g.
3
design work, from the general Sonification Design Space Map to detailed models of
synthesis, spatialisation, and user interaction, presented in chapter 5. This abstraction
process is based on Grounded Theory (Glaser and Strauss (1967)), aiming to design
flexible theoretical models that capture and explain as much detail as possible of the ob-
servation data collected. Such an integrative approach appears to be the most promising
way forward for sonification as a research discipline.
Finally, it should be noted that scientists are not the only social group that is interested
in the role of data for modern societies: Artists have always taken part in the general
discourse in society, and in recent years, media artists as well as musicians and sound
artists have become interested in creating works of art that represent data in artistically
interesting ways. This aspect certainly played a role in my personal motivation for this
dissertation project.
1.2 Scope
While multimodal display systems are extremely interesting for data exploration, the
complexity of interactions between modalities and individual differences in perception is
considerable. Therefore, the research work in this thesis has been intentionally limited
to audio-centric data representation; however, simple forms of visual representations and
haptic interaction have been provided where it seemed appropriate and helpful.
Abstract representations of data by auditory means are not at all well understood yet;
thus providing collections of different approaches for discussion may well be fruitful
for the community. Special importance has been given to design methodology, and to
considering the human-computer interaction loop; ranging from interaction in the design
process to interactive choices and control in a realtime sonification design.
Sonification designs may be intended for several different uses, with different aims. To
give a few examples:
Presentation entails clear, straightforward auditory demonstration of finished results;
this may be useful in conference lectures, science talks, and similar situations.
Exploration is all about interaction with the data, ’acquiring a feeling for one’s data’;
this must necessarily remain informal, as it is a heuristic for generating hypotheses, which
will be cross-checked and verified later with every analysis tool available.
Analysis requires well-understood, reliable tools for detecting specific phenomena; ac-
cepted by the conventions in the scientific domain they belong to.
In Pedagogy, different students may learn to understand structures/patterns in data
better when presented in different modalities; the auditory approach may be more ap-
Latour and Woolgar (1986); Rheinberger (2006)).
4
propriate and useful for some cases, e.g. people with visual impairments.
This thesis focuses on studying the viability of exploration and analysis of scientific data
by means of sonification; thus we (meaning the author and the SonEnvir team) developed
exemplary cases in close collaboration with the domain scientists, implemented sonifica-
tion designs for these cases, and analysed them to understand their general usefulness.
We built a software framework to support the efficient realisation of these sonification
designs; this is reported on in section 3.5.1, and available as open-source code here3. The
sonification design prototypes developed are also accessible online4 and can be re-used
both as concepts and as fully functional code. Note that the SonEnvir software environ-
ment is not a complete ’big system’, but a flexible, extensible collection of approaches,
and the infrastructure needed to support them.
Ths software environment is freely extensible by others (being open source), and it aims
to shorten development times for Auditory Display design sketches, thus allowing for
freely moving between discussion and fast redesign. It also supports Auditory Display
design pedagogy, as well as other uses, such as artistic projects involving data-related
control of sound and image processes.
1.3 Methodology
The methodology employed in the SonEnvir project is centered on interdisciplinary col-
laboration - domain scientists bring current questions and related data from their research
context, and learn the basic concepts of sonification and auditory perception. The ques-
tions are addressed with sonification design prototypes which are refined in iterative
steps; common understanding and patience while learning is the key to eventual success.
This concept was condensed into an experimental setting of the interdisciplinary work
process: The Science By Ear workshop brought together international sonification ex-
perts, mostly Austrian domain scientists, and audio programming specialists to work on
sonification designs in a very controlled setting, within very short time frames. This
workshop has been received very favorably by the participants, and is reported on in
section 4.2.
The methodology of the thesis is based on Grounded Theory5 (Glaser and Strauss
(1967), see also section 5.1): By looking at a body of sonification designs, and analysing
their context, design approaches and decisions, a general, practice-based model is ab-
stracted: the Sonification Design Space Map (SDSM). Aspects of this model that
3 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/4 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/5In sociology, Grounded Theory is used inductively to create new hypotheses from observations or
data collected with few pre-assumptions; this is in contrast to formulating hypotheses a priori and
testing them by experiments.
5
warrant further detail are given: models for synthesis approaches, spatialisation, and
user/task/interaction.
The sonification designs analysed stem from the following sources:
• Work with SonEnvir domain scientists
• The Science By Ear workshop
• Submissions to the ICAD 2006 concert
1.4 Overview of this thesis
Chapter 2, Psychoacoustics, Perception, Cognition, and Action, provides the necessary
background in psychoacoustics, covering mainly psychoacoustics and auditory cognition
literature that is directly relevant to sonification design work in more detail, rather than
giving a general overview of the psychoacoustics literature.
Chapter 3, Sonification Systems, provides an introduction to sonification and its history,
and covers some current systems that support sonification design work. The software
system implemented for the SonEnvir project is described here from a more general
perspective.
Chapter 4, Sonification and Interdisciplinary Research, provides further details on the
interdisciplinary nature of sonification research; here, the research design of the SonEnvir
project, and two activities within it, namely the Science By Ear workshop and the ICAD
2006 Concert, are described.
Chapter 5, General Sonification Models, is the main contribution to sonification theory
in this thesis. It describes a general model for sonification design work, divided into
several aspects: Overall design decisions and strategies are covered by the Sonification
Design Space Map (SDSM); appropriate synthesis approaches are covered in the Synthe-
sis model; user interaction is covered in the User Interaction model; and spatial aspects
of sonification design are covered in the Spatialisation model.
Chapters 6, 7, 8, and 9 present example sonification designs from the four domain
sciences in SonEnvir, chapter 10 presents designs for two datasets explored in the Science
By Ear workshop, and chapter 11 discusses and compares four works from the ICAD
2006 concert. This is the main practical and analytic contribution in this thesis. These
chapters describe much of the body of sonification designs created within the SonEnvir
project, as well as some others; this body of designs provided the background material
for creating the General Sonification Models.
Chapter 12, Conclusions, positions the scope of work presented within the wider context
of sonification research, and concludes the insights gained.
Chapter 2
Psychoacoustics, Perception, Cognition, and
Interaction
2.1 Psychoacoustics
Psychoacoustics is a branch of psychophysics, the psychological discipline which studies
the relationship between (objective) physical stimuli and their subjective perception by
human beings; psychoacoustics then studies acoustic stimuli and their auditory percep-
tion. Consequently, much of its literature is mainly concerned with the physiological
base of auditory perception, i.e., finding out how perception works by creating stimuli
that force the auditory system into specific interpretations of what it hears.
When considering the stimuli used in traditional psychoacoustics experiments as a world
of sounds, this world has an extremely reduced vocabulary. Of course this reduction
makes perfect sense for experiments which try to clarify how (especially lower level,
more physiological) perceptual mechanisms (assumed to be hard-coded in the ’neural
hardware’) work, but the knowledge thus acquired is often only indirectly useful for
sonification design work.
A number of works are considered major references for the field: For psychoacoustics
in general, Psychoacoustics - Facts and Models (Zwicker and Fastl (1999)) is very
comprehensive; a good introductory textbook that is also accessible for non-specialists
is An Introduction to the Psychology of Hearing (Moore (2004)); Bregman thoroughly
studies the organisation of auditory perception in more complex (and thus nearer to
everyday life) situations in Auditory Scene Analysis (Bregman (1990)); for the spatial
aspects of human hearing, the standard reference is Spatial Hearing (Blauert (1997)).
The typical background of psychoacoustics research is speech, spatial hearing, and music;
sonification is fundamentally different from all of these, possibly with the exception of
conceptual similarity to experimental strands of electronic music.
The main concepts in these sources which are relevant for sonification research are:
6
7
Just Noticeable Differences (JNDs) for different audible properties of sounds (and
consequently, the corresponding synthesis parameters) have been studied exten-
sively; being aware of these helps to make sure that differences in synthetic sounds
will be noticeable by users with normal hearing.
Masking Effects can occur when sonifications produce dense streams of events; under-
standing how these depend on properties of the individual events is important to
avoid perceptually ’losing’ information in the soundscape created by a sonification
design.
Auditory Stream Formation and its rules are essential for multiple stream sonifica-
tion; here it is important to control whether streams will tend to perceptually
segregate or fuse into merged percepts.
Testing Methodology can be employed to verify that sonification users are physically
able to perceive the sensory differences of interest. In effect, this entails writing
auditory tests for sonification designs, such that designers can test that they can
hear the differentiation they are aiming for, and that users can acquire analytical
listening skills from well-controlled examples.
Cognitive and Memory Limits determine how we understand common musical struc-
tures, and in fact, much music intended to be ’accessible’ is created (unknowingly)
conforming to these limits. Sonification design issues from choices for time scal-
ings, to user interface options for quick repetitions, choosing segments to listen
to, and others, also crucially depend on these limits.
More recent research assumes the perspective of Ecological Psychoacoustics (Neuhoff
(2004)), which takes into account that in daily life, hearing usually deals with complex
environments of sounds, and thus allows for considering sonification designs from the
perspective of ecologies of sounds that coexist.
However, in a way sonification research and design work addresses a problem that is
inverse to what psychoacoustics studies: rather than asking how we perceive existing
worlds of sounds, the question in sonification is, how can we create a world of sounds
that can communicate ’meaning’ by aggregates of finely differentiated streams of sound
events?
Bob Snyder actually addresses this inverse problem (i.e., how to create worlds of sounds
that can communicate meaning) directly, if for a more traditional purpose: Music and
Memory (Snyder (2000)) is a textbook for teaching composition to non-musicians in a
perceptually deeply informed way, in a course Snyder gives at the Art Institute of Chicago.
He describes how limitations of perception and memory influence artistic choices, and
explains and demonstrates these with examples from a very wide range of musical cul-
tures and traditions, almost entirely without traditional (Western) music notation. This
8
is intended to give musicians/composers informed free choice to stay within these limi-
tations (and be ’accessible’), or approach and transgress them intentionally. By covering
a wide range of psychoacoustics and auditory perception literature from the perspective
of art practice, and describing it in terms accessible for art students, many of whom
do not have traditional musical or scientific training, Snyder has created a very useful
resource for practicing sonification designers who are willing to learn more about creating
perceptually informed (and artistically interesting) worlds of sounds.
2.2 Auditory perception and memory
This section is a brief summary of the first part of Music and Memory, to provide enough
background for readers to follow auditory perception-related arguments made later.
Figure 2.1 shows a symbolic representation of the current models of both bottom-up
and top-down perceptual processes. Bottom-up processes begin with sound exciting the
eardrums, which gets translated into firing patterns of a large number of auditory nerves
(ca. 30.000) coming from the ears. For a short time, a ’raw’ representation of the
sound just heard remains in echoic memory. This raw signal is being held available for
many concurrent feature extraction processes: these processes can include rather low-
level aspects (which are almost certainly built into the neural ’hardware’) like ascribing
sound components coming from the same direction to the same sound source, but also
higher-level aspects like a surprising harmonic modulation in a piece of music (which is
certainly culturally learned).
The extracted features are then integrated into higher level percepts, often in several
stages; in this process of abstraction, finer details are discarded, e.g. pitches in a
musical context are categorised into a familiar tuning system, and nuances in rhythm
and articulation usually also fade quickly from memory, unless one makes a special effort
to retain them.
Feature extraction interacts very strongly with long term memory: personal auditory
experience determines what is in long term memory, so for any listener, the extracted
features will unconsciously activate related memory content, which may or may not
become conscious. Note that unconsciously activated memories feed back into the
feature extraction processes, potentially priming the perceptual binding that happens
toward specific cultural or personal notions.
Short term memory (STM) is the only conscious part in figure 2.1: perceptual awareness
of what one is hearing now, as well as the few related memories that become activated
enough are the only results of perception one becomes consciously aware of. Short term
memory content can be rehearsed, and thus kept in working memory for a while, which
increases its chance of being committed to long term memory eventually. On average,
9
Figure 2.1: Some aspects of auditory memory, from Snyder (2000), p.6. The connections
shown are only a momentary configuration of the perceptual system, and will continuously
change quite rapidly.
short term auditory memory can keep several seconds of sound around. This depends on
’chunking’: generally, it is assumed that one can keep 7 +- 2 items in working memory
at any moment; however, one can and does increase this number by forming groups of
multiple items, which are then treated by memory as single (bigger) items (again with a
limit of ca. 7 applying).
10
The longer the auditory structures one tries to keep in memory, the more this depends
on abstraction; i.e. forming categories, simplifying detail, and grouping into higher level
items. This imposes a limit that is relevant for sonification contexts: comparing a hard
to categorize structural shape that only becomes recognizable over two minutes to a
potentially similar episode of two minutes one hears an hour later is very difficult.
Generally, while bottom-up processes (usually equalled with perception) are usually as-
sumed to be properties of the human neural system, and thus quite universal for all people
with normal hearing, top-down processes (often equalled with cognition) are more per-
sonal: they depend on cultural learning and are informed by individual experience, and
thus can vary much more between individuals.
2.3 Cognition, action, and embodiment
A closer connection to sonification research, as well as some useful terminology, can be
found in Music Cognition research:
Recent work, e.g. by Marc Leman (Leman and Camurri (2006), and Leman (2006)),
defines terminology that works well for describing what sonification can achieve. Leman
talks of proximal and distal cues: Proximal (near) cues refer to the perceptually relevant
features of auditory events, i.e. the audible properties ’on the surface’ of a sound event;
by contrast, distal (further away) cues are actions inferred by the listeners that are likely
to have caused the proximal cues. One example of distal actions would be a musician’s
physical actions; and a little further away, a performer’s likely intentions behind her
actions would also be considered distal cues.
In recent years, Cognition research has widely moved away from the traditionally abstract
notion of ’cognitive’ (meaning only dealing with symbols, and thus easy to model by
computation); today the idea is widely accepted that cognition is deeply intertwined with
the body, resulting in the concept of Embodied Cognition (see e.g. Anderson (2003));
applying this idea to auditory cognition, Leman says that the perception of gesture
in music involves the whole body (of the performer and the listener). Music listeners
who engage with listening may spontaneously express this by moving along with the
music; when asked in experimental settings to make movements that correspond to the
music they are listening to, even musically untrained listeners can be remarkably good
at imitating performer gestures.
Appropriating this terminology and applying it to sonification, one can describe soni-
fication elegantly in these terms: sound design decisions inform details of the created
streams of sound, i.e. they determine the proximal cues; ideally, these design decisions
lead to perceptual entities (’auditory gestalts’), which can create a sensation of plausible
distal cues behind the proximal cues. In case of success, these distal cues, which arise
11
within the listener’s perception, create an ’implied sense’ in the sounds presented (which
could be called the ’sonificate’); thus these distal cues are likely to be closely related to
’data meaning’ (the equivalent to performers’ gestures, which are commonly taken to
correspond closely to their intentions).
In reflecting on his research on design of experimental electronic music instruments,
David Wessel argues that the equivalent of the ’babbling phase’ (of small infants) is
really essential for electronic music instruments: free-form, purpose-free interaction with
the full possibility space of an instrument allows for more efficient and meaningful learning
of what the instrument is capable of; just like children learn the phonetic possibilities of
their vocal tract by (seemingly random) exploration (Wessel (2006)).
He cites a classic experiment by Held and Hein, where two kittens are acquiring visual
perception skills in very different ways: one kitten can move about the space, while
the other kitten gets the same visual stimuli, but does not make its own choices of
where to move - instead, it has the moving kitten’s choices imposed on it. This second
kitten sustained considerable perceptual impairments. Wessel argues that the role of
sensory-motor engagement is essential in auditory learning, but not well understood yet;
he suggests designing electro-acoustic musical instruments such that they allow for the
described forms of interaction by providing ’control intimacy’, in short low-latency, high-
resolution, multichannel control data from performer gestures. This strategy should
create a long term chance of arriving at the equivalent of virtuosity on (or at least
mastery of) that instrument.
Transposed to the context of sonification for scientific data, this is in full agreement
with an Embodied Cognition perspective, and is another strong argument for allowing as
much ’user’ interaction with sonification tools as possible: from haptic interfaces used
e.g. for dynamic selection of data subsets, to access for tuning sound design parameters,
to fully accessible code that defines how a particular sonification design operates.
2.4 Perception, perceptualisation and interaction
Perception of the physical world is intuitively non-modal and unified: events in the world
are synchronous, so sensory input from different modalities is too1; many multimodal
data exploration projects use virtual environments so that they can provide integrated
visual, auditory and haptic modes for perception and interaction. The argument that
learning is strongly dependent on sensory-motor involvement has found its way into HCI
research literature; here, the common term is ’closing the human-computer interaction
loop’ (see e.g. Dix (1996)).
1One interesting exception here is far away events that are both visible and audible; the puzzling
difference between speeds of sound and light has led to the first measurements of the speed of sound.
12
In the context of sonification research, this has led to a special conference series, the
Interactive Sonification workshops (ISon)2, so far held at Bielefeld (2004) and York
(2007). In a special issue of IEEE Multimedia resulting from ISon2004, the editors
emphasize that learning how to ’play’ a sonification design with physical actions, in
fact similar to a musical instrument, really helps for an enactive understanding of both
the nature of the perceptualisation processes involved and of the data under study
(Hermann and Hunt (2005)). They find that there is a lack of research on how learning
in interactive contexts take place; obviously this applies equally to interactive visual
display applications.
2.5 Mapping, mixing and matching metaphors
Mapping data dimensions to representation parameters always involves choices. Walker
and Kramer (1996) report interesting experiments on this topic: They play through a
number of different permutations of mappings of the same data to the same set of display
parameters, rated by the designers as ’intuitive’, ’okay’, ’bad’, and random, and they
test how well users accomplished defined tasks with them. Expert assumptions turned
out to be not as accurate as they expected; users could learn quite arbitrary mappings
nearly as well as supposedly more ’natural ones’3.
Whether this also holds true for exploratory contexts, when there is no pre-defined goal to
be achieved, is an open question. Here, performance in an easy-to-measure (but trivial)
task is not a very interesting criterium for sonification designs. On the other hand,
it is of course good design to reduce cognitive load while users are involved in data
exploration (by using cognitively simple mappings). For visualisation systems designed
for exploration, the idea of measuring ’insight’ and the number of hypotheses formed in
the exploration process has been suggested recently (Saraiya et al. (2005)); as far as we
know this evaluation strategy has not been applied to exploratory sonification yet.
In de Campo et al. (2004), we make the case that the impression of perceiving the
sources of representations (in Leman’s terms, the distal cues) becomes easier when the
metaphorical distance between the data dimension and the audible representation appears
smaller; i.e., when a reasonably similar concept in the world of sound was found for the
data property to be communicated. For example, almost all time-series data can be
treated as if they were acoustic waveforms, which is what ’audification’ essentially does.
With more complex data, the option of accessing data subsets by interactive choice,
browsing through the data space with different auditory perspectives, can potentially
allow forming new hypotheses on the data.
2 http://interactive-sonification.org/3This paper was republished in a recent special issue of IEEE Spectrum Multimedia on Sonification,
with a new commentary (Walker and Kramer (2005a,b))
Chapter 3
Sonification Systems
In a certain Chinese Encyclopedia, the Celestial Emporium of Benevolent
Knowledge, ”it is written that animals are divided into: (a) Those that
belong to the Emperor, (b) embalmed ones, (c) those that are trained,
(d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h)
those included in the present classification, (i) those that tremble as if
they were mad, (j) innumerable ones, (k) those drawn with a very fine
camelhair brush, (l) others, (m) those that have just broken a flower
vase, (n) those that from a long way off look like flies.”
in Jorge Luis Borges - The Analytical Language of John Wilkins
Borges (1980)
Perceptualisation of scientific data by visualisation has been extremely successful. It is
by now completely established scientific practice, and a wide variety of visualisation tools
exist for a wide range of applications. Given the different set of perceptual strengths
of audition compared to vision, sonification has long been considered to have similar
potential as an exploratory tool for scientists which is complementary to visualisation
and statistics.
One strategy to realize more of this potential of sonification is to create a general software
environment that supports fast development of sonification designs for a wide range of
scientific applications, a design process in close interaction with scientific users, and
simple exchange of fully functional sonification designs. This is the central idea of the
SonEnvir project, as described in detail (in advance of the project itself) in de Campo
et al. (2004).
There are a number of software packages for sonification and auditory display (Ben-Tal
et al. (2002); Pauletto and Hunt (2004); Walker and Cothran (2003), and others), all of
which make different choices: whether they are to be used as toolkits to integrate into
applications, or whether they are full applications already; which data formats or real-
time input modalities are supported; what sonification models are assumed (sometimes
13
14
implicitly); and what kinds of interaction modes are possible and provided.
This chapter provides a very short overview of the history of sonification, and describes
the most common uses of sonification. Then, some historical and current sonification
toolkits and environments are described, and the main types of audio and music program-
ming environments. Finally, the system developed for the present thesis is described.
3.1 Background
3.1.1 A short history of sonification
The prehistory and early history of sonification is covered very interestingly (within a very
good general overview) in Gregory Kramer’s Introduction to Auditory Display (Kramer
(1994a)).
Employing auditory perception for scientific research was not always as unusual as it is
considered in today’s visually dominated scientific cultures; in fact, sonification can be
said to have had a number of precursors:
In medicine, the practice of auscultation, i.e., listening to the body’s internal sounds for
diagnostic purposes, seems to have been present in Hippocrates’ time (McKusick et al.
(1957)). This was long before Laennec, who is usually credited with the invention of the
stethoscope in 1819.
In engineering, mechanics tend to be very good at hearing which parts of a machine they
are familiar with are not functioning well; just consider how much good car mechanics
can tell just from listening to a running engine.
Moving on to technically mediated acoustic means of measurement, there is evidence
that Galileo Galilei employed listening for scientific purposes: Following Stillman Drake’s
biography of Galilei (Drake (1980)), it seems plausible that Galilei used auditory infor-
mation to verify the quadratic law of falling bodies (see figure 3.1.1). By running strings
across the plane at distances increasing according to the quadratic law ( 1, 4, 9, 16,
etc.), the ball running down the plane would ring the bells attached to the strings in a
regular rhythm. In a reconstruction of the experiment, Riess et al. (2005) found that
time measuring devices of the 17th century were likely too imprecise, while listening for
rhythmic precision works well and is thus more plausible to have been used.
An early example of a technical device rendering an environment variable perceptible
which humans do not naturally perceive is the Geiger-Muller-Counter: Incidence of a
particle generated by radioactive decay on the detector causes an audible click; the
density of the irregular sequence of such clicks informs users instantly about changes in
radiation levels.
15
Figure 3.1: Inclined plane for Galilei’s experiments on the law of falling bodies.
This device was rebuilt at the Istituto e Museo di Storia della Scienza in Florence.
c©Photo Franca Principe, IMSS, Florence.
Sonar is another interesting case to consider: Passive Sonar, where one listens to un-
derwater sound to determine distances and directions of ships, has apparently been
experimented with by Leonardo da Vinci (Urick (1967), cited in Kramer (1994a)); in
Active Sonar, sound pulses are projected in order to penetrate visually opaque volumes of
water, listening to reflections to understand local topography, as well as moving objects
of interest, be they vessels, whales, or fish swarms.
In seismology, Speeth (1961) had subjects try to differentiate between seismograms of
natural earthquakes and artificial explosions by playing them back speeded up. While
subjects could classify the data very successfully, and rapidly (thanks to the speedup),
little use was made of this until Hayward (1994) and later Dombois (2001) revived the
practice and the discussion.
16
An interesting case of auditory proof of a long-standing hypothesis was reported in
Pereverzev et al. (1997): In the early 1960s, Josephson and Feynman had predicted
quantum oscillations between weakly coupled reservoirs of superfluid helium; 30 years
later, the effect was verified by listening to an amplified vibration sensor signal of these
mass-current oscillations (see also chapter 7).
One can say that the history of sonification research officially began with the first Inter-
national Conference for Auditory Display (ICAD) in 1992, organised by Gregory Kramer
to bring all the researchers working on related topics, but largely unaware of each other,
into one research community. The extended book version of the conference proceedings
(Kramer (1994b)) is considered the main founding document of this research domain,
and the yearly ICAD conferences are still the central event for researchers, generating
much of the body of sonification research literature.
In 1997, the ICAD board wrote a report for the NSF (National Science Foundation) on
the state of the art in sonification1; and more recently, a collection of seminal papers
mostly presented at ICADs between 1992 and 2004 appeared as a special issue of ACM
Transactions on Applied Perception(TAP, ACM (2004)), which shows the range and
quality of related research.
Many interesting applications of sonification for specific surposes have been made:
Fitch and Kramer (1994) showed that an auditory display of medical patients’ life signs
can be superior to visual displays; Gaver et al. (1991) found that monitoring a vir-
tual factory (ArKola) by acoustic means works remarkably well for keeping it operating
smoothly.
The connection between neural signals and audition has its own fascinating history, from
early neurophysiologists like Wedensky (1883) listening to nerve signals by telephone, to
current EEG sonifications like Baier et al. (2007); Hermann et al. (2006); Hinterberger
and Baier (2005); as well as musicians’ fascination with brainwaves, beginning with
Alvin Lucier’s Music for Solo Performer (1965), among many others. (See also the
ICAD concert 2004, described in section 4.3.)
The idea of listening for scientific insight keeps being rediscovered by researchers even
if they seem to be unaware of sonification research; e.g., what James Gimzewski calls
Sonocytology (Pelling et al. (2004), see also here2) is (in auditory display terminology)
a form of audification of signals recorded with an atomic force microscope used as a
vibration sensor.
There are also current uses in Astronomy by NASA (Candey et al. (2006)), where one of
the motivations given is providing better data accessibility for visually impaired scientists;
and at University of Iowa3, mainly dealing with electromagnetic signals.
1http://icad.org/node/4002 http://en.wikipedia.org/wiki/James Gimzewski3 http://www-pw.physics.uiowa.edu/space-audio/
17
Nevertheless, a large number of scientists still appear quite surprised when they hear of
the idea of employing sound to understand scientific data.
3.1.2 A taxonomy of intended sonification uses
Sonification designs may be intended for a wide range of different uses, with substantially
different aims4:
Presentation calls for clear, straightforward, auditory demonstration of finished results;
this may be useful in conference lectures, science talks, teaching contexts, and
other situations.
Exploration is very much about interaction with the data, ’acquiring a feeling for the
data’; while this seems a rather fuzzy target, and is in fact hard to measure, it is
actually indispensible and central. Following Rheinberger (2006), exploration must
necessarily remain informal; it is a heuristic for generating hypotheses - once they
appear on the epistemic horizon, they will be cross-checked and verified with every
analysis tool available. So generating some hypotheses that turn out to be wrong
eventually is not a problem at all; in the worst case, if too many hypotheses are
wrong, this can be an efficiency issue.
Analysis requires well-understood, reliable tools for detecting specific phenomena, which
are accepted by the conventions in the scientific domain they belong to. The prac-
tice of auscultation in medicine may be considered to belong into this category,
even though it only relies on physical means, with no electronic mediation. Also
the informal practice of listening to seismic recordings belongs here.
Monitoring is intended for a variety of processes that benefit from continuous moni-
toring by human observers, whether in industrial production, in medical contexts
like intensive care units, or in scientific experiments. Human auditory perception
habituates quickly to soundscapes with little change; any sudden changes, even
of an unexpected nature, in the soundscape are easily noticed, and enable the
observer to intervene if necessary.
Pedagogy - Different students may learn to understand structures/patterns in data
better when presented in different modalities; an auditory approach to presentation
may be more appropriate and useful in some cases. For example, students with
visual impairments may benefit from data representations with sound, as research
on auditory graphs shows (e.g. Harrar and Stockman (2007); Stockman et al.
(2005)).
4Note that the points separated here may overlap; e.g. presentation and pedagogy certainly do.
18
Artistic Uses - Many works in sound art are sonification-based, whether they are sound-
only installations, or more generally, data-driven multimedia works. The recent
appearance of special topics issues like Leonardo Music Journal, Volume 16 (2006)
confirm this trend, as do sonification research activities at art institutions like the
Bern University of Arts5.
The intended uses a specific sonification system has been designed for largely determine
the scope of its functionality, and its usefulness for different contexts.
3.2 Sonification toolkits, frameworks, applications
A number of sonification systems have been implemented and described since the 1980s.
They all differ in scope of features, and limitations; some are historic, meaning they run
on operating systems that are obsolete, while others are in current use, and thus alive
and well; most of them are toolkits meant for integration into (usually visualisation)
applications. Few are really open and easily extensible; some are specialised for very
particular types of datasets. Current systems are given more space here, as they are
more interesting to compare with the system developed for this thesis.
3.2.1 Historic systems
The Porsonify toolkit (Madhyastha (1992)) was developed at a time when realtime syn-
thesis was still out of reach on affordable computers; thus Porsonify aimed to provide an
interface for the Sun Sparc’s audio device and two MIDI synthesizers. Behaviour defined
for a single sound event (usually triggered from a single data point) is formulated in sonic
widgets, which generate control commands for the respective sound device. Example
sonifications were created using data comparing living conditions of different U.S. cities
(cf. the accompanying CD to Kramer (1994b)), and multi-processor performance data.
The LISTEN toolkit (Wilson and Lodha (1996)) was written for SGI workstations, using
(alternatively) the internal sound chip, or external MIDI as sound rendering; it was
meant to be easy to integrate into existing visualisation software, which was done for
visualising geometric uncertainty of surface interpolants, and for algorithmic uncertainty
in fluid flow.
The Musical Data Sonification Toolkit, or MUSE (Lodha et al. (1997)), was a followup
project, aiming to map scientific data to musical sound. Also written for SGI, it uses
mapping to very traditional musical notions: timbres are traditional orchestra instruments
and vowel sounds generated with CSound instruments, rhythms come from a choice of
5see http://www.hkb.bfh.ch/y.html
19
seven dance rhythms, pitch is defined from the major scale, following rules for melodic
shapes, and harmony is based on overtone ratios. It has been applied ”to visualize (sic)
uncertainty in isosurfaces and volumetric data”.
A later incarnation, MUSART (Musical Audio transfer function Real-time Toolkit, see
Joseph and Lodha (2002)) sonifies data by means of musical sound maps. It converts
data dimensions into ’audio transfer functions’, and renders these with CSound instru-
ments. Users can personalise their auditory displays by choosing which data dimensions
to map to which display parameters. In the article cited, the authors report uses for
exploring seismic volumes for the oil industry. Again, the authors emphasize their use of
musical concepts for sonification design.
While not a single software system, Auditory Information Design by Stephen Barrass
(Barrass (1997)), is a fascinating collection of multiple concepts (all with catchy names):
it encompasses a task-data analysis method (’TaDa’), a collection of use cases for finding
auditory metaphors for design (’ear-benders’), a set of design principles (’Hearsay’), a
perceptually linearised information sound space (’GreyMUMS’), and tools for designing
sonifications (’Personify’). The practical implementations described show a wide variety
of approaches; they all share unix flavor, often being shell scripts that connect command-
line programs. Thus it is not one consistent framework, but rather a collection of how-to
examples. For data treatment, mostly perl scripts are used; for sound synthesis, CSound,
which at the time was non-realtime. Some examples also appeared in the CSound book
(Boulanger (2000)) mentioned below.
3.2.2 Current systems
xSonify (Candey et al. (2006)) has been developed at NASA; it is also based on Java,
and runs as a web service6. It aims at making space physics data more easily accessible
to visually impaired people. Considering that it requires data to be in a special format,
and that it only features rather simplistic sonification approaches (here called modi), it
will likely only be used to play back NASA-prepared data and sonification designs.
SonART (Ben-Tal et al. (2002); Yeo et al. (2004)) is a framework for data sonifica-
tion, visualisation and networked multimedia application. In its latest incarnation, it is
intended to be cross-platform and uses OpenSoundControl for communication between
(potentially distributed) processes for synthesis, visualisation, and user interfaces.
The Sonification Sandbox (Walker and Cothran (2003)) has intentionally limited range,
but it covers that range well: Being written in Java, it is cross-platform; it generates MIDI
output e.g. to any General MIDI synth (such as the internal synth on many soundcards).
One can import data from CSV textfiles, and view these with visual graphs; a mapping
editor lets users choose which data dimension to map to which sound parameter: Timbre
6http://spdf.gsfc.nasa.gov/research/sonification
20
(musical instruments), pitch (chromatic by default), amplitude, and (stereo) panning.
One can select to hear an auditory reference grid (clicks) as context. It is very useful
for learning basic concepts of parameter mapping sonification with simple data, and it
may be sufficient for many auditory graph applications. Development is still continuing,
as the release of version 4 (and later small updates) in 2007 show.
The Sonification Integrable Flexible Toolkit (SIFT, see Bruce and Palmer (2005)) is
again a toolkit for integration into other applications, typically for visualisation. While
it is also written in Java and uses MIDI for sound rendering, it emphasizes realtime
data input support from network sources. It has been used for oceanographic data sets;
however, the paper cited describes the first prototype of this system, and no later versions
of it seem to have been developed.
Sandra Pauletto’s toolkit for Sonification (Pauletto and Hunt (2004)) is based on Pure-
Data (see section 3.3 below), and has been used for several application domains: Elec-
tromyelography data for Physiotherapy (Hunt and Pauletto (2006)), helicopter flight
data, and others. While it supports some data types well, adapting it for new data is
rather cumbersome, mainly because PureData is not a general-purpose programming
language.
SoniPy is a very recent and quite ambitious project, written in the Python language,
and described in Worrall et al. (2007). It is still in the early stages of development at
this time, but may well become interesting. Being an open source project, it is hosted
at sourceforge7; at the beginning of this thesis, it did not exist yet.
All these toolkits and applications are limited in different ways, based on resources for
development available to their creators, and the applications envisioned for them. For
the broad parallel approach we had in mind, and the flexibility required for it, none
of these systems seemed entirely suitable, so we chose to build on a platform that is
both a very efficient realtime performance system for music and audio processing and a
full-featured modern programming language: SuperCollider3 (McCartney (2007)). To
provide some more background, here is an overview of the three main families of music
programming environments.
3.3 Music and sound programming environments
Computer Music has been dealing with programming to create sound and music struc-
tures and processes for over fifty years now; current music and sound programming
environments offer many features that are directly useful for sonification purposes as
well.
Mainly, three big families of programs have evolved; most other music programming
7http://sourceforge.net/projects/sonipy
21
systems are conceptually similar to one of them:
Offline synthesis - MusicN to CSound
MusicN languages started in 1957/58, from the Music I program developed at Bell Labs
by Max Mathews and others; Music IV (Mathews and Miller (1963)), already featured
many central concepts in computer music languages, such as the idea of a Unit Generator
as the building block for audio processes (unit generators can be e.g. oscillators, noises,
filters, delay lines, and envelopes). As the first widely used incarnation, Music V, was
written in FORTRAN and thus relatively easy to port to new computer architectures, it
spawned a large number of descendants.
The main strand of successors in this family is CSound, developed at MIT Media Lab
beginning in 1985 (Vercoe (1986)), which has been very popular in academic computer
music. Its main approach is to use very reduced language dialects for orchestra files (con-
sisting of descriptions of DSP processes called instruments), and score files (descriptions
of sequences of events that each call one specific instrument with specific parameters
at specific times). A large number of programs were developed as compositional front-
ends, to write score files based on algorithmic procedures, such as Cecilia (Piche and
Burton (1998)), Cmix, Common Lisp Music, and others; so CSound has in fact created
an ecosystem of surrounding software.
CSound has a very wide range of unit generators and thus synthesis possibilities, and a
strong community; e.g. the CSound Book (Boulanger (2000)) demonstrates its scope
impressively. However, for sonification, it has a few disadvantages: Even though it is text-
based, it uses specialised dialects for music, and thus is not a full-featured programming
language. Any control logic and domain-specific logic would have to be built in other
languages/applications, while CSound could provide a sound synthesis back-end. Being
originally designed for offline rendering, and not built for high-performance realtime
demands, it is not an ideal choice for realtime synthesis either. CSound has been ported
to very many platforms.
Graphical patching - Max/FTS to Max/MSP(/Jitter) to PD/GEM
The second big family of music software began with Miller Puckette’s work at IRCAM
on Max/FTS in the mid-1980s, which later evolved into Opcode Max, which eventually
became Cycling’74’s Max/MSP/Jitter environment. In the mid-1990s, Puckette began
developing an open source program called PureData (Pd), later extended with a graphics
system called GEM. All these programs share a metaphor of ’patching cables’, with
essentially static object allocation of both DSP and control graphs.
This approach was never meant to be a full programming language, but a simple facility
22
to allow for patching multiple DSP processes written in lower-level (and thus more
efficient) languages; with Max/FTS, the programs actually ran on a DSP card built by
IRCAM. Thus, the usual procedure for making patches for more complex ideas often
entails writing new Max or Pd objects in C; while these can run very efficiently if well
written, special expertise is required, and the development process is rather slow.
In terms of sound synthesis, Max/MSP has a much more limited palette than CSound,
though a range of user-written MSP objects exist; support for graphics with Jitter has
become very good recently. Both Max and Pd have a strong (and somewhat overlap-
ping) user base; Pd is somewhat smaller, having started later than Max. While Max is
commercial software with professional support by a company, Pd is open-source software.
Max runs on Mac OS X and Windows, but not on linux, while Pd runs best on linux,
reasonably well on Windows, and less smoothly on OS X.
Realtime text-based environments - SuperCollider, ChucK
The SuperCollider language and realtime system came from the idea of having both
realtime synthesis and musical structure generation in one environment, using the same
language. Like Max/PD, it can be said to be an indirect descendant of CSound. From
SC1 written by James McCartney in 1996, it has gone through three complete rewriting
cycles, thus the current version SC3 is a very mature system. In version 2, SC2, it
inherited much of its language characteristics from Smalltalk; in SC3 the language and
the synthesis engine were split into a client/server architecture, and many syntax features
from other languages were adopted as options. Its sound synthesis is fully dynamic like
CSound, it has been written for realtime use with scientific precision, and being a text-
based, modern, elegant, full programming language, it is a very flexible environment for
very many uses, including sonification.
The range of unit generators is quite wide, though not as abundant as in CSound;
synthesis in SC3 is very efficient. SC3 also provides a GUI system with a variety of
interface widgets, but its main emphasis is on stable realtime synthesis. SC3 has a
somewhat smaller user community, which is nevertheless quite active. Having become
open source with version 3, it has since flourished in terms of development activity. SC3
runs very well on OS X, pretty well on Linux, and less well on Windows (though the
SonEnvir team put some effort into improving the Windows port).
The ChucK language has been written by Ge Wang and Perry Cook, starting in 2002.
It is still under development, exploring specific notions such as being strongly-timed,
and others. Like SC3, it is not really intended as a general purpose language, but as a
music-specific environment. While being cross-platform, and having interfacing options
similar to SC3 and Max, it has a considerably smaller palette of unit generator choices.
One possible advantage of ChucK is that it has very fine-grained control over time; both
synthesis and control can have single-sample precision.
23
3.4 Design of a new system
As the existing systems did not have the scope we required, we designed our own. A full
description of the design of the Sonification Environment as it was before the SonEnvir
project started is given in de Campo et al. (2004); the following section is updated from
a post-project perspective.
3.4.1 Requirements of an ideal sonification environment
The main design aim is to allow fluid development of new and modification of existing
sonification designs. By using modular software design which decouples components like
basic data handling objects, data processing, sound synthesis processes, mappings used,
playback approaches, and real-time interaction possibilities, all the individual aspects of
one sonification design can be re-used as starting points for new designs.
A Sonification Environment should:
• Read data files in various formats. The minimum is human-readable text files for
small data sets, and binary data files for fast handling of large data sets. Reading
routines for special file formats should be writable quickly. Realtime input from
network sources should also be supported.
• Perform basic statistics on the data for user orientation. This includes (for ev-
ery data channel): minimum, maximum, average, standard deviation, and simple
histograms. This functionality should be user-extensible in a straightforward way.
• Provide basic playback facilities like ordered iteration (in effect, a play button with
a speed control), loop playback of user-chosen segments, zooming while playing,
data-controlled playback timing, and 2D and 3D navigation along user-chosen
data dimensions. Later on, navigation along data-derived dimensions such as
lower-dimensional projections of the data space is also desirable.
• Provide a rich choice of interaction possibilities: Graphical user interfaces, MIDI
controllers, graphics tablets, other human interaction devices, and tracking data
should be supported. (The central importance of interaction only became clear in
the course of the project.)
• Provide a variety of possible synthesis approaches, and allow for changing and
refining them while playing. (The initial design suggested a more static library of
synthesis processes, which turned out to be unnecessary.)
• Allow for programming domain-specific models to run and generate data to sonify.
This strongly suggests a full modern programming language. (This requirement
only came up in the course of the project, for the physics sonifications.)
24
• Store sonification designs in human-readable text format: This allows for long-
term platform independence of designs, provides possibilities for informal rapid
exchange (text is easy to send by e-mail), and can be an appropriate and useful
publication format for sonification designs that employ user interaction.
• Serve to build a library/database of high-quality sonification designs made in this
environment, with real research data coming from a diverse range of scientific
fields, developed in close collaboration with experts from these domains.
More generally, the implementation should be kept as lightweight, open, and flexible as
possible to accommodate evolving new understanding of the design issues involved.
3.4.2 Platform choice
While PureData was a platform option for a while, we soon decided to stay entirely in
SuperCollider3, based on the list of requirements given above. This decision had some
benefits, as well as some drawbacks.
The benefits we experienced were:
• A fully open source programming language is easy to extend in ways that are useful
for a wider community;
• Interpreted languages like SC3 provide relatively simple entry to users programming
(starting with little scripts, and changing details for experimentation);
• Readability has turned out to be very useful, as the code script is also a full
technical documentation;
• An interactive development environment encourages code literacy, and thus general
competence, of sonification ’users’. In this context, the notion of Just In Time
Programming (as described e.g. in Rohrhuber et al. (2005)) has turned out to be
extremely useful for interdisciplinary team development sessions, see chapter 4.
The main drawback we encountered was that SC3 only runs really well on OS X, a
bit more uncomfortably on linux (which was not used by any of the team members),
while on Windows (which we had to support), it was initially quite unusable; this led to
SonEnvir taking care of substantially improving the Windows port.
3.5 SonEnvir software - Overall scope
The main goal of the SonEnvir sonification framework is to allow for the creation of
meaningful and effective sonifications more easily. Such a sonification environment sup-
25
ports sonification designers by providing software components, and concepts for using
them. It combines all the important aspects that need to be considered: data represen-
tation, interaction, mapping and rendering.
A famous phrase about computer music programming systems is that they are kitchens,
not restaurants, which also applies to SonEnvir: rather than giving users a menu of
finished dishes to choose from (which other people created), it provides ingredients,
utensils, recipes and examples.
3.5.1 Software framework
SuperCollider3 has a very elegant extension system; one can assemble components to be
published in different ways: Classes, their respective Help files, UnitGenerator plugins,
and all kinds of support files can be combined into packages which can be downloaded,
installed, and de-installed directly from within SC3. Such packages are called Quarks.
Currently, most of the code created in the project is under version control with Subversion
at the SonEnvir website8. In order to achieve maximum reuse, some parts have been
converted into Quarks, while for others, this is still in process. Many items of general
usefulness have already been migrated directly into the main SC3 distribution. The
sonification-specific components will remain available at the SonEnvir website, as will
the collection of sonification designs. (For an overview, see the end of this section.)
The subsequent sections briefly describe the overall structure of the framework and
the design and implementation of the data representation module. For reference, the
framework structure in the subversion repository is described in appendix A.
3.5.2 Framework structure
The SonEnvir framework implements a generic sonification model consisting of four
aspects:
Data model The data model unifies the notions of how data are handled in the frame-
work and deals with the diversity of data types that can be used for sonification.
User-Interaction model This aspect deals with all aspects of interactive model for
exploration and analysis of data. It is mainly implemented in the JInT package
(see below).
Synthesis model The mapping onto properties of sound or the creation of more com-
plex structures of sound by a sonification model. As all the needed code infras-
8https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/
26
tructure existed in the JITLib library within SC3, it is not coded as classes, but
only a conceptual model, described in section 5.3, Synthesis Models.
Spatialisation model This model takes care of the audio rendering of the designed
sonification for different requirements and playback environments. It is described
in detail in section 5.5, Spatialisation Model. Its code components reside partially
in SC3 itself, in the Framework/Rendering folder, and in the AmbIEM package9,
which is now a SuperCollider quark package.
All these models taken together allow for designing sonifications in a flexible way. As
the data model is the most implementation-related aspect, it is described in detail here,
and not in the more conceptual chapter on the general models (chapter 5).
3.5.3 The Data model
The aim of the data model is to provide a unified representation of different types of
data that can be used in the sonification framework. This demands a highly flexible and
abstract model as data may have very different structures. The data model also provides
functionality for input/output in the original form the data are supplied in, and includes
various statistical functions for data analysis.
All models are object-oriented in design, and the classes and their inter-relations are
described using UML (Unified Modelling Language) charts. In order to avoid possible
name-space conflicts with other class definitions on any target platform, the classes in
the SonEnvir framework have a prefix ”SE”. Figure 3.2 illustrates the design of the data
model in a UML graph.
The SEData class is central to the design of the data model. It is the highest abstraction
of any kind of dataset to be sonified. Besides providing properties for the name and
the data source, the actual data is organised in channels. An SEData object contains
instances of SEDataChannel, which is the base class for all different types of data
channels and represents a single dimension in a dataset. Data channels can be numerical
data, but also any sort of nominal data with the only restriction that they are organised
as a sequence and addressable by index.
SENumDataChan specifies that the data values in the given channel are all numbers,
and provides a basic set of numerical properties of this set of numbers. Besides the usual
minimum, maximum, mean, and standard deviation values, it also implements functions
that proved to be useful for sonifications, such as removing offsets or a drift, as well as
normalising and ’whitening’ the numbers.
Another important subclass of numeric data channel is covering all time-based data
channels. These basically refer to two types: time series (SETimeSeriesCh) providing
9AmbIEM is a port of a subset of a system by Musil et al. (2005); Noisternig et al. (2003).
27
Figure 3.2: UML diagram of the data model.
a sample rate, and data with time stamps (SETimeStampsCh). Although basically
a numeric data channel as well, we decided to introduce another basic type for vector
based data with a subclass for 3D spatial data. Any of the data channel types mentioned
above may be combined in order to form a dataset described through SEData. For
convenience, there are two predefined classes derived from SEData that cover some
common combinations of data channels: SETimeData and SESpatialData.
Every SEData instance is associated with a SEDataSource. This class abstracts the
28
access to the raw data material. It takes care that the space required for big datasets is
made available when needed, and uses different parsers for reading different file formats.
If needed, it can be extended to include network resources and real-time data. Each
SEDataSource also provides information about the type of each data series that is con-
tained in the raw data. This might be available from headers of some data formats, or
it has to be set explicitly such that SEData can create the appropriate SEDataChannels.
Like the entire framework, the data model is provided as a class library for SuperCollider3.
Once the library is brought into place, it is compiled at startup of the SuperCollider3
language. The following listing illustrates using SEData objects in SC3:
// Example listing of data model usage in SC3.
(
// read an ascii data file
~vectors = FileReader.readInterpret(
"~/data/C179_T_s.dat",
true, true
);
// supply data channel names by hand
~chanNames = [’temperature’, ’solvent’,
’specificHeat’, ’marker’];
// make an SEData object
~phaseData = SEData.fromVect(
’phaseData’,
~chanNames,
~vectors,
SENumDataChan // all numerical data, so use SENumDataChan class.
);
// provide simple statistics
~phaseData.analyse;
~phaseData.means.postln;
~phaseData.stdDevs.postln;
)
Chapter 4
Project Background
A physicist, a chemist, and a computer scientist try to go up a hill in an
ancient car. The car crawls, stutters, and then stalls. The physicist says,
”The transmission ratio is wrong - I’ll take a look at it.”; the chemist
says, ”No, the fuel mix is wrong, I’ll experiment with it.”; the computer
scientist says, ”why don’t we all get out, close the doors, get back in,
and try again.”.
This chapter describes the research design for and the working methodology developed
within the SonEnvir project, the design and the process of the Workshop ’Science By
Ear’ the project team held in March 2006, and the concert the team organized for the
ICAD 2006 conference in London. As most of the work presented in this dissertation was
done within the context of the SonEnvir project, it is helpful to provide some background
on that context here.
4.1 The SonEnvir project
The central concept of the SonEnvir project was to create an interdisciplinary setting in
which scientists from different domains and sonification researchers could learn how to
work on data perceptualisation by auditory means. The project took place from January
2005 to March 2007, and it was the first collaboration of all four universities in Graz.
SonEnvir was funded by the Future Funds of the Province of Styria.
4.1.1 Partner institutions and people
The project brought together the following institutions as partners:
• the Institute of Electronic Music and Acoustics (IEM), at the University of Music
and Dramatic Arts Graz;
29
30
• the Theoretical Physics Group - Institute of Physics, at the University of Graz;
• the Institute for Sociology, at the University of Graz;
• the University Clinic for Neurology, at the Medical University Graz;
• and the Signal Processing and Speech Communication Laboratory SPSC, at the
University of Technology Graz.
The IEM was the host institution coordinating the project, and the source of audio design
and programming as well as sonification expertise in the project. The main researcher
here was the author of this dissertation.
From the Institute of Sociology, Christian Daye provided data from a variety of sociologi-
cal contexts, and co-designed and experimented with sonifications for them, as discussed
in section 6. He was also responsible for feedback and evaluation of the interdisciplinary
work process from the perspective of sociology of science.
The Physics group had changing members in the course of the project: initially Bianka
Sengl provided data from quantum physics research, namely from competing Constituent
Quark models, as discussed in section 7.1 and appendix C.1. Later on, Katharina Vogt
worked on a number of different physics topics and sonifications for them, including the
Ising and Potts models discussed in section 7.2.
The Signal Processing and Speech Communication Laboratory was represented by Chris-
topher Frauenberger, who worked on a number of different sonification experiments,
among others on propagation of electromagnetic waves, and time series classification,
as discussed in section 8. He also contributed substantially to the code implementa-
tions, and has become the main developer for the python-based Windows version of
SuperCollider3.
For the Institute of Neurology, Annette Wallisch was the main researcher. She provided
a variety of EEG data for experimenting with sonification designs for screening and
monitoring, as described in section 9. She also dealt with an industry research partner,
the company BEST medical systems (Vienna), and she wrote a dissertation (Wallisch
(2007), in German) on the research done within SonEnvir.
4.1.2 Project flow
In order to create a broad base of sonification designs for a wide range of data from
the scientific contexts described, the project was structured in three iterations. Each
iteration began with identifying potentially interesting research questions from the do-
mains, and collecting example data for these. Then sonification designs were created
and tested, which became a more collaborative and experimental cooperation as the
project proceeded.
31
In each of the scientific fields, we started by building simple sonification designs to begin
the discussion process. The key question here has turned out to be learning how to work
in such a highly interdisciplinary group, how to build bridges for common understanding,
and to develop a common language for collaboration.
We focused on building sonification designs that demonstrate the usefulness of sonifi-
cation by showing practical benefit for the respective scientific field. Identifying good
research questions at this intermediate level of complexity was not trivial. Nevertheless,
being able to come up with sufficiently convincing examples to reach the immediate
partner ’audience’ is very important.
Finally, the project goal was to integrate all the approaches that worked well in one
context into a single software framework that includes all the software infrastructure,
thus making them re-usable for a wide range of applications; this was intended to result
in a meaningful contribution to the sonification community. The diversity of the research
group and their problem domains forced us toward very flexible and re-usable solutions.
By making our collection of implemented sonification designs freely accessible, we hope
to capture much of what we have learned in a form that other researchers can build on.
4.1.3 Publications
Many research results were published in conference and journal papers, which are indi-
cated in the respective chapters, and briefly listed here:
de Campo et al. (2004) was a project plan for SonEnvir before the fact. Papers on
sociological data (Daye et al. (2005)), quantum spectra (de Campo et al. (2005d)), and
the project in general (de Campo et al. (2005a)) were presented at ICAD and ICMC
2005. We wrote some papers with external collaborators, on electrical systems (Fickert
et al. (2006)), and various kinds of lattice data (de Campo et al. (2005c), de Campo
et al. (2006b), de Campo et al. (2006c), de Campo et al. (2005b)).
For ICAD 2006, we contributed an overview paper, de Campo et al. (2006a), and organ-
ised a concert of sonifications described in section 4.3, contributing a piece described in
de Campo and Daye (2006) and in section 11.3.
At ICAD 2007, we presented papers on EEG (de Campo et al. (2007)), time series
(Frauenberger et al. (2007)), Potts models (Vogt et al. (2007)), and on the Design
Space Map concept (de Campo (2007b)). At the ISon workshop in York 2007, we
presented work on juggling sonification (Bovermann et al. (2007)) and the Sonification
Design Space Map (de Campo (2007a)).
Some project results and insights in the sociological context were also presented in two
journal publications: Daye et al. (2006) and Daye and de Campo (2006).
32
4.2 Science By Ear - An interdisciplinary workshop
This workshop was in our opinion the most innovative experiment in methodology within
SonEnvir. Aiming to intensify the interdisciplinary work setting within SonEnvir, we
brought in both sonification experts and scientists from different domains to spend three
days working on sonification experiments. Considering participant responses (both during
and after the event), this workshop was very successful. Detailed background is available
online here1.
4.2.1 Workshop design
We chose the participants to invite so they would form an ideal combination of com-
petences: Eight international sonification experts, eight domain scientists (mainly from
Austria), six audio specialists and programmers, and (partially overlapping with the
above) the SonEnvir team itself (see appendix D). This group of ca. 24-28 people was
just large enough to allow for different combinations for three days, but still small enough
to allow for good group cohesion.
The workshop program consisted of five short lectures by the sonification experts, which
served to inform less experienced domain scientists about sonification history, method-
ology, and psychoacoustics. This helped to bring all participants closer to a common
language. Most of the workshop time was spent in sonification design sessions. For
each day, three interdisciplinary teams were formed, composed of the three categories;
2-3 sonification experts, 2-3 domain scientists, 2 audio programmers, 1 moderator (a
SonEnvir member).
These sessions typically lasted 2 hours, after which the group would report to the plenary
about their results. For the first two days, all three teams worked on the same problem
at the same time (in parallel), which allowed for good comparisons of design results. On
the last day, each group worked on a separate problem for two sessions to allow working
more in depth on the exploration of ideas.
4.2.2 Working methods
The design sessions focused on data submitted by the participating domain scientists;
the scientific domains included Electrical Power Systems, EEG Rhythms, Global Social
data, meteorology in the Alpine region, computational Ising models, Ultra-Wide-Band
communication, and research in biological materials called Polysaccharides.
The parallel sessions began with a talk by the submitting domain specialist introducing
the problem dataset, for to the plenary group. Then the group split into the three teams,
1http://sonenvir.at/workshop/
33
and the teams began their parallel sessions. The typical sequence in a session was to
do some brainstorming first, to get ideas what sonification strategies may be applica-
ble. Once a few candidate ideas were around, experimentation began by coding little
sonification designs (some administrative code like data reading routines was prepared
beforehand).
Time tended to be rather short, so decisions what to try first were often based on what
seemed doable within limited time. Toward the end of the session time, the group began
preparing what they would report to the plenary meeting. This usually consisted of little
demos of what the group had tried, many more ideas for experiments to do as follow-up
steps, and an informal evaluation of what the group felt they had learned.
On the final workshop day, spending two sessions on a topic was a welcome change.
Having more time to experiment, and especially taking a break and then continuing
work on a problem allowed for more sophisticated mini-realisations.
Having a wiki set up for the workshop allowed to distribute latest versions of information
materials, all the code examples written, and the notes that were taken during all sessions.
Furthermore, most sessions and discussions were recorded (audio, and some video) to
allow later analysis of the working process and the interactions taking place.
4.2.3 Evaluation
Many of the designs ended up being adapted in some form for later work in SonEnvir;
two that were not used elsewhere are described in section 10 for completeness.
Based on feedback given by the workshop participants, it can be considered a highly
successful experiment in methodology. Many participants commented very positively on
the innovative aspects of this workshop: Actually doing design work in an interdisciplinary
group setting rather than going through prepared examples was considered remarkable.
The major design tradeoff that was also discussed in the responses was how much time
to spend on each data problem: time pressure limited the eventual usefulness of the
designs that were created, so the alternative of working on much fewer data sets for
much longer may be worth trying - at the potential risk of having less comprehensive
overall scope.
Christian Daye made a qualitative and quantitative content analysis of the audio record-
ings of the sessions that confirmed the overall positive response (publication still in
progress), and he developed a number of guidelines for future similar events:
Prepare and distribute basic literature on the domains well beforehand. In the
SBE workshop, there was sometimes a tendency that domain scientists would
mainly listen, thus leaving the sonification experts and programmers to do most
34
of the talking. From an interdisciplinary point of view, this is not ideal, as it does
not create equally shared understanding.
Do more technical preparation together with the programmers beforehand. In
some sessions problems came up with reading and handling data properly, which
made them less practical than intended.
Have a scientist from the problem domain in every group. As the SBE workshop
covered a wide range of problems, this was not feasible in the parallel sessions.
This strategy would work well in combination with a more limited set of problems
to work on.
4.3 ICAD 2006 concert
While the ICAD has been holding conferences since 1992, the first ever concert of
sonifications at an ICAD conference was only in 2004.
4.3.1 Listening to the Mind Listening
For the ICAD conference in Sydney 2004, Stephen Barrass organised a concert of sonifi-
cations of brain activity, called Listening to the Mind Listening2. The concert call3 invited
participants to create sonifications of neural activity: a dataset was provided with five
minutes of multichannel EEG recording of a person listening to a piece of music. A jury
selected ten submissions for the concert which took place in the Sydney Opera House.
Even though the pieces were constrained to adhere to the time axis of the recording, the
diversity of the approaches taken, and the variety in the sounding results was extremely
interesting. The pieces can be listened to here4 and the organisers published an analytical
paper in Leonardo Music Journal comparing all the entries in a number of different ways
(Barrass et al. (2006)).
The concert was a great success, so it seemed likely to become a regular event at ICAD.
4.3.2 Global Music - The World by Ear
In 2006 the author was invited to be Concert Chair for the ICAD conference in London.
Together with SonEnvir colleagues Christopher Frauenberger and Christian Daye, we
agreed that social data would be an interesting and accessible topic for a sonification
2http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm3http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert call.htm4http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm
35
concert/competition, and we proceeded to collect and prepare social data of 190 nations
represented in the United Nations.
The concert call5 invited participants to contribute a sonification that illuminates aspects
of the social, political and economic circumstances represented in the data. The following
quote is the central part of the concert call.
Motivation
Werner Pirchner, Ein halbes Doppelalbum, 1973: ”The military costs every
person still alive roughly as much as half a kilogram of bread per day.”
Global data are ubiquitous - one finds them in every newspaper, and they
cover a range of themes, from global warming to increasing poverty, from
individual purchasing power to the ageing of the world’s population. Obvi-
ously these data are of a social nature: They describe specific aspects (e.g.
ecological or economic) of the environment in which societies exist, which
taken together determine culture, i.e. the way people live.
Rising awareness of these global interdependencies has led both to fear and
concerns (e.g. captured in the notion of the risk society, see Beck (1992);
Giddens (1990, 1999)), as well as hopes for eventual positive consequences
of globalisation. Along with developments like the scientisation of politics
(see Drori et al. (2003)), this growing understanding of global issues has re-
defined the context of the political discourse in modern societies: As modern
societies claim to steer their own course based on self-observation by means
of data, an information feedback loop is realised.
Alternative choices of data that are important to consider, which data should
be set in relation to each other, and a consideration of how to perceptualise
these data choices meaningfully can enrich this discourse.
Closing the feedback loop by informing society about its current state and
its development is a task that both scientists and artists have responded to,
and this is the key point of this call:
• You can contribute to the discourse by perceptualising aspects of world
societal developments,
• search for data that concern interesting questions, and devise strategies
for investigating them, and
• demonstrate that sound can communicate information in an accessible
way for the general public.
5http://www.dcs.qmul.ac.uk/research/imc/icad2006/concertcall.php
36
The reference dataset of 190 countries included data ranging from commonly expected
dimensions like geographical data (capital location, area), population number, to basic
social indicators such as GDP, access to sanitation and drinking water, and life ex-
pectancy. An extended dataset included data on education (years in school for males
and females), illiteracy, housing situation, economic independence of males and females,
and others.
The call went on to specify the following constraints:
Using this reference dataset was mandatory, so countries, capital locations, population
and area data should be used. Participants were strongly encouraged to extend this
dataset with more dimensions, and possible resources for such data extensions were
pointed out.
The concert sound system was to be a symmetrical ring of eight speakers, so any spa-
tialisation used in pieces should employ such a configuration.
Finally, participants had to provide a short paper that documents the context and back-
ground of their data choices and sonification design.
An international jury composed of sociologists, computer musicians/composers, and
sonification specialists wrote reviews rating the anonymous submissions, and eight pieces
were finally selected for the concert6.
Four of these pieces are described in more detail in section 11.
6 Papers and headphone-rendered mp3 files for all pieces are available at
http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html.
Chapter 5
General Sonification Models
A British Euro-joke tells of a meeting of officials from various countries who
listen to a British proposal, nodding sagely at its numerous benefits;
the French delegate stays silent until the end, then taps his pencil and
remarks: ”I can see that it will work in practice. But will it work in
theory?”
reported in Barnes (2007)
In this chapter, several models are proposed to allow better understanding of the main
aspects of sonification designs:
Sonification Design Space Map - General orientation in the design process
Synthesis Model - Considerations of and examples for synthesis approaches
User Interaction Model - Understanding sonification usage contexts and users’ goals
and tasks to be achieved
Spatialisation Model - Using spatial distribution of sound for sonification
Note the entangled nature of these aspects: splitting sonification designs into aspects is
only a simplification that is temporarily useful for grasping the concepts. Because of their
close connections, it will be necessary to cross-reference between sections. Generally,
because of these interdependencies the understanding of these sections will benefit from
re-reading.
37
38
5.1 The Sonification Design Space Map (SDSM)
5.1.1 Introduction
This section describes a systematic approach for reasoning about experimental sonifi-
cation designs for a given type of dataset. Starting from general data properties, the
approach recommends initial strategies, and lists possible refinements to consider in the
design process. An overview of the strategies included is presented as a mental (and
visual) map called the Sonification Design Space Map (SDSM), and the refinement steps
to consider correspond to movements on this map.
The main purpose of this approach is to extract ’theory’ from ’observation’ (in our case,
of design practice), similar to Grounded Theory in sociology (Glaser and Strauss (1967)):
to make implicit knowledge (often found in ad hoc design decisions which sonification
experts consider ’natural’) explicit and thus available for reflection, discussion, learning,
and application in design work.
This approach is mainly the result of studying design sessions which took place in the
interdisciplinary sonification workshop ’Science By Ear’, described in detail in section
4.2.
In order to explain the concept in practice as well, a set of workshop sessions on one
simple dataset is analysed here in the terms proposed; in the chapters on implemented
designs, many more of these are described in detail using SDSM terms.
5.1.2 Background
When collaborations on sonification for a new field of application start, sonification
researchers may know little about the new domain, its common types of data, and
its interesting research questions; similarly, domain scientists may know little about
sonification, its general possibilities, and its possible benefits for them. In such early
phases of collaboration, the task to be achieved with a single particular sonification is
often difficult to define clearly, so it makes sense to employ an exploratory strategy which
allows for mutual learning and exchange. Eventually, the interesting tasks to achieve
become clearer in the process. Note that even when revisiting familiar domains, it is
good methodological practice to start with as few implicit assumptions as possible, and
introduce any concepts from domain knowledge later, and transparently and explicitly,
in the course of the design process.
Rheinberger (2006) describes that researchers deal with ’epistemic things’, which are by
definition vague at first (they can be e.g. physical objects, concepts or procedures whose
usefulness is only slowly becoming clear); they choose ’experimental setups’ (ensembles
of epistemic things and established tools, devices, procedures), which allow for endless
39
repetitions of experiments with minimal variations. The differential results gained from
this exhaustion of a chosen area in the possibility space can allow for new insights. Then,
an experimental setup can collapse into an established device or practice, and become
part of a later experimental setup.
From this perspective, sonification designs start their lifecycle as epistemic things, which
need to be refined under usage; they may in time become part of experimental setups,
and if successful, eventually ’disappear’ into the background of a scientific culture as
established tools.
Some working definitions
The objects or ’content’ to be perceptualised can be well-known information, or new
unknown data (or shades of gray in between). The aims for these two applications are
very different: for information, establishing easy-to-grasp analogies is central, for data,
enabling the perceptual emergence of latent phenomena of unforeseeable type in the
data. As working terminology for the context here, we propose to define the following
three terms:
Auditory Display is the rendering of data and/or information into sound designed for
human listening. This is the most general, all-encompassing term (even though the term
’display’ has a visual undertone to it).
We further propose to differentiate between two subspecies of Auditory Displays:
Auditory Information Display is the rendering of well-under-stood information into
sound designed for communication to human beings. It includes speech messages such
as in airports and train stations, auditory feedback sounds on computers, alarms and
warning systems, etc.
Sonification or Data Sonification is the rendering of (typically scientific) data into
(typically non-speech) sound designed for human auditory perception. The informational
value of the rendering is often unknown beforehand, particularly in data exploration.
The model described here focuses on Data Sonification in the narrower sense.
These definitions are quite close to the current state of the evolving terminology; In
the International Ecyclopedia of Ergonomics and Human Factors, Walker and Kramer
(2006) define the terms quite similarly:
”Auditory display is a generic term including all intentional, nonspeech
audio that is designed to transmit information between a system and a user.
...
Sonification is the use of nonspeech audio to present data. Specifically,
sonification is the transformation of data relations into auditory relations,
for the purpose of studying and interpreting the data.”
40
Common sonification strategies
The literature usually classifies sonification approaches into Audification and Parame-
ter Mapping (Kramer (1994b)), and Model-Based Sonification (Hermann (2002)). For
the context here, we prefer to differentiate the categories more sharply, which will be-
come clear along the way; so, our three most common approaches are: Sonification (or
generally, perceptualisation) by Continuous Data Representation, Discrete Point Data
Representation, and Model-Based Data Representation.
Continuous Data Representation treats data as quasi-analog continuous signals, and
relies on two preconditions: equal distances along at least one dimension, typically time
and/or space; and sufficient (spatial or temporal) sampling rate, so that the signals is
free of sampling artifacts, and interpolation between data points is smooth. Both simple
audification and parameter mapping involving continuous sounds belong in this category.
Its advantages include: subjective perceptual smoothness; interpolation can make the
sampling interval (which is an observation artifact) disappear; perception of continuous
shapes (curves) can be appropriate; audition is very good at structures in time; mapping
data time to listening time is metaphorically very close and thus easy to understand.
Its drawbacks include: it is often tied to linear movement along one axis only; and events
present in the data (e.g. global state changes in a system) may be difficult to represent
well.
Discrete Point Data Representation creates individual events for every data point.
Here, one can easily arrange the data in different orders, choose subsets based on special
criteria (e.g. based on navigation input), and when special conditions arise, they can be
expressed well.
Its advantages include: more flexibility, e.g. subset selections of changeable sizes, based
on changeable criteria, and random iterations over the chosen subsets; and the lack of
illusion of continuity may be more accurate to the data.
Its drawbacks include: attention may be drawn to data independent display parame-
ters, such as a fixed grain repetition rate; at higher event rates, interactions between
overlapping sound events may occur, such as phase cancellations.
Model-Based Data Representation employs more complex mediation between data
and sound rendering by introducing a model, whose properties are informed by the data.
Its advantages include: apart from data properties, more domain knowledge can be
captured and employed in the model; and models may be applicable to datasets from a
variety of contexts, as is commonly aimed for in Data Mining.
Its drawbacks include: assumptions built into models may introduce bias leading away
from understanding the domain at hand; there may be a sense of disconnection be-
tween data and sounding representations; higher complexity of model metaphors may be
41
difficult to understand and interpret.
5.1.3 The Sonification Design Space Map
Task/Data Analysis (Barrass (1997)) focuses on solving well-defined auditory informa-
tion design problems: How to design an Auditory Display for a specific task, based on
systematic descriptions of the task and the data. Here, the phenomena to be perceptu-
alised are known beforehand, and one tries to render them as clearly as possible.
The Sonification Design Space Map given here addresses a similar but different problem:
The aim to be achieved here is to find transformations that let structures/patterns in
the data (which are not known beforehand) emerge as perceptual entities in the sound
which jump to the foreground, i.e. as identifiable ’interesting audible objects’; these are
closely related to ’sound objects’ in the electronic music field (from ’objets sonores’, see
Schaeffer (1997)), and in psychoacoustics literature, ’auditory gestalts’ (e.g. Williams
(1994)).
In other words, the most general task in data sonification designs for exploratory pur-
poses is to detect auditory gestalts in the acoustic representation, which one assumes
correspond to any patterns and structures in the data one wants to find.
SDS Map axes
To facilitate this search for the unknown, the Design Space Map enables a designer,
researcher, or artist to engage in systematic reasoning about applying different sonifica-
tion strategies to his/her task or problem, based on data dimensionality and perceptual
concepts.
Especially while the task is not yet clearly understood and defined (which is often the
case in exploratory contexts), reasoning about data aspects, and making well-informed
initial choices based on perceptual givens can help to develop a clearer formulation of
useful tasks.
So, the proposed map of the Sonification Design Space (see figure 5.1) has these axes:
X-axis: the number of data points estimated to be involved in forming one gestalt, or
’expected gestalt size’;
Y-axis: the number of data dimensions of interest, i.e. to be represented in the current
sonification design;
Z-axis: the number of auditory streams to be employed for data representation.
42
Figure 5.1: The Sonification Design Space Map
The overlapping zones are fuzzy areas where different sonification approaches apply; the arrows
on the right refer to movements on the map, which correspond to design iterations. For detailed
explanations see sections 5.1.3 and 5.1.4.
To ensure that the auditory gestalts of interest will be easily perceptible, the most
fundamental design decision is the time scale: In auditory gestalts (or sound objects)
of 100 milliseconds and less it becomes more and more difficult to discern meaningful
detail, while following a single gestalt for longer than say 30 seconds is nearly impossible,
or at least takes enormous concentration; thus, a reasonable rule of thumb for single
gestalts is to time-scale their rendering into the duration of echoic memory and short
term memory, i.e. on the order of 1-3 seconds (Snyder (2000)). Sounds up to this
duration can be kept in working memory with much detail information, keeping all the
nuances and inflections while more perceptual processing goes on. This time frame can
be called ’echoic memory time frame’. The ’expected gestalt size’ is the number of data
points (of the dataset under study) that should be represented within this time frame to
allow for perception of individual gestalts at this data subset size.
Note that the three-second time frame does not impose a limit on the number of data
points represented: as a deep exploration of the world of Microsound (Roads (2002))
shows, clouds of short sound events can happen at very high densities in the micro-time
scale; in fact this is a fascinating area for creating sound that is rich in perceptual detail
and artistic possibilities.
43
SDS Map zones
The zones shown in figure 5.1 do not have hard borders; their extensions are only meant
to give an indication how close-by (and thus meaningfully applicable) which strategies
are for a given data ’gestalt size’ and dimensions number. Similarly, the number ranges
given below are only approximate orders of magnitude, and mainly based on personal
experience both in electronic music and sonification research.
The Discrete-Point zone ranges roughly from gestalt size 1 - 1000 and from dimensions
number 1 - 20; the transitions shown in the map from note-like percepts via textures to
granular events which merge into clouds of sound particles are mainly perceptual.
The Continuous zone ranges roughly from gestalt size 10 - 100.000 and from dimensions
number 1 - 20; the main transition here is between parameter mapping and audification,
with various technical choices indicated along the way, such as using the continuous data
signal as a modulation source, band splitting it, and/or applying filtering to it.
The Model-Based zone ranges roughly from gestalt size 10 - 50.000 and from dimensions
number 8 - 128; because the approach is so varied and flexible, there are no further
orientation points in it yet. Existing varieties of model-based approaches are still to be
analysed in the terms of this Sonification Design Space, and can eventually be integrated
in appropriate locations on the map.
All these zones apply mainly for single auditory streams; generally, when multiple streams
are used in a sonification design, the individual streams can and should use fewer dimen-
sions. In fact, using multiple streams is the main strategy for reducing the number of
dimensions while keeping the overall density of presentation constant.
5.1.4 Refinement by moving on the map
In the evolution of a sonification design, all intermediate incarnations can be conceptu-
alised easily as locations on the map, based on how many data points are rendered into
the basic time interval, how many data dimensions are being used in the representation,
and how many perceptual streams are in use. A step from one version to the next can
then be considered analogous to a movement on the map. This mind model aims to
capture the design processes we could observe in concentrated form in the Science by
Ear workshop (’SBE’, described in detail in section 4.2), and in extended form in the
development work in the main strands of the SonEnvir project.
Data anchor
For exploring a dataset, one can start by putting a reference point on the map, which we
call Data Anchor: This is a point on the map corresponding to the full number of data
44
points and data dimensions. A first synopsis, or more properly Synakusis, of the entire
dataset (within the echoic memory time frame of ca. 3 seconds) can then be created with
one of the nearest sonification strategies on the map. Subsequent sonification designs
and sketches will typically correspond to a movement down from this point, i.e. toward
using fewer dimensions at a time, and to the left, toward listening to less than the total
number of data points in the echoic memory time frame. Of course one can still listen
to the entire dataset, total presentation time will simply become longer.
Shift arrows
Shift arrows, as shown in figure 5.1 on the right hand side, allow for moving one’s current
’working position’ on the Design Space Map, in effect deploying different sonification
strategies in the exploration process. Note that some shifting operations are used for
’zooming’, and leave the original data untouched, while others employ (temporary) data
reduction, extension, and transformation; in any sonification design one develops, it
is essential to differentiate between these kinds of transformations and document the
steps taken clearly. Finally, one can decide to defer such decisions and turn them into
interaction possibilities, so that e.g. subsets are selected interactively.
A left-shifting arrow can be used to reduce the ’expected gestalt size’, in effect using
fewer data points within the echoic memory time frame. Some options are: investigat-
ing smaller, user-chosen data point subsets (this can be by means of interaction, e.g.
’tapping’ on a data region and hearing that subset); downsampling; by subsets chose by
appropriate random functions; and other forms of data preprocessing.
A down-shifting arrow can be used to reduce the ’dimensions number’, i.e. to employ less
data properties (or dimensions) in the presentation. Some options are: dimensionality
reduction by preprocessing (e.g. statistical approaches like Principal Component Analysis
(PCA), or using locality-preserving space-filling curves in higher-dimensional spaces, e.g.
Hilbert curves); and user-chosen data property subsets, keeping the option to explore
others later. Model-based sonification concepts may also involve dimensionality reduction
techniques, yet they are in principle quite different from mapping-based approaches.1
An up-shifting arrow can be used to increase the number of dimensions used in the
sonification design; e.g. for better discrimination of components in mixed signals, or
to increase ’contrast’ by emphasizing aspects with relevance-based weighting. Some
options are: time series data could be split into frequency bands to increase detail
resolution; extracting the amplitude envelope of a time series and using it to accentuate
its dynamic range2; other domain-specific forms of preprocessing may be appropriate for
adding secondary data dimensions to be used in the sonification design.
1Thomas Hermann, personal communication, Jan 2007.2Whether such transformations happen in the data preprocessing stage or in the audio DSP imple-
mentation of a sonification design makes no difference to the conceptual reasoning process.
45
A right-shifting arrow can be used to increase the number of data points used, which
can help to reduce representation artifacts. Some options are: interpolation of signal
shape between data points; repetition of data segments (e.g. granular synthesis with
slower-moving windows); local waveset audification (see section 5.3); and model-based
sonification strategies can be used to create e.g. physical vibrational models, whose state
may be represented in larger secondary datasets informed by comparatively few original
data points.
Interpolation in time-series data is often employed habitually without further notice; the
model proposed here strongly suggests notating this transformation as a right-shifting
arrow. If one is certain that the sampling rate used was sufficient, using cubic (or better)
interpolation instead of the actually measured steps creates a smoother signal which is
nearer to the phenomenon measured than the sampled values. When such a smoothed
signal is used for modulating an audible synthesis parameter, the potentially distracting
presence of the time step unit should be less apparent.
Z axis shifts
So far, all arrows have concerned movement in the front plane of the map, where only a
single auditory stream is used for data representation. After the time scale, the number
of streams is the second most fundamental perceptual design decision. By presenting
some data dimensions in parallel auditory streams (especially data dimensions of the
same type, such as time-series of EEG measurements for multiple electrodes), overall
display dimensionality can be increased in a straightforward way, while dimensionality
in each individual stream can be lowered substantially, thus making each single stream
easier to perceive. (The equivalent movement is difficult to represent well visually on
a 2D map, but easy to imagine in 3D space. Figure 5.2 shows a rotated view.) For
multiple streams, all previous arrow movements apply as above, and two more arrows
become available:
An inward arrow can be used to increase the number of parallel streams in the represen-
tation. Some options are: multichannel audio presentation; and setting one perceptual
dimension of the parallel streams to fixed values with large enough differences to cause
stream separation, thus in effect labelling the streams.
An outward arrow can be used to decrease the number of parallel streams in the repre-
sentation. Some options are: selecting fewer streams to listen to; intentionally allowing
for perceptual merging of streams.
Experimenting with different numbers of auditory streams can be very interesting, as
multiple perspectives on the same data ’content’ may well contribute to more intuitive
understanding of the dataset under study. Figure 5.2 shows the range of hypothetical
variants of a sonification design for a dataset with 16 dimensions; the graph plane is at
46
Figure 5.2: SDS Map for designs with varying numbers of streams.
Hypothetical variants of a sonification design for a dataset with 16 dimensions; see text.
an expected gestalt size of 100 data points, and the axes shown are Y (number of data
properties mapped) and Z (number of auditory streams employed). Different designs
might employ, for example, one stream with 16 mapped parameters, 2 streams with 8, 4
streams with 4, 8 with 2 and 16 streams with a single parameter. Of course, depending
on the character of the data dimensions, other, more asymmetrical combinations may
be worth exploring; these will typically be located below the diagonal shown.
Note that the map is slightly ambiguous between number of generated versus perceived
streams; parallel streams of generated sound may fuse or separate based on perceptual
context. This is a very interesting phenomenon whch can be quite fruitful: perceptual
fusion between streams can be an appropriate expression of data features, e.g. in EEG
recordings, massive synchronisation of signals across electrodes may cause the streams
to fuse, which can represent the nature of some epileptic seizures well.
47
5.1.5 Examples from the ’Science by Ear’ workshop
In order to clarify the theoretical considerations given so far, we now turn to analysing
design work done in an interdisciplinary setting. We report one exemplary set of design
sessions as they happened, with added after-the-fact analysis in terms of the Sonification
Design Space Map concept (short: SDSM). Where SDSM strongly calls for additional
designs, these are provided and marked as additions. This is intended to demonstrate
the potential of going from practice-grounded theory back to theory-informed practice.
The workshop concept is described in section 4.2.
The workshop setting
True to the inherently interdisciplinary nature of scientific data sonification, the SBE
workshop brought together three groups of people for three days: Domain scientists who
were invited to supply data they usually work with; an international group of sonification
experts; and audio programmers/sound designers. Apart from invited talks by the soni-
fication experts, the main body of work consisted of sonification design sessions, where
interdisciplinary groups (ca. 8 people, domain scientists, sonification experts, program-
mers, and a moderator) spent 2 hours discussing one submitted data set, experimenting
with different sonification designs, and then discussing results across groups in plenary
meetings.
In each session, discussion notes were taken as documentation, where possible the soni-
fication designs were kept as code, and all the sound examples played in the plenary
meetings were rendered as audio files. All this documentation is available online3.
Load Flow - data background
The particular data set serving as a starting example came from electrical power systems:
It captures electrical power usage for one week (December 18 - 24, 2004) across five
groups of power consumers: households, trade and industry, agriculture, heating and
warm water, and street lighting; a sum over all consumer groups was also provided.
Clear daily cycles were to be expected, as well as changes between workdays and week-
ends/holidays. While this is not scientifically challenging, it is a good example of simple
data with everyday relevance. We chose this dataset for the first parallel session, and it
did serve well for exploring basic sonification concepts with novices. The full documen-
tation for these sessions is available online here4.
3http://sonenvir.at/workshop/4http://sonenvir.at/workshop/problems/loadflow/. All sound examples can be found here, in the
folders TeamA, TeamB, TeamC, and Extras; for layout reasons, relative links at this site are given here
as ./TeamX/name.mp3 etc.
48
Figure 5.3: All design steps for the LoadFlow dataset.
Steps are shown as locations labeled with team name and step number (A1, B2, C3, etc.),
and arrows between locations.
The dataset was an excel file with 5 columns for the consumer groups, and consumption
values were sampled at 15 minute intervals; so for a week, there are 24 * 4 * 7 = 672
data points for the entire dataset. In SDSM terms, this puts the Data Anchor for this set
right in the middle of the Design Space Map, in the overlap zone between Discrete-Point
and Continuous sonification, see section 5.3.
Sonification designs
All sonification designs are shown as locations on the Design Space Map in figure
fig:loadmap labeled as A1, B1, C3 etc. Teams A and B created their design sketches in
SuperCollider3, while Team C worked with the PureData environment.
[A1] Team A began by sonifying the entire dataset as five parallel streams, scaled to 13
seconds, i.e. one day scaled to ca. 2 seconds; power values were mapped to frequency
with identical scaling for all channels5. The resulting five parallel streams were panned
into one stereo panorama.
After experimenting with larger and smaller timescales, agreement was reached that the
5 ./Team A/TeamA 1 FiveSines PowersToFreqs.mp3
49
initial choice of timescale was appropriate and useful. In SDSM terms, this means the
team was looking for auditory gestalts at the scale of single days.
[A+] As SDSM recommends starting with a synakusis into a timeframe of 3 seconds,
this is provided here6. This was only added after the workshop.
Then, alternative sound parameter mappings were tried out based on team suggestions:
[A2] Mapping powers to amplitudes of five tones labeled with different pitches7. While
this is closer in metaphorical distance, it is perceptually less successful: one could not
distinguish much shape detail in amplitude changes.
[A3] Mapping powers to amplitudes and the cutoff frequencies of resonant lowpass filters
of five differently pitched tones8. This was clearer again, but still not as differentiated
as mapping to tone frequencies.
[A4] Going back to mapping to frequencies, each tone was labeled with a different
phase modulation index (essentially, different levels of brightness)9. While this allowed
for better stream identification, the (very quickly chosen) scaling was not deemed very
pleasant, if inadvertently amusing.
[A5] Finally, the team tried using less parallel streams, and adding secondary data: the
phase modulation depth (basically, the brightness) of both channels (household and
agriculture) was controlled from the difference between the two data channels10. While
this did not work very well, it seemed promising with better secondary data choices;
however, at this point session time was over. In SDSM terms, design A5 is a move
down - to less channels - and a move back up - derived data used to control additional
parameters (the map only shows the resultant move).
Team B chose to do audification (following one sonification expert’s request), and to
use an interactive sonification approach: Their design loaded the entire data for one
channel (672 values, equivalent to one week of data time) into a buffer, and played back
a movable 96-value segment (equal to one day) as a looped waveform. The computer
mouse position was used to control which 24hour-segment is heard at any time. This
maps the signal’s local ’jaggedness’ into spectral richness and its overall daily change
into amplitude. (For the non-interactive sound examples that follow, the mouse is moved
automatically through the week within 14 seconds.)
While the team found the data sample rate and overall data size too low for much
detail, an interesting side effect turned up: when audifying segments in this fashion, the
difference between the same time of day for two adjacent days was emphasized; large
6./extras/LoadflowSynakusis.mp37./Team A/TeamA 2 FiveTones PowersToAmps.mp38./Team A/TeamA 3 FiveTones PowersToAmpsAndFilterfreqs.mp39./Team A/TeamA 4 FiveFMSounds IDbyModDepth.mp3
10./Team A/TeamA 5 TwoFMSounds DiffToModDepth.mp3
50
differences at a specific time between adjacent days created strong buzzing11. In the next
design step, 2 channels, households (left) and agriculture (right) were compared side by
side12, and for clearer separation, they were labeled with different loop frequencies 13.
The final design example maps the power values corresponding to the current mouse
position directly to the amplitude of a 50Hz (European mains frequency) filtered pulse
wave 14. As above, in the fixed rendering here, the mouse moves through the week at
constant speed within 14 seconds.
In SDSM terms, the initial choices were to move all the way down on the map (to
only 1, and then 2 out of 5 channels at a time), and essentially a move to the left: a
one-day window chosen data subset was played by moving a one-day window within the
data. Note that this move is actually creating an interaction parameter for sonification
design users, which is one the many advantages of current interactive programming
environments.
Note that the interpolation commonly used in audification is actually slightly dubious
here: There may well have been meaningful short-time fluctuations within 15 minute
intervals which would not have been captured in the data as supplied.
Team C used PureData as programming environment. Their approach was quite similar
to Team A, with interesting differences: They began with scaling each single data channel
into 3 seconds, mapping power in that channel both to frequency and to amplitude, and
subsequently rendered all channels in this fashion15. Finally, this team also produced a
version with six parallel streams (including the sum value), scaled into 12 seconds, and
with different timbres16.
In SDSM terms, they first moved to the bottom of the map, while keeping full data
scale, i.e. a synakusis-sized time window; example 7 moves back up (using all channels),
and to the left (i.e. toward higher time resolution, gestalts on the order of single days
of data).
5.1.6 Conclusions
Conceptualising the sonification design process in terms of movements on a design space
map, one can experiment freely by making informed decisions between different strategies
to use for the data exploration process; this can help to arrive at a representation which
produces perceptible auditory gestalts more efficiently and more clearly. Understanding
the sonification process itself, its development, and how all the choices made influence
11./Team B/1 LoadFlow B Households.mp312./Team B/2 LoadFlow B households agriculture.mp313./Team B/3 LoadFlow B households agriculture.mp314./Team B/4 LoadFlow B households agriculture.mp315http://sonenvir.at/workshop/problems/loadflow/Team C/, sound examples 1-6.16./Team C/TeamC AllChannels.mp3
51
the sound representation one has arrived at, is essential in order to attribute perceptual
features of the sound to their possible causes: They may express properties of the dataset,
they may be typical features of the particular sonification approach chosen, or they can
be artifacts of data transformation processes used.
As these analyses of some rather basic sonification design sessions show, the terminology
and map metaphor provide valuable descriptions of the steps taken; having the map
available (mentally or physically) for a design work session seems very likely to provide
good clues for next experimental steps to take.
Note that the map is open to extensions: As new sonification strategies and techniques
evolve, they can easily be classified as either new zones, areas within existing zones, or as
transforms belonging to one of the directional arrows categories; then their appropriate
locations on the map can easily be estimated and assigned.
5.1.7 Extensions of the SDS map
There are several ways to extend the map, and make it more useful, and this dissertation
aims to provide some of them:
More and richer detail can be added by analysing the steps taken in observed design
sessions, classifying them as strategies, and adding them if new or different. This is the
object of chapters 6, 7, 8, 9, 10, and 11, the example sonification designs from different
SonEnvir research activities.
A more detailed analysis of the existing varieties model-based sonification can be under-
taken, and that understanding can and should be expressed in the terms of the conceptual
framework of the map; however, this is beyond the scope of this thesis.
Expertise can be integrated by interviewing sonification experts, tapping into their expe-
rience, inquiring about their favorite strategies, or decisions they remember that made
a big difference for a specific design process.
One can imagine building an application that lets designers navigate a design space map,
on which simple example data sets with coded sonification designs are located. When
one moves in an area that corresponds to the dimensionality of the data under study, the
nearest example pops up, and can be adapted for experimentation with one’s own data.
Obviously such examples should be canonical and capture established sonification best
practices and guidelines, e.g. concerning mapping Walker (2000), as well as sonification
design patterns Barrass and Adcock (2004).
Finally, many of the strategies need not be fixed decisions made once; being able to delay
many of the strategic choices, and to make them available as interaction parameters when
exploring a dataset can be extremely valuable.
52
5.2 Data dimensions
Before proceeding to synthesis models, it will be helpful to discuss the nature of data
dimensions in more depth.
5.2.1 Data categorisation
In data analysis, data dimensions are classified by scales: data may capture categorical
differences, ordered differences, which may have a metric, and a natural zero.
Table 5.1: Scale types
Scale: Characteristics: Example:
nominal difference without order kind of animal
ordinal difference with order degrees of sweetness
interval difference with order and metric temperature
ratio difference, order, metric, and natural zero length
For nominal scales (such as ’kind of animal’) and ordinal scales, it is useful to know
the set of all occurring values, or categories (such as cat, dog, horse). The size of this
set greatly influences the choices of possible representations of the values in this data
dimension.
For metrical scales (interval and ratio), it is necessary to know the numerical range in
order to make scaling choices; also knowing the measurement resolution or increment
(for example, age could measured in full years, or days since birth) and precision (e.g.
tolerances of a measuring device) is useful.
5.2.2 Data organisation
Apart from the phenomena recorded, and their respective values, data may have differ-
ent forms of organisational structure: Individual data points may have different kinds
of neighbour relations to specific other data points. The simplest case would be no
organisation at all: Measuring all the individual weights of a herd of cows is just a set
of measured values with no order. When recording health status at the same time, each
data point has two dimensions, but there is still no order.
If the cows’ identities are recorded as well, similar measurements at different times can
be compared. If the cows have names, the data can be sorted alphabetically (nominal
scale); if the cows’ birth dates are known as well, the data can also be sorted by age
53
(interval). Both sortings are derived from data dimensions, and there is no obvious
’best’, or preferable order. Often the order in which data are recorded is considered an
implicit order; however, in the example given, it may simply be the order in which the
cows happened to be weighed. In social statistics, data for individuals or aggregates
without obvious neighbour relations are the most frequent case.
When physical phenomena are studied, measurements and simulations are often organ-
ised in time (e.g. time series of temperature) and space (temperature in n measuring
stations in a geographical area, or force field simulations in a 3D grid of a specific res-
olution). These orders can actually be considered separate data dimensions; for clear
differentiation one may call a dimension which expresses a value (such as temperature)
a value dimension, while a dimension that expresses an order (e.g. a position in time or
space) can be called ordering dimension or indexing dimension.
TaskData analysis by Barrass (1997), chapter 4, provides a template that captures data
dimensions systematically, as well as initial ideas for desirable ways of representation of
and interaction with the data under study. As a practical example, the TaDa Analysis
made for the LoadFlow dataset as a preparation for the Science By Ear workshop is
reproduced here.
5.2.3 Task Data analysis - LoadFlow data
Name of Dataset: Load Flow
Date: March 12, 2006
Authors: Walter Hipp, Alberto de Campo (TaDa)
File: LoadFlow.xls (original), .tab, .mtx.
Format: excel xls original, tab delimited for Sc3, mtx format for pure data. The
file contains 672 lines with date and time, total electrical power consumption, and
consumption for five groups of power consumers.
Scenario
The Story:
Load Flow describes how the electrical power consumption of different groups of con-
sumers changes in time. A time series was taken for a week (in Winter 2004) of 15
minute average values, documenting date and time, total power consumption, and con-
sumption for a) households, b) trade, c) Agriculture, d) heating and warm water, and
e) street lighting.
Tasks for this data set:
54
• Find out which kinds of patterns can be discerned at which time domain; e.g. daily
cycles versus shorter fluctuations.
• Since all five individual channels have the same unit of measurement, find ways to
represent them in a way that their values and their movements can be compared
directly.
Table 5.2: The Keys
Question: Who uses how much power when?
Are there patterns that recur? At what time scales?
Are there overall periodicities?
Answers: One or several of the channels;
Yes/No, days/hour/times of day;
categories of pattern shapes
Subject: Relative proportions, patterns of change in time
Sounds: ? (none at the time the TaDa analysis was written)
TaDa
Table 5.3: The Task
Generic question: What is it? How does it develop?
Purpose: Identify, compare
Mode: interactive
Type: continuous
Style: exploration
Table 5.4: The Data/Information:
Level: Intermediate and global
Reading: Conventional (possibly direct)
Type: 5 channels, ratio
Range: continuous
Organization: time
55
Table 5.5: The Data:
Type: 5 channels of ratio scale with absolute zero
Range: Individual channels 0 2.24, total power 1.08 4.55
Organisation: Time
Appendix
Figure 5.4: LoadFlow - time series of dataset (averaged over many households)
Figure 5.5: LoadFlow - time series for 3 individual households
56
5.3 Synthesis models
Perceptualisation designs always require decisions in what manner precisely data values
(the ’sonificate’) determine perceptible representations (in the case of auditory repre-
sentation, the sonifications). While section 5.1 focused on which data subsets are to be
presented in the rendering, this section covers the question which technical aspects of
the sound synthesis algorithms deployed are to be determined by which data dimensions.
The three sonification strategies defined in section 5.1.2 are discussed in more depth,
and concrete examples of synthesis processes are provided in ascending complexity.
With all strategies from the very simplest to the most complex model-based designs,
decisions of mappings (of data dimensions or model properties) to synthesis parameters
are required; these decisions need to be informed by perceptual principles such as those
covered in chapter 2.
While building sonification designs may be technically simple, mapping choices are by no
means trivial. One aspect to consider is metaphorical proximity: Mappings that relate
closely to concepts in the scientific domain may well reduce cognitive load and thus
allow for better concentration on explorations tasks. (For a discussion of performance of
clearly defined tasks with ’intuitive’, ’okay’, random, and intentionally ’bad’ mappings,
see Walker and Kramer (1996), described in section 2.5.)
Another aspect is the clarity of the communicative function to be fulfilled in the research
context: What will a perceptible aspect of the sound serve as? Some possible categories
are:
analogic display of a data dimension - a value dimension mapped to a synthesis pa-
rameter which is straightforward to recognise and follow
a label identification for a stream - needed when several streams are heard in par-
allel
an indexing strategy - ordering the data by one dimension, then indexing into subsets
context information/orientation - mapping non-data; e.g. using clicks to represent
a time grid
Finally, it is essential to understand the resolution of perceptual dimensions, such as
Just Noticeable Differences (JNDs) of perceptual dimensions. Note that sound process
parameters need not be directly perceptible; they may govern aspects of the sound that
will indirectly produce differences that may be described perceptually in other terms.
Perceptual tests can be integrated into the sonification design process, like writing tests
to verify that new code works as intended. Writing examples that test whether a specific
concept produces audible differences for the data differences of interest can provide
57
such immediate confirmatory feedback, as well as direct learning experience for the test
listeners immediately at hand. Such examples also provide a good base for discussions
with domain specialists.
Similar mapping decisions come up in the process of designing electronic or software-
based music instruments; how the ranges of sensor/controller inputs (the equivalent to
data to be sonified) are scaled into synthesis parameter ranges determines how playing
that instrument will feel to a performer.
5.3.1 Sonification strategies
The three most common concepts, Continuous Data Representation, Discrete Point
Data Representation, and Model-Based Data Representation, correspond closely to the
approaches described first in Scaletti (1994). The examples given again use the LoadFlow
dataset, and loosely follow the order given by Scaletti. Pauletto and Hunt (2004) briefly
describe how different data characteristics sound under different sonification methods:
Static areas, trends, single outliers, discontinuities, noisy sections, periodicities (loops),
or near-periodicities are simple characteristics that may occur in a single data dimension,
and will be used as examples of easily detectable phenomena.
The data for the code examples can be prepared as follows:
( // load data file
q = q ? (); // a dictionary to store things by name
// load tab-delimited data file:
q.text = TabFileReader.read( "LoadFlow.tab".resolveRelative, true, true );
// keep the 5 interesting channels, convert to numbers
q.data = q.text.drop(1).collect { |line| line[3..7].collect(_.asFloat) };
// load one data channel into a buffer on the server
q.buf1 = Buffer.loadCollection(s, q.data.flop[0]); // households
);
5.3.2 Continuous Data Representation
Audification is the simplest case of continuous data representation: Typically, converting
the numerical values of a long enough time series into a soundfile is a good first pass at
finding structures in the data. Scaletti (1994) calls this 0th order sonification. Scaling
the numerical values is straightforward, as one only needs to fit them into the legal range
for the type of soundfile to be used; for high precision, 32 bit floating point data can be
converted to sound file formats without any loss of information. For audification, one can
simply scale the (expected or actual) maximum and minimum values to the conventional
-1.0 to +1.0 range for audio signals at full level. This maps the data dimension under
58
study directly to the amplitude of the audible signal.
By making the playback rate user-adjustable allows for simple time-scaling, one can
change expected gestalt size interactively. The fastest timescaling value will typically
be around 40-50 kHz, which includes the default sample rates of most common audio
hardware; this puts roughly 100.000 data points into working memory, which makes
audification the fastest option for screening large amounts of data with minimal prepro-
cessing.
Typical further operations to provide are: selection of an index range in the data, options
for looped and non-looped playback, and synchronised visual display of the waveform
under study. The EEGScreener described in chapter 9.1 is an example of a powerful,
flexible audification instrument.
Of the phenomena to be detected, static values will become silent: the human ear does
not hear absolute pressure values, and while audio hardware may output DC offsets,
loudspeakers do not render these as reproducible pressure offsets. Trends are also not
represented clearly: Ramp direction is not an audible property. Single outliers become
sharp clicks, and discontinuities (e.g. large steps) become be loud pops. Rapidly fluctu-
ating sections will sound noisy, and periodicities will be easy to discern even in they are
only weak components in mixed signals.
Code examples for 0th order - audification.
p = ProxySpace.push; // prepare sound
~audif.play; // start an empty sound source
// play entire week once, within 0.05 seconds
~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05) * 0.1 };
// try agriculture data
q.buf1.loadCollection(s, q.data.flop[2]);
// play the entire week looped
~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05, loop: 1) * 0.1 };
The next example loops over an adjustable range of days; starting day within the week
and loop length can be set in days.
(
~audif = { |dur = 0.05, day=0, length=1|
var stepsPerDay = 96;
var start = day * stepsPerDay;
var rate = q.buf1.duration / dur;
// read position in the data buffer
59
var phase = Phasor.ar(1, rate, start, start + (length * stepsPerDay));
BufRd.ar(1, q.buf1, phase, interpolation: 4) * 0.1;
};
)
The next example loops a single day, and allows moving the day-long time window, thusnavigating by mouse - this is the solution SBE TeamB developed.
(
~audif = {
var start = MouseX.kr; // time in the week (0 - 1)
var range = BufFrames.kr(q.buf1); // full range is one week.
var rate = 1 / 10; // guess a usable rate
var phase = Phasor.ar(0, rate, 0, range / 7) + (start * range);
var out = BufRd.ar(1, q.buf1, phase, interpolation: 4);
out = LeakDC.ar(out * 0.5); // remove DC offset
};
)
Parameter mapping continuous sonification, or what Scaletti calls 1st-order sonification,
maps data dimensions onto parameters that control a directly audible synthesis parame-
ter, such as pitch, amplitude (of a carrier signal), brightness, etc. Here, the simplest case
would be mapping to frequency (a synthesis parameter) respectively pitch (a perceptual
property of the rendered sound).
The first example maps the data range of 0 - 2.24 into pitch range of (midinote) 60 -
96, or frequencies between ca 260 and 2000 Hz, time-scaled into 3 seconds.
// loop a week’s equivalent of data
(
~maptopitch.play;
~maptopitch = { | loopdur = 3|
var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1);
var pitch = datasignal.linlin(0, 2.24, 60, 96); // scale into 3 octaves;
var sound = SinOsc.ar(pitch.midicps) * 0.2;
Pan2.ar(sound);
}
)
It may seem a little over-engineered here, but in general, it is a good idea to consider
here what the smallest data variations of interest are, and whether they will be audible
in the mapping used.
60
While data for Just Noticeable Differences for some perceptual properties of sound exist
in the literature, their values will depend on the experimental context and circumstances.
Thus, rather than relying only on experiments which were conducted for other purposes,
it makes sense to do at least some perceptual tests for the intended usage context.
For example given above, data resolution is 0.01 units; scaled from range [0, 2.24] into[60, 96] creates a minimum step of 0.01 * 36 / 2.24, or 0.16 semitone steps. Theliterature agrees that humans are most sensitive to pitch variation when it occurs at a(vibrato) rate of ca. 5 Hz, so a first test may use a pitch of 78 (center of the chosenrange), a drift/variation rate of 5 Hz, and a variation depth of +-0.08 semitones; allof these can be adjusted to find the marginal conditions where pitch variation is justnoticeable.
(
~test.play;
~test = { |driftrate = 5, driftdepth = 0.08, centerpitch = 78|
var pitchdrift = LFNoise0.kr(driftrate, driftdepth);
SinOsc.ar( (centerpitch + pitchdrift).midicps) * 0.2
};
)
Changing driftrate, driftdepth and center pitch will give an impression of how this be-
haves; to my ears, 0.08 is in fact very near the edge of noticeability. One could sys-
tematically test this by setting drift depth to random start values above and below the
expected JND, and having test persons do e.g. forced choice tests that would converge
on the border for a given drift rate and center pitch.
The next example maps the same data values to amplitude, which could seem metaphor-ically closer - the data value is consumed energy, and amplitude is directly correlatedto acoustical energy. However, the rendering is perceptually not very clear: humans aregood at filling in dropouts in audio signals, such as speech phonemes masked in noisyenvironments, or damaged by bad audio connections, such as intermittent telephonelines. The patterns that emerged in the pitch example, where the last three days areclearly different, almost disappear. Changing to linear mapping instead of exponentialmakes little difference.
(
~maptoamp.play;
~maptoamp = { | loopdur = 3|
var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1);
var amp = datasignal.linlin(0, 2.24, -60, -10).dbamp;
// var amp = datasignal * 0.2; // linear mapping
var sound = SinOsc.ar(300) * datasignal * 0.2;
Pan2.ar(sound);
}
61
)
The next example shows what Scaletti calls a second-order mapping. The data aremapped to a parameter that controls another parameter, phase modulation depth; how-ever, perceptually this translates roughly to brightness (which could be considered afirst-order audible property).
(
~maptomod.play;
~maptomod = { | loopdur = 3|
var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1);
var modulator = SinOsc.ar(300) * datasignal * 2;
var sound = SinOsc.ar(300, modulator) * 0.2;
Pan2.ar(sound);
}
)
5.3.3 Discrete Data Representation
An alternative approach to creating continuous signals based on data dimensions, one can
also create streams of events, which may sound note-like when slower than ca. 20 events
per second; at higher rates, they can best be described with Microsound terminology, as
granular synthesis.
The example below demonstrates the simplest case: one creates one synthesis event for
each data point, with a single data dimension mapped to one parameter. A duration of
3 seconds will create a continuous-seeming stream; 10 seconds will sound like very fast
grains, while 30 seconds takes the density down to 22.4 events per second, which can
seem like very fast marimba-like sounds.
(
~grain.play;
~grain = { |pitch=60, pan|
var sound = SinOsc.ar(pitch.midicps);
var envelope = EnvGen.kr(Env.perc(0.005, 0.03, 0.2), doneAction: 2);
Pan2.ar(sound * envelope, pan)
};
// ~grain.spawn([\pitch, 79]);
Tdef(\data, {
var duration = 10;
var datachannel = q.data;
var power;
62
q.data.do { |chans|
power = chans[0]; // households;
~grain.spawn([\pitch, power.linlin(0, 2.24, 60, 96)]);
(duration / datachannel.size).wait;
};
}).play;
)
5.3.4 Parallel streams
When the dimensions in a data set are directly comparable (like here, where they are all
power consumption measured in the same units at the same time instants), it is concep-
tually convincing to render them as parallel streams. Auditory streams, as discussed in
Bregman (1990) and Snyder (2000), are a perceptual concept: a stream is formed when
auditory events are grouped together perceptually, and multiple streams can form when
all the auditory events separate into several groups.
With the example above, a minimal change can be made to create two parallel streams:
Instead of creating one sound event for one data dimension, one creates two, and pans
them left and right for separating the two streams by spatial location.
(
Tdef(\data, {
var duration = 10;
var datachannel = q.data;
var powerHouse, powerAgri;
~grain.play;
q.data.do { |chans|
powerHouse = chans[0];
powerAgri = chans[2];
~grain.spawn([\pitch, powerHouse.linlin(0, 2.24, 60, 96), \pan, -1]);
~grain.spawn([\pitch, powerAgri.linlin(0, 2.24, 60, 96), \pan, 1]);
(duration / datachannel.size).wait;
};
}).play;
)
When presenting several data dimensions simultaneously, one can obviously map them
to multiple parameters of a single synthesis process, thus creating one stream with
multiparametric controls. This makes the individual events fairly complex, and may
require that each event has more time to unfold perceptually. (In the piece Navegar, a
fairly complex mapping is used, see section 11.3.)
63
It should be noted that what is technically created as one stream of sound events is not
guaranteed to fuse into one perceptual stream - it may split into several layers, just like
separately created multiple streams may perceptually merge into single auditory stream.
In fact, as perception is strongly influenced by a listener’s attitude, one can intentionally
choose analytic or holistic listening attitudes; either focusing on details of rather few
streams, or listening to the overall flow of the unfolding soundscape - whether it is a
piece of music or a sonification.
5.3.5 Model Based Sonification
In Model Based Sonification (Hermann and Ritter (1999)), the general concept is that the
data values are not mapped directly, but inform the state of a model; properties of that
model (which is a kind of front-end) are then accessed when user input demands it (e.g.
by exciting the model with energy, somewhat akin to playing a musical instrument). The
model properties then determine how the sound engine renders the current user input;
this backend inevitably contains some mapping decisions to which the considerations
given here can be applied.
Till Bovermann’s example implementation of the Data Sonogram (Bovermann (2005)) is
a good compact example for MBS. The approach is to treat the data values as points in
n-dimensional space (for the example Iris data set, 4); then user input triggers a circular
energy wave propagating from a current user-determined position, and the reflections of
each data point are simulated by mapping distance (in 4D space) to amplitude and delay
time, as if in natural 3D space. The other parameters for the sound grains (frequency,
number of harmonics) are also determined by data based mappings.
The Wahlgesange sonification based on this examples uses somewhat more elaborate
mapping: Distance in 2D is mapped to delay and amplitude, with user-tunable scaling;
panning is determined by 2D circular coordinates; the data value of interest (voter
percentage) is mapped to the sound grain parameter pitch, and controls for attack/decay
times make the tradeoff between auditory pitch resolution and time resolution explicit.
Both of these examples are too extended for the context here; but they are both available
online, and Wahlgesange is described in more detail in section 6.2.
While it would be worthwhile to analyse more MBS examples in detail, this is beyond
scope of the present thesis. Further research will be necessary for a more fine-grained
integration of the model-based approach into the context of the sonification models
given here.
64
5.4 User, task, interaction models
Humans experience the world with all their senses, and interacting with objects in the
world is the most common everyday activity they are well trained at. For example,
handling physical objects may change their visual appearance, and touching, tapping or
shaking them may produce acoustic responses they can use to learn about objects of
interest. Perception of the world, action in it, and learning are tightly linked in human
experience, as discussed in section 2.3.
In artificial systems that model aspects of the world, from office software to multimodal
display systems, or sonification systems in particular, interaction crucially determines how
users experience such a system: whether they can achieve tasks correctly (effectiveness)
with it, whether they can do so in reasonable amounts of time (efficiency), and whether
they enjoy the working process (positive user experience, pleasantness).
This section looks at potential usage situations of sonification designs and systems: the
people working in these contexts (’sonification users’); the goals they will want to pursue
by means of (or supported by) sonification; the kinds of tasks entailed in pursuing these
goals; the kinds of interfaces and/or devices that may be useful for these goals; and
some notions of how to go about matching all of these.
5.4.1 Background - related disciplines
Interaction is a field where a number of disciplines come into play:
Human Computer Interaction (HCI) studies the alternatives for communication between
humans and computers (from translating user actions into input for a computer system
to rendering computer state into output for the human senses), sometimes to amazing
depths of detail and variety (Buxton et al. (2008); Dix et al. (2004); Raskin (2000)).
Musical instruments are highly-developed physical interfaces for creating highly differ-
entiated acoustic sound, with a very long tradition; in electronic music performance,
achieving similar degrees of control flexibility (or better, control intimacy) has long been
desirable. While the mainstream music industry has focused on a rather restricted set
of protocols (MIDI) and devices (mostly piano-like keyboards, simulated tape machines,
and mixing desks), experimental interfaces that allow very specific, sometimes idiosyn-
cratic ideas of musical control have been an interesting source of problems for interested
engineers. The research done at institutions like STEIM17 (see Ryan (1991)) and CN-
MAT18 (Wessel (2006)) has made interface and instrument design its own computer
music sub-discipline, with its own conference (NIME19, or ’New Instruments/Interfaces
17http://www.steim.nl18http://cnmat.berkeley.edu19http://www.nime.org
65
for Musical Expression’, since 2001).
Computer game controllers tend to be highly ergonomic and very affordable; thus they
have become a popular resource for artistic (re-)appropriation as cheap and expressive
music controllers: Gamepads, and more recently, Wii controllers, have both been adopted
as is, and creatively rewired for specialised artistic uses.
This has been part of an emerging movement toward more democratic electronic devices:
Beginning with precursors like Circuit Bending (Ghazala (2005)), extending the design
of sound devices my introducing controlled options for what engineers might consider
malfunction), designers have created open-source hardware - such as the Arduino mi-
crocontroller board20 - to simplify experimentation with electronic devices. With these
developments, finding ways to create meaningful connections and new usage contexts for
object-oriented hardware (Igoe (2007)) has become interesting for a much larger public
than strictly electronics engineers and tinkerers.
5.4.2 Music interfaces and musical instruments
CD/DVD players or MP3 players tend to have rather simple interfaces: play the current
piece, make it louder or softer, go to the next or previous track, use randomised or
ordered playback of tracks.
A piano has a simple interface for playing single notes: one key per note, ordered
systematically, and hitting the key with more energy will make it louder. Thus, beginners
can experience rather fast success at finding simple melodies on this instrument. Playing
polyphonic music really well on piano is a different matter; as Mick Goodrick puts it, in
music there is room for infinite refinement (Goodrick (1987)).
On a violin, learning to produce good tone already takes a lot of practice; and playing in
tune (for whichever musical culture one is in) requires at least as much practice again.
(One is reminded of the joke where a neighbour asks, ”why can’t your children spend
more time practicing later, when they can already play better?”)
Instruments from non-western cultures may provide interesting challenges: Playing nose
flutes is a good example of an instrument that involves the coordination of unusual
combinations of body parts, thus developing (in Western contexts) rather unique skills.
However, a violin allows very subtle physical interaction with musical sound while it is
sounding, and in fact requires that skill for playing expressively. On piano, each note
sounds by itself once it has been struck, thus the relations between keys pressed, such
as chord balance, micro-timing between notes, and agogics are the main strategies for
playing expressively on the piano.
In Electronic Music performance, mappings between user actions as registered by con-
20http://www.arduino.cc
66
trollers (input devices like the ones HCI studies, buttons, sliders, velocity-sensitive keys,
sensors for pressure, flexing, spatial position etc.) and the resulting sounds and musical
structures are essentially arbitrary - there are no physical constraints as in physical instru-
ments. Designing satisfying personal instruments with digital technology is an interesting
research topic in music and media art; e.g. Armstrong (2006) bases his approach on
a deep philosophical background, and discusses his example instrument in these terms;
Jorda Puig (2005) provides much historical context of electronic instruments, and dis-
cusses an array of his own developments in that light. Thor Magnusson’s (and others’)
ongoing work with ixi software21 explores applying intentional constraints to interfaces
for creating music in interesting ways.
5.4.3 Interactive sonification
The main researchers who have been raising awareness for interaction in sonification
are Thomas Hermann and Andy Hunt, who started the series of Interactive Sonification
workshops, or ISon22. In the introduction to a special issue of IEEE Multimedia resulting
from ISon2004, the editors give the following definition:
”We define interactive sonification as the use of sound within a tightly closed human-
computer interface where the auditory signal provides information about data under
analysis, or about the interaction itself, which is useful for refining the activity.” (Her-
mann and Hunt (2005), p 20)
In keeping with Hermann’s initial descriptions of Model-Based Sonification (Hermann
(2002)); they maintain that learning to ’play’ a sonification design with physical inter-
action, as with a musical instrument, really helps users acquire an understanding of the
nature of the perceptualisation processes involved and of the data to be explored. They
find that there is not enough research on how learning in interactive contexts actually
occurs.
The Neuroinformatics group at University Bielefeld (Hermann’s research group) has
studied a number of very interactive interfaces in sonification contexts: recognizing
hand postures to control data exploration (Hermann et al. (2002)), a malleable surface
for interaction with model-based sonifications (Milczynski et al. (2006)), tangible data
scanning using a physical object to control movement in model space (Bovermann et al.
(2006)), and others.
At University of York, in Music Technology, Hunt has studied both musical interface
design issues (e.g. Hunt et al. (2003)) and worked on a number sonification projects
mainly with Sandra Pauletto (e.g. Hunt and Pauletto (2006)). Pauletto’s PhD thesis,
Interactive non-speech auditory display of multivariate data (Pauletto (2007)), discusses
21http://www.ixi-software.net/22 http://interactive-sonification.org/
67
interaction and sonification in great detail (pp. 56-67), and studies central sonification
issues with user experiments: The first two experiments compare listening to auditory
displays of data (audifications of helicopter flight data, sonifications of EMG (elec-
tromyelography) data) with their traditional analysis methods (visually reading spectra,
signal processing analysis). In both cases, auditory display of large multi-variate data
sets turned out to be an effective choice of display.
Her third experiment directly studies the role of interaction in sonification: Three al-
ternative interaction methods are provided for exploring synthetic data sets to locate a
given set of structures. A low interaction method allows selection of data range, play-
back speed, and play/stop commands. For the medium interaction method, a jog wheel
and shuttle is used to navigate the sonification at different speeds and direction. The
high interaction method lets the analyst navigate by moving the mouse over a screen
area that corresponds to the data, like tape scrubbing.
Both objective measurements and subjective opinions found the low interaction method
less effective and efficient, and preferred the two higher interaction modes. Interestingly,
users preferred the medium interaction mode for its option to quickly set the sonification
parameters, and then letting it play while concentrating on listening; the high interaction
method requires constant user activity to keep the sound going. It should be noted here
that these results strictly apply only to the specific methods studied, and cannot be
generalised; however, they do provide interesting background.
5.4.4 ”The Humane Interface” and sonification
The field of Human Computer Interaction (HCI) is very wide and diverse, and cannot
be covered here in depth. However, a rather specialised look at some examples of inter-
faces may suffice to provide enough context for discussing the main issues in designing
sonification interfaces.
Rather than attempting to cover the entire field, I will take a strong position statement
by an expert in the field as a starting point: Jef Raskin was responsible for the Macintosh
Human Interface design guidelines that set the de facto standard for best practice in HCI
for a long time, and his book ”The Humane Interface” (Raskin (2000)) is an interesting
mix between best practice patterns and rather provocative ideas.
Here is a brief overview of the main statements by chapter:
1. Background - The central criterium for interfaces is quality of the interaction; it
should be made as humane as possible. Humane means responsive to human needs, and
considerate of human frailties. As one example, the user should always determine the
pace of interaction.
2. Cognetics - Human beings only have a single locus of attention, and a single focus
of attention, which in interactions with machines is nearly always on the task they try
68
to achieve.23 Computers and interfaces should not distract users from their intentions.
Human beings always tend to form habits; user interfaces should allow the formation of
good habits, as through benign habituation competence becomes automatic. A possible
measure of how well an interface supports benign habituation is to imagine whether a
blind user can learn it.
As a more general point, humans mostly use computers to get work done; here, user
work is sacred, and user time is sacred.
3. Modes - Modes are system states where the same user gesture can have different
effects, and are generally undesirable; one should eliminate modes where possible. The
exception to the rule is physically maintained modes, which he calls quasi-modes (entered
e.g. by holding down a special key, and reverted to normal when the key is released.)
Visible affordances should provide strong clues as to their operations. If modes cannot be
entirely avoided, monotonic behaviour is the next best solution: a single gesture always
causes the same single operation; and in a mode where the operation is not meaningful
the gesture should do nothing. It is worth keeping in mind that everyone is both expert
and novice at the same time when different aspects of a system are considered.
4. Quantification - Interface efficiency can be measured, e.g. with the GOMS Keystroke
model. For most cases, ’back of the envelope’ calculations give a good first indication
of efficiency; standard times for hitting a key, pointing by mouse, moving from mouse to
keyboard, and mentally preparing an action are sufficient for that. Finding the minimum
combination for a given task is likely to make that task more pleasant to perform.
Obviously, the time a user is kept waiting for software to respond should be as low as
possible; while a user is busy with other things, s/he will not notice waiting times.
5. Unification - This chapter ranges far beyond the scope needed here, eventually making
a case that operating systems and applications should disappear entirely. Fundamental
actions are catalogued, and variants of computer-wide string search are discussed as one
example of how system-wide unified behaviour should work.
6. Navigation - The adjectives ’intuitive’ and ’natural’ when used for interfaces generally
translate to ’familiar’. Navigation, as with the ZoomWorld approach might be interesting
for organising larger collections of sonification designs; for the context of the SonEnvir
project these ideas were not applicable.
7. Interface issues outside the user interface - Programming Environments are notoriously
bad interfaces, and actually have been getting worse: On a 1984 computer, starting up,
running Basic, and typing a line to evaluate 3 +4 may be accomplished in maybe 30
seconds; on a current (2000) computer, every one of these steps takes much longer,
even for expert users.
23Raskin’s motto for the chapter is a quote from a character in the TV series Northern Exposure,
Chris: ”I can’t think of X if I’m thinking of Y.”
69
Relevance to sonification and the SonEnvir project
The most closely related notion to disappearing system software (chapter 5) is the
Smalltalk heritage of SC3. Smalltalk folklore says that ’when Smalltalk was a little
girl, she thought she was an operating system’ - one could do almost everything within
Smalltalk, including one of Raskin’s major desirables, namely, defining new operations by
text at any time, which change or extend the ways things work in a given environment.
The question what ’user content’ is actually being created is extremely important in
sonification work: In sonification usage situations, ’content to keep’ can comprise uses of
a particular data file, particular settings of the sonification design, perceptual phenomena
observed with these data and settings, and text documentation, i.e., descriptions of all
of the above and possibly user actions to take to cause certain phenomena to emerge.
The text editor and code interface in SC3 is well suited for this: commands to invoke
a sonification design (e.g. written as a class), code to access a specific data file, and
notes of observations can be kept in a single document, as they are all just text. Across
different sonification designs, SC3 behaves uniformly in this respect.
Compared to most programming environments, the SC3 environment allows very fluid
working styles. Documentation within program creation (literate programming, as Don-
ald Knuth called it), is supported directly.
5.4.5 Goals, tasks, skills, context
From a pragmatic point of view, a number of compromises need to be balanced in
interaction design, especially when it is just one of several aspects to be negotiated:
• Simple designs are quicker to implement, test and improve than more complex
designs. Given that one usually understands requirements much better by imple-
menting and discussing sketches, simpler designs will often be better.
• Exotic devices can be very interesting; however, they limit transferability to other
users, and will require extra costs and development time. Even when there is a
strong reason to use a special interface device, including a fallback variants with
standard UI devices is recommended.
• Functions should be clearly made available to the users; usually that means making
them visible affordances. (Buxton et al. (2008) argues here that the attitude ’you
can do that already’, in some arcane way experts may know about, means that
final users will not use that implemented function.)
Goals are firmly grounded in the application domain, and with the users. What do users
want to achieve with the sonification design to be created? The goals will naturally
70
be different for different domains, datasets, and contexts (e.g. research prototypes or
applications for professional use); nevertheless these examples may apply to most designs:
• experience the differences between comparable datasets of a given kind
• find phenomena of interest within a given dataset, e.g. at specific locations, with
specific settings
• document such phenomena and their context, as they may become findings
• make situations in which phenomena of interest occurred repeatable for other users
The interaction design of a sonification design should allow the user’s focus of attention
to remain at least close to these top-level goals. Ideally, the design should add as little
cognitive load as necessary for the user, to keep her attention free for the goals.
The sonification design’s interface should offer ways to achieve all necessary and useful
actions toward achieving these goals. The concepts for these actions should obviously
be formulated in terms of the mental model the user has of the data and the domain
they come from.
Tasks comprise all the actions users take to achieve their top-level goals. Tasks can
be directly functional for attaining a goal, or necessary to change the system’s state
such that a desired function becomes available. Systems that often require complicated
preparation to get things done tend to distract users from their goals, and are thus
experienced as frustrating. Some example tasks that come up when using a sonification
design are:
• load a sonification design of choice (out of several available)
• load a dataset to explore
• start the sonification
• compare with a different datasets
• tune the sonification design while playing
• explore different regions of a dataset by moving through them
• look up documentation of the sonification design details
• start, repeat, stop sonification of different sections
• store a current context: a dataset, current selection, current sonification parameter
settings, and accompanying text/explanation.
71
For all these tasks, there should be visible affordances that communicate to the user how
the related tasks can be done. Ideally, a single task should be experienced as one clear
sequence of individual actions (or subtasks).
More complex tasks will be composed of a sequence of subtasks. As novice users ac-
quire more expertise, they will form conceptual chunks of these operations that belong
together. As long as these subtasks require meaningful decisions, it is preferable to keep
them separate; if there is only a single course of actions, one should consider making it
available as a single task.
Skills are what users need to have or acquire to use an interface efficiently. These can
include physical skills like manual dexterity, knowledge of operating systems, and other
skills. In the HCI literature, two conflicting viewpoints can be found here: a. users
already possess skills that should be re-used; one should add as little learning load as
possible, and enable as many users as possible to use a design quickly; b. interfaces
should allow for long-term improvement, and enable motivated users to learn to do
very complex things very elegantly eventually. Which of these apply will depend on the
context the sonification is designed for; in any case it is advisable to consider well what
one is expecting of users in terms of learning load.
Some necessary knowledge / skills include:
• locating files (e.g. program files, data files)
• reading documentation files
• selecting and executing program text
• using program shortcuts (e.g. start, stop)
• using input devices like mice, trackballs, tablets
Context should be represented clearly to reduce cognitive load: all changeable settings
should be e.g. visible on a graphical user interface, such as choice of data file, sonification
parameter settings, current subset choice, and others. Often, the display elements for
these can double as affordances that invite experimentation. In some cases, it can be
useful to display the current data values graphically, or to double auditory events visually
as they occur in realtime playback.
5.4.6 Two examples
EEG players
In the course of the SonEnvir projects, most of the interaction design was done in collab-
orative sessions. One exception that required more formal procedures was redesigning
72
the EEG Screener and Realtime Player (discussed in depth in chapter 9.1), as the in-
tended expert users were not available for direct discussion. These designs went through
a full design revision, with a task analysis that is identical for most of the interface.
The informal ’wish list’ included: Simple to use, start in very few steps, low effort, keep
results reproducible; include a small example data file that can be played directly.
The task analysis comprised these items:
Goals:
• quickly screen large EEG data files to find episodes to look at in detail
Tasks:
1. locate and load EEG files in edf format
2. select which EEG electrode channels will be audible
3. select data range to playback:
which time segment within file
speedup factor, filtering
4. play control: play, stop, pause, loop;
feedback current location
5. document current state so it can be reproduced by others
6. include online documentation in german
7. later: prepare example files for different absences
All of these were addressed with the GUI shown in figure 9.2: 1. File selection is done
with a ’Load EDF’ button and regular system file dialog; for faster access, the edf file is
converted to soundfiles in the background, and feedback is given when ready.
2. Initially, this was only planned with popup menus and the electrode names; however,
making a symbolic map of the electrode positions on the head and letting users drag-
and-drop electrode labels to the listening locations (see figure 9.3) proved was much
appreciated by the users.
3. Time range within the file was realised in multplie ways: graphical selection within a
soundfile view showing the entire file; providing the start time, duration, and end time as
adjustable number boxes; and showing the selected time segment in a magnified second
soundfile view. This largely follows sound editor designs, which EEG experts are typically
not familiar with.
73
4. Play controls are implemented as buttons; play state is shown by button color (white
font is active) and by a moving cursor in both soundfile views. The cursor’s location
is also given numerically. Looping and filtering is also controlled by buttons; in looped
mode, a click plays when the loop point is crossed. In filter mode, the volume controls
for the individual bands are enabled. When filtering is off, these controls are disabled for
clarity.
Adjustable playback parameters are all available as named sliders, with the exact numer-
ical values and units. (Recommended presets for different usage situations were planned,
but not realised eventually.)
5. The current state can be documented with buttons and shortcuts: The ’Take Notes’
button opens a text window, which contains the current filename; the current time and
playback settings can be pasted into it, so they can be reconstructed later.
6. The ’Help’ button opens a detailed help page in German.
The EEG Realtime Player re-uses this design with minimal extensions, as shown in figure
9.5; this reduces learning time for both designs, which are intended for the same group
of users. The main differences are the use of different time units (seconds instead of
minutes) and more parameter controls, as the synthesis concept is more elaborate.
Wahlgesange
This design is described in detail in section 6.2; its GUI is shown in figure 6.5. As this
design follows a Model-Based concept, the realtime interaction mode is central:
Goals:
• compare geographical distribution of voters for ca. 12 parties in four elections in
a region of Austria.
Tasks:
1. switch between a fixed range of elections and parties to explore
2. ’inject energy’ by interaction to excite the model at a visually chosen location
3. compare parties and elections quickly after another
4. adjust free sonification parameters like timescale
1. Choosing which election and party results to explore is done with two groups of
buttons which show all available choices. The currently active button has a white font.
2. As common in Model-Based Sonification, this design requires much more interaction:
to obtain sound, users must click on the geographical map. This causes a circular wave
74
to emerge from the given location, which spreads over the entire extent of the map.
Each data point is indexed by spatial location on the map; when the expanding wave
hits it, a sound is played based on its value for the current data channel (voter turnout
for one of the parties).
3. For faster comparisons, switching to a new election or party plays the sonification
for the new choice with the last spatial location; switching between parties can also be
done by typing reasonably mnemonic characters as shortcuts.
4. The free sonification parameters like expansion speed of the wave, number of data
points to play (to reduce spatial range), etc., can be adjusted with sliders which also
show the precise numerical values.
Full explanations are given in a long documentation section before the program code,
which was deemed sufficient at the time.
An interesting possible extension here would be the use of a graphical tablet to obtain a
pressure value when clicking on the map; this would be equivalent to a velocity-sensitive
MIDI keyboard. However, in the interest of easier transfer to other users, we preferred
to keep the design independent of specific non-standard input devices.
5.5 Spatialisation Model
The most immediate quality of a sound event is its localization: What direction did
that sound come from? Is it near or far away? We often spontaneously turn toward an
unexpected sound, even if we were not paying attention earlier.
Spatial direction is also one of the stronger cues for stream separation or fusion (Breg-
man (1990), Snyder (2000), Moore (2004)); when sound events come from different
directions, they are unlikely to be attributed to the same physical source.
Music technology has developed a variety of solutions for spatialising synthesized sound,
and both SuperCollider3 and the SonEnvir software environment support multiple ap-
proaches for different source characteristics, and different reproduction setups.
Sources can either be continuous or short-term single events; while continuous sources
may have fixed or moving spatial positions, streams of individual events may have dif-
ferent spatial positions for each event. In effect, giving each individual sound event its
own static position in space is a granular approach to spatialisation.
(1D) Stereo rendering over loudspeakers works well for few parallel streams, where
spatial location mainly serves to identify and disambiguate streams. The most common
spatialisation method employed is amplitude panning, which relies on the illusion of
phantom sound sources created between a pair of loudspeakers, with the perceived
position depending on the ratio of signal levels between the two speakers. Panorama
75
potentiometers (pan pots) on mixing desks employ this method. Sound localisation on
such setups is of course compromised at listening positions outside the sweet spot.
(2D) Few channel rendering is typically done with horizontal rings of 4 - 8 speakers.
This has become more easy in recent years with 5.1 (by now, up to 7.1) home audio
systems, which can be used with external input from multichannel audio interfaces. Such
systems can spatialize sources on the horizontal plane quite well, and can be used as up
to 7 static physical point sources as well.
(3D) Multichannel systems, such as the CUBE at IEM Graz with 24 speakers, or the An-
imax Multimedia Theater in Bonn with 40 speakers, are usually designed for symmetry,
spreading a number of loudspeakers reasonably evenly on the surface of a sphere. This
allows for good localisation of sources on the sphere, with common spatialisation ap-
proaches including vector based panning, Ambisonics, and Wave Field Synthesis. Source
distances outside the sphere can be simulated well by reducing the level of the direct
sound relative to the reverb signal, and lowpass filtering it.
(1D/3D) Headphones are a special case: they can be used to listen to stereo mixes
for loudspeakers (and most listeners today are well trained at localising sounds with
this kind of spatial information); and they can be used for binaural rendering, i.e. sound
environments that feature the cues which allow for sound localisation in ’normal’ auditory
perception. For music, this may be done with dummy head recordings; for auditory
display, this is done with simulations of these cues applied to all the sound sources
individually to create their spatial characteristics.
5.5.1 Speaker-based sound rendering
Physical sources
For multiple speaker setups, a simple and very effective strategy is to use individual
speakers as real physical sources. The main advantage is that physics really help in this
case; when locations only serve to identify streams, as with few fixed sources, fixed single
speakers work very well.
Amplitude Panning
The most thorough overview on amplitude panning methods is provided in Pulkki (2001).
Note that all of the following methods work for both moving and static sources. Code
examples for all these are given in Appendix B.1.
1D: In the simplest case of panning between two speakers, equal power stereo panning
is the standard method.
2D: The most common case here is panning to a horizontal, symmetrical ring of n
76
speakers by controlling azimuth; in many implementations, the width over how many
speakers (at most) the energy is distributed can be adjusted.
In case the angles along the ring are not symmetrical, adjustments can be made by
remapping, e.g. with a simple breakpoint lookup strategy. However, using the best
geometrical symmetry attainable is always superior to compensation for asymmetries.
Often it is necessary to mix multiple single-channel sources down to stereo: The most
common technique for this is to create an array of pan positions (e.g. n steps from 80%
left to 80% right), to pan every single channel to its own stereo position, and summing
these stereo signals.
Mixing multiple channel sources into a ring of speakers can be done the same way; the
array of positions then corresponds to (potentially compensated) equal angular distances
around the ring. Both larger numbers of channels can be panned into rings of fewer
speakers, and vice versa.
3D: For simple geometrical arrangements of speakers, straightforward extensions of am-
plitude panning will suffice. E.g. for the CUBE setup at IEM consists of rings of 12,
8, and 4 speakers (bottom, middle, top); the setup at Animax Multimedia Theater in
Bonn adds a bottom ring of 16 speakers. For these systems, having 2 panning axes, one
between the rings for elevation, and one for azimuth in each ring, works well.
Again, the speaker setup should be as symmetrical as possible; compensation can be
trickier here. Generally speaking, even while compensations for less symmetrical se-
tups are mathematically plausible, spatial images will be worse outside the sweet spot.
Maximum attainable physical symmetry cannot be fully substituted by more DSP math.
Compensating overall vertical ring angles and individual horizontal speaker angles within
each ring is straightforwrd with the remapping method described above. For placement
deviations that are both horizontal and vertical, using fuller implementations of Vector
Based Amplitude Panning (VBAP, see e.g. Pulkki (2001)) is recommended24; however,
this was not required within the context of the SonEnvir project, or this dissertation.
For placement deviations that are both horizontal and vertical, it is preferable to have .
However, this was not needed within the context of the SonEnvir project.
Ambisonics
Ambisonics is a multichannel reproduction system developed independently by several
researchers in the 1970s Cooper and Shiga (1972); Gerzon (1977a,b), based on the idea
that spherical harmonics can be used to encode and decode directions from which sound
energy comes; a good basic introduction to Ambisoncis math is online here25.
24VBAP has been implemented for SC3 in 2007 by Scott Wilson and colleagues, see
http://scottwilson.ca/site/Software.html25 http://www.york.ac.uk/inst/mustech/3d audio/ambis2.htm
77
The simplest form of Ambisonics, first order, can be considered an extension of the
classic Blumlein MS stereo microphone technique: in MS, one uses an omnidirectional
microphone as a center channel (M for Mid), and a figure-of-8 mike to create a Side
signal (S). By adding or subtracting the side signal from the center, one obtains Left
and Right signals; e.g. L = M-S, R = M+S. By using figure-of-8 mikes for Left/Right,
Front/Back, and Top/Bottom signals, one obtains a first order Ambisonic microphone,
such as those made by the Soundfield company26. The channels are conventionally
named W, X, Y, Z. Such an encoded recording can be decoded simply for speaker
positions on a sphere.
In the 1990s, the mathematics for 2nd and 3rd order Ambisonics were developed to
achieve increasingly higher spatial resolution; these are formulated in Malham (1999),
and also available online here27.
Extensions to even higher orders were realised recently by IEM researchers (Musil et al.
(2005); Noisternig et al. (2003)), with multiple DSP optimizations implemented as a
PureData library. Using MATLab tools written by Thomas Musil, coefficients for encod-
ing/decoding matrices for different speaker combinations and tradeoff choices can be
calculated offline, and can then simply be read in from text files in the realtime platform
of choice. The most complex use of this library so far has been the VARESE system
(Zouhar et al. (2005)). This is a dynamic recreation of the acoustics of the Philips pavil-
ion at Brussels World Fair, for which Edgard Varese’s Poeme Electronique (and Iannis
Xenakis’ concrete PH) was composed.
While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team
decided to write a consistent new implementation of Ambisonics in SC3, based on a
subset of the existing PureData libraries. This package was realised up to third order
Ambisonics by Christopher Frauenberger for the AmbIEM package, available here28.
It supports the main speaker setup of interest, the IEM Cube, as well as a setup for
headphone rendering as described below.
5.5.2 Headphones
For practical reasons, such as when working in one room with colleagues, scientists exper-
imenting with sonifications are required to use headphones. Many standard techniques
work well for lateralising sounds, which can be entirely sufficient for making streams
segregate or fuse as desired. In order to achieve perceptually credible simulations of
auditory cues for full localisation, for example, making sounds appear to come from the
front, or above, more complex approaches are needed; the most common approach is to
model the cues by means of which the human ear determines sound location.
26http://www.soundfield.com27 http://www.york.ac.uk/inst/mustech/3d audio/secondor.html28 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/
78
Sound localisation in human hearing depends on the differences between the sound heard
in the left and right ears; in principle, three kinds of cues are involved:
Interaural Level Difference (ILD), which is the level difference of a sound source between
the ears, dependent on the source’s direction. This can roughly be simulated with
amplitude panning, which is however limited to left/right distinction in headphones
(usually called lateralisation). Being so similar to amplitude panning, it is fully compatible
with stereo speaker setups.
Interaural Time Difference (ITD), the difference in arrival time of a sound between the
ears. This is on the order of a maximum of 0.6 msec: at a speed of sound of 340 m/sec,
this is the time equivalent to a typical ear distance of 21 cm. This can be simulated well
for headphones; but because delay panning does not transfer reliably for speakers (one
hardly ever sits exactly on the equidistance symmetry axis of one’s loudspeaker pair), it
is hardly used. Like amplitude panning, delay panning only creates lateralisation cues.
Head Related Transfer Functions - HRTF / HRIR
Head Related Transfer Functions (HRTFs) or equivalently, Head Related Impulse Re-
sponses (HRIRs) capture the fact that both ITD and ILD are frequency-dependent: For
every direction of sound incidence, the sound arriving at each ear is colored by reflections
on the human pinna, head, and upper torso; such pairs of filters are quite characteristic
for the particular direction they corrsepond to. Roughly speaking, localising a heard
sound depends on extracting the effect of the pair of filters that colored it, and inferring
the corresponding direction from the characteristics of this pair of filters; obviously, this
works more reliably on known sources.
HRTFs/HRIRs can be measured by recording known sounds from a set of directions with
miniature microphones at the ear, and extracting the effect of the filters. Obviously,
HRTF filters are different for every person (as are people’s ears and heads), and every
person is completely accustomed to decoding sound directions from her own HRTFs.
Thus, there is no miracle HRTF curve that works perfectly for everyone; however, because
some features in HRTFs are generalizable (such as the directional bands described in
Blauert (1997)), the idea of using HRTFs to simulate sounds coming from different
directions has become quite popular. The KEMAR set of HRIRs (see Gardner and
Martin (1994); the data are available online here29) is based on recordings made with
a dummy head, and is considered to work reasonably well for different listeners. The
IRCAM has also published individual HRIRs of ca. 50 people for the LISTEN project
(Warusfel (2003), online here30), so one can try to find matches to suit a particular
person’s preferences well.
29http://sound.media.mit.edu/resources/KEMAR/full.tar.Z30http://recherche.ircam.fr/equipes/salles/listen/
79
Implementing fixed HRIRs for fixed source locations is straightforward, as one only needs
to convolve the sound source with one pair of HRIRs. However, this is not sufficient:
static angles tend to sound like colouration (as caused by inferior audio equipment); in
everyday life, we usually move our heads slightly, creating small changes in ITD, ILD
and HRTF which quickly disambiguate any localisation uncertainties. Thus, creating
convincing moving sources with HRTF spatialisation is required, which is not trivial: as
a source’s position changes, its impulse responses must be updated quickly and smoothly.
There is no generally accepted scheme for efficient high-quality HRIR interpolation, and
convolving every source separately is computationally expensive.
Ambisonics and Virtual Binaural Rendering
For complex changing scenes, the IEM has developed a very efficient approach for bin-
aural rendering (Musil et al. (2005); Noisternig et al. (2003)): In effect, taking a virtual,
symmetrical speaker setup, and spatializing to that setup with Ambisonics; then render-
ing these virtual speakers as point sources with their appropriate HRIRs, thus arriving at
a binaural rendering. This provides the benefit that the Ambisonic field can be rotated
as a whole, which is really useful when head movements of the listener are tracked, and
the binaural rendering is designed to compensate for them. Also, the known problems
with Ambisonics when listeners move outside the sweet zone disappear; when one carries
a setup of virtual speakers around one’s head, one is always right in the center of the
sweet zone. This approach has been ported to SC3 by C. Frauenberger; its main use is in
the VirtualRoom class, which simulates moving sources within a rectangular box-shaped
room. This class is especially useful for preparing spatialisation with multi-speaker setups
by headphone simulation.
Among other things, the submissions for the ICAD 2006 concert31, described also in
section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web
documentation32.
One can of course also spatialize sounds on the virtual speakers by any of the simpler
panning strategies given above as well; this trades off easy rotation of the entire setup
for better point source localisation.
To support simple headtracking, C. Frauenberger also created the ARHeadTracker ap-
plication, also available as a SuperCollider3 Quark.
31 http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php32 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html
80
5.5.3 Handling speaker imperfections
All standard spatialisation techniques work best when speaker setups are as symmetrical
and well-controlled as possible. While it may not always be feasible to adjust mechan-
ical positions of speakers freely for very precise geometry, a number of factors can be
measured and compensated for, and this is supported by several utility classes written in
SuperCollider, which are part of the SonEnvir framework.
Latency
The Latency class plays a test signal for a given number of audio channels, and waits for
the signals to arrive back at an audio input. The resulting list of measured per-channel
latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class
described below.
Spectralyzer
While inter-speaker latency differences are well-known and very often addressed, we have
found another common problem to be more distracting for multichannel sonification:
Each individual channel of the reproduction chain, from D/A converter to amplifier,
cable, loudspeaker, and speaker mounting location in the room, can sound quite different.
When changes in sound timbre can encode meaning, this is potentially really confusing!
To address this, the Spectralyzer class allows for simple analysis of a test signal as
played into a room, with optional smoothing over several measurements, and then tuning
compensating equalizers by hand for reasonable similarity across all speaker channels.
SpeakerAdjust
Once one has achieved usable EQ curves for every speaker channel, one can begin
to compensate for volume differences between channels (with big timbral differences
between channels, measuring volume or adjusting it by listening is rather pointless).
The SpeakerAdjust class expects specifications for relative amplitude, (optionally) delay
time, and (optionally) as many parametric EQ bands as needed for each channel. Thus, a
speaker adjustment can be created that runs at the end of the signal chain and linearizes
the given speaker setup as much as possible; of course, adding limiters for speaker and
listener protection can be built into such a master effects unit as well.
Chapter 6
Examples from Sociology
Though sociology has early on been considered a promising field of application (Kramer
(1994a)), sonification to date is not widely known within the social sciences. Thus, one
purpose of collaborating with sociologists was to raise the awareness of the potential
benefits sonification can bring to social research.
Three sonification designs and their research context are described and analysed as case
studies here: the FRR Log Player, Wahlgesange (election/whale songs), and the Social
Data Explorer.
Social (or sociological) data generally show characteristics that make them promising
for sonification: They are multi-dimensional, and they usually depict complex relations
and interdependencies (de Campo and Egger de Campo (1999)). We consider the
application of sonification to data depicting historical (or geographical) sequences as
the most promising area within the social sciences. The fact that sound is inherently
time-bound is an advantage here, because sequential information can be conveyed very
directly by mapping the sequences on the implicit time axis of the sonification.
In fact, social researchers are very often interested in events or actions in their temporal
context. The importance of developmental questions is even growing due to the glob-
alized notion of social change. Sequence analysis, the field methodologically concerned
with these kinds of questions, assembles methodologies that are by now rather estab-
lished, like event history analysis, and appropriate techniques to model causal relations
over time (Abbott (1990, 1995); Blossfeld et al. (1986); Blossfeld and Rohwer (1995)).
Like most methods of quantitative (multivariate) data analysis, sequence analysis meth-
ods need to be based on an exploratory phase. The quality of the analysis process as
a whole depends critically on the outcome of this exploratory phase. As the amount of
social data is continuously increasing, effective exploratory methods are needed to screen
these data. On higher aggregation levels (such as global, or UN member states level),
social data have both a time (e. g. year) and a space dimension (e. g. nation) and thus
can be understood both as time and geographical sequences. The use of sonification to
explore data of social sequences was the main focus of the sociological part within the
81
82
SonEnvir project.
6.1 FRR Log Player
An earlier stage of this work was described in detail in a poster for ICAD 2005 (Daye
et al. (2005)), it is briefly documented in the SonEnvir sonification data collection here1,
and the full code example is available from the SonEnvir code repository here2.
Researchers in social fields, be they sociologists, psychologists or design researchers,
sometimes face the problem of studying actions in an area which is not observable for
ethical reasons. This was especially true in the context of the RTD project Friendly
Rest Room FRR3 (see Panek et al. (2005), which was partly funded by the European
Commission. The project’s aim was to develop an easy to use toilet for older persons,
and persons with (physical) disabilities. In order to meet that objective, an interdisci-
plinary consortium was set up, bringing together specialists of various backgrounds like
industrial design, technical engineering, software engineering, user representation, and
social scientists.
In the final stage of the FRR project, a prototype of this toilet was installed at a day
care center for patients with multiple sclerosis (MS) in Vienna, in order to validate the
design concept in daily life use. The sonification design described here was intended
for sonifying the log data gathered during this validation phase, because difficulties had
arisen with these analyses. Being unable, for ethical reasons, to gather observational
data, these log data are the only way to understand the actions taken by the user. The
FRR researchers are interested in these data because they provide information on the
user’s interaction with the given technical equipment, and thus on the usability and
everyday usefulness of the toilet system.
6.1.1 Technical background
The guests of this day care center are patients with varying degrees of Multiple Sclerosis
(MS); some need support from nurses when using the toilet while others can use it
independently. Due to security considerations as well as for pragmatical reasons, not
all components developed within the FRR-project were selected for this field test (see
Panek et al. (2005)). The main features of the installed conceptual prototype are:
• Actuators to change the height of the toilet seat, ranging from 40 to 70 cm.
1http://sonenvir.at/data/logdata1/2https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/FRR Logs/3http://www.is.tuwien.ac.at/fortec/reha.e/projects/frr/frr.html
83
• Actuators to change the seat’s tilt, ranging from 0 to 7 degrees forward/down.
• Six buttons on a hand-held remote control to use these actuators: toilet up, toilet
down, tilt up, tilt down, as well as flush and alarm triggers.
• Two horizontal support bars next to the toilet that can be folded up manually.
• A door handle of a new type which is easier to use for people with physical dis-
abilities was mounted on the outside of the entrance door.
Figure 6.1: The toilet prototype system used for the FRR field test.
Left to right: the door with specially designed handle, the toilet prototype as installed at the
day center, and an illustration of the tilt and height changing functionality.
As direct observation of the users’ interaction with the toilet system was out of the
question, sensors were installed in the toilet area that continuously logged the current
status of the toilet area. These sensors recorded:
• the height of the toilet seat (in cm, one variable),
• the tilt of the toilet seat (in degree, one variable),
• the status of the remote control buttons (pressed/not pressed, six variables),
• the status of the entrance door (open/not open, one variable); and,
• the presence of RFID tagged smart cards (RFID mid range technology) near the
toilet seat to identify any persons present. The guests and the employees of the
day care center were provided with such smart cards, and an RFID module in the
toilet area registered the identities of up to four cards simultaneously.
84
The log data matrix recorded from these sensor data is quite unusual for sociological
data, due to its time resolution of about 0.1 sec maximum (which is high for social data),
and the sequential properties of the information captured by the data. One log entry
consists of about 25 variables, of which 11 are relevant for our analysis: A timestamp
for when an entry was logged, and the ten variables described above. Of these eleven
variables, seven are binary. Each log file records the events of one day. In case there is
no event for a longer time (e.g. during the night), a ’watchdog’ in the logging software
creates a blank event every 18 minutes to show the system is still on.
In order to use these log files to understanding what the users did, we needed to recon-
struct sequences of actions of a user based on the events registered by the sensors. The
technical data had to be interpreted in terms of users’ interaction with the equipment;
otherwise the toilet prototype could not be evaluated. The technical data themselves are
not sufficient for a validation, as we need to validate whether or not the proposed techni-
cal solution results in an improvement of the users’ quality of life, which is the eventual
social phenomenon of interest here. Due to the sequential nature of the information
contained in the log files, established routines from multivariate statistics could not be
applied, as they usually do not consider the fundamental difference of data composed of
events in temporal sequence.
6.1.2 Analysis steps
Graphical Screening
On a graphical display (which is what the FRR researchers used), it is not at all easy to
follow the sequential order of the events, above all because such a sequence consists of
several variables. Yet, as the first step of analysis, we relied on graphs with the purpose
on identifying episodes. An episode in our context is defined by a single user’s visit to
the toilet. A prototypical minimum episode consists the following logged events:
door open
door close
tilt down (multiple events)
tilt up (multiple events)
button flush
door open
Note that, in this specific episode, the height and the tilt of the toilet bowl are adjusted
via remote control by the user. Still, this episode is a very simple chain of events. Most
of the logged events for tilt down and tilt up result only from the weight of the person
sitting on the toilet seat.
85
Figure 6.2: Graphical display of one usage episode (Excel).
The first step in analysing the data material was to use graphical displays to look for
sections that could be identified as one user’s visit to the toilet prototype, and to chunk
the data into such episodes, which formed our new entities of analysis. The episode
displayed graphically in figure 6.2 is an example for a very simple, single episode. It is
obvious that the graph is not easy to interpret due to its complexity (possibly additionally
complicated on black and white printouts). The sequential character of the events can
be read visually, if not very comfortably: One can see that the starting event is that the
door opens, and then closes; followed by the event that the toilet bowl tilts forward (the
tilting degree grows). We can assume that the person is now sitting on the toilet. Then
the height is adjusted, and the tilt as well. After the tilt returns to a lower value (we can
assume the weight has been removed, so we can infer that the person has stood up),
the flush button is pressed, and the door opens and closes again. The other variables
remain unchanged.
Investigating patterns of use
The FRR researchers were not interested primarily in the way a single person behaves in
the Friendly Rest Room, but rather whether different groups of people would be found
who, for instance due to similar physical limitations, show similarities in interacting with
the given technical equipment. Such typical action patterns of various user groups,
are interesting to cross-reference with data from other sources: Characteristics like sex,
weight, age of a person, her/his physical and cognitive limitations, additional informa-
86
tion like whether s/he is using a wheelchair, or crutches, are important to deepen the
interpretation and allow for causal inferences. For this purpose, an identification module
was mounted behind the cover of the water container of the FRR prototype, which was
intended to recognize users wearing RFID tags.
To give just one example how user identification can help: usually, people who use
wheelchairs need more time than non-wheelchair users to open a door, enter the room
and close the door again. This is partly because of the need to manoeuvre around the
door when sitting in a wheelchair, mainly because standard door handles are hard to use,
especially when, as is the case with MS, people have restricted mobility in their arms.
Thus, if an analysis shows that the time needed by wheelchair users to enter the room is
on average shorter than with a standard door, one can conclude that the FRR-designed
door handle is a usability improvement for wheelchair users.
Similarly, one can identify further patterns of use and possibly relate them to user char-
acteristics as mentioned above. However, these patterns are not only important for the
evaluation of the equipment, but also for figuring out user IDs that were accidentally
not recorded.
Comparing anonymous episodes with patterns
Unfortunately, RFID tag recognition only worked within a range of about 50cm around
the toilet, and so not every person using the toilet was identified. Thus there are
anonymous episodes which cannot be related to personal data from other sources. From
a heuristic perspective, these anonymous data are nearly useless. As this applies for 53
% of the 316 episodes, this was a serious concern for the validity of the results.
Thus it was decided to study the episodes of identified users in order to find patterns that
may allow for eventual identification of anonymous episodes. For some of the anonymous
episodes, direct identification was possible. For others, most likely for users who did not
use the prototype often, we could rely on conjecture based on what we could derive from
the episodes of identified users. By comparing with the patterns identified in step 2, we
made use of the ’anonymous’ episodes we analysed them by approaching the problem
with empirically found categories.
6.1.3 Sonification design
The repertoire of sounds for the FRR Log player sonification design is:
• Door state (open or closed) is represented by coloured noise similar to diffuse
ambient outside noise; this noise plays when open and fades out when closed.
• Button presses on the remote control for height and tilt change (up or down) play
short glissando events, up or down, identified for height or tilt by different basic
pitch and timbre.
87
• Alarm button presses are rendered by a doorbell-like sound - this button is mostly
used to inform nurses that assistance is needed; use for emergency is rare.
• Flush button presses are represented with a decaying noise burst.
• Continuous state of height and tilt are both represented as soft background drones;
when their values change, they move to the foreground, and when their values are
static, they recede into the background again.
This design mixes discrete-event sonification (marker sounds for the button presses) and
continuous sonification (tilt and height).
Figure 6.3: FRR Log Player GUI and sounds mixer.
6.1.4 Interface design
The graphical user interface shown in 6.3 provides visual feedback, and possibilities for
interaction:
A button allows for selection of different episode files to sonify; it shows the filename
when a file has been selected. If a user ID tag has been recorded in the log, that is shown.
88
For playing the sequence, Start/Stop buttons and a speed control are provided. Speed
is the most useful control, as different patterns may appear on different timescales. A
mixer for the levels of all the sound components is provided, and for tuning the details
of each sound, can be called up from a button (”px mixer”). This ProxyMixer window
allows for storing all tuning details as code scripts, so that useful settings can be fully
documented, communicated and reproduced.
The binary state variables are all visually displayed as buttons, and allow for triggering
from the GUI: A button for the door, and buttons for remote buttons turn red when
activated in the log. When they are pressed on the GUI, they play their corresponding
sound, so users can learn the repertoire of symbolic sounds very quickly.
The continuous variables are all displayed: time within the log as hours:minutes:seconds;
height and tilt of the seat as labeled numbers and as a line with variable height and tilt.
Finally, the last 5 and the next 5 events in the log are shown as text; this was very useful
for debugging, and it provided an extra layer of available information to the users of the
sonification design.
6.1.5 Evaluation for the research context
For the research context these data came from, this sonification design was successful:
It represented time sequence data with several parallel streams of parameters, and events
to be detected efficiently, and it was straightforward to learn and use.
The researchers reported being able to use rather high speedups, and being able to
achieve good recognition of different user categories. In fact, the time scaling was
essential for understanding the meaning behind the sequential order and timing of events.
Especially the times between events, the breaks, were instructive as they possibly point
to problems of the user with the equipment to be evaluated.
In short, the sonification design solved the task at hand more efficiently than the other
tools previously used by the researchers.
6.1.6 Evaluation in SDSM terms
Within the subset of 30 episodes used for design development (out of 316), the longest
is 1660 lines, and covers 32 minutes, while the shortest ones are ca. 180 lines, and 5
minutes. The SDSM Map shows data anchors for this variety of files, and marks for three
different speedups of these 2 example files, original speed (x1) and speedups of x10 and
x100. At lower speeds, one can leave the continuous sounds (tilt and height) on, while
at high speedups, the rendering is clearer without them. The 8 (or 6 at higher speeds)
data properties used are actually rendered technically as parallel streams; whether they
are perceived as such is a question of the episode under study, the listening attitude, and
89
the playback speed. For example, one could listen to each button sound indivually, but
usually the timed sequence of button presses would be heard as one stream of related,
but disparate sound events.
Figure 6.4: SDS Map for the FRR Log Player.
While this design was created before the SDSM concept existed, it conforms to all basic
SDSM recommendations, as well as secondary guidelines.
Time is represented as time; time scaling as the central SDSM navigation strategy is
available as a user interaction parameter. Thus, users can experiment freely with different
time scales to bring different event patterns into focus; in SDSM terms, the expected
’gestalt number’ (here, the data time rescaling) can be adjusted to fit into the echoic
memory time frame.
This is supported here by adaptive time-scaling of the sound events: as time is sped up,
sound durations shorten by the square root of the speedup factor (see below). Recorded
(binary) events in time are represented by simple, easily recognized marker sounds; they
either sound similar to the original events (the flush button, alarm bell), or they employ
straightforward metaphors consistently (glissando up is up both for tilt or height), thus
minimizing learning load.
Continuous state is represented by a background drone, which is turned louder when
changes happen; this jumping to the foreground amplifies the natural listening behavior
of ’tuning out’ constant background sounds, and being alerted when the soundscape
90
changes. For higher speedups, researchers reported that they often turned these com-
ponents off completely, so the option to let users do that quickly was useful.
The time scaling of marker sounds is handled in a way that can be recommended for
re-use: Constant sound durations create too much overlap at higher speeds, while pro-
portional scaling to the speedup factor deforms the symbolic marker sounds too much
for easy recognition. So, the strategy invented for this sonification was to scale the du-
rations of the marker sounds by 1/(timeScalescaleExp); scaleExp values being between
0.0 (no duration scaling) and 1.0 (fully match sequence time scaling). For the time
scaling range desired here, 1 to 100, scaling sound durations by the power of 0.5, i.e.
the square root, has turned out to work well: the sounds are still easily recognized as
transformations of their original type, and one can still follow dense sequences well.
6.2 ’Wahlgesange’ - ’Election Songs’
This work is also described in de Campo et al. (2006a), and in the SonEnvir data
collection here4. The SC3 code for running this design can be downloaded from the
SonEnvir svn repository here5. It was designed by Christian Daye and the author, and
it is based on an example for the Model-Based Sonification concept called ’Sonogram’
described by Hermann (2002); Hermann and Ritter (1999) (not to be confused with
standard spectrograms or medical ultrasound-based imaging). The code example by Till
Bovermann is available here6.
With the sonification design presented here, we can explore geographical sequences. As
a straightforward and familiar example for social data with geographical distributions,
we use election results; in particular, from the Austrian province of Styria, for provincial
parliament elections in 2000 and 2005, and the national parliament election in 20067.
Our interest focused on displaying social data both in their geographical distribution, and
at a higher spatial resolution than usual. Whereas most common displays of social data
focus on the level of districts (here, 17), we wanted to design a sonification that displays
spatial distances and similarities in the election results among neighboring communities.
The mind model is that of a journey through Styria. A journey can be defined as the
transformation of a spatial distribution into a time distribution. A traveler who starts at
4http://sonenvir.at/data/wahlgesaenge/5 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/ElectionsDistMap/6 http://www.techfak.uni-bielefeld.de/ tboverma/sc/tgz/MBS Sonogram.tgz7Styria is one of nine federal states in Austria. It consists of 542 communities grouped in 17 districts,
and about 1 190 000 people live here. In autumn 2005, more than 700 000 Styrian voters elected their
political representatives. The result of this election was politically remarkable: the ruling conservative
party OVP (Osterreichische Volkspartei: Austrian People’s Party) has been defeated for the first time
since 1945 by the left social-democratic party SPO (Sozialdemokratische Partei Osterreichs: Social-
Democratic Party of Austria).
91
community A passes first the neighbouring communities, and the longer she is on the
way the more space is between her and community A. Hence, in this sonification, the
spatial distances between communities are mapped onto the time axis.
6.2.1 Interface and sonification design
The communities are displayed in a two-dimensional window on a computer screen (see
figure 6.5). For each community, the coordinates of the community’s administrative
offices were determined and used as the geographical reference point of the respective
community. The distances as well as the angles within our data thus correspond with
the real distances and angles between the communities’ administrative offices.
Figure 6.5: GUI Window for the Wahlgesange Design.
The left hand panel allows switching between different election results (and district/community
levels of aggregation), and between the parties to listen to. It also allows tuning some param-
eters of the sonification, and it displays a short description of the closest ten communities.
The maps window shows a map of Styria with the community borders; this map is the clicking
interface.
This sonification design depends strongly on user interaction: like most Model-Based
Sonifications, it needs to be played, like to a musical instrument; without user actions,
92
there is no sound. Clicking the mouse anywhere in the window initiates a circular wave
that spreads in two-dimensional space. The propagation of this wave is shown on the
window by a red circle. When the wave hits a data point, this point begins to sound in
a way that reflects its data properties. In our case, these data properties are the election
results within each community. Thus, the user first hears the data point nearest to
the clicking point, from the proper spatial direction, with pitch being controlled by the
turnout percentage of the currently selected party in that community (high pitch being
high percentage); then the result for the second-nearest community, and so on. The
researcher can select different parties to listen to their results from the election under
study.
Further, the researcher can choose a direction in which to look and listen. In figure 6.5,
this direction is North, indicated by the soft radial line within the circular wave. The
line begins at the point where the researcher has initiated the wave, to provide visual
feedback while listening, and keeping a trace of which initial location the current sound
was generated for. Data points along this line are heard from the front, others are
panned to their appropriate directions. While this sonification was designed for a ring of
twelve speakers surrounding the listener, it can be used with standard stereo equipment
as well: For stereo headphones, one changes to a ring of four, and listens to the front
two channels. Then, data points along the main axis are heard from the center, those
on the left (or right) are panned accordingly, 90 degrees being all the way left or right8.
The points at more than 90 degrees off axis progressively fade out, and those above 135
degrees off axis are silent.
The GUI provides the following sonification parameter controls:
A distance exponent defines how much the loudness for a single data point decreases
with increasing distance. For 2D spaces, 1/distance is physically correct, but stronger
or weaker weightings are interesting to experiment with.
The velocity of the expanding wave in km/second. The default of 50 km/sec scales the
entire area (when played from the centre) into a synakusis-like time scale of 3 seconds.
Slower or higher speeds can be experimented with to zoom further in or out.
The maximum number of communities (Orte in German) that will be played. Selecting
only the nearest 50 or so data points allows for exploration of smaller areas in more
detail.
The decay time for each individual sound grain. At higher speeds, shorter decays create
less overall overlap, and thus provide more clarity; for smaller sets and slower speeds,
longer decay times allow for more detailed pitch perception and thus higher perceptual
8Note that for stereo speakers at +-30 degrees, the angles within +-90 degrees are scaled together
to +-30 degrees - which we find preferable to keeping the angles intact and only hearing a 60 degree
’slice’ of all the data points, which could be done by leaving the setting at 12 channels, and only using
the first 2.
93
resolution.
The direction in which the wave is looking; in the sound, this determines which direction
will be heard from the front. The direction can be rotated through North, West, South
and East.
For more detail information, the ten data point locations nearest to the clicked point are
shown on a list view.
6.2.2 Evaluation
This sonification design is a good tool for outlier analysis. It works rather fast at a low
level of aggregation (communities), and outliers are easily identified by tones that are
higher than their surroundings. Typically, these are local outliers: in an area that has a
local average value of say 30%, you can hear a 40% result ’sticking out’; when analysing
the entire dataset statistically, this may not show up as an outlier.
A second strong feature is the ability to get a quick impression of distributions of a data
dimension with their spatial order intact, so achieving the tricky task of developing an
intuitive grasp of the details of one’s data becomes more likely.
This sonification design is not restricted to election data: Other social indicators that are
assessed at the community level (unemployment rates, labor force participation rate of
women, and others) can be included. To represent them in conjunction with e.g. election
results promotes the investigation of local dependencies that might be hidden by higher
aggregation levels or by the mathematical operations of correlation coefficients.
Finally, this sonification design is of course not restricted to the geographical borders of
Styria. It can be used as an exploratory tool enabling researchers to quickly scan social
data in their geographical distribution, at different aggregation levels. Given an inter-
esting question to address at such higher levels, an adaptation to different geographical
scales, i.e. European and global data distributions is straightforward to do, e.g. with
nations as the aggregation entity.
When considered from an SDSM perspective, this sonification design respects a number
of SDSM recommendations: It shows the important role of interaction design, while the
sound aspect of the sonification design itself remains rather basic. It also shows the
central importance of time scaling/zooming between overview and details; in fact this
design was the source for recommending this particular time-scaling strategy within the
SDSM concept. The design also demonstrates metaphorical simplicity recommended by
SDSM. An SDSM graph shows that the sonification can render one data property of
the entire set within echoic memory time frame, and zoom into more detail by selecting
subsets, or by slowing down the propagation speed.
94
Figure 6.6: SDS-Map for Wahlgesange.
6.3 Social Data Explorer
6.3.1 Background
This sonification design is a study for mapping geographically distributed multidimen-
sional social data to a multiparametric sonification, i.e. a classical parameter mapping
sonification. It offers a number of interaction possibilities, so that sociologists (the
intended user group) can experiment with changing the mappings freely. This serves
both for learning sonification concepts by experimentation and for finding interesting
mappings, for instance, mappings that confirm known correlations between parameters.
The example data file contains the distribution of the working population of all 542
communities in Styria by sectors of economic activities, given in table 6.1.
This data file is quite typical for geographically distributed social data.
6.3.2 Interaction design
A number of interactions can be accessed from the user interface shown in figure 6.7:
’Order’ allows sorting by a chosen parameter (alphabetically or numerically); ’up’ is
ascending, ’dn’ is descending. The number-box is for choosing one data item to inspect
95
Table 6.1: Sectors of economic activities
Agrarian, Wood-, and Fishing Industries
Mining
Production of commodities
Energy and Water Industries
Construction
Trade
Hotel and Restaurant Trade
Traffic and Communication
Credit and Insurance
Realty, Company Services
Public Administration, Social Security
Pedagogy
Health, Veterinary, and Social Services
Other Services
Private Households
Exterritorial Organisations
First-time seeking work
by index in the sorted data, so e.g. 0 is the first data point of the current sorted order.
Every parameter of the sonification can be mapped by using the elements of a ’mapping
line’: For every synthesis or playback parameter, users can select a data dimension. The
data range in minimum and maximum values is displayed. The data can have a ’warp’
property, i.e. whether the data should be considered linear, exponential, or have another
characteristic mapping function.
The arrow-button below ’pushes’ the range of the current data dimension to the editable
number boxes, as this is the data scaling range (’mimax’) to use for parameter mapping.
This range can be adjusted, in case this becomes necessary to experiment with a specific
hypothesis. The second ’mimax’ range is the Synth parameter range, which is adjustable,
as is the warp factor. Here, the arrow-button also pushes in the default parameter values.
The ’range’ display that follows shows the default synthesis parameter range (e.g. 20-
20000 for frequency), and the popup menu under ’Synth Param’ shows the name of the
parameter chosen for that mapping line.
Setting ’playRange’ determines the range of data point indices to play within the current
sorted data, with 0 being the first datapoint. ’post current range’ posts the current
range in the current order.
The final group of elements, labeled ’styrData’, allows for starting and stopping the
96
Figure 6.7: GUI Window for the Social Data Explorer.
The top line of elements is used for sorting data by criteria. The five element lines below are for
mapping data dimensions to synth parameters, and scaling the ranges flexibly. The bottom line
allows selecting a range of interest within the sorted data, and sonification playback control.
sonification playback.
6.3.3 Sonification design
The sonification design itself is quite a simple variant of discrete-event parameter map-
ping. Three different synthesis processes (’synthdefs’) are provided, all with control
parameters for freq, amp, pan, sustain. The synthdefs mainly vary in the envelope
they use (one is quasi-gaussian, the other two percussive), and in the panning algo-
rithm (’sineAz’ is for multichannel ring-panning). Which of these sounds is used can be
changed in the code.
The player algorithm iterates over the chosen range of data indices. It maps the values
of each data item to values for the synthesis event’s parameters, based on the current
mapping choices. If nothing is chosen for a given synthesis parameter, a default value is
used (e.g. for duration of the event, 0.1 seconds).
97
6.3.4 Evaluation
For experimenting with parameter mapping sonification, this design allows for similar
rendering complexity as the Sonification Sandbox (Walker and Cothran (2003)), though
without parallel streams of sounds. Both the user interface and the sonification itself are
sketches rather than polished applications, e.g. the user interface could allow loading
data files, switching between instruments, and derive its initial display state from the
current state of the model.
Given more development time, it would benefit from multiple and more complex sound
functions, from making more functionality available from GUI elements, and from fuller
visual representation of the ongoing sonification. Nevertheless, according to the sociol-
ogist colleague who experimented with it, it supported exploration of the particular type
of data file well enough to confirm its viability.
While we intended to experiment with designs bridging between the Wahlgesange design
and the Social Data Explorer, this was not pursued, mainly due to time constraints, and
because other ventures within the SonEnvir project were given higher priority.
Chapter 7
Examples from Physics
In the course of the SonEnvir project, we began with sonifications of quantum spectra,
and later decided to shift the focus to statistical spin models as employed in computa-
tional physics, for various reasons given below.
Sonification has been used in physics rather intuitively, without referring to the term
explicitly. The classical examples are the Geiger counter and the Sonar, both monitoring
devices for physical surroundings. An early example of research using sonification is the
experiment of the inclined plane by Galileo Galilei. Following Drake (1980), it seems
plausible that Galilei used auditory information to verify the quadratic law of falling
bodies (see chapter 3, and figure 3.1.1). In reconstructing the experiment, Riess et al.
(2005) found that time measuring devices of the 17th century (water clocks) were almost
certainly not precise enough for these experiments, while rhythmic perception was.
In modern physics, sonification has already played a role: one example of audification is
given in a paper by Pereverzev et al., where quantum oscillations between two weakly
coupled reservoirs of superfluid helium 3 (predicted decades earlier) were found by lis-
tening: ”Owing to vibration noise in the displacement transducer, an oscilloscope trace
[...] exhibits no remarkable structure suggestive of the predicted quantum oscillations.
But if the electrical output of the displacement transducer is amplified and connected to
audio headphones, the listener makes a most remarkable observation. As the pressure
across the array relaxes to zero there is a clearly distinguishable tone smoothly drifting
from high to low frequency during the transient, which lasts for several seconds. This
simple observation marks the discovery of coherent quantum oscillations between weakly
coupled superfluids.” (Pereverzev et al. (1997))
Next to sonification methods in physics, physics methods found their way into sonifica-
tion, as in the model-based sonification approach by Hermann and Ritter (1999). For
example, in so called data sonograms, physical formalisms are used to explore high-
dimensional data spaces; an adaptation of the data sonogram approach has been used
in the ’Wahlgesange’ sonification design described in section 6.2.
98
99
Physics and sonification
In physics, sonification has particular advantages. First of all, modern particle physics
is usually described in a four-dimensional framework. For a three dimensional space
evolving in time, a complete static visualisation is not possible any more. This makes
it harder to understand and thus very abstract - thus in both didactics and research,
sonification may be useful. In the auditory domain, many sound parameters may be used
to display a four-dimensional space, maintaining symmetry between the four dimensions
by comparing different rotations of their mappings. A feature of auditory dimensions
that has to be taken into account is that these dimensions are generally not orthogonal,
but could rather be compared to mathematical subspaces (see Hollander (1994)). This
concept is very common in physics, and thus easily applicable.
Furthermore in physics, many phenomena are wave phenomena happening in time, just
as sound is. Thus sonification provides a very direct mapping. While scientific graphs
usually map the time direction of physical phenomena onto a geometrical axis, this is
not necessary in a sonification, where physical time persists, and multiple parameters
may be displayed in parallel.
While perceptualisation is not intended to replace classical analytical methods, but rather
to complement them, there are examples where visual interpretation is superior to or at
least preceding mathematical treatment. For instance, G. Marsaglia (2003) describes a
battery of tests for the quality of numerical random number generators. One of these is
the parking lot test, where mappings of randomly filled arrays in one plane are plotted and
visually searched for regularities. He argues that visual tests are striking, but not feasible
in higher dimensions. As nothing is known beforehand about the nature of patterns that
may appear in less than ideal random number generators, there is no all-encompassing
mathematical test for this task. Sonification is a logical continuation of such strategies
which can be applied with multidimensional data from physical research contexts.
The major disadvantage of sonification we encountered is that physicists (and probably
natural scientists in general) are not familiar with it. Visualisation techniques and our
learnt understanding of them has been refined since the beginnings of modern science.
For auditory perception especially, we were e.g. confronted with the opinion that the
hearing process is just a Fourier transformation, and could be fully replaced by Fourier
analysis. This illustrates that much work is required before sonification becomes standard
practice in physics.
100
7.1 Quantum Spectra Sonification1
Quantum spectra are essential to understand the structure and interactions of composite
systems in such fields as condensed matter, molecular, atomic, and subatomic physics.
Put very briefly, quantum spectra describe the particular energy states which different
subatomic particles can assume; as these cannot be observed directly, competing models
have been developed that predict the precise values and orderings of these energy levels.
Quantum spectra provide an interesting field for auditory display due to the richness
of their data sets, and their complex inner relations. In our experiments (’us’ refer-
ring to the physics group within SonEnvir), we were concerned with the sonification of
quantum-mechanical spectra of baryons, the most fundamental particles of subatomic
physics observed in nature. The data under investigation stem from different competing
theoretical models designed for the description of baryon properties. This section reports
our attempts at finding valid and useful strategies for displaying, comparing and explor-
ing various model predictions in relation to experimentally measured data by means of
sonification. We investigated the possibilities of sonification in order to develop them as
a tool for classifying and explaining baryon properties in the context of present particle
theory.
Baryons - most prominently among them the proton and the neutron - are considered
as bound systems of three quarks, which are presently the ultimate known constituents.
The forces governing their properties and behaviour are described within the theory of
quantum chromodynamics (QCD). While up to now this theory is not yet exactly solvable
for baryons (at low and intermediate energies), one resorts to effective models, such as
constituent quark models (CQMs).
CQMs have been suggested in different variants. Existing models differ mainly in which
components they consider to constitute the forces binding the constituent quarks: All
models include a so called confinement component - as the distance between quarks
expands, the forces between them grow, which keeps them confined - and a hyperfine
interaction, which models interactions between quarks by particle exchange. As a result
there is a variety of quantum-mechanical spectra for the ground and excited states of
baryons. The characteristics of the spectra contain a wealth of information important
for the understanding of baryon properties and interactions. Baryons are also classified
by the combinations of quarks they are made up of, and by a number of other properties
such as color, flavor, spin, parity, and angular momentum, which can be arranged in
symmetrical orders. For more background in Constituent Quark Models and baryon
classification, please refer to Appendix C.1.
1This section is based on material from two SonEnvir papers: de Campo et al. (2005d) and de Campo
et al. (2006a).
101
7.1.1 Quantum spectra of baryons
The competing CQMs produce baryon spectra with characteristic differences due to the
different underlying hyperfine interactions. In figure 7.1 the excitation spectra of the
nucleon (N) and delta (∆) particles are shown for three different classes of modern
relativistic CQMs. While the ground states are practically the same (and agree with
experiments) for all CQMs, the excited states show different energies and thus level
orderings. (For instance, in the OGE CQM the first excitation above the N ground state
is JP = 1−
2, whereas for the GBE CQM it is JP = 1+
2.) Evidently the predictions of the
GBE CQM reach the best overall agreement with the available experimental data.
Figure 7.1: Excitation spectra of N (left) and ∆ (right) particles.
In each column, the three entries left to right are the energies (in MeV, or Mega-electronVolts)
based on One-Gluon exchange (Eidelman (2004)), Instanton-induced (Glozman et al. (1998);
Loering et al. (2001)), and Goldstone-Boson Exchange (Glantschnig et al. (2005)) constituent
quark models. The shaded boxes represent experimental data, or more precisely, the ranges of
imprecision that measurements of these data currently have (Eidelman (2004)).
7.1.2 The Quantum Spectra Browser
Sonifying baryon mass spectra
The baryon spectra as visualised by patterns such as in Fig. 7.1 allow a discrimination of
the qualities of the CQM description of experiment. Also one can read off characteristic
features of the different CQMs such as the distinct level orderings, etc. However, it
is quite difficult to conjecture specific symmetries or other relevant properties in the
dynamics of a given CQM by just looking at the spectra. Thus, there are a number
of open research questions where we expected sonification to be helpful. We began by
identifying phenomena that are likely to be discernible in sonification experiments:
102
• Is it possible to distinguish e.g. the spectrum of an N 1+
2nucleon from, say, a delta
∆3+
2by listening only?
• Is there a common family sound character for groups of particles, or for entire
models?
• In the confinement-only model, the intentionally absent hyperfine interaction causes
data points to merge into one: is this clearly audible?
We studied the sonification of baryon spectra with three specific data sets. They contain
the N as well as ∆ ground state and excitation levels for three different dynamical
situations: 1) the GBE CQM (Glozman et al. (1998)), 2) the OGE CQM (Theussl et al.
(2001)), and 3) the case with confinement interaction only, i.e., omitting the hyperfine
interaction component. Each one of these data files is made up of 20 lists, and each
list contains the energy levels of a particular N as well as ∆ multiplet JP . The lists are
different in length: Depending on the given JP multiplet they contain 2 - 22 entries,
since we only take into account energy levels up to a certain limit.
Sonification design
For sonification of baryon spectra, the most immediately interesting feature is the level
spacing. The quantum-mechanical spectrum is bounded from below and its absolute
position is fixed by the N ground state (at 939 MeV); above that, spectral lines up to
ca 3500 MeV appear for the excited states in the spectrum of each particle. As the
study of these level spacings depends on the precise nature of the distances between
these lines within and across particles, a sonification design demands high resolution for
that parameter; thus we decided to map these differences between the energy levels to
audible frequencies.
Several mapping strategies were tried for an auditory display of the spacings between
the energy levels in the spectra: I) Mapping the mass spectra to frequency spectra
directly, with tunable transposition together with optional linear frequency shift and
spreading, and II) Mapping the (linear) mass spectra to a scalable pitch range, i.e.
using perceptually linear pitch space as representation. Both of these approaches can be
listened to as simultaneous static spectra (of one particle at a time) and as arpeggios
with adjustable temporal spread against a soft background drone of the same spectrum.
Interface design
These models are implemented in SuperCollider3 scripts; for more flexible browsing, a
GUI was designed (see figure 7.2). All the tuneable playback settings can be changed
103
while playing, and they can be saved for reproducibility and an exchange of settings
between researchers. Some tuning options have been included in order to account for
known data properties: E.g., the values calculated for higher excitations in the mass
spectra are considered to be less and less reliable; we modeled this with a tuneable slope
factor that reduces amplitude of the sounds representing the higher excitation levels in
all models.
Figure 7.2: The QuantumSpectraBrowser GUI.
The upper window allows for multiple selection of particles that will be iterated over in 2D
loops; or alternatively, for direct playback of that particle by clicking. The lower window is for
tuning all the parameters of the sonification design interactively.
For static data like these, flexible, interactive comparison between different subsets of the
data is a key requirement; e.g. in order to find out whether discrimination by parity P is
possible with auditory display, one will want to automatically play interleaved sequences
alternating between selected particles with positive and negative parities.
The Quantum Spectra Browser window allows for the following interactions:
The buttons Manual, Autoplay choose between manual mode (where click on a button
switches to the associated sound) and an autoplay mode that iterates over all the selected
104
particles, either horizontally (line by line) or vertically (column by column). The buttons
LoopStart, LoopStop stop and start this automatic loop; the numberbox stepTime sets
for how many seconds each spectrum is presented. The three rows of buttons below
Goldstone, OneGluon, Confinement allow for playing individual spectra, or for a multiple
selection of which particles are heard in the loop.
The QSB Sound Editor allows for setting many synthesis/spatialisation parameters:
fixedFreq sets the freqency that corresponds to ground state; the default value is 939
Hz (for 939 MeV). fRangScale rescales the frequency range the other energy levels are
mapped into: a scale of 1 is original values, 2 expands to twice the linear range. As this
distorts proportions, we mostly left this control at 1. transpose transposes the entire
spectrum by semitones, so a value of -24 is two ocatves down. This leaves proportions
intact, and many listeners find this frequency range more comfortable to listen to. slope
determines how much the frequency components for higher energy levels are attenuated;
this models the decreasing validity of higher energy levels. 0 is full level, 0.4 means each
line is softer by a factor of 1 − 0.4 than the previous line. (The frequency-dependent
sensitivity of human hearing is compensated for separately using the AmpComp unit
generator).
panSpread sets how much spectral lines are separated spatially. With a spread of 1, and
stereo playback, the ground state is all the way left, and the highest excited state is
all right; less than 1 means they are panned closer together. When using multichannel
playback, this can expand over a series of up to 20 adjacent channels. panCenter sets
where the center line will be panned spatially - 0 is center, -1 is all left, 1 is all right.
The remaining parameters tune the details of an arpeggiation loop: essentially, a loop of
spread-out impulses excites the spectral lines individually, and they ring until a minimum
level is reached. ringTime determines how long each component will take to decay (RT
for -60dB) after an impulse. bgLevel maintains presence of the entire spectrum as one
gestalt: the sprectal line sounds will only decay to this minimum level and remain at that
level. attDelay determines when within the loop the first attack will play. attSpread
determines how spread out the attacks will be within the loop time; within the loop the
first attack will play. loopTime determines the time for one cycle of impulses.
7.1.3 The Hyperfine Splitter
Addressing a more subtle issue, we then designed a Hyperfine Level Splitter, which allows
for studying the so called splittings of the energy levels due to a variable strength of the
hyperfine interaction inherent in the CQMs. The hyperfine interaction is needed in order
to describe the binding of three quarks more realistically, i.e. in closer accordance with
experimental observation. When it is absent (in simulations), certain quantum states
are degenerate, meaning that the corresponding energy levels of some particles coincide.
105
In the first demonstration example, we chose the excitation levels of two different particles
(the Neutron n-1/2+ and the Delta d3/2+), calculated within the same CQM, the
Goldstone-Boson Exchange model (gbe) Glozman et al. (1998). These two particles are
degenerate when there is no hyperfine interaction present.
Sonification design
Mapped into sound, this means that one hears a chord of three tones for the ground
states and the first two excitation levels, which are the same for both particles. Here,
auditory perception is more difficult than in the Quantum Browser, as the mass spectra
are being played as continuous chords, and the hyperfine interaction may be ’turned up’
gradually (to 100 percent). Thereby, the energy levels are pulled apart, and one hears a
complex chord of six tones. The two particles that are compared can be distinguished
acoustically now, as when they are observed in experiments. With the Level Splitter, the
dynamical ingredients leading to these energy splittings may be studied in detail, and
likewise the quantitative differences in distinct CQMs.
The underlying sonification design is an extension of that for the Quantum Browser.
Mainly, some parameters are added to control the number of spectral lines to be rep-
resented at once, and a balance control between the simultaneous or interleaved two
channels that are compared.
Interface design
The Hyperfine Data Player window allows for the following interactions: The sets of
pop-up menus labeled left and right select which model (GBE, OGE), which particle
(Nukleon, Delta, etc.), which state (1/2, 3/2 etc), and which parity (+, -, both) is
chosen for sonification in that audio channel. The slider percent determines where
to interpolate between the model points of choice and their corresponding points in
the Confinement-only model; this is where the hyperfine interaction component to the
model can be gradually turned on or off. The graphical view below, labeled l3, l2, l1 -
l1, l2, l3 shows the precise values for the first several energy states of the two particles
chosen. The very bottom is ground state (939MeV), the visible range above goes up
to 3500. In the state shown, a so called ’level crossing can be seen (and heard): level
3 of GBE nucleon 1/2 (both parities) crosses below level 2; by comparison, in OGE,
the same particle has monotonically ascending spectral energy states. The bottom row
of buttons stops and starts the sonification, posts the current interpolated values, and
recalls a number of prepared demo settings.
The Hyperfine Editor allows for setting many synthesis/spatialisation parameters fa-
miliar from the Quantum Spectra Browser, as well as several more: balance sets the
balance between left and right channels. bgLevel sets the minimum level for arpeggiated
106
Figure 7.3: The Hyperfine Splitter GUI.
The left window is for selecting two particles by model, particle name, spin, and parity; the
hyperfine component is faded in and out with the slider in the middle. The bottom area shows
the audible spectral lines central-symmetrically. The window on the right side is the editor for
the synthesis and spatialisation parameters of the sonification design.
settings, as above. brightness adds harmonic overtones (by frequency modulation) to
the individual lines so that judging their pitch becomes easier. pitStretch rescales the
pitch range the other energy levels are mapped into: a scale of 1 is original values, 2
expands to twice the intervallic range. (This is different from fRangScale above, which
used linear scaling). transpose transposes the entire spectrum by semitones, as above.
melShift determines when within the loop the second channel’s attack will play relative
to the first. 0 means they play together, 3 means they are equally spaced apart (by 3 of
6 subdivisions); the maximum of 6 plays them in sync again. melSpread determines how
much the attacks within one channel are arpeggiated; 3 means such that they appear
equally spread in time. Together, these two controls allow alternating the two spectra
as groups, or interleaving the individual spectral lines across the two spectra. ringAtt
determines how fast the attack times are for both channels. ringDecay sets the decay
time for the spectral lines for both channels. nMassesL, nMassesR is handled automati-
cally when changing particle types and properties. This is the number of masses audible
in the spectrum, which can be reduced by hand if desired. ampGrid sets the volume for
a reference grid (clicks) which can be turned on for orientation.
107
7.1.4 Possible future work and conclusions
At this point, the physicists that had worked closely with these data in their own research
agenda unfortunately had to leave the project, which meant that this line of experiments
came to an end before more interesting ideas could be experimented with. These were
intended to explore a number of further aspects that may ultimately be relevant in the
scientific study of particle physics by sonification, and for completeness, these ’loose
ends’ are given here.
Comparison with experimental data
As can be seen seen from figure 7.1, there are several experimental data available for the
energy levels. However, they are affected by experimental uncertainties. Consequently,
their auditory display needs some adaptations. We intended to differentiate between
(sharp) theoretical data as deduced from the CQMs and (spread) phenomenological
data measured in experiment by adding narrow band modulation to spread-out data
bands. It should be quite interesting to qualify the theoretical predictions vis-a-vis the
experimental data.
Representing symmetries with spatial ordering
Much effort has gone into finding visual representations for the multiple symmetries
between particle groups and families. Arranging the sound representations in 3D-space
with virtual acoustics in a spatial order determined by symmetry properties between
particle groups may well be scientifically interesting; navigating such a symmetry space
could become an experience that lets physicists acquire a more intuitive notion of the
nature of these symmetries.
Temporal aspects
There are plenty of interesting time phenomena in the quantum physics, which could
be made use of in numerous ways in further explorations. For example, there is an
enormous variation in the half-life of different particles. This could be expressed quite
directly in differentiated decay times for different spectral lines. In addition, including the
probabilities for transitions between excited states and ground states will open promising
possibilities for demonstrating the dynamical ingredients in the quark interactions inside
baryons.
108
Conclusions
Our investigations have indicated that sonification is an interesting alternative and a
promising complementary tool for analysing quantum-mechanical data. While many
interesting design ideas came up in this line of research, which may well be useful for
other contexts, the implemented sonification designs were not fully tested by domain
experts in this quite specialised field. Given motivated domain science research partners,
a number of good candidates for sonification approaches remain to be explored further
in the context of quantum spectra.
109
7.2 Sonification of Spin models2
Spin models provide an interesting test case for sonification in physics, as they model
complex systems that are dynamically evolving and not satisfactorily visualisable. While
the theoretical background is largely understood, their phase transitions have been an
interesting subject of studies for decades, and results in this field can be applied to many
scientific domains. While most classical methods of solving spin models rely on mean
values, their most important feature, especially at the critical point of phase transition,
are the spin fluctuations of single elements.
Therefore we started out with the fluctuations of the spins, and provided auditory in-
formation that can be analysed qualitatively. The goal was to display three-dimensional
dynamic systems, distinguish the different phases and study the order of the phase tran-
sition. Audification and sonification approaches were implemented for the spin models
studied, so that both realtime monitoring of the running model and analysis of pre-
recorded data sets is possible. Sound examples of these sonifications are described in
Appendix C.2.
7.2.1 Physical background
Spin systems describe macroscopic properties of materials (such as ferromagnetism) by
computational models of simple microscopic interactions between single elements of the
material. The principal idea of modeling spin systems is to study a complex system in a
controlled way, where they are theoretically tractable, and mirror the behaviour of real
compounds.
From a theoretical perspective, these models are interesting because they allow studying
the behaviour of universal properties in certain symmetry groups. This means that some
properties do not depend on details like the kind of material, such as so-called order
parameters giving the order of the phase transition. Already in 1945, E. A. Guggenheim
(cited in Yeomans (1992)) found that the phase diagrams of eight different fluids he
studied shows the very same coexistence curve3. A theoretical explanation is given by
a classification in symmetry groups – all of these different fluids belonged to the same
mathematical group.
2This section is based on a SonEnvir ICAD conference paper, Vogt et al. (2007)3This becomes apparent when plotted in so-called reduced variables, the reduced temperature being
T/Tcrit, the actual temperature relative to the critical one, and pressure is treated likewise.
110
7.2.2 Ising model
One of the first spin models, the Ising model, was developed by Ernst Ising in 1924 in
order to describe a ferromagnet. Since the development of computational methods, this
model has become one of the best studied models in statistical physics, and has been
extended in various ways.
Figure 7.4: Schema of spins in the Ising model as an example for Spin models.
The lattice size here is 8 x 8. At each lattice location, the spin can have one of two possible
values, or states (up or down).
Its interpretation as a ferromagnet involves a simplified notion of ferromagnetism.4 As
shown in figure 7.4, it is assumed that the magnet consists of simple atoms on a quadratic
(or in three dimensions cubic) lattice. At each lattice point an atom (here, a magnetic
moment with a spin of up or down) is located. In the computational model, neighbouring
spins try to align to each other, because this is energetically more favorable. On the
other hand, the overall temperature causes random spin flips. At a critical temperature
Tcrit, these processes are in a dynamic balance, and there are clusters of spins on all
orders of magnitude. If the temperature is lowered from Tcrit, one spin orientation will
prevail. (Which one is decided by the random initial setting.) Macroscopically, this is
the magnetic phase (T < Tcrit). At T > Tcrit, the thermal fluctuations are too strong
for uniform clusterings of spins. There is no macroscopic magnetisation, only thermal
noise.
4There are many different application fields for systems with next neighbour interaction and random
behaviour. Ising models have even been used to describe social systems, as e.g. in P. Fronczak (2006),
though this is a disputed method in the field.
111
7.2.3 Potts model
A straightforward generalisation of this model is the admission of more spin states than
just up and down. This was realized by Renfrey B. Potts in 1952, and was accordingly
called the Potts model. Several other extensions of models were studied in the past.
We worked with the q-state Potts model and its special case for q = 2, the Ising model,
both being classical spin models. For mathematical background, see Appendix C.2.
The order of the phase transition is defined by a discontinuity in the derivates of the free
energy (see figure 7.5). If there is a finite discontinuity in one of the first derivatives,
the transition is called first order. If the first derivatives are continuous, but the second
derivatives are discontinuous, it is a so-called continuous phase transition.
Figure 7.5: Schema of the orders of phase transitions in spin models.
The mean magnetisation is plotted vs. decreasing temperature. (a) shows a continuous phase
transition and (b) a phase transition of first order. In the latter, the function is discontinuous
at the critical temperature. The roughly dotted line gives an approximation on a finite system,
e.g. a computational model. The bigger the system, the better this approximation models the
discontinuous behaviour.
Nowadays, spin models are usually simulated with Monte Carlo algorithms, giving the
most probable system states in the partition function (Yeomans, 1992, p. 96). We
implemented a Monte Carlo simulation for an Ising and Potts model in SuperCollider3
(see figure 7.2.3). The lattice is represented as a torus (see fig. 7.8) and continually
updated: for each lattice point, a different spin state is proposed, and the new overall
energy calculated. As shown in equation C.3, it depends on the neighour’s interactions
(SiSj) and the overall temperature (given by the coupling J ∼ 1/T ). If the new energy is
smaller than the old one, the new state is accepted. If not, there is still a certain chance
that it is accepted, leading to random spin flips representing the overall temperature.
To observe the model and draw conclusions from it, usually mean values of observables
are calculated from the Monte Carlo simulation, e.g. the overall magnetisation. The
simulation needs time to equilibrate at each temperature in order to model physical
112
reality, e.g. with small or large clusters. Big lattices with a length of e.g. 100 need
many equilibration steps. With a typical evolution of the model, critical values or the
order of the phase transition can be deduced. This is not rigorously doable, as on a finite
lattice a function will never be continuous, compare figure 7.5. In a quantised system,
the ”jump” in the observable will just look more sudden for a first order phase transition.
This last point is both an argument for using sonification and a research goal for this
study: by using more information than the mean values, the order of the phase transition
can be more clearly distinguished. Also, we studied different phase transitions with the
working hypothesis that there might be principal differences in the fluctuations, which
can be better heard. (A Potts model with q ≤ 4 states has a continuous phase transition,
whereas with q ≥ 5 states it has a phase transition of first order.) Thus researchers may
gain a quick impression of the order of the phase transition.
Implementing spin models
In all the analytical approaches, the solving procedures of models are based on abstract
mathematics. This gives great insight in the universal basics of critical phenomena, but
often a quick glance on a graph complements classical analysis, as mentioned above.
Thus in areas where visualisation cannot be done, applying sonification can help to reach
an intuitive understanding with relatively few underlying assumptions. Sonification tools
can also serve as monitoring devices for highly complex and high dimensional simulations.
The phases and the behaviour at the critical temperature can be observed. Finally, we
were particularly interested in sonification of the critical fluctuations with self-similar
clusters on all orders of magnitude.
We wanted to provide for a more or less direct observation of data on all levels of the
analysis, both to verify assumptions and to not overlook new insights. This should be
done by observing the dynamic evolution of the spins, not only mean values. Thus,
the important characteristic of spin fluctuations can be studied and the entire system
continuously observed.
Spin model data features
Spin models have several basic characteristics, which were used in different sonifica-
tion approaches. These properties refer to the structure of the model, the theoretical
background and its interpretation, and they were exploited for the sonification as follows:
• The models are discrete in space by fixed lattice positions and these are filled with
discrete valued spins. The data sets are rather big, on the order of a lattice size
of 100 in two or three dimensions, and are dynamically evolving. Because of the
113
Figure 7.6: GUI for the running 4-state Potts Model in 2D.
The GUI shows the model in a state above critical temperature, where large clusters emerge.
The lattice size is 64x64. The averages below the spin frame show the development of the mean
magnetisation for the 4 spin parities over the last 50 configurations. As the temperature is
constant and the system has been equilibrated before, these mean values are rather constant.
114
specifics of the modeling, the simulations are only correct on the statistical aver-
age, and many configurations have to be taken into account together for correct
interpretation. Considering that a single auditory event has to have some mini-
mum duration to display perceptually distinguishable characteristics, we explored
two options for the auditory display: a fast audification approach, and omission,
i.e. representing only a subset of all spins, using a granular approach.
• The models are calculated by next-neighbour interaction aligning the spins on the
one hand, and random fluctuations on the other. We aimed to preserve the next-
neighbour property at least partially by different strategies of moving through the
data frame: either along a conventional torus path, or along a Hilbert-curve, see
fig. 7.8 (in approaches 7.2.4, 7.2.5 and 7.2.7). For the lossy (omission) approach,
the statistical nature of the model was preserved by picking random elements for
the granular sonification.
• There is a global symmetry in the spins, thus - in the absence of an exterior
magnetic field - no spin orientation is preferred. This was mapped for the Ising
model by choosing the octave for the two spin parities. In the audifications, every
spin orientation is assigned a fixed value, and symmetry is preserved as the sound
wave only depends on the relative difference between consecutive steps in the
lattice.
• At the critical point of phase transition, the clusters of spins become self-similar
on all length scales. We tried to use this feature in order to generate a different
sound quality at the point of phase transition. This would allow a clear distinc-
tion between the two phases and the (third) different behaviour at the critical
temperature itself.
7.2.4 Audification-based sonification
In this approach, we tried to utilise the full available information generated by the model.
As the Sonification Design Space Map suggests audification for higher density auditory
display, we interpreted the spins within each time instant as a waveform (see figure 7.7).
This waveform can be listened to directly or taken as a modulator of a sine wave.5 When
the temperature is lowered, regular clusters emerge, changing only slowly from time step
to time step. Thus, if the audification preserves locality, longer structures will emerge
aurally as well, resulting in more tone-like sounds. When one spin dominates, there is
silence, except for some random thermal fluctuations at non-zero temperature.
5While this would not qualify as an audification by the strictest definition, such a simple modulation
is still conceptually quite close.
115
Figure 7.7: Audification of a 4-state Potts model.
The first 3 milliseconds of the audio file of a model with 4 different states in the high temper-
ature phase (noise).
Figure 7.8: Sequentialisation schemes for the lattice used for the audification.
The left scheme shows a torus sequentialisation, where spins at opposed borders are treated
as neighours. This treats a 2D-grid like a torus (a doughnut shape), as row by row is read.
On the right side a Hilbert curve is shown.
While fig. 7.7 explains handling of one line of data for the sonification, the question
remains how to move through all of them. Different approaches of sequentialisation
are shown in fig. 7.8. The model has periodic boundary conditions, so a torus path is
possible. We also experimented with moving through the lattice along a Hilbert curve.
This is a space filling curve for quadratic geometries, reaching every point without inter-
secting with itself. This was intended to make the audification insensitive to differences
which arise depending on whether rows or columns are read first, which can occur in the
case of symmetric clustering. Eventually, it turned out that symmetric clustering mainly
depends on unfavorable starting conditions and occurs only rarely, so we mostly used a
torus path, as the model does in the calculation.
The sounds were recorded directly from the interactive model, using the GUI shown in
fig. 7.2.3 for a specific temperature. In order to judge the phase of the system, this
simple method is most efficient.
116
At the time of recording, the model has already been equilibrated - its state represents
a typical physical configuration for the specific temperature. When the temperature is
cooled down continually, the system needs several transition steps at each new temper-
ature before the data represents the new physical state correctly. Thus, in a second
approach, data was pre-recorded and stored as a sound-file. Contrary to our assump-
tions, the continuous phase transition is not very clearly distinguishable from the first
order phase transition. This is partly due to the data - on a quantised lattice there
are no truly continuous observables, so the distinction between first and second order
transitions is fuzzy in principle.
A fundamental problem is that the equilibration steps (which are not recorded!) between
the stored configurations cut out the meaningful transitions between them: That these
equilibration steps are needed at all is in fact a common drawback in the established
computational spin models. When one considers every complete lattice state as one
sequence of single audio samples (e.g. 32x32 = 1024 lattice sites), then with a sampling
frequency of 44100 Hz, at every 23 ms a potentially completely different state is rendered,
instead of a continuously evolving system with only few changes in the cluster structures
from one frame to the next. This makes it more difficult to understand the dynamic
evolution of the transitions. We tried to leave out as few equilibration steps as possible
to stick closely to a physical relevant state and still keep the transitions understandable.
Consequently, we recorded e.g. for a 32x32-lattice every 32nd step, and on the whole
10 different couplings (temperatures), every 32 times. Thus, our soundfiles (described
in appendix C.2) have (32 x 32) lattice sites x 10 couplings x 32 record steps = 327680
samples, and last 7.4 s. Still, when comparing a 4-state Potts model to one with 5 spin
states, the change in the audio pattern is only slightly more sudden in the latter.
7.2.5 Channel sonification
We refined the audification approach by recording data for each spin separately. This
concept is shown in figure 7.9. All of the lattice is sequentialised like a torus (see fig.
7.8) and read out for every spin state separately. When data of spin A is collected, only
lattice sites with spin A are set to 1; all the others to 0. On the contrary, when spin
B data is collected, all lattice sites with spin A are set to 0, and spin B to 1; and so forth.
Thus, the different spins are separate and can be played on different channels. One
remaining problem is that the channels are highly correlated: in the Ising model with only
2 states, the 2 channels are exactly reciprocal. Thus there may be phase cancellations
in the listening setup that makes it harder to distinguish the channels. Still, the overall
impression is clearer than the simple audification, and this approach is the most promising
regarding the order of the phase transition.
117
Figure 7.9: A 3-state Potts model cooling down from super- to subcritical state.
The three states are recorded as audio channels, shown here with time from left to right.
Toward the end, channel 2 dominates.
7.2.6 Granular sonification
In this approach, the data were pre-processed, which allowed for designing less fatiguing
sounds. Also, more sophisticated considerations can be included in the sonification
design.
In a cloud sonification we first sonified each individual spin as a very short sound grain,
and played them at high temporal density. A 32x32 lattice (1024 points) can be played
within one second, and allowing some overlap, this leaves on the order of 3 ms for each
sound grain. One second is a longer than desirable time for going through one entire
time instant, but this is simply a trade-off between representing all the available data
for that time instant, and moving forward in time fast enough. For bigger lattices, this
approach is too slow for practical use.
Thus the next step was calculating local mean values. We took random averaged spin
blocks in the Ising model6, see figure 7.10, so the data was pre-processed for the soni-
fication, and we did not use all available information. At first, for each configuration a
6In this sonification we stayed with the simpler Ising model due to realtime CPU limitations, but the
results do transfer to the Potts model.
118
Figure 7.10: Granular sonification scheme for the Ising model.
The spatial location of each randomly chosen spin block within the grid determines its spa-
tialisation, and its averaged value determines pitch and noisiness of the corresponding grain.
few lattice sites are chosen; then for each site, the average of its neighbouring region
is calculated, giving a mean magnetic moment between −1 (all negative) and +1 (all
positive); 0 meaning the ratio of spins is exactly half/half. This information is used
to determine the pitch and the noisiness of a sound grain. The more the spins in one
block are alike, the clearer the tone (either lower or higher), the less alike, the noisier
the sound. Location of the block in 3D space is given by spatial position of the sound
grain.7 The soundgrains are very short and played quickly after one another from differ-
ent virtual regions. With this setting, a three-dimensional gestalt of the local state of a
cubic lattice is generated around and above the listener.
Without seeing the state of the model, a clear picture emerges from the granular sound
7This spatial aspect can only be properly reproduced with a multi-channel sound system. We
adapted the settings for the CUBE, a multi-functional performance space with a permanent multi-
channel system at the IEM Graz. Using the VirtualRoom class described in section 5.5, one can also
render this sonification for headphones.
119
texture, and also untrained listeners can easily distinguish the phases of the model.
7.2.7 Sonification of self-similar structures
To study a detail aspect of the above approach, we looked at self-similar structures
at the point of phase transition by sonification. Music has been considered to exhibit
self-similar structures, beginning with (Voss and Clarke (1975, 1978)); later on, the
general popularity of self-similarity within chaos theory has also extended to computer
music, and the hypothesis that self-similar structures may be audible has led to a lot of
experimentation and compositions with such conceptual background.
In internal listening tests we tried to display structures on several orders of magnitude in
parallel. These were calculated by a blockspin transformation, which returns essentially
the spin orientation of the majority of points in a region of the lattice. It was our goal to
make such structures of different orders of magnitude recognisable as similarly moving
melodies, or as a unique sound stream with a special sound quality.
Figure 7.11: A self similar structure as a state of an Ising model.
This is used as a test case for detecting self similarity. Blockspins are determined by the
majority of spins of a certain region.
In our design, three orders of magnitude in the Ising model were compared to each
other, as shown in figure 7.11. The whole lattice (on the right side - with the least
resolved blockspins) was displayed in the same time as a quarter of the middle and as an
eighth of the left blockspin spin structure (second on the left side). The original spins
are shown on the left. Comparing three simultaneous streams for similarities turned out
to be a demanding cognitive task: Trying to follow three streams and comparing their
melodic behaviour at the same time is not trivial, even for trained musicians. Thus
we experimented with an alternative: the three streams representing different orders of
magnitude are interleaved quickly. When the streams are self-similar, one only hears a
single (random) stream; as soon as one stream is recognisably different from the others,
a triple grouping emerges. While this method works well with simple test data as shown
120
in fig. 7.11, we could not verify self-similarities in noisy data of running spin models. We
suspect that self-similar structures do not persist long enough for detection in running
models, but for time reasons did not pursue this further.
7.2.8 Evaluation
Domain Expert Opinions
A listening test with statistical analysis was not appropriate as there were not enough
subjects available familiar with researching spin models. Thus, as a qualitative evaluation
we obtained opinions from experts in the field. These were four professors of Theoretical
Physics in Graz, who were not directly involved in the sonification designs. They were
explained the results and given a few questions on the applicability and usefulness of the
results.
The overall attitude may be summed up as curious but rather sceptical, even if the opin-
ions differed in the details. Asked whether they themselves would use the sonifications,
all of them answered they would do so only for didactic reasons or popular scientific
talks. The possibility of identifying different phases was acknowledged but was not seen
as superior to other methods (e.g. studying graphs of observables, as would be the stan-
dard procedure). One subject remarked that, for research purposes, the ’aha-moment’
was missing. This might be due to the fact that the Ising and Potts model have both
been studied for decades and are well understood. While the data is mainly thermal
noise, there is only little information to extract. Our sonifications reveal no new physi-
cal findings for the models we chose. A three dimensional display seems interesting for
the experts, even if the dimensions are not experienced explicitly (in the audification
approach there is a sequentialisation for displaying one dimension) and the sound grain
approach as implemented only applies for three physical dimensions.
Another application that was discussed is a quick overview over large data sets: e.g.
checking numerical parameters (that there are enough equilibration steps, for instance)
or getting a first impression of the order of the phase transition. This seems plausible
to all subjects, even if the standard procedure, e.g. a program for pattern recognition,
would still be equivalent and - given the familiarity with such tools - preferable to them.
The main point of criticism was the idea of a qualitative rather than quantifiable approach
towards physics, which is seen as a possible didactics tool but ’not hard science’. General
sonification problems were discussed as well: it was noted that visualisation techniques
play a more and more important role in science, and that they are tough competitors.
Also for state of the art of publishing, sonification is at a disadvantage.
Besides this expected scepticism, it can be remarked that all subjects heard immediately
the differences in the sound qualities. Metaphors to the sounds came up spontaneously
during the introduction, as e.g. boiling water for the point of phase transition. The
121
experts came up with several ideas for future projects to discuss; this kind of interest is
an encouraging form of feedback.
Conclusions and Possible Future Work
Spin models are interesting test cases for studying sonification designs for running mod-
els. We implemented both Monte Carlo simulations of Potts and Ising models and
sonification variants in SuperCollider3. These models produce dynamically evolving data
with their main characteristics being fluctuations of single spins; although analytically
well defined, finite computational models can only reproduce a numerical approximation
of the predicted behaviour, which has to be interpreted.
A number of different sonifications were designed in order to study different aspects
of these spin models. We created tools for the perceptualisation of lattice calculations
which are extensible to higher dimensions and a higher number of states. They allow both
observing running models, and analysing pre-recorded data to obtain a first impression
of the order of the phase transition.
Experimenting with alternative sonification techniques for the same models, we found
differing sets of advantages and drawbacks: Granular sonification of spin blocks gives
a reliable classification of the phase the system is in, and allows to observe running
simulations, using the random behaviour of spin models. Audification based tools allow
us to make use of all the available data, and even track each spin orientation separately
in parallel. This tool is used to study the order of the phase transition. Additionally, we
worked on sonifications of self similar structures.
With this study, sonification was shown to be an interesting complementary data repre-
sentation method for statistical physics. Useful future directions for extending this work
would include increased data quality and choices of different input models, which would
lead to classification tools for phase transitions that allow studying models of higher
dimensionality. Continued work in this direction could lead to applications in current re-
search questions in the field of computational physics. The research project QCDAudio
hosted at IEM Graz with SonEnvir participant Kathi Vogt as lead researcher will explore
some of these directions.
Chapter 8
Examples from Speech Communication and
Signal Processing
The Signal Processing and Speech Communication Laboratory at TU Graz focuses on
research in the area of non-linear signal processing methods, algorithm engineering and
applications thereof in speech communication and telecommunication. After investigat-
ing sonification approaches to the analysis of stochastic processes and wave propagation
in ultra-wide-band communication (briefly mentioned in de Campo et al. (2006a)), the
focus for the last phase in SonEnvir was on the analysis of time series data.
In signal processing and speech communication, most of the data under study are se-
quences of values over time. There are many properties of time series data that interest
the researcher. Besides analysis in the frequency domain, the statistical distribution of
values provides important information about the data at hand. With the Time Series
Analyser, we investigated the use of sonification in analysing the statistical properties
of amplitude distributions in time series data. From the domain science’s point of view,
this can be used as a method for the classification of signals of unknown origin, or for the
classification of surrogate data to be used in experiments in telecommunication systems.
8.1 Time Series Analyser1
The analysis of time series data plays a key role in many scientific disciplines. Time
series may be the result of measurements, unknown processes or simply digitised signals
of a variety of origins. Although usually visualised and analysed through statistics, the
inherent relationship to time makes them particularly suitable for a representation by
means of sound.
1This section is based on the SonEnvir ICAD paper Frauenberger et al. (2007).
122
123
8.1.1 Mathematical background
The statistical analysis of time series data is concerned with the distribution of values
without taking into account their sequence in time. As we will see later, changing the
sequence of values in a time series completely destroys the frequency information while
keeping the statistical properties intact. The most well known statistical properties of
time series data is the arithmetic mean (8.1) and the variance (8.2).
x =1
n
n∑i=1
xi (8.1)
σ2 =1
n
n∑i=1
(xi − x)2 (8.2)
However, higher order statistics provide more properties of time series data, describing
the shape of the underlying probability function in more detail. They all derive from the
statistical moments of a distribution defined by
µ′
n =n∑i=1
(xi − α)nP (x) (8.3)
where n is the order of the moment, α the value around which the moment is taken
and P (x) the probability function. The moments are most commonly taken around the
mean, which is equivalent to the first moment µ1. The second moment around the mean
(or second central moment) is equivalent with the variance σ2 and hence, the squared
standard deviation σ.
Higher order moments define the skewness and kurtosis of the distribution. The skewness
is a measurement for the asymmetry of the probability function, meaning a distribution
has high skewness if its probability function has a more pronounced tail toward one end
than to the other. The skew is defined by
γ1 =µ3
µ322
(8.4)
with µi being the i − th central moment. The kurtosis describes the ’peakedness’ of a
probability function; the more pronounced peaks there are in the probability function,
the higher the kurtosis of the distribution. It is defined by
β2 =µ4
µ22
(8.5)
Both values distinguish time series data and are significant properties in signal processing.
From the SDSM point of view, the inherent time line and the typically large numbers of
data values in time series data suggest the use of the most direct approach to auditory
perceptualisation - audification. When interpreted as a sonic waveform the statistical
124
properties of time series data become acoustical dimensions, which may be perceived:
The variance corresponds directly to the power of the signal, and hence (though non-
linearly) to its perceived loudness. The mean, however, is nothing more than an offset
and is not perceivable. The question of interest is whether the skewness and the kurtosis
of signals can be related to perceptible dimensions as well.
8.1.2 Sonification tools
In order to investigate the statistical properties of time series data by audification, we
first developed a simple tool that allows for defining arbitrary probability functions for
noise. Subsequently, we built a more generic analysis tool that makes it possible to
analyse any kind of signal. This tool was also used as the underlying framework for the
experiment described in section 8.2.
8.1.3 The PDFShaper
The PDFShaper is an interactive audification tool that allows users to draw probability
functions and listen to the resulting distribution as an audification in real-time. Figure
8.1 shows the user interface.
PDFShaper provides four graphs (top down): the probability function, the mapping func-
tion, the measured histogram and the frequency spectrum of the time series synthesised
as specified by the probability function. The tool allows the user to interactively draw
in the first graph to create different kinds of amplitude distributions. It then calculates
a mapping function which is defined by
C(x) =1
g(x)=
∫ x
0
P (t)dt (8.6)
where C(x) is the cumulative probability function and g(x) is a mapping function that if
applied to a uniform distribution y produces values according to the probability function
P (t). This mapping function essentially shapes values from a uniform distribution to
any desired probability function P (t).
In the screenshot shown, the probability function is drawn into the top graph as a shifted
exponential function. After applying the mapping function shown in the second graph to
white noise, the third graph shows the real-time histogram of the result. It approximately
resembles the target probability function. Note that both skew and kurtosis are relatively
high in this example as the probability function is shifted to the right and has a sharp
peak.
125
Figure 8.1: The PDFShaper interface
8.1.4 TSAnalyser
The TSAnalyser is a tool to load any time series data and analyse its statistical properties.
Figure 8.2 shows the user interface.
Besides providing statistical information about the file loaded (aiff format) it shows a
histogram and a spectrum. Its main feature is to be able to ”scramble” the signal.
126
Figure 8.2: The TSAnalyser interface
That is, it randomly re-orders the values in the time series and hence, destroys all
spectral information. When analysing amplitude distributions, the spectral information
is often distracting. Scrambling a signal will result in a noise-like sound with the same
statistical properties as the original. In the screenshot the loaded file is a speech sample
that comes with every SuperCollider installation. When scrambled, the spectrum at the
bottom shows an almost uniform distribution in the frequency domain.
Both PDFShaper and TSAnalyser are implemented in SuperCollider, and available as
part of the SonEnvir Framework by svn here2.
2https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/
127
8.2 Listening test
The experiment described here was designed to investigate whether the higher order
statistical properties of arbitrary time series data are perceptible when rendered by au-
dification. If so, what are the perceptual dimensions that would correlate to these
properties, and what are the just noticeable difference levels?
8.2.1 Test data
The first challenge in designing the experiment was to create appropriate data. They
should not contain any spectral information and the statistical properties should be
fully controllable, ideally independently. Unfortunately, it is a non-trivial task to define
probability functions with certain statistical moments, as this is an ill-defined problem.
We settled on a random number generator for the Levi skew alpha-stable distribution
Wikipedia (2007). It was chosen because it features parameters that directly control the
resulting skew and kurtosis which also can be made atypically high. It is defined by the
probability function
f(x;α, β, c, µ) =1
2π
∫ + inf
− inf
φ(t)e−itxdt (8.7)
φ(t) = eitµ−|ct|α(1−iβsign(t)Φ) (8.8)
Φ = tan(πα
2) (8.9)
Where α is an exponent, β directly controls the skewness and c and µ are scaling
parameters. There is no analytic solution to the integral, but there are special cases in
which the distribution behaves in specific ways. For example, for α = 2 the distribution
reduces to a Gaussian distribution. Fortunately, the Levi distribution was implemented
as a number generator in the GNU Scientific Library GSL, see GSL Team (2007). It
allows for generating sequences of numbers of any length for a distribution determined
by providing the α and β parameters.
For the experiment we generated 24 signals with skew values ranging from -0.19 to 0.25
and kurtosis ranging from 0.17 to 14. It turned out to be impossible to completely de-
couple skew from kurtosis. So, we decided to generate two sets, one that has insignificant
changes in skew, but a range in kurtosis of 0.16 to 14, while the other set covered the full
range for skew and 0.15 to 5 for kurtosis. All signals were normalised to have a variance
of 0.001 and were 3 seconds long (at a samplerate of 44.1 kHz) with 0.2 seconds fade-in
and fade-out times.
128
8.2.2 Listening experiment
The experiment was designed as a similarity listening test. Participants were listening to
sequences of three signals and had to select the two signals they perceived as being most
similar. Each sequence was composed of the signal under investigation (each of the 24),
a second randomly chosen signal out of the 24, and the first signal scrambled; the three
signals were in random order. It was pointed out to participants that they will not hear
two exactly identical sounds within the sequence, but they were asked to select the two
that sounded most similar. The signal under investigation and its scrambled counterpart
were essentially different signals, but shared identical statistical properties. It was not
specified which quality of the sound they should listen for to make this decision. This
and the scrambling was done to make sure that participants focus on a generic quality
of the noise rather than specific events within the signals.
After a brief written introduction into the problem domain and the nature of the exper-
iment, participants started off with a training phase of three sequences to learn the user
interface. For this training phase, the signals with the largest differences in skew and
kurtosis were chosen to give people an idea of what to expect. Subsequently, each of the
sets were played; Set one with 9 sequences, Set two with 15. The sequence of the sets
was altered with each participant. Participants were also able to replay the sequence as
often as they wished and adjust the volume to their taste. Figure 8.3 shows the user
interface used.
Figure 8.3: The interface for the time series listening experiment.
A post-questionnaire asked for the sound quality participants used to distinguish the
signals and asked them to assign three adjectives to describe this quality. Furthermore,
participants were asked whether participants could tell any difference between the sets,
and whether they felt there was any learning effect, i.e., whether the task became easier
during the experiment.
129
8.2.3 Experiment results
Eleven participants took part in the experiment, most of them working colleagues or
students at the institute. Four participants were members of the SonEnvir team and
had more substantial background on the topic which, however, did not seem to have any
impact on their results.
The collected data shows that there is a significant increase in the probability of choosing
the correct signals as the difference in kurtosis and skew increased. Figure 8.4 shows
the average probabilities in four different ranges of ∆ kurtosis. The skew in this set
was nearly constant (±0.001), so the resulting difference in correct answers is related
to the change in ∆ kurtosis. While up to a difference of 5 in kurtosis the probability
Figure 8.4: Probability of correctness over ∆ kurtosis in set 1
is only insignificantly higher than 0.333 (the probability of random answers), and even
decreases, there is a considerable increase thereafter, topping at over 70% at differences
of around 11. This indicates that 5 is the threshold for just noticeable differences for
kurtosis. This is also supported by the results from set 2 as shown in figure 8.5.
For skewness the matter was more difficult as we had no independent control over
it. Although the data from set 2 suggest that there is an increase in probability with
increasing difference in skew (as shown in figure 8.6), this might also be related to the
difference in kurtosis.
Looking at the probability of correctness over both, the difference in kurtosis and the
130
Figure 8.5: Probability of correctness over ∆ kurtosis in set 2
Figure 8.6: Probability of correctness over ∆ skew in set 2
difference in skew (as in figure 8.7) reveals that it is unlikely that the increase is related
to the change in ∆ skewness. While in every spine in which ∆ skew is constant the
131
probability increases with increasing ∆ kurtosis, this is not the case vice versa.
Figure 8.7: Probability of correctness over ∆ skew and ∆ kurtosis in set 2
Summarising, we found evidence that participants could reliably detect changes in kur-
tosis greater than 5, but we did not find enough proof for the case of skewness. This
may indicate that we need to use a different dataset which has bigger differences in
skew while having small values for the kurtosis. However, for this another family of
distributions must be found.
The number of times participants used the replay option seemed to have no impact on
their performance. Figure 8.8 shows the number of replays of all data points over ∆
kurtosis. Red crosses indicate correct answers, black dots incorrect answers. Although
participants replayed the sequence more often when the difference in kurtosis was small,
there is no evidence that they were more successful when using more replays.
The answers to the post-questionnaire must be seen in the light of the data analysis
above. The quality participants assessed to drive their decisions must be linked to the
kurtosis rather than skewness in the signal. The most common answers for this quality
were crackling and the frequency of events. Others included roughness and spikes.
However, some participants also stated that they heard different colours of noise and
other artefacts related to the frequency spectrum. This is a common effect when being
exposed to noise signals for a longer period of time. Even if the spectrum of noise
is not changing at all (as in our case), humans often start to imagine hearing tones
132
Figure 8.8: Number of replays over ∆ kurtosis in set 2
and other frequency related patterns. Asked for adjectives to describe the quality the
participants provided cracking, clicking, sizzling, annoying, rhythmic, sharp, rough and
bright/dark. In retrospect, this correlates nicely with the kurtosis being the ’peakedness’
of the probability function.
There was no agreement over which set was easier. Most participants said there was
hardly any difference while some would state the one or the other. Finally, on average
people felt that there was no learning curve involved and the examples were short enough
for them not to get too tired over listening to them.
8.2.4 Conclusions
In this section we presented an approach for analysing statistical properties of time series
data by auditory means. We provided some background on the mathematics involved
and presented the tools for audification of time series data that were developed. Subse-
quently, we described a listening test designed to investigate the perceptual dimensions
that would correlate with higher order statistical properties like skew and kurtosis. We
discussed the data chosen and the design of the experiment. The results show that
there is evidence that participants improved in distinguishing noise signals as the dif-
ference in kurtosis increased. The data suggests that in this setting the just noticeable
difference was 5. However, for skew we were not able to find similar evidence. In a
133
post-questionnaire we probed for the qualities that participants used to distinguish the
signals and obtained a set of related adjectives.
Future work will have to investigate why there was nothing to be found for skewness
in the signals. It might have been the case that our range of values did not allow
for segregation by skew, and a different source for data will have to be found to have
independent control over skew. However, it might also be the case that skew is not
perceivable in direct audification and a different sonification approach has to be chosen
to make this property perceptible.
In SDSM terms, the listening experiment respected the 3 second echoic memory time
limit, maximising the number of data points to fit into that time frame by audifying at
a samplerate of 44.1 kHz.
Chapter 9
Examples from Neurology
9.1 Auditory screening and monitoring of EEG data
This chapter describes two software implementations for EEG data screening and realtime
monitoring by means of sonification. Both have been designed in close collaboration with
our partner institution, the University Clinic for Neurology at the Medical University Graz.
Both tools were tested in depth with volunteers, and then tested with the expert users
they are intended for, i.e. neurologists who work with EEG data daily. In the course
of these tests, a number of improvements to the designs were realised; both the tests
and the final versions of the tools are described in detail here. This scope of reported
work is intended to provide an integrated description and analysis of all aspects of the
design process from sonification design issues, interaction choices, user acceptance, to
steps towards clinical use.
This work is described with much more neurological background in the PhD thesis by
Annette Wallisch (Wallisch (2007), in German). This chapter is based on a SonEnvir
paper for ICAD 2007 (de Campo et al. (2007)), and this work is also briefly documented
online in the SonEnvir data collection1, with accompanying sound examples.
9.1.1 EEG and sonification
As the general background for EEG and sonification is covered extensively in a number
of papers (Baier and Hermann (2004); Hermann et al. (2006); Hinterberger and Baier
(2005); Mayer-Kress (1994); Meinicke et al. (2002)), it is kept rather brief here.
EEG is short for electroencephalogram, i.e. the registration of the electrical signals
coming from the brain that can be measured on the human head. There are standard
systems where to locate electrodes on the head, called montages; e.g., the so-called
10-20 system, which spaces electrodes at similar distances over the head (see Ebe and
1 http://sonenvir.at/data/eeg/
134
135
Homma (2002) and many other EEG textbooks).
The signal from a single electrode is often analysed in terms of its characteristic frequency
band components: The useful frequency range is typically given as 1-30 Hz, sometimes
extended a little higher and lower. Within this range, different frequency bands have been
associated with particular activities and brain states; e.g., the ’alpha’ range is between 8
and 13 Hz, associated with a general state of relaxedness, and non-activity of the brain
region for visual perception; thus alpha activity is most prominent with eyes closed.
For both sonification designs presented, we split the EEG signal into frequency ranges
which closely correspond to the traditional EEG bands2, as shown in table 9.1.
Table 9.1: Equally spaced EEG band ranges.
EEG band name frequency range
deltaL(ow) 1 - 2 Hz
deltaH(igh) 2 - 4 Hz
theta 4 - 8 Hz
alpha (+ mu) 8 - 16 Hz
beta 16 - 32 Hz
gamma 32 - 64 Hz
9.1.2 Rapid screening of long-time EEG recordings
For a number of neurological problems, it is standard practice to record longer time
stretches of brain activity. A stationary recording usually lasts more than 12 waking
hours; night recordings are commonly even longer, up to 36 hours. For people with
so-called ’absence’ epileptic seizures (often children), recordings with portable devices
are made over similar stretches of time. These recordings are then visually screened, i.e.
looked through in frames of 20-30 seconds at a time; this process is both demanding
and slow.
For the particular application toward ’absences’, rapid auditory screening is ideal: these
seizures tend to spread over the entire brain, so the risk of choosing only few electrodes
to screen acoustically is not critical; furthermore, the seizures have quite characteristic
features, and are thus relatively easy to identify quickly by listening. For more gen-
eral screening, finding time regions of interest quickly (by auditory screening) potentially
reduces workload and increases overall diagnostic safety. With visual and auditory screen-
ing combined, the risk of failing to notice important events in the recorded brain activity
2The alpha band we employ is slightly wider than the common 8-13 Hz; we merge it with the slightly
higher mu-rhythm band to maintain equal spacing.
136
is quite likely reduced.
9.1.3 Realtime monitoring during EEG recording sessions
A second scenario that benefits from sonification is realtime monitoring while recording
EEG data. This is a long-term attention task: an assistant stays in a monitor room next
to the room where the patient is being recorded; s/he watches both a video camera view
of the patient, and the incoming EEG data on two screens. In the event of atypical EEG
activity (which must be noticed, so one can intervene if necessary), a patient may or
may not show peculiar physical movements. Watching the video camera, one can easily
miss atypical EEG activity for a while.
Here, sonification is potentially very useful, because it can alleviate constant atten-
tion demands: One can easily habituate to a background soundscape, which is known
to represent ’everything is normal’. When changes in brain activity occur, the sound-
scape changes (in most cases, activity is increased, which increases both volume and
brightness), and this change in the realtime-rendered soundscape automatically draws
attention.
A sonification design that aims to render EEG data in real time is also useful for studying
brain activity as recorded by EEG devices at its natural speed: One can easily portray
activity in the traditional EEG frequency bands acoustically; as many of the phenomena
are commonly considered to be rhythmical phenomena, auditory presentation is partic-
ularly appropriate here, see Baier et al. (2006). Realtime uses of biosignals have other
applications too, see e.g. Hinterberger and Baier (2005); Hunt and Pauletto (2006).
9.2 The EEG Screener
9.2.1 Sonification design
For rapid EEG data screening, there is little need for an elaborate sonification design.
As the signal to be sonified is a time signal, and a signal speed of several 10000s of
points per seconds is deemed useful for screening, straightforward audification is the
obvious choice recommended by the Sonification Design Space Map. Not doing any
other processing allows for keeping the rich detail of the signals entirely intact. With
common EEG sampling rates around 250 Hz, a typical speedup factor is 60x faster than
real time, which transposes our center band (alpha, 8-16Hz) to 480-960 Hz, well in the
middle of the audible range. For more time resolution, one can go down to 10x, or for
more speedup, up to 360x. See Figure 9.1 for locations on the Sonification Design Space
Map.
137
Figure 9.1: The Sonification Design Space Map for both EEG Players.
As there is no total size for EEG files (they can be anything from a few minutes to 36 hours
and more), Data Anchors are given for one minute and for one hour (center, and far right).
The labels Scr x10, Scr x60, and Scr x360 shows the map locations for minimum, default, and
maximum settings of speedUp, i.e. the time scaling of the EEGScreener (bottom right). The
labels RTP 1band and RTP 6bands show the locations for a single band and all six bands of
the EEGRealtimePlayer. Note that the use of two audio channels moves both of these designs
inwards along the ’number of streams’ axis, which is not shown here for simplicity.
This allows for wide ranges of time scales of local structures in the data to be put into
the optimum time window (the ca. 3 second window of echoic memory, see section 5.1
and de Campo (2007b)), while keeping the inner EEG bands well in the audible range;
if needed, one can compensate for reduced auditory sensitivity to the outer bands by
raising their relative amplitudes.
A lowpass filter for the EEG signal is available from 12 to 75 Hz, with a default value at
30 Hz, to provide the equivalent of visual smoothing used in EEG viewer software. Our
users wanted that feature, and it is a simple way to reduce higher band activity, which
is mostly considered noise (from a visual perspective that is).
A choice is provided between the straight audified signal, and a mix of six equal-
bandwidth layers, which can all be individually controlled in volume. This allows both
for focused listening to individual bands of interest, and for identification of the EEG
band in which a particular audible component occurs. A further reason to include this
138
Figure 9.2: The EEGScreener GUI.
The top rows are for file, electrodes, and time range selection. Below the row for playback
and note-taking elements are the playback parameter controls, and band filtering display and
controls.
band-splitting was to introduce the concept in a simpler form, such that users could
transfer the idea to their understanding of the realtime player.
9.2.2 Interface design
The task analysis for the Screener demanded that a graphical user interface be simple
to use (low-effort, little training needed), fast, and to provide for keeping reproducible
results of screening sessions. Furthermore, it should provide choices of what to listen
to, and visual feedback of what exactly one is hearing, and how. The GUI elements are
similar to sound file editors (which audio specialists are familiar with, but EEG specialists
usually are not).
File, electrode, and range selection
139
The button Load EDF is for selecting a file to be screened. Currently, only .edf3 files
are supported, but other formats are easy to add if needed. The text views next to it
(top line) provide file data feedback: file name, duration, and montage type the file was
recorded with4. The button Montage opens a separate GUI for choosing electrodes by
location on the head (see figure 9.3).
Figure 9.3: The Montage Window.
It allows for electrode selection by their location on the head (seen from above, the triangle
shape on top being the nose). One can drag the light gray labels and drop them on the white
fields ’Left’ and ’Right’.
The popup menus Left and Right let users choose which electrode to listen to on which
of the two audio channels. Like many soundfile editors, the signal views Left and Right
show a full-length overview of the signal of the chosen electrodes. During screening, the
current playback position is indicated by a vertical cursor.
The range slider Selection and the number boxes Start, Duration, End show the current
selection and allow for selecting a range within the entire file to be screened. The number
box Cursor shows the current playback position numerically. The signal views Left Detail
and Right Detail show the waveform of the currently selected electrodes zoomed in for
3A common format for EEG files, see http://www.edfplus.info/4As edf files do not store montage information, this is inferred from the number of EEG channels in
the file; at our institution, all the raw data montage types have different numbers of channels.
140
the current selection.
Playback and note taking
The buttons Play, Pause, Stop start, pause, and stop the sound.
The button Looped/No Loop switches between once-only playback and looped playback
(with a click to indicate when the loop restarts). The button Filters/Bypass switches
playback between Bypass mode (the straight audified signal, only low-pass-filtered), and
Filters mode, the mixable band-split signal.
The button Take Notes opens a text window for taking notes during screening. The edf
file name, selected electrodes and time region, and current date are pasted in as text
automatically. The button Time adds the current playback time at the end of the notes
window’s text, and the button Settings adds the current playback settings (see below)
to the notes window text.
To let the user concentrate on listening while screening a file, it is possible to stay on
the notes window entirely: Key shortcuts allow for pausing/resuming playback (e.g. to
type a note), for adding the current time as text (so one can take notes for a specific
time), and for the current playback settings as text.
Playback Controls
These control the parameters of the screener’s sound synthesis.
speedUp sets the speedup factor, with a range between 10-360; the default value of
60 means that one minute of EEG is presented within one second. Note that this is
straightforward tape-speed acceleration, which preserves full signal detail. The option
to compare different time-scalings of a signal segment allows for learning to distinguish
mechanical (electrode movements) and electrical artifacts (muscle activity) from EEG
signal components. lowPass sets the cutoff frequency for the lowpass filter, range be-
tween 12 and 75 Hz, with a default of 30 Hz. clickVol sets the volume of the loop
marker click, and volume sets the overall volume.
In Bypass mode, only the meter views are visible in this section, and they display the
amount of energy present in each of the six frequency bands (deltaL, deltaH, theta,
alpha, beta, gamma). In Filters mode, the controls become available, and one can raise
the level of bands one wants to focus on, or turn down bands that distract from details
in other bands. The buttons All On / All Off allow for quickly resetting all levels to
defaults.
9.3 The EEG Realtime Player
The EEGRealtimePlayer allows listening into details of EEG data in real time (or up to
5x faster when playing back files), in order to follow temporal events in or near their
141
original rhythmic contour. This design (and its eventual distribution as a tool) has been
developed in two stages:
Stage one is a data player, which plays recorded EEG data files at realtime speed with
the same sonification design (and the same adjustment facilities) as the final monitor
application. This allows for familiarising users with the range of sounds the system can
produce, for experimenting with a wide variety of EEG recordings, and for finding settings
which work well for a particular situation and user. This stage is described here.
Stage two is an add-on to the software used for EEG recording, diagnosis, and admin-
istration of patient histories at the institute. Currently, this stage is implemented as a
custom version of the EEG recording software which simulates data being recorded now
(by reading a data file), and sending the ’incoming’ data by network on to a special
version of the Realtime player (i.e. the sound engine and interface). Here, the incoming
data is sonified with the same approach as in the player-only version. Eventually, this
second program is meant be implemented within the EEG software itself.
9.3.1 Sonification design
The sonification design for real time monitoring is much more elaborate than the screener.
It was prototyped by Robert Holdrich in MATLAB, and subsequently adapted and im-
plemented for realtime interactive use in SC3 by the author. For a block diagram, see
fig. 9.4.
The EEG signal of each channel listened to is split into six bands of equal relative band-
width (one octave, 1-2, 2-4, ... 32-64 Hz). Each band is sonified with its own oscillator
and a specific carrier frequency: based on a user-accessible fundamental frequency base-
Freq, the carriers are by default multiples of baseFreq by integer numbers, 1, 2, ... 6. If
one wants to achieve more perceptual separation between the individual bands, one can
deform this overtone pattern with a stretch factor harmonic, where 1 is pure overtone
tuning:
carFreq = baseFreq ∗ i ∗ harmonici−1 (9.1)
The carrier frequency in each band is modulated with the band-filtered EEG signal,
thus creating a representation of the signal shape details as deviation from center pitch.
The amplitude of each oscillator band is determined by the amplitude extracted from
the corresponding filter-band, optionally stretched by an expansion factor contrast; this
creates a stronger foreground/background effect between bands with low energy and
bands with more activity.
For realtime monitoring as a background task, a second option for emphasis exists:
high activity levels activate an additional sideband modulation at carFreq * 0.25, which
creates a new fundamental frequency two octaves lower. This should be difficult to miss
even when not actively attending.
142
Figure 9.4: EEG Realtime Sonification block diagram.
143
Figure 9.5: The EEG Realtime Player GUI.
Note the similarities to the EEGScreener GUI; the main difference is the larger number of
synthesis control parameters.
Finally, for file playback, crossing the loop point of the current selection is acoustically
marked with a bell-like tone.
9.3.2 Interface design
Most elements (buttons, text displays, signal views, notes window) have the same func-
tions as in the EEGScreener. The main difference to the EEGScreener is that there are
many more playback controls, since the sonification model (as described above) is much
more complex.
The Playback controls are ordered by importance from top to bottom:
contrast ranges from 1-4; values above 1 expands the dynamic range, making active
144
bands louder and thus moving them to the foreground relative to average-activity bands.
For background monitoring, levels between 2-3 are recommended.
baseFreq is the fundamental frequency of the sonification, between 60-240 Hz; this can
be tuned to user taste - and our users have in fact expressed strong preferences for their
personal choice of baseFreq.
freqMod is the depth of frequency modulation of the carrier for each band. At 0, one
hears a pure harmonic tone with varying overtone amplitudes, at greater values, the pitch
of the band is modulated up and down, driven by the filtered signal of that band. Thus
the signal details of the activity in that band are rendered in high perceptual resolution.
A value of 1 is normal deviation.
emphasis fades in a new pitch two octaves below baseFreq for very high activity levels;
this can be used for extra emphasis in background monitoring.
harmonic is the harmonicity of the carrier frequencies: A setting of 1 means purely
harmonic carrier frequencies, less compresses the spectrum, and more expands it; this
can be used to achieve better perceptual band separation.
clickVol sets the volume of the loop marker click, volume sets the overall volume of
the sonification, and speed controls an optional speedup factor for file playback, with a
range between 1-5, 1 being realtime; in live monitoring mode, this control is disabled.
Band Filter Controls and Views
The buttons All On and All Off allow for setting all levels to medium or zero. The meter
views show the amount of energy present in each of the six frequency bands, and the
sliders next to them set the volume of each frequency band.
9.4 Evaluation with user tests
9.4.1 EEG test data
For development and testing of the sonification players described, a variety of EEG
recordings - containing typical epileptic events and seizures - was collected. This database
was assembled at the Department for Epileptology and Neurophysiological Monitoring
(University Clinic of Neurology, Medical University Graz), by using the in-house archive
system. It contains anonymous data of currently or recently treated patients.
For the expert users tests, three data examples were chosen, suited for each player’s
special purpose. For the Screener, rather large data sets were selected, to test with a
realistic usage example. Two measurements of absences and one day/night EEG with
seizures localized in the temporal lobe were prepared. The Realtime Player was tested
with three short data files; one a normal EEG (containing eye movement artefacts and
145
alpha waves), and two pathological EEGs (generalized epileptic potentials, and fronto-
temporal seizures).
The experts we worked with considered the use of audition in EEG-diagnostics very
unusual. We expected them to find it difficult to associate sounds with the events, so
they did some preliminary sonification training: For all data examples, they could look
at the data with their familiar EEG viewer software after having listened first, and try to
match what they had heard with the visual graphs familiar to them.
9.4.2 Initial pre-tests
An initial round of tests was done to get a first impression of usability, and data appropri-
ateness, which also contained experimental tasks (learning to listen). In order to obtain
independent and unbiased opinions, two interns were invited to test the first versions of
the screener and the realtime player by listening through the entire prepared database
at their own pace. They were instructed to take detailed notes of the phenomena they
heard (including inventing names for them), and where in which files; they spent roughly
40 hours on this task. The documentation of their listening experiments was then ver-
ified in internal re-listening and testing sessions. After these pre-tests, we decided to
reduce some parameter ranges to prevent users from choosing too extreme settings, and
we chose a smaller number of data sets for the second test round with expert users.
9.4.3 Tests with expert users
As the eventual success of these players depends on acceptance by the users in a clinical
setting, it was essential to do an evaluation with medical specialists. This was done
by means of two feedback trials; using the results of the primary expert test round,
the players were then improved in many details. For both players we made pre/post-
comparisons of user ratings between the different versions.
Even though we tested with the complete potential user group at our partner institution,
the test group is rather small (n=4); thus we consider the tests, and especially the open
question/personal interviews section, as more qualitative than quantitative data.
To prepare the four specialists for their separate test sessions, they were introduced to
the new aspects of data evaluation and experience by sonification in a group session.
For each EEG player a separate test session was scheduled to avoid ’listening overload’
and potential confusion.
Questionnaire
The questionnaire contained the following 11 scales:
The ratings to give for each statement ranged from 1 (strongly disagree) to 5 (strongly
146
Table 9.2: Questionnaire scales for EEG sonification designs
1 Usability
2 Clarity of interface functions
3 Visual design of interface
4 Adjustability of sound (to individual taste)
5 Freedom of irritation (caused by sounds)
6 Good sound experience (i.e. pleasing)
7 Allows for concentration
8 Recognizability of relevant events in data by listening
9 Comparability (of observations) with EEG-Viewer software
10 Practicality in Clinical Use (estimated)
11 Overall impression (personal liking)
agree). In addition to the 11 standardized questions, space for individual documentation
and description was provided. Moreover, an open question asked for further comments,
observations, and suggestions.
Results of first expert tests
This initial round of tests resulted in a number of improvements in both players: Elab-
orate EEG waveform display and data range selection was added to both; the visual
layout was unified to emphasize elements common to both players; and the screener was
extended with band filtering, which is both useful in itself, and a good mediating step
toward the more complex realtime sonification design.
9.4.4 Analysis of expert user tests EEG Screener 1 vs. 2
Optimizing the interface and interaction possibilities for version 2 of the Screener im-
proved most of its ratings sustantially: it was considered to offer more comfortable use
(+1) and more attractive visual design (+1). The sound experience for the medical
specialists has improved somewhat (+0.5), while the freedom of irritation experienced
improved very much (+2.0). While all other criteria improved substantially, recognizabil-
ity of events, comparability with viewer software, and clinical practicality received lower
ratings (between -0.5 and -0.25). We suspect that the better rating in the first test round
may have been enthusiasm about the novelty of this tool. Thus, personal conversation
with the expert users after the tests showed how strongly opinions differed: One user
did not feel ’safe’ and comfortable with the screener and could not trust his own hearing
skills enough to discriminate relevant information from (technical) artefacts by listening.
147
Figure 9.6: Expert user test ratings for both EEGScreener versions.
By contrast, the three others were quite relaxed and felt positively reassured to have
done their listening tasks properly and effectively. Furthermore, the users probably were
less motivated in comparing the EEG viewer to the ’listening result’ (which was asked
in one question), as they had done that carefully in the first tests already.
Overall, all users reported much higher satisfaction with version 2 of the screener (+1).
The answers in the open comments section can be summarized as follows: All users
confirmed better usability, design, clarity and transparency of version 2. Some improve-
ments were suggested in the visualization of the selected EEG channels, in particular
when larger files are analysed. Moreover, integration of the sonification into the real
EEG viewer would be appreciated a lot. A plug-in version of the player for the EEG-
Software used (NeuroSpeed by B.E.S.T. medical) was already in preparation before the
tests; in effect, the expert users confirmed its expected usefulness.
9.4.5 Analysis of expert user tests - RealtimePlayer 1 vs. 2
The mean estimation of the second realtime player version shows a positive shift in nearly
all scales of the questionnaire. Moreover, the range of the ratings is smaller than before,
so the answers were more consistent. The best ratings were given for visual design (+1),
148
Figure 9.7: Expert user test ratings for both RealtimePlayer versions.
adjustability of sound (+1) and comparability to viewer (+1.5), all estimated with ’good
to very good’. The ’overall impression’ was now estimated as ’good’ (+1), as well as
’usability’ (+0.5), ’clarity of interface’ (+0.5), and ’good sound experience’ (+1). The
aspects ’recognizability of relevant EEG events’ ( +1) and ’practical application’ (+1)
are estimated similarly satisfying. The only item that remains at the same mean rating
is ’freedom of irritation’, estimated as a little better than average. The same rating was
given for ’allowed concentration’ (+1.5), which has improved very much.
Probably, these two aspects correspond to each other: in spite of the improved control
of irritating sounds and a learning effect, the users were still untrained in coping with
the rather complex sound design. This sceptical position was taken in particular by two
users, affecting items 5 to 9. All in all, the ratings indicate good progress in the realtime
player’s design. This may well have been influenced the strong time restraints on these
tests: As our experts have very tight schedules in clinical work, it has been difficult to
obtain enough time for reasonably unhurried, pressure-free testing.
Comparing the ratings across the two first versions, the Realtime Player 1 was not rated
as highly as the Screener 1. We attribute this to the higher complexity of the sound
design (which did not come across very clearly under the time pressure given), the related
non-transparency of some parameter controls, and to ensuing doubts about the practical
149
benefit of this method of data analysis. Only the rating for irritation is better than
Screener 1, which indicates that the sound design is aesthetically viable for the users.
All these concerns were addressed in the Realtime Player 2: In order to clarify the band-
splitting technique, GUI elements indicate the amount of power present in each band,
and allow for interactive choice of which bands to listen to; less parameter controls are
made available to the user5, with simpler and clearer names. Much more detailed help
info pages are also provided now.
Finally, band-splitting (adapted to audification) was integrated into the Screener 2 as
well, which gives users a clearer understanding of this concept across different sonification
approaches.
9.4.6 Qualitative results for both players (versions 2)
For both players, all users mentioned easy handling (usability), good visual design, and
transparency of functionality. More positive comments on the Screener were ’higher
creativity’ (by using the frequency controls) and that irritating sounds have nearly disap-
peared. One user explained this by a training effect, and we agree: It seems that as users
learn to interpret the meaning of ”unpleasant” sounds (such as muscle movements), the
irritation disappears. Regarding the realtime player, users mentioned good visual corre-
lation with the sound, because of the new visual presentation of EEG on the GUI. One
user noted that acoustical side-localisation of the recorded epileptic seizure works well.
Further improvements were suggested: For both players, the main wish is synchronization
of sound and visual EEG representation (within the familiar software): In case of realtime
monitoring, this would allow to better compare the relevant activities. As far as screening
is concerned, the visual representation of larger files on the GUI was considered not very
satisfying.
For the realtime player, presets for the complex parameters in accordance to specific
seizure types were suggested as very helpful. Moreover, usability could still be improved
a bit more (however, no specific wishes were given), as well as irritating sounds should
be further decreased. This wish may also be due to the fact that the offered parameter-
controls for reducing disturbing sounds may not have been used fully. This can likely be
addressed by more training.
9.4.7 Conclusions from user tests
According to the experts’ evaluation of the EEG Screener, intensive listening training
will be essential for its effective use in clinical practice - in spite of improved usability and
5Version 1 had some visible controls mainly of interest to the developer.
150
acceptance of the second version. As the visual mode in clinical EEG diagnostics and
data analysis is still dominant, for the widespread use of sonification tools an alternative
time and training management is necessary. After such training, our new tools may well
help to successively reduce effort and time in data analysis, decrease clinical diagnostic
risk, and in the longer term, offer new ways for exploring EEG data.
9.4.8 Next steps
A number of obvious steps could be taken next (given followup research projects):
For the Realtime Player, the top priority would be integration of the network connec-
tion for realtime monitoring during EEG recording sessions. Then, user tests in real
world long-term monitoring settings can be conducted. These tests should result in
recommended synthesis parameter presets for different usage scenarios.
For the sound design, we have experimented with an interesting variant which empha-
sizes the rhythmic nature of the individual EEG bands more (see Baier et al. (2006);
Hinterberger and Baier (2005)). This feature can be made available as an added user
parameter control (’rhythmic’), with a value of 0 maintaining the current sound design,
and 1 accentuating the rhythmic features more strongly.
For both Realtime Player and Screener, eventual integration into the EEG administration
software used at our clinic was planned; however, this can only be done after another
round of longer-term expert user tests, and when the ensuing design changes have been
finalised.
9.4.9 Evaluation in SDSM terms
The main contributions to the Sonification Design Space Map concept resulting from
work on the EEG players were the following lessons:
Adopt domain concepts and terminology wherever possible (band splitting)
make interfaces as simple and user-friendly as possible
provide lots of visual support for what is going on (here, show band amplitudes)
provide opportunities to understand complex representations interactively by providing
options to take them apart (here, listening to single bands at a time)
give users enough time to learn (this did not happen for the Realtime Player).
Chapter 10
Examples from the Science by Ear Workshop
For more background on the Science By Ear workshop, see section 4.2, and here1.
The dataset LoadFlow and the experiments made with it in the SBE Workshop are
instructive basic examples; they are given as first illustrations of the Sonification Design
Space Map in section 5.1.
Other SBE datasets and topics (EEG, Ising, UltraWideband, Global Social Data) were
elaborated in more depth in mainstream SonEnvir research activities, and are thus cov-
ered in the examples from the SonEnvir research domains. The remaining two datasets,
RainData, and Polysaccharides, are described briefly here for completeness.
10.1 Rainfall data
These data were provided and prepared by Susanne Schweitzer and Heimo Truhetz
of the Wegener Center for Climate and Global Change, Graz. The data describe the
precipitation per day over the European alpine region from 01.01.1980 to 01.01.1991.
Additionally, associated orographic information (i.e. describing the average height of
the area) was provided. Such data are quite common in climate physics research. The
precipitation for 24 hours is measured as the total precipitation within 6:00 UTC2 and
6:00 UTC of the next day.
The data were submitted in a single large binary file of the following format: Each single
number is precipitation data in mm/day over the European alpine region (latitude 49.5N-
43N, longitude 4E-18E) with 78 x 108 grid points. The time range covers 11 years, from
1980-1990; this equals 4018 days. The data is stored in 4018 arrays (one after another)
of 78 x 108 (rows x colums) values. The first array contains precipitation data over the
selected geographic region of day 1 (1.1.1980), the 2nd array is precipitation data over
the selected geographic region of day 2 (2.1.1980), and so on. A visualisation of the
1 http://sonenvir.at/workshop/2Coordinated Universal Time
151
152
Figure 10.1: Precipitation in the Alpine region, 1980-1991.
average precipitation over the 11 years given is shown in figure 10.1.
A second file provides associated information on orography of the European alpine region,
i.e., the terrain elevation in meters. This data is stored in one 78 x 108 array.
General questions the domain scientists deemed interesting were whether it would be
possible to hear all three dimensions (geographical distribution and time) simultaneously
and to find a meaningful representation of the distribution of precipitation in space and
time. They also speculated that it might be relaxing to listen to a synthetic rendering
of the sound of rain.
As possible topics to investigate, they suggested:
• 10-year mean precipitation in the seasons
• variability of precipitation via standard deviations (i.e., do neighbouring regions
more often swing together or against each other?)
• identification of regions with similar characteristics via covariances (do different
regions sound different?)
153
• extreme values (does the rain fall regularly, or are there long droughts in some
regions?)
• correlations in height (does precipitation behave similarly in similar orographic
heights?)
• distribution of precipitation amounts (on how many days is the precipitation higher
than 20mm, 19 mm, 18mm, etc?)
As a test of the proper geometry of the data format, the SC3 starting file for the sessions
provided a graphical representation of the orographic dataset, with higher regions shown
as brighter gray, see figure 10.2. We also provided example reading routines for the data
file itself.
Figure 10.2: Orography of the grid of regions.
Session team A
In the brainstorming phase, team A came up with the idea to use spatial distribution for
the definition of features like variability, entropy, etc., possibly using large regions such as
quarters of the entire grid. The team agreed that the data should be used as time series,
since rhythmical properties are expected to be present. The opinion was that the main
interest is in the deviations from the average yearly shape. Thus, the team decided to
154
try using an acoustic representation of the data series conditioned to the average yearly
shape as a benchmark curve as follows: if the value in question is higher than average,
high pitched dust (single sample crackle through a resonance filter) is audible. if value
is lower than average, lower pitched dust is heard. The amplitude should scale with the
absolute deviation from average, and the dust density should scale with absolute rain
values.
In this fashion, one could sonify different locations at the same time by assigning the
sonifications of different locations to different audio channels. This should produce
temporal rhythms if there are systematic dependencies between the locations. As data
reading turned out to be more difficult than expected, the team began experimenting
with dummy data to design the sounds and behaviour of the sonification, while the
second programmer worked on data preparation. In the end, the team ran out of time
before the real data became accessible enough for replacement.
Session team B
Team B discussed many options while brainstorming: as the data set was quite large,
choosing data subsets, e.g. by regions; looking for possible correlations; maybe listening
to the entire time range for a single location; maybe use a random walk for a reading tra-
jectory; select location by pointing (mouse), compare a reference time-series sonification
to the data subset under study.
The team found a good solution for the data reading difficulties: they read only the data
points of interest directly fro the binary file, as this turned out to be fast enough for real
time use. The designs written explored comparing the time-series for two single locations
over ten years; the sound examples produced demonstrate these pairs played sequentially
and simultaneously on left and right channels. The sounds are produced with discrete
events: each data point is rendered by a gabor grain with a center frequency determined
by the amount of rain for the day and location of interest.
In the final discussion, the team found that a comparison of different regions would be
valuable, where the mean area over which to average should be flexible. Such averaging
could also be considered conceptually similar to fuzzy indexing into the data; modulating
the averaging range and providing fuzziness in three dimensions would be worth further
exploration.
Session team C
Team C had the most difficulties getting the data loaded properly; this was certainly
a deficiency in the preparation. After converting a subset of the data with Excel, they
decided on comparing the data for January in all years, and listen for patterns and dif-
155
ferences across different regions. Some uncertainty about whether the conversions were
fully correct remained, but this was considered relatively unimportant for the experimen-
tal workshop context.
The sonification design entailed calculating a mean January value for all locations, and
comparing each individual day to the mean value. This was intended to show how the
precipitation varies, and to identify extreme events. All 8424 stations are being scanned
along north/south lines, which slowly move from west to east. The ration of the day’s
rainfall to the mean was mapped to the resonance value of a bandpass filter driven by
white noise.
The sound examples provided cover January 5, 15, and 25 for the years 1980 and 1981
scaled into 9 seconds; a much slower variant with only 190 stations is presented as well
for comparison, and this shows a much smoother tendency. Varying filter resonance as
rapidly as described above is not likely to be very clearly audible.
Comparison in SDSM terms
The data set given is quite interesting from an SDSM perspective: it has 2 spatial
indexing dimensions, with 78 * 108 = 8424 geographical locations, for which orographic
data dimension (average elevation above sea level) are also given. For each location,
data are given for 1 (or maybe 2) time dimensions, namely, 365 (resp. 366) days * 11
years = 4018 time steps (days). Thus, multiple locations are possible for its data anchor
(see figure 10.3), depending on the viewpoint taken. From a temporal point view, one
would treat the 8424 locations as the data size, and create a ’day anchor’ at x: 8424, y:
1, a month anchor at x: 8424, y: 30, the year anchor and the 11-year are both outside
the standard map size. For a single location, an anchor could be at x: 4018, y: 2. In
any case, whatever one considers to be the unit size of this kind of data set is arbitrary,
as both time and space dimensions could be different sizes and/or resolutions.
Team A mapped one year into 7.3 seconds, and presented two streams of two mapped
dimensions each (pitch label for deviation polarity, and intensity for deviation amount).
These choices put its SDSM point at an expected gestalt size of 150 (x), dimensions at
2 (y), and streams at 2 (Z). Continuous parameter mapping is a reasonable choice for
this location on the map.
Team B begins with 8424 data points per 4 second loop; this is a rather dense gestalt
size of ca. 6000. The design choice of averaging over 9-10 values scales this to ca.
600, which seems well suited for granular synthesis with a single data dimension used,
mapped to the frequency parameter (y-axis), and using two parallel streams (z).
Team C maps 8424 values into 9 seconds, which creates a gestalt size of ca. 3000
(label C1 on the map); this seems very fast for modulation synthesis of filter bandwidth,
although it uses only a single stream and dimensions, so y and z values are both 1.
156
Figure 10.3: SDSM map of Rainfall data set.
The slower version (190 values in 9 seconds, C2) is more within SDSM recommended
practice, at a gestalt size of ca. 60. While the SDSM concept recommends making
indexing dimensions available for interaction, this was too complex for the workshop
setting.
10.2 Polysaccharides
This problem was worked on for two two-hour sessions, so the participants had more
time to reflect and consider how to proceed. The data were submitted by Anton Huber
of the Institute of Chemistry at University of Graz.
10.2.1 Polysaccharides - Materials made by nature3
Polysaccharides make up most of the biological substance of plant cells. Their molecular
geometries, such as their symmetries determine the physical properties of most plant-
based materials. Even materials from trees of the same kind have different properties
because of the environment they come from; so understanding the properties of a given
3This was the title of Anton Huber’s introductory talk.
157
sample is of crucial importance to Materials scientists. A typical question that occurs
is: Are the given datasets (which should be the same) somehow different?
In aqueous media, polysaccharides form so-called supermolecular structures. Very few of
these molecules can structurise amazing amounts of water: water clusters can be several
millimeters large. By comparison, the individual molecules are measured in nanometers,
so there is a scale difference of six orders of magnitude!
In a given measurement setup the materials are physically sorted by fraction: on the left
side particles with big molecules (high mol numbers) are found, one the right small ones.
Rather few bins (on the order of 30) of sizes and corresponding weights are conventionally
considered sufficiently precise for classification, both in industry and science.
The data for this session were analysis data of four samples of plant materials: beech,
birch, oat and rice. Three different measurements were given, along with their indexing
axes: channel 1 is an index (corresponding to mol size) of the measurement at channel
2, channel 3 is an index of channel 4, and channel 5 is an index of channel 6.
Channels 1 and 2 contain the measured delta-refraction index of electromagnetic radi-
ation aimed at the material sample; i.e. how strongly light of a given wavelength is
diverted from its direction by the size-ordered regions along the sample. (The exact
wavelength used was not given.)
Channels 3 and 4 contain the measured fluorescence index under electromagnetic radi-
ation, again dependent on the size-ordered regions along the sample.
Channels 5 and 6 contain the measured dispersion of the material sample under light,
or more precisely, how much the dispersion differs from that of clear water, based on
molecule size along the size-ordered axis of the sample.
10.2.2 Session notes
The notes for this session were reconstructed shortly after the workshop.
Brainstorming
One of the first observations made was that the data look like FFT analyses - so the
team considered using FFT and Convolution on the bins directly. An alternative could
be a multi-filter resonator with e.g. 150 bands, maybe detuned from a harmonic series.
As was noted several times in the workshop by those favouring audification, it seemed
desirable to obtain ’rawer’ data as directly as possible from the measurements; these
may be interesting to be treated as impulse responses.
The first idea the team decided to try was to create a ”signature sound” of 1-2 seconds
for one channel each data file by parameter mapping to about 15-20 dimensions; a
158
second step should be to compare two such signature sounds (for two channels of the
same file) binaurally.
Experimentation
A look at the data revealed that across all files, channel 1 seemed massively saturated in
the upper half, so we decided to take only the undistorted part of channel 1, downsample
it to e.g. 50 zones, and turn these into 50 resonators, which would ring differently for
the different materials when excited. The resonator frequencies were scaled according
to the index axis, which is roughly equal to particle size: small particles are represented
by high sounds, and big particles by lower resonant frequencies.
Based on this scheme, we proceeded to make short ’sound signatures’ for the four
materials, using delta-refraction index (channel 2) and fluorescence (channel 4) data,
with two different exciter signals: Noise and impulses.
The sound examples provided here4 present all four materials in sequence:
Delta refraction index, impulse source: Materials1 Pulse BeechBirchOatRice.mp3
Delta refraction index, noise source: Materials1 Noise BeechBirchOatRice.mp3
Fluorescence, impulse source: Materials2 Pulse BeechBirchOatRice.mp3
Fluorescence, noise source: Materials2 Noise BeechBirchOatRice.mp3
The team also started making these playable from a MIDI drum pad for a more interactive
interface, but did not have enough time to finish this approach.
Evaluation
The group agreed that having time for two sessions was much better for deeper discussion
and more interesting results. Even so, more time would be desirable. In this particular
session, the sound signatures made were easy to distinguish, so in principle, this approach
works.
What could be next steps? It would be useful to implement signatures of more than
one channel to increase reliability of properties tracking; e.g. for materials production
monitoring, this could be a useful application.
It would also be interesting to try a nonlinear complex sound generator (such as a
feedback FM algorithm) and control its inputs from the data, using on the order of
20-30 dimensions; this holistic approach would be interesting from the perspective of
sonification research, as it might lead to emergent audible properties without requiring
detailed matching of individual data dimensions to specific sound parameters. While
4http://sonenvir.at/workshop/problems/biomaterials/sound descr
159
there was no time to attempt this within the workshop setting, the idea would certainly
warrant further research.
In SDSM terms, the dimensionality of each data point is unusually high here. The
sonifications render each material (consisting of 680 measurements) to a reduced range
of the data, downsampled to 50 values, as resonator specifications, i.e., intensity and
ringtime of each band. Given an interactive design, such as one allowing tapping on the
different materials or probes, one can easily compare on the order of 5-8 samples within
short term memory limits.
Chapter 11
Examples from the ICAD 2006 Concert
The author was Concert Chair for the ICAD 2006 Conference at Queen Mary University
London, and together with Christian Daye and Christopher Frauenberger, organized
the Concert Call, the review process for the submissions, and the concert itself (see
section 4.3 for full details). This chapter discusses four of the eight pieces played in
the concert, chose for diversity of the strategies used, and clarity and completeness of
documentation.
11.1 Life Expectancy - Tim Barrass
This section discusses a sonification piece created by Tim Barrass for the ICAD 2006
Concert, described in Barrass (2006), and available as headphone-rendered audio file1.
Life Expectancy is intended to allow listeners to find relationships between life expectan-
cies and living conditions around the world. The sounds he chooses are quite literal
representations of their meanings, making them relatively easy to ’read’, even though
the piece is quite dense in information.
It is structured in three parts, beginning with a 20 second section which mainly provides
spatial orientation, a long middle section representing living conditions for each country
in a dense 2 second soundscape, and a short final section illuminating gender differences
in life expectation.
The opening section presents the spatial locations of all country capitals, ordered by
ascending life expectancy. The speaker ring is treated as if it were a band around the
equator, with the listener inside near the center of the globe. Each capital location is
marked by a bell sound (which is easy to localise), spatialised in the ring of speakers
according to the capital’s longitude; latitude (distance to the equator, North or South)
is represented by the bell’s pitch, where North is higher. A whistling tone represents
ascending life expectancy for each country, and as it is not spatialised, it is easy to
1 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/life.mp3
160
161
follow as one stream. Each country has roughly 0.1 second for its bell and whistle tone.
The main section of the piece is about six minutes long, and presents a rich, complex
audio vignette for every country, at the length of a musical bar of two seconds. The
most intriguing aspect here is the ordering of the countries: First we hear the country
with the highest life expectancy, then the lowest, the second highest, the second lowest,
and so on until the interleaved orders meet in the median.
Each sound vignette consists of the following sound components:
Two bell sounds whose pitch indicates latitude, first of the equator, then of of the
country’s capital spatial location, their horizontal spatial position being longitude.
A chorus speaking the country’s name, with the number of voices representing the
population number, and whose extension representing the country area. The capital
name is also spoken, at its spatial location.
A fast ascending major scale represents life expectancy, once for male, once for female
inhabitants of the country. The number of notes of the scale fragment represents the
number of life decades, so a life expectancy of 75 years would be represented as a scale
covering 8 steps (up to the octave) with the last note shortened by 50 percent. The
gender differences between each pair of scales, and the alternation of extreme contrasts
in the beginning of the sequence articulates this aspect very interestingly.
Clinking coins signify economic aspects: average income by density of the coin sounds,
while gross domestic product (GDP) is indicated by reverb size.
The sound of water filling a vessel indicates access to drinking water and sanitation: a
full vessel indicates good access, an empty vessel little access. Three pulses of this sound
provide total, rural, and urban values. Sanitation is rendered by adding distortion to the
water pulses when sanitation values are low (suggesting ’dirty’ water).
The final short section of the piece focuses on gender differences in life expectancy. As
the position bell moves from the North Pole to the South Pole, life expectancies for
each country are represented with a tied note, going from the value for male to female
(usually rising), and spatialised at the capital’s location.
Tim Barrass is very modest in commenting on the piece (Barrass (2006)):
I have taken a straightforward and not particularly musical approach, in
an attempt to gain a clear impression of the dataset. The sound mapping
is ”brittle”, designed specifically for the dataset. I would not expect this
approach to provide a flexible base to explore the musical, sonic and in-
formational possibilities of similar material, but it may at least serve as an
example of one direction that has been tried.
While the piece may appear ’artless’ in representing so much of the dataset with appar-
ently simplistic sound mappings, I find the piece extremely elegant, both as a sonification,
and as a composition. The sound metaphors are so clear that they almost disappear, as
162
does the spatial representation. It is quite an achievement to create concurrent sound
layers that are both rich, complex, dense enough to be demanding to listen to, and trans-
parent enough to allow for discovering different aspects as the piece proceeds. This piece
certainly provided the richest information representation of all entries for the concert.
The beginning and end sections work beautifully as frames for the piece, as orientation
help, and as alternative perspectives on the same questions. For me, the questions that
remain long after listening to the piece come from the strongest intervention in the piece,
the idea of sorting the countries so as to begin with the most extreme contrasts in life
expectancy, and moving toward the average lifespan countries.
11.2 Guernica 2006 - Guillaume Potard
This section discusses a piece created by Guillaume Potard for the ICAD 2006 Concert,
described in Potard (2006), and available as headphone-rendered audio file2.
Guernica 2006 sonifies the evolution of world population and the wars that occurred
between the year 1 and 2006. Going far beyond the data supplied with the concert call,
Potard has compiled a comprehensive list of 507 documented wars, with geographical
location, start and end year, and a flag indicating whether it was a civil war or not. He
also located estimates for world population for the same time period.
The sonification design represents the temporal and geographical distribution chrono-
logically. The temporal sequence follows historical time: the start year of each war
determines when its representing sound begins. As many more wars have occurred to-
ward the end of the period observed, the time axis was slowed down logarithmically in
the course of the piece, so the duration of a year near the end of the piece is 4 times
longer than at the beginning. This maintains the overall tendency, but still provides
better balance of the listening experience. The years 1, 1000 and 2000 are marked by
gong sounds for orientation. The entire piece is scaled to a time frame of five minutes.
The start time of each war is indicated by a weapon sound; the sounds chosen change with
the evolution of weapon technology. In the beginning, horses, swords, and punches are
heard, while after the invention of gunpowder cannons, guns, and explosions dominate.
Newer technology such as helicopters is heard only toward the end of the piece, after
the year 1900. Civil wars are marked independently by the additional sound of breaking
glass.
The spatial distribution of the sounds was handled by vector-based amplitude panning
for the directions of the sound sources relative to the reference center, the geographical
location of London. Sound distance was rendered by controlling the ratio of direct to
reverberation sound.
2 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/guernica.mp3
163
The evolution of world population is sonified concurrently as a looping drone, with
playback speed rising as population numbers rise.
Guernica 2006 was certainly the most directly dramatic piece in the concert. The use of
samples communicates the intended context very clearly, without requiring much prior
explanation. As Potard (2006) states, richer data representation with this approach
would certainly be possible; he considers representing war durations, distinguishing more
types of war, and related factors like population migrations in future versions of the
piece.
11.3 ’Navegar E Preciso, Viver Nao E Preciso’
This section discusses a sonification piece created by Alberto de Campo and Christian
Daye for the ICAD 2006 Concert, described in de Campo and Daye (2006), and avail-
able as headphone-rendered audio file3. As this piece was co-written by the author of
this dissertation, much more background can be provided than with the other pieces
discussed.
In this piece, we chose to combine the given dataset containing current (2005) social
data of 190 nations with a time/space coordinates dataset of considerable historical
significance: The route taken by the Magellan expedition to the Moluccan Islands from
1519-1522, which was the first circumnavigation of the Globe.
11.3.1 Navigation
The world data provided by the ICAD 2006 Concert Call all report the momentary
state for the year 2005, and thus free of the idea of historical progression. Also, the
choice of which variables to include in the sonification, and how, must be based on
theoretical assumptions which are not trivial to formulate on a level of aggregation
involving 6.513.045.982 individuals (the number of people estimated to have populated
this planet on April 30, 2006, see U.S. Census Bureau (2006)). The data do provide
detailed spatial information, so we decided to choose a familiar form of data organization
that combines space and time: the journey.
Traveling can be defined as moving through both space and time. While the time
dimension as we experience it is unimpressed by the desires of the traveler, s/he can
decide where to move in space. The art and science that has enabled mankind to find
out where one is, and in which direction to go to arrive somewhere specific, is known as
Navigation.
Navigation as a practice and as a knowledge system has exerted major influence on the
3 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/navegar.mp3
164
development of the world. The Western world has changed drastically by the conse-
quences of the journeys led by explorers like Christopher Columbus or Vasco da Gama.
(The art of navigation outside Europe, especially in Polynesia, is covered very interest-
ingly in Conner (2005), pp 41-58.) The first successful circumnavigation of the globe,
led by Ferdinand Magellan, proved beyond all scholastic doubts that the earth in fact
appears to be round. This would not have happened without the systematic cultivation
of all the related sciences in the school for navigation, map-making and ship-building
founded by Henry the Navigator, King of Portugal in the 15th century. (Conner (2005)
also describes their methods of knowledge acquisition vividly as mainly coercion, appro-
priation, and information hoarding, see chapter: Blue Water Navigation, pp. 201ff.)
For all these reasons, Magellan’s Route became an interesting choice for temporal and
spatial organization for our concert contribution.
11.3.2 The route
Leaving Seville on August 10, 1519, the five ships led by Magellan (called Trinidad,
San Antonio, Concepcion, Victoria, and Santiago) crossed the Atlantic Ocean to anchor
near present-day Rio de Janeiro after five months (Pigafetta (1530, 2001); Wikipedia
(2006b); Zweig (1983)). Looking for a passage into the ocean later called the Pacific,
they moved further south, where the harsh winter and nearly incessant storms forced
them to anchor and wait for almost six months.
While exploring unknown waters for this passage, the Santiago sank in a sudden storm,
and the San Antonio deserted back to Spain; the remaining three ships succeeded and
found the passage in the southernmost part of South America which was later called the
Magellan Straits, in late October 1520. The ships then headed across the Mar del Sur, the
ocean Magellan named the Pacific, towards the archipelago which is now the Philippines,
where they arrived four months later. Seeking the mythical Spice Islands, Magellan and
his crew visited several islands in this area (Limasawa, Cebu, Mactan, Palawan Brunei,
and Celebes); on Mactan, Magellan was killed in a battle, and a monument in Lapu-Lapu
City marks the site where he died.
In spite of their leader’s death, the crew decided to fulfil their mission. By now diminished
to 115 persons on just two ships (Trinidad and Victoria), they finally managed to reach
the Spice Islands on November 6, 1521. Due to a leak in the Trinidad, only the Victoria
”set sail via the Indian Ocean route home on December 21, 1521. By May 6, 1522, the
Victoria, commanded by Juan Sebastin Elcano, rounded the Cape of Good Hope, with
only rice for rations. Twenty crewmen died of starvation before Elcano put into Cape
Verde, a Portuguese holding, where he abandoned 13 more crew on July 9 in fear of
losing his cargo of 26 tons of spices (cloves and cinnamon).” Wikipedia (2006b). On
September 6, 1522, more than three years after she left Seville, Victoria reached the
port of San Lucar in Spain with a crew of 18 left. One is reminded of a song by Caetano
165
Figure 11.1: Magellan’s route in Antonio Pigafetta’s travelogue
(Primo Viaggio Intorno al Globo Terracqueo. - First travel around the terracqueous globe, see
Pigafetta (1530)).
166
Figure 11.2: Magellan’s route, as reported in wikipedia.
http://wikipedia.org/Magellan
Veloso, who, pondering the mentality and fate of the Argonauts, wrote: ”Navegar e
preciso, viver nao e preciso” - ”Sea-faring is necessary, living is not” (see appendix E).
11.3.3 Data choices
The explorers in the early 15th century were interested in spices (which Europe was
massively addicted to at the time), gold, and the prestige earned by gaining access to
good sources of both. Nowadays, other raw materials are considered premium goods.
What would someone who undertakes such a journey today hope to gain for his or her
exertions; what is as precious today as gold and spices were in the 16th century?
We imagine today’s conquistadores (or globalizadores) would likely ask first about eco-
nomic power: how rich is an area? Second, they would probably check geographical
potential; and chances are that if any one resource will be as central to economic activ-
ity in the future as spices were centuries ago, it will be drinking water resources. Water
might well become the new pepper, the new cinnamon, or even the new gold. (As the
Gulf wars showed, oil would have been the obvious current choice; however, we found the
future perspective more interesting.) Thus we chose to focus on two main dimensions:
one depicting economic characteristics of every country we pass, and another informing
us about its inhabitants’ current access to drinking water.
167
11.3.4 Economic characteristics
The variable ’GDP per capita’ included in the given data set provides some insights in
the overall economic performance of a country. Obviously, the ’GDP per capita’ variable
lacks information about the distribution of the income; it only says how much money
there would be per person if it were equally distributed. This is never the case; on the
contrary, scientists find that the rich get richer and the poor get poorer both in intra-
national and international contexts. E.g. in the US of 1980, the head of a company
earned on average 42 times as much as an employee by the year 1999, this ratio was
more than ten times higher: a company leader earned 475 times more than an average
employee (Anonymous (2001)).
Figure 11.3: The countries of the world and their Gini coefficients.
From http://en.wikipedia.org/wiki/Gini.
A measure that captures aspects of income distribution is the Gini coefficient on in-
come inequality (Wikipedia (2006a)). Developed by Corrado Gini in the 1910s, the Gini
coefficient is defined as the ratio of area between the Lorenz curve of the distribution
and the curve of the uniform distribution, to the area under the uniform distribution.
More common is the Gini index, which is the Gini coefficient times 100. The higher the
Gini index, the higher the income differences between the poorer and the richer parts
of a society. A value of 0 means perfectly equal distribution, while 100 means that one
person gets all the income of the country and the others have zero income. However,
the Gini index does not report whether one country is richer or poorer than the other.
Our sonification tries to balance the limitations of these two variables by combining
them: We include two factors that go into a Gini calculation; the ratio of the top and
bottom 10 percentile of all incomes in a population, and the ratio of the top to bottom
20%. In Denmark, at Gini index rank 1 of 124 nations for which Gini data exist, the top
10% earn 4.5x as much as the bottom 10%, for the UK (rank 51), the ratio is 13.8:1;
168
the US (rank 91) ratio is 15.9:1; in Namibia, at rank 124, the ratio is 128.8:1. (In the
sonification, missing values here are replaced by a dense cluster of near-center values,
which is easy to distinguish acoustically from the known occurring distributions.)
11.3.5 Access to drinking water
An interesting variable provided by the ICAD06 Concert data set is ’Estimated percentage
of population with access to improved drinking water sources total’. Being part of the
so-called ”Social Indicators” (UN Statistics Division (1975, 1989, 2006)), the data are
reported to the UN Statistics Division by the national statistic agencies of the UN
member states. Unfortunately, this indicator has a high percentage of missing values
(46 of 190 countries, or 24.2%). This percentage can be reduced to 16.3% (31 countries)
by excluding missing values from countries which are not touched by our Magellanian
route. Still, the problem is fundamental and must be addressed. The strategy we chose
was to estimate the missing values on the basis of the data value of the neighboring
countries, being aware that this procedure does not satisfy scientific rigor. In most cases,
though, we claim that our estimates are likely to match reality: for instance, it is very
likely that in France and Germany (as in in most EU countries), very close to 100% of
the population do have access to ”improved drinking water resources”, and that this
fact is considered too obvious to be statistically recorded.
11.3.6 Mapping choices
We deliberately chose rather high display complexity; while this requires more listener
concentration and attention for maximum retrieval of represented information, hopefully
a more complex piece invites to repeated listening, as audiences tend to do with pieces
of music they enjoy. Every country is represented by a complex sound stream composed
of a group of five resonators; the central resonator is heard most often, the outer pairs
of resonators (’satellites’) sound less often. All parameters of this sound stream are
determined by (a) data properties of the associated country and (b) the navigation
process, i.e. the ship’s current distance and direction towards this country. At any time,
the 15 countries nearest to the route point are heard simultaneously. This is both to
limit display complexity for the sake of clarity, and to keep the sonification within CPU
limits for realtime interactive use. The mapping choices in detail are given in 11.1.
In order to provide a better opportunity to learn this mapping, the author has written
a patch which plays only a single sound source/country at a time, where it is possible
to switch between the parameters for all 192 countries. This allows comparing the
multidimensional changes as one switches from say Hongkong (very dense population,
very rich) to Mongolia (very sparse population, poor). In public demonstrations and
talks, this has proven to be quite appropriate for this relatively complex mapping. When
169
Table 11.1: Navegar - Mappings of data to sound parameters
Population density of country Density of random resonator triggers
GDP per capita of country Central pitch of the resonator group
Ratio of top to bottom 10% Pitches of the outermost (top and bottom)
’satellite’ resonators
Ratio of top to bottom 20% Pitches of the inner two ’satellite’ resonators
(missing values for these become dense clusters)
Water access Decay time of resonators (short tones mean dry)
Distance from ship Volume and attack time (far away is ’blurred’)
Direction toward ship Spatial direction of the stream in the loudspeakers
(direction North is always constant)
Ship speed, direction, winds Direction, timbre and volume of wind-like noise
hearing the piece after experimenting for a while with an example of its main components,
many listeners report understanding the sonification much more clearly.
It has also been helpful to provide some points of orientation that can be identified while
the piece unfolds, as listed in table 11.2.
11.4 Terra Nullius - Julian Rohrhuber
This section discusses a sonification piece created by Julian Rohrhuber for the ICAD
2006 Concert. It is described in Rohrhuber (2006), and available as headphone-rendered
audio file here4.
11.4.1 Missing values
The concept for ’Terra Nullius’ builds on a problem present (or actually, absent) in
data from many different contexts: missing values. Rohrhuber (2006) states that in
sonification, data are assumed to have implicit meaning, and that sonifications try to
communicate such meaning. In the specific case of the data given for the concert, most
data dimensions are quantitative; thus the data can be ordered along any such dimension,
and the value for one dimension of a given data point can be mapped to a sonic property
of a corresponding sound event. For example, one could order by population size, and
map GDP per capita to the pitch of a short sound event.
However, with missing values the situation becomes considerably more complicated:
4 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/terra.mp3
170
Table 11.2: Some stations along the timeline of ’Navegar’
0:00-0:10 Very slow move from Sevilla to San Lucar
0:20-0:26 Cape Verde: very direct sound (i.e. near the capital), rather low,
dense spectrum (poor country, unknown income distribution)
0:54-1:00 Uruguay/Rio de la Plata: very direct sound, passing close by.
1:05-2:40 Port San Julian, Patagonia: very long stasis, everything is far away,
six months long winter break in Magellan’s travel
2:45-3:00 Moving into Pacific Ocean: new streams, many dense spectra;
unknown income distributions
3:20 Philippines: very direct sound (near capital), high satellites:
unequal income distribution
4:00 Brunei: very direct, high, dense sound: very rich, unknown distribution
... towards Moluccan Islands
4:50 East Timor: direct, mostly clicking, only very low frequency resonances
(very poor, little access to water, unknown income distribution)
5:15 into Indian Ocean: ’openness’, sense of distance
5:50 approaching Africa: more lower centers, with very high satellites:
poor, with very unequal distributions (but at least statistics available)
5:55 Pass Cape of Good Hope: similar to East Timor
6:10 Arrive back at San Lucar, Spain
Rohrhuber states that ”These non-values break gaps into the continuity of evaluation -
they belong to another dimension within their dimension. Missing data not only fail to
belong to the dimension they are missing from, they also fail to belong in any uniform
dimension of ’missing’.” Furthermore, one must consider that there are no fully valid
strategies for dealing with missing values: Removing data points with missing values dis-
torts the comparisons in other data dimensions; substituting likely data values introduces
possible errors and reduces data reliability; marking them by recognizably out-of-range
values may be logically correct, but these special values can be quite distracting in a
sonification rendering.
11.4.2 The piece
The piece consists of multiple cycles, each moving around the globe once. For every
cycle, all countries within a zone parallel to the equator are selected and sonified one at a
time in East to West order, as shown in figure 11.4. In the beginning, the zone contains
latitudes similar to England, or actually London, as the capitals determine geographical
position. The sound is spatialised accordingly in the ring of speakers, so one cycle around
171
the globe moves around the speaker cycle once. With every cycle, the zone of latitudes
widens until all countries are included.
Figure 11.4: Terra Nullius, latitude zones
To sonify the missing values in the 46 data dimensions given, a noise source is split into
46 frequency bands. When a value for a dimension is present, the corresponding band
remains silent; the band only becomes audible when the value for that dimension in the
current country is missing.
After all countries are included in the cycle, the latitude zone narrows again over several
cycles, and ends with the latitude and longitude of London. For this second half, the
172
filters have smaller bandwidth, so there is more separation between the dimensions.
Gradually, constant decorrelated noise fades in on all speakers, which remains for a few
seconds after the end of the last cycle.
’Terra Nullius’ plays very elegantly with different orders of ’missingness’, in fact creating
what could be called ’second-order missing values’ of what is being sonified: ”... A band
of filtered noise is used for each dimension that is missing, i.e. the noisier it is, the
less we know. In the end the missing itself seems quite rich of information - only about
what?” (Rohrhuber (2006))
Personally, I find this the most intriguing work of art in the ICAD 2006 concert. Subtly
shifting the discussion to recursively higher levels of consideration of what it is we do
not know, it is an invitation to deeper reflection on many questions about meaning and
representation.
11.5 Comparison of the pieces
In order to study the variety of approaches that artists and sonifiers took in creating
pieces, SDSM terminology and viewpoint turned out to be quite useful. For the dataset
given, a clear anchor can be provided at 190 data points and 26 dimensions for the basic
dataset, and 44 for the extended set (see figure 11.5).
Life Expectancy chooses a rather large set of data dimensions, and sonifies aspects of
it in three distinct ways: an overview of for spatial orientation, sorted by life expectancy
(LE1), a long sequence of 2 second vignettes, densely packed with information (LE2),
and a final sequence of life expectancies sorted North-South (LE3).
Orientation - LE1
Within 20 seconds, a signal sound is played for each country, ordered by total life ex-
pectancy; this renders 3 mapped dimensions (life expectancy, latitude, longitude).
Vignettes - LE2
Five streams make up each vignette:
two bell sounds - 2 dimensions: latitude and longitude;
spoken country and capital name - 6 dimensions: 2 names, spatial location again (2),
population size, and area;
scale fragment - 2 dimensions: life expectancy for males and females;
clinking coins - 2 dimensions: average income over density and GDP;
water vessel - 3x2 dimensions: 3 pulses with 2 values each, ’fullness’ and distortion.
This combination of parallel/interlocked streams with [2, 6, 2, 2, 6] dimensions each
renders a total of 16 dimensions per vignette of 2 seconds! While these could also be
rendered visually as a ’sideways’ view of the SDS map (showing the Y and Z axes), they
are shown here as 16 parallel dimensions for better comparability.
173
Figure 11.5: SDSM comparison of the ICAD 2006 concert pieces.
Ending - LE3
This section is again short (30 seconds with intro and ending clicks, 17 without), and
compares the 2 life expectancy values for males and females, with the countries sorted
North/South; including spatial location, it uses 4 dimensions.
Overall, the piece has very literal, easy to read mappings to sounds; it employs a really
complex, differentiated soundscape, and it is very true to the concept of sonification.
Guernica uses its own data, thus requiring its own data anchor. The piece renders
world population as one auditory stream with a single dimension (GUE+), while each
war is its own stream of 3 dimensions; while the maximum number of simultaneous wars
in Potard’s data is around 35, the piece does not use war durations, so the maximum
number of parallel streams is not documented.
The order of the data is chronological, and at 507 wars within 300 seconds, it has an
average event density of 5 within 3 seconds. Three dimensions are used for each event:
the war’s starting year, and its latitude/longitude. The parallelism of streams is roughly
sketched with copies of the label GUE receding along the Z axis; as this is dynamically
changing, there is no satisfying visual representation.
Like Life Expectancy, Guernica features very literal sound mappings (samples of fighting
sounds); it is based on additional data collected on wars and population since year 1,
174
which extend the starting dataset consideraby; and it adds the notion of a timeline and
historical evolution.
Navegar orders the data along a historical time/space route. Within 6 minutes, 134
countries are rendered (the others are too far away from the route to be ’touched’), which
puts the average data point density around 1 per 3 seconds. At any point, the nearest
15 countries are rendered as one stream each, with 7 dimensions per stream (NAV): lat-
itude/longitude (with moving distance and direction), population density, GDP/capita,
top 10 and top 20 richest to poorest ratios, and water access. The parallelism is again
indicated symbolically as multiple NAV labels along the Z axis. Additionally, ship speed,
direction, and weather conditions are represented, based on 76 timeline points (NAV+).
Like Guernica, Navegar introduces a historical timeline; unlike it, it juxtaposes that with
current social data. It uses metaphorically more indirect mappings than most of the other
submissions. Uniquely within the concert context, it creates a soundscape of stationary
sound sources with a subjective perspective: a moving observer (listener), and it also
sonifies context (here, speed and travel conditions).
Terra Nullius organizes the data by two criteria: selection by latitude zone, ordering by
longitude. A maximum of all 46 dimensions is used throughout the piece, which sets its
Y value on the SDSM map. Within 19 cycles, larger and larger data subsets are chosen;
first, 14 countries within 18 seconds, putting it at a gestalt size of 2-3 (TN1 in the map).
This speeds up to 190 countries at a rate of 100/sec, or ca 35 gestalt size (TN2), and
returns to roughly the original rate eventually.
What sets Terra Nullius apart from all other entries is that it assumes a meta-perspective
on data perceptualisation in general by studying missing values exclusively.
Conclusion
While the SDSM map view of all for pieces shows the large differences between the
approaches taken, it cannot fully capture or describe the radical differences in concepts
manifested in this subset of the pieces submitted. On the one hand, that would be asking
a lot of an overview-creating, orientational concept; one the other, it is interesting to
find that even within rather tightly set constraints like a concert call, creativity easily
defies straightforward categorisation.
Chapter 12
Conclusions
This work consists of three interdependent contributions to sonification research: a the-
oretical framework that is intended for systematic reasoning about design choices while
experimenting with perceptualisations of scientific and other data; a software infrastruc-
ture that pragmatically supports the process of fluid iterative prototyping of such designs;
and a body of sonifications realised using this infrastructure. All these parts were created
within one work process in parallel, interleaved streams: design sketches suggested ideas
for infrastructure that would be useful; observing and analysing design sessions led to
deeper understanding which informed the theoretical framework, and both the growing
framework and the theoretical models eventually led to a more effective design workflow.
The body of sonifications created within this system, and the theoretical models derived
from the analyses of this body of practical work (and a few selected other sonification
designs of interest) form the permanent results of this dissertation. They contribute to
the field of sonification research in the following respects:
• The Sonification Design Space Map and the related models provide a sonification-
specific alternative to TaDa Auditory Information Design, and they suggest a
clearer, more systematic methodology for future sonification research, in particular
for sonification design experimentation.
• The SonEnvir framework provided the first large-scale in-depth test of Just In Time
programming for scientific contexts, which was highly successful. The sonification
community, and other research communities have become aware of the flexibility
and efficiency of this approach.
• The theoretical models, the practical methodology and the individual solutions
developed here may help to reduce time spent to cover large design spaces, and
thus contribute to more efficient and fruitful experimentation.
The work presented here was also employed in sonification workshop settings, and nu-
merous talks and demonstrations given by the author. It proved to be helpful in giving
175
176
interested non-experts a clear impression of the central issues in sonification design work,
and has been received favourably by a number of experts in the field.
12.1 Further work
Within the SonEnvir project, many compromises had to be made due to time and capacity
constraints. Also, given the breadth of the overall approach chosen, many ideas could
not be fully explored, and would thus warrant further research.
In the theoretical models, the main desirable future research aims would be:
1. Integration of more analyses of the growing body of Model-Based Sonification designs.
2. Expansion of the user interaction model based on a deeper background in HCI
research.
In the individual research domains, several areas would warrant continued exploration.
Here, it is quite gratifying to see that one of the research strands has led to a direct
followup project: The QCDaudio project hosted at IEM Graz continues and extends
research begun by Kathi Vogt within SonEnvir.
For the EEG research activities, two strategies seem potentially fruitful and thus worth
pursuing: continuing the planned integration into the NeuroSpeed software, and starting
closer collaborations with other EEG researchers, such as the Neuroinformatics group in
Bielefeld, and individual experts in the field, such as Gerold Baier.
It is quite unfortunate that none of the designs created within this research context would
be directly usable for visually impaired people. In my opinion, providing better access to
scientific and other data for the visually impaired is one of the strongest motivations for
developing a wider variety of sonification design approaches, and would be well worth
pursuing more deeply. I hope the work presented will be found useful for future research
in that direction.
For me personally, experimenting with different forms of sonification in artistic contexts
has become even more intriguing than it was before embarking on this venture. As the
entries for the ICAD concerts, as well as many current network art pieces show, cre-
ative minds find plenty of possibilities for experimentation with data representation by
acoustic, visual and other means; creating work that is both aesthetically interesting and
scientifically well-informed is a still a fascinating activity. When more perceptual modal-
ities are included in more interactive settings, the creative options and the possibility
spaces to explore multiply once again.
Appendix A
The SonEnvir framework structure in
subversion
This section describes which parts of the framework reside in which folders in the So-
nEnvir subversion repository. Note that the state reported below is temporary; pending
discussion with the SC3 community, more SonEnvir work will move into the main distri-
bution, as well as into general SC3 Quarks, or SonEnvir-specific Quarks.
A.1 The folder ’Framework’
This folder contains the central SC3 classes written during the project, and their respec-
tive help files. The sub-folders are structured as follows:
Data: contains all the SC3 classes for different kinds of data (see the Data model
discussion above), such as EEG data in .edf format; it also includes some appli-
cations written as classes: The TimeSeriesAnalyzer (described in section 8), the
EEGScreener and EEGRealTimePlayer (described in section 9).
Interaction: contains the MouseStrum Class. Most of the user interface devices/interaction
classes are covered by the JInT quark written by Till Bovermann, and available
from the SC3 project site at sourceforge.
Patterns: contains the HilbertIndex, a pattern class that generates 2D and 3D in-
dices along Hilbert space filling curves; note that for 4D Hilbert indices there is a
quark package. It also includes support patterns for Hilbert index generation, and
Pxnrand, a pattern that avoids repeating the last n values of its own output.
Rendering: contains two UGen classes, TorusPanAz and PanRingTop, and a utility for
adjusting the individual speakers multichannel systems for more balanced sound,
SpeakerAdjust. See also section 5.5.
177
178
Synthesis: includes a reverb class (AdCVerb, used in the VirtualRoom class), sev-
eral classes for cascaded filters, a UGen to indicate loop ends in buffer playback,
PhasorClick (both are used in the EEG applications); and a dual band compressor.
Utilities: includes a model for QCD simulations, Potts2D, a library of singing voice
formants, and various extension methods.
osx, linux, windows: these folders capture platform specific development; of these,
only the OSX folder is in use for OSX-specific GUI classes. These will eventually
be converted to a crossplatform scheme.
A.2 The folder ’SC3-Support’
QtSC3GUI: contains GUIs written in Qt, which were considered an option for SC3
on Windows; this strand of development was dropped when sufficiently powerful
versions of cross-platform GUI extension package swingOSC became available.
SonEnvirClasses, SonEnvirHelp: these contain essentially obsolete variants of So-
nEnvir classes; they are kept mainly in case some users still need to run examples
using these classes.
A.3 Other folders in the svn repository
CUBE: contains the QVicon2Osc application, which can connect the Vicon tracking
system (which is in use at the IEM Cube) to any software that supports Open-
SoundControl, and a test for that system using the SonEnvir VirtualRoom for
binaural rendering.
Prototypes: contains all the sonification designs (’prototypes’) written, sorted by sci-
entific domain. These are described extensively and analysed in the chapters on
sonification designs for the domain sciences, 6 - 9.
Psychoacoustics: contains some demonstrations of perceptual principles written for
the domain scientists.
SC3-Training: contains a short Introduction to SuperCollider for sonification; this was
written for the domain scientists, both in German and in English.
SOS1, SOS2: contains demo versions of sonification designs for two presentations
(called Sound of Science 1 and 2) at IEM Graz.
testData: contains anonymous EEG data files in .edf format, for testing purposes only.
179
A.4 Quarks-SonEnvir
This folder contains all the SC3 classes written in SonEnvir that have been migrated
into Quarks packages for specific topics. Each folder can be downloaded and installed
as a Quark.
QCD contains some Quantum Chromodynamics models implemented in SC3.
SGLib contains a port of a 3D graphics library for math operations on tracking data.
gui-addons contains platform-independent gui extensions to SC3.
hilbert contains a file reader for loading pre-computed 4D Hilbert curve indices from
files.
rainData contains a data reader class for the Rain data used in the SBE workshop (see
section 10).
wavesets contains the Wavesets class, which analyses mono soundfiles into Wavesets,
as defined by Trevor Wishart. This can also be used for applying granular synthesis
methods on times series-like data.
A.5 Quarks-SuperCollider
These extension packages contains all the SC3 classes written in SonEnvir that have
been migrated into Quarks packages for specific topics. They can be downloaded and
installed from the sourceforge svn site of SuperCollider.
AmbIEM: This package for binaural sound rendering using Ambisonics has become an
official SuperCollider extension package (’Quark’). ARHeadtracker is an interface
class to a freeware tracking system.
The statistics methods implemented within SonEnvir have moved to the general SC3
quark MathLib, while others have become quarks themselves, such as the JustInTerface
quark (JInT) written by Till Bovermann (within SonEnvir). Finally, the TUIO quark
(Tangible User Interface Objects, also by Till Bovermann, of University Bielefeld) is of
interest for sonification research with strongly interactive approaches.
Appendix B
Models - code examples
B.1 Spatialisation examples
B.1.1 Physical sources
For multiple speaker setups, a simple and very effective strategy is to use individual
speakers as real physical sources. The main advantage is that physics really help in this
case; when locations only serve to identify streams, as with few fixed sources, fixed single
speakers work very well.
SuperCollider supports this directly with the Out Ugen: it determines which bus a signal
is written on, and thus, which audio hardware output it is heard on.
// a mono source playing out of channel 4 (indices start at 0)
{ Out.ar(3, Ringz.ar(Dust.ar(30), 400, 0.2)) }.play;
The JITLib library in SuperCollider3 supports a more flexible scheme: sound processes
(in JITLib speak, NodeProxies) run on their own private busses by default; when they
should be audible, they can be routed to the hardware outputs with the .play method.
~snd = { Ringz.ar(Dust.ar(30), 400, 0.2) }; // proxy inaudible, but plays
~snd.play(3); // listen to it on hardware output 4.
NodeProxies also support more flexible fixed multichannel mapping very simply: The
.playN method lets one route each audio channel of the proxy to one or several hardware
output channels, each with optional individual level controls.
// a 3 channel source
~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) };
180
181
// to individual speakers 1, 3, 5:
~snd3ch.playN([0, 2, 4]);
// to multiple speakers, with individual levels:
~snd3ch.playN(outs: [0, [1,2], [3,4]], amps: [1, 0.7, 0.7]);
B.1.2 Amplitude panning
All of the following methods work for both moving and static sources.
1D: In the simplest case the Pan2 UGen is used for equal power stereo panning.
// mouse controlled pan position
{ Pan2.ar(Ringz.ar(Dust.ar(30), 400, 0.2), MouseX.kr(-1, 1)) }.play;
2D: The PanAz UGen pans a single channel to a symmetrical ring of n speakers byazimuth, with adjustable width over how many speakers (at most) the energy is dis-tributed.
(
{ var numChans = 5, width = 2;
var pos = MouseX.kr(0, 2);
var source = Ringz.ar(Dust.ar(30), 400, 0.2);
PanAz.ar(5, source, pos, width);
}.play;
)
In case the ring is not quite symmetrical, adjustments can be made by remapping;
however, using the best geometrical symmetry attainable is always superior to post-
compensation. In order to remap dynamic spatial positions to a ring of speakers at
unequal angles such that the resulting directions are correct, the following example
shows the steps are needed: Given a five-speaker system, equal speaker angles would
be [0, 0.4, 0.8, 1.2, 1.6, 2.0] with 2.0 being equal to 0.0 (this is the behaviour of the
PanAz UGen); the actual unsymmetric speaker angles could be for example [0, 0.3, 0.7,
1, 1.5, 2.0]; so remapping should map a control value of 0.3 (where speaker 2 actually
is) to a control value value of 0.4 (the control value that positions this source directly
in speaker 2). The full map of corresponding values is given in table B.1.
( // remapping unequal speaker angles with asMapTable and PanAz:
a = [0, 0.3, 0.7, 1, 1.5, 2.0].asMapTable;
b = Buffer.sendCollection(s, a.asWavetable, 1);
{ |inpos=0.0|
var source = Ringz.ar(Dust.ar(30), 400, 0.2));
182
Table B.1: Remapping spatial control values
list of breakpoints for equally spaced output/
desired spatial position mapped control values
0.0 0.0
0.3 0.4
0.7 0.8
1.0 1.2
1.5 1.6
0.0 == 2.0 2.0 == 0.0
var pos = Shaper.kr(b.bufnum, inpos.wrap(0, 2));
PanAz.ar(a.size - 1, source, pos);
}.play;
)
Mixing multiple channel sources down to stereo:
The Splay UGen mixes an array of channels down to 2 channels, at equal pan distances,
with adjustable spread and center position. Internally, it uses a Pan2 UGen.
~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) };
~snd3pan = { Splay.ar(~ snd3ch, spread: 0.8, level: 0.5, center: 0) };
~snd3pan.playN(0);
Mixing multiple channel sources into a ring of speakers:
The SplayZ UGen pans an array of source channels into a number of output channels
at equal distances; spread and center position can be adjusted. Both larger numbers of
channels can be splayed into rings of fewer speakers, and vice versa. Internally, SplayZ
uses a PanAz UGen.
// spreading 4 channels equally into a ring of 6 speakers
~snd4ch = { Ringz.ar(Dust.ar([1,1,1,1] * 30), [400, 550, 750, 900], 0.2) };
~snd4pan = { SplayZ.ar(6, ~ snd4ch, spread: 1.0, level: 0.5, center: 0) };
~snd4pan.playN(0);
3D: The SonEnvir extension TorusPanAz does the same for setups with rings of rings of
speakers. Again, the speaker setup should be as symmetrical as possible; compensation
can be trickier here. (In general, even while compensations for less symmetrical se-
tups seem mathematically possible, spatial images will be worse outside the sweet spot.
Maximum attainable physical symmetry cannot be fully substituted by more DSP math.)
183
( // panning to 3 rings of 12, 8, and 4 speakers, cf. IEM CUBE.
~snd = { Ringz.ar(Dust.ar(30), 550, 0.2) };
~toruspan = {
var hAngle = MouseX.kr(0, 2); // all the way around (2 == 0)
var vAngle = MouseY.kr(0, 1.333); // limited to highest ring
TorusPanAz.ar([12, 8, 4],
~snd.ar(1),
hAngle,
vAngle
);
};
~toruspan.playN(0);
)
Compensating overall vertical ring angles and individual horizontal speaker angles within
each ring is straightforwrd with the asMapTable method as shown above. For placement
deviations that are both horizontal and vertical, it is preferable to have Vector Based
Amplitude Panning in SC3, which has been implemented recently by Scott Wilson and
colleagues1. However, this was not needed within the context of the SonEnvir project.
B.1.3 Ambisonics
While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team
decided to write a consistent new implementation of Ambisonics in SC3, based on a
subset of the existing PureData libraries. This package was realised up to third order
Ambisonics by Christopher Frauenberger for the AmbIEM package, available here2. It
supports the main speaker setup of interest (a half-sphere of 12, 8 and 4 speakers, the
CUBE at IEM, with several coefficent sets for different tradeoff choices), and for a setup
with 1-4-7-4 speaker rings, mainly used as a more efficient lower resolution alternative
for headphone rendering, as described below.
( // panning two sources with 3rd order ambisonics into CUBE sphere.
~snd0 = { Ringz.ar(Dust.ar(30), 400, 0.2) };
~snd1 = { Ringz.ar(Dust.ar(30), 550, 0.2) };
~pos0 = [0, 0.01]; // azimuth, elevation
~pos1 = [1, 0.01]; // azimuth, elevation
~encoded[0] = { PanAmbi30.ar( ~snd0.ar, ~pos0.kr(1, 0), ~pos0.kr(1, 1)) };
~encoded[1] = { PanAmbi30.ar( ~snd1.ar, ~pos1.kr(1, 0), ~pos1.kr(1, 1)) };
1See http://scottwilson.ca/site/Software.html2 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/
184
decode24 = { DecodeAmbi3O.ar(~ encoded.ar, ’CUBE_basic’) };
decode24.play(0);
)
B.1.4 Headphones
Ambisonics and Virtual Binaural Rendering
For complex changing scenes, the IEM has developed a very efficient approach for bin-
aural rendering (Musil et al. (2005); Noisternig et al. (2003)): In effect, taking a virtual,
symmetrical speaker setup (such as 1-4-7-4), and spatializing to that setup with Am-
bisonics; then rendering these virtual speakers as point sources with their appropriate
HRIRs, thus arriving at a binaural rendering. This provides the benefit that the Am-
bisonic field can be rotated as a whole, which is really useful when head movements of
the listener are tracked, and the binaural rendering is designed to compensate for them.
Also, the known problems with Ambisonics when listeners move outside the sweet zone
disappear; when one carries a setup of virtual speakers around one’s head, one is always
right in the center of the sweet zone.
This approach has been ported to SC3 by C. Frauenberger; its main use is in the Vir-
tualRoom class, which simulates moving sources within a rectangular box-shaped room.
This class has turned out to be very useful as a simple way to prepare both experiments
and presentations for multi-speaker setups by relatively simple headphone simulation.
(
// VirtualRoom example - adapted from help file.
// preparation: reserve more memory for delay lines, and boot the server
s.options.memSize_(8192 * 16)
.numAudioBusChannels_(1024);
s.boot;
// make a proxyspace
p = ProxySpace.push;
// set the path for the folder with Kemar files.
VirtualRoom.kemarPath = "KemarHRTF/";
)
(
// create a virtual room
v = VirtualRoom.new;
// and start its binaural rendering
v.init;
// set the room properties (reverberation time and gain,
// hf damping on reverb and early reflections gain)
v.revTime = 0.1;
185
v.revGain = 0.1;
v.hfDamping = 0.5;
v.refGain = 0.8;
)
( // set room dimension [x, y, z, x, y, z]:
// a room 8m wide (y), 5m deep(x) and 5m high(z)
// - nose is always along x
v.room = [0, 0, 0, 5, 8, 5];
// make it play to hardware stereo outs
v.out.play;
// listener is listener position, a controlrate nodeproxy;
// here movable by mouse.
v.listener.source = { [ MouseY.kr(5,0), MouseX.kr(8,0), 1.6, 0] };
)
// add three sources to the scene
( // make three different sounds
~noisy = { Decay.ar(Impulse.ar(10, 2), 0.2) * PinkNoise.ar(1) };
~ringy = { Ringz.ar(Dust.ar(10), [400, 600,950], [0.3, 0.2, 0.05]).sum };
~dusty = { Dust.ar(400) };
)
// add the three sources to the virtual room:
// source, name, xpos, ypos, zpos
v.addSource( ~noisy, \noisy, 1, 2, 2.5); // bottom right corner
v.addSource( ~ringy, \ringy, 1.5, 7, 2.5); // bottom left
v.addSource( ~dusty, \dusty, 4, 5, 2.5); // top, left of center
v.sources[\noisy].set(\xpos, 4, \ypos, 6, \zpos, 2); // set noisy position
v.sources[\noisy].getKeysValues; // check its position values
v.sources[\ringy].set(\xpos, 2.5, \ypos, 4, \zpos, 2);
// remove the sources
v.removeSource(\noisy);
v.removeSource(\ringy);
v.removeSource(\dusty);
v.free; // free the virtual room and its resources
p.pop; // and clear and leave proxyspace
Among other things, the submissions for the ICAD 2006 concert3, described also in
section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web
3 http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php
186
documentation4.
One can of course also spatialize sounds on the virtual speakers by any of the simpler
panning strategies given above as well; this trades off easy rotation of the entire setup
for better point source localisation.
To support simple headtracking, C. Frauenberger also created the ARHeadTracker ap-
plication, also available as a package from the SonEnvir website here5.
B.1.5 Handling speaker imperfections
All standard spatialisation techniques work best when speaker setups are as symmetrical
and well-controlled as possible. While it may not always be feasible to adjust mechan-
ical positions of speakers freely for very precise geometry, a number of factors can be
measured and compensated for, and this is supported by several utility classes written in
SuperCollider, which are part of the SonEnvir framework.
Latency
The Latency class plays a test signal for a given number of audio channels, and waits for
the signals to arrive back at an audio input. The resulting list of measured per-channel
latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class
described below.
// test 2 channels, max delay expected 0.2 sec,
// take default server, mic is on AudioIn 1:
Latency.test(2, 0.2, Server.default, 1);
// stop measuring and post results
Latency.stop;
// results are posted like this:
// measured latencies:
in samples: [ 1186.0, 1197.0 ]
in seconds: [ 0.026893424036281, 0.027142857142857 ]
Spectralyzer
While inter-speaker latency differences are well-known and very often addressed, we have
found another common problem to be more distracting for multichannel sonification:
Each individual channel of the reproduction chain, from D/A converter to amplifier,
4 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html5 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/
187
cable, loudspeaker, and speaker mounting location in the room, can sound quite different.
When changes in sound timbre can encode meaning, this is potentially really confusing!
To address this, the Spectralyzer class allows for simple analysis of a test signal as
played into a room, with optional smoothing over several measurements, and then tuning
compensating equalizers by hand for reasonable similarity across all speaker channels.
While this could be written to run automatically, we consider it more of an art than
an engineering task; a more detailed EQ intervention will make the frequency response
flatter, but may color the sound more by smearing its impulse behaviour.
x = Spectralyzer.new; // make a new spectralyzer
x.start; x.makeWindow; // start it, open its GUI
x.listenTo({ PinkNoise.ar }); // pink noise should look flat
x.listenTo({ AudioIn.ar(1)}); // should look similar from microphone.
Figure B.1: The Spectralyzer GUI window.
For full details see the Spectralyzer help file.
( // tuning 2 speakers for better linearity
p = ProxySpace.push;
~noyz = { PinkNoise.ar(1) }; // create a noise source
~noyz.play(0, vol: 0.5);
// filter it with two bands of parametric eq
~noyz.filter(5, { |in, f1=100,rq1=1,db1=0,f2=5000,rq2=1,db2=0|
MidEQ.ar(MidEQ.ar(in, f1, rq1, db1), f2, rq2, db2);
});
)
// tweak the two bands for better acoustic linearity
~noyz.set(\f1, 1200, \rq1, 1, \db1, -5); // take out low presence bump
~noyz.set(\f2, 150, \rq2, 0.6, \db2, 3); // boost bass dip
188
~noyz.getKeysValues.drop(1).postcs; // post settings when done
// move on to speaker 2
~noyz.play(1, vol: 0.5);
// tweak the two bands again for speaker 2
~noyz.set(\f1, 1200, \rq1, 1, \db1, 0); // likely to be different ...
~noyz.set(\f2, 150, \rq2, 0.6, \db2, 0); // from speaker 1.
~noyz.getKeysValues.drop(1).postcs; // post settings.
SpeakerAdjust
Once one has achieved usable EQ curves for every speaker channel, one can begin
to compensate for volume differences between channels (with big timbral differences
between channels, measuring volume or adjusting it by listening is rather pointless).
The SpeakerAdjust class expects simple specifications for each channel:
amplitude (as multiplication factor, typically below 1.0),
optionally: delaytime (in seconds, to be independent of the current samplerate),
optionally: eq1-frequency, eq1-gain, eq1-relative-bandwidth,
optionally: eq2-frequency, eq2-gain, eq2-relative-bandwidth,
and repeat for as many bands as desired.
// From SpeakerAdjust.help:
// adjustment for 2 channels, amp, dtime, eq specs;
// you can add as many triplets of eqspecs as you want.
(
var specs;
specs = [
// amp, dtime, eq1: frq, db, rq; eq2: frq, db, amp
[ 0.75, 0.0, [ 250, 4, 0.5], [ 800, -4, 1]],
[ 1, 0.001, [ 250, 2, 0.5], [ 5000, 3, 1]]
];
{ var ins;
ins = Pan2.ar(PinkNoise.ar(0.05), MouseX.kr(-1, 1));
SpeakerAdjust.ar(ins, specs)
}.play;
)
Such a speaker adjustment can be created and added to the end of the signal chain
to linearise the given speaker setup as much as possible; of course, adding limiters for
speaker and listener protection can be built into such a master effects unit as well.
Appendix C
Physics Background
C.1 Constituent Quark Models
The concept of constituent quarks was introduced in the 1960s by Gell-Mann (1964)
and Zweig (1964), based on symmetry considerations in the classification of hadrons, the
strongly interacting elementary particles. The first CQMs for the description of hadron
spectra were introduced in the early 1970s by de Rujula et al. (1975). The original CQMs
relied on simple models for the confinement of constituent quarks (such as the harmonic
oscillator potential) and employed rudimentary hyperfine interactions. Furthermore they
were set up in a completely nonrelativistic framework. In the meantime CQMs have
undergone a vivid development. Over the years more and more notions deriving from
QCD have been implemented, and CQMs are constructed within a relativistic formalism.
Modern CQMs all use a confinement potential of linear form, as suggested by QCD. For
the hyperfine interaction of the constituent quarks several competing dynamical concepts
have been proposed: A prominent representative is the one-gluon-exchange (OGE) CQM,
whose dynamics for the hyperfine interaction basically relies on the original ideas of
Zweig (1964): the effective interaction between the constituent quarks is generated by
the exchange of a single gluon. For the data we experimented with, we considered a
relativistic variant of the OGE CQM as constructed by Theussl et al. (2001). A different
approach is followed by the so-called instanton-induced (II) CQM (Loering et al. (2001)),
whose hyperfine forces derive from the ’t Hooft interaction. Several years ago the physics
group at Graz University has suggested a hyperfine interaction based on the exchange
of Goldstone bosons. This type of dynamics is motivated by the spontaneous breaking
of chiral symmetry (SBχS), which is an essential property of QCD at low energies. The
SBχS is considered to be responsible for the quarks to acquire a (heavier) dynamical
mass, and their interaction should then be generated by the exchange of Goldstone
bosons, the latter being another consequence of SBχS. The Goldstone-boson-exchange
(GBE) CQM was originally suggested in a simplified version, based on the exchange of
pseudoscalar bosons only (Glozman et al. (1998)). In the meantime an extended version
189
190
has been formulated by Glantschnig et al. (2005).
Quantum-Mechanical Solution of Constituent Quark Models
Modern CQMs are constructed in the framework of relativistic quantum mechanics
(RQM). They are characterised by a Hamiltonian operator H that represents the to-
tal energy of the system under consideration. For baryons, which are considered as
bound states of three constituent quarks, the corresponding Hamiltonian reads
H = H0 +∑i<j
[Vconf (i, j) + Vhf (i, j)] , (C.1)
The first term on the right-hand side denotes the relativistic kinetic energy of the sys-
tem (of the three constituent quarks), and the sum includes all mutual quark-quark
interactions. It consists of two parts, the confinement potential Vconf and the hyperfine
interaction Vhf . The confinement potential prevents the constituent quarks from escap-
ing the volume of the baryon (being of the order of 10−15 m); no free quarks have ever
been observed in nature. The hyperfine potential provides for the fine structure of the
energy levels in the baryon spectra. Different dynamical models lead to distinct features
in the excitation spectra of baryons.
In order to produce the baryon spectra of the CQMs one has to solve the eigenvalue
problem of the Hamiltonian in equation C.1. Several methods are available to achieve
solutions to any desired accuracy. The Graz group has applied both integral-equation
(Krassnigg et al. (2000)) as well as differential-equation techniques (Suzuki and Varga
(1998)).
Upon solving the eigenvalue problem of the Hamiltonian one ends up with the eigenvalues
(energy levels) and eigenstates (quantum-mechanical wave functions) of the baryons.
They are characterised according to the conserved quantum numbers, the total angular
momentum J (which is half integer in the case of baryons) and the parity P (being
positive or negative). The different baryons are distinguished by the ’flavor’ of their
constituent quarks, which can be u, d, and s (for ’up’, ’down’, and ’strange’). For
example, the proton is uud, the neutron is udd, the ∆++ is uuu, and the Σ0 is uds.
Classification of Baryons
The total baryon wave function ΨXSFC is composed of spatial (X), spin (S), flavor (F ),
and color (C) degrees of freedom corresponding to the product of symmetry spaces
ΨXSFC = ΨXSFΨsingletC , (C.2)
191
It is antisymmetric under the exchange of any two particles, since baryons must obey
Fermi statistics. There are several visual representations of the symmetries between the
different baryons based on their combinations of quarks; figure C.1 shows one of them.
Figure C.1: Multiplet structure of the baryons as a decuplet.
In this ordering of baryon flavor symmetries, all the light and strange baryons are in the lowest
layer.
Quarks are differentiated by the following properties:
Color The color quantum numbers are r, b, g (for ’red’, ’blue’, and ’green’). Only white
baryons are observed in experiment. Thus the color wave function corresponds to a
color singlet state and is therefore completely antisymmetric. As a consequence the
rest of the wave function (comprising spatial, spin, and flavor degrees of freedom)
must be symmetric.
Flavor According to the Standard Model (SM) of particle physics there are six quark
flavors: up, down, strange, charm, bottom, and top. Quarks of different flavours
have different masses. Normal hadronic matter (i.e. atomic nuclei) is basically
composed only of the so-called light flavors u and d. CQMs consider hadrons with
flavors u, d, and s. These are also the ones that are most affected by the SBχS.
Correspondingly, one works in SU(3)F and deals with baryons classified within
singlet, octet, and decuplet multiplets. For example, the nucleons (proton and
neutron) are in an octet, together with the Λ, Σ, and Ξ particles.
Spin All quarks have spin 12. The spin wave function of the three quarks is constructed
192
within SU(2)S and is thus symmetric or mixed symmetric or mixed antisymmetric.
The total spin of a baryon is denoted by S.
Orbital Angular Momentum and Parity The spatial wave function corresponds to
a given orbital angular momentum L of the three-quark system. Its symmetry
property under spatial reflections determines the parity P .
Total Angular Momentum The total angular momentum J is composed of the total
orbital angular momentum L and the total spin S of the three-quark system accord-
ing to the quantum-mechanical addition rules of angular momenta: J = L + S.
It is always half-integer. The total angular momentum J is a conserved quan-
tum number and, together with the parity P , serves for the distinction of baryon
multiplets JP .
C.2 Potts model- theoretical background
In mathematical terms, the Hamilton-function H defines the overall energy, which any
physical system, and thus also a Potts model, will try to minimize:
H = −J∑<i,j>
SiSj −M∑i
Si (C.3)
where J is the coupling parameter between spin Si and its neighbouring spin Sj. J
is inversely proportional to the temperature; M is the field strength of an exterior
magnetic field acting on each spin Si. The first sum is denoted over nearest neighbours
and describes the coupling term. It is responsible for the phase transition. If J = 0,
only the second term remains, and the Hamiltonian describes a paramagnet, being only
magnetised in the presence of an exterior magnetic field. In our simulations, M was
always 0.
When studying phase transitions macroscopically, the defining term is the free energy
F .
F (T,H) = −kBT lnZ(T,H) (C.4)
It is proportional to the logarithm of the so-called partition function Z of statistical
physics, which sums up all possible spin configurations and weights them with a Boltz-
mann factor kB. Energetically unfavorable states are less probable in the partition func-
tion than energetically favorable ones.
Z =∑Sn
e− HkBT (C.5)
193
The partition function Z (eq. C.5) is not calculable in practice due to combinatorial
explosion: a three dimensional lattice with a length of 100 and two possible spin states
has 21003= (210)105 ∼ 10300.000 configurations that would have to be summed up - at
every time step of the simulation. Also in analytical deduction only few spin models
have been solved exactly, and in three dimensions not even the simple Ising model
is analytically solvable. Therefore classical treatment relies mainly on approximation
methods, which allow partly to estimate critical exponents, and can be outlined briefly
as follows:
Early theories addressing phase transitions, like Van der Waals theory of fluids and Weiss
theory of magnetism can be subsumed under Landau theory or mean-field theory. Mean-
field theory assumes a mean value for the free energy. Landau derived a theory, where
the free energy is expanded as a power series in the order parameter, and only terms are
included which are compatible with the symmetry of the system. The problem is that all
of these approaches ignore fluctuations by relying only on mean values. (For a detailed
review of phase transition theories please refer to Yeomans (1992).)
Renormalization group theory by K. G. Wilson Wilson (1974) solved many problems
of critical phenomena, most importantly the understanding of why continuous phase
transitions fall into universality classes. The basic idea is to do a transformation that
changes the scale of the system but not its partition function. Only at the critical point
the properties of the system will not change under such a transformation, and it is then
described by so-called fixed points in the parameter space of all Hamiltonians. This is
why critical exponents are universal for different systems.
C.2.1 Spin models sound examples
The following audio files can be downloaded from
http://sonenvir.at/downloads/spinmodels/.
The first part describes sonifications that enable the listener to classify the phase of the
model (sub-critical, critical, super-critical).
Granular sonifications: Random, averaged spin blocks were used to determine the
sound grains. The spatial setting cannot be reproduced in this recording. But
even without having a clear gestalt of the system, the different characteristics of
IsingHot, IsingCritical and IsingCold may easily be distinguished.
Audification approaches: (Please consider that a few clicks in the audio files below
are artifacts of the data management and buffering in the computer.)
1. Noise: NoiseA gives the audification of a 3-state Potts model at thermal
noise (coupling J = 0.4)
194
NoiseB gives the same for the 5-state Potts model (J = 0.4), evidently
the sound becomes smoother the more states are possible, but its overall
character stays the same.
2. Critical behaviour: this example was recorded with a 4-state Potts model at
and near the critical temperature:
SuperCritical - near the critical point clusters emerge. These are rather big
but homogeneous, hence a regularity is still perceivable. (J = 0.95)
Critical - at the critical point itself, clusters of all orders of magnitude emerge,
thus the sound is much more unstable and less pleasant. (J = 1.05)
3. SubCritical - as soon as the system is equilibrated in the subcritical domain
(at T < Tcrit), one spin orientation predominates, and only few random spin
flips occur due to thermal fluctuations. (Recorded with the Ising model at J
= 1.3.)
The next examples study the order of the phase transition.
Direct audification displays only a very subtle differences between the two types of
phase transitions:
1. The 4-state Potts model is played in ContinousTransition.
2. A more sudden change can be perceived in FirstOrderTransition for the 5-
state Potts model.
Audification with separate spin channels: For each spin-orientation the lattice is se-
quentialised and the resulting audification is played on an own channel. The lattice
size was 32x32, and the system was equilibrated at each step. The examples finish
with one spin orientation prevailing, which means that only random clicks from a
non-vanishing temperature remain.
1. The transition in the 2-state Ising model and the 4-state Potts model are
continuous, the change is smooth.
2. In the 5-state and 8-state models the phase transition is abrupt (the data is
more distinct the more states are involved).
Appendix D
Science By Ear participants
The following people took part in the Science By Ear workshop:
SonEnvir members/moderators
Daye, Christian
De Campo, Alberto
Eckel, Gerhard
Frauenberger, Christopher
Vogt, Katharina
Wallisch, Annette
Programming specialists
Bovermann, Till, Neuroinformatics Group, Bielefeld University
De Campo, Alberto
Frauenberger, Christopher
Pauletto, Sandra, Music Technology Group, York University
Musil, Thomas, Audio/DSP, Institute of Electronic Music (IEM) Graz
Rohrhuber, Julian, Academy of Media Arts (KHM) Cologne
Sonification experts
Baier, Gerold, Dynamical systems, University of Morelos, Mexico
Bonebright, Terri, Psychology/Perception, DePauw University
Bovermann, Till
Dombois, Florian, Transdisciplinarity, Y Institute, Arts Academy Berne
Hermann, Thomas, Neuroinformatics Group, Bielefeld University
195
196
Kramer, Gregory, Metta Organization
Pauletto, Sandra
Stockman, Tony, Computer Science, Queen Mary Univ. London
Domain scientists
Baier, Gerold
Dombois, Florian
Egger de Campo, Marianne, Sociology, Compass Graz
Fickert, Lothar, Electrical power systems, University of Technology (TU) Graz
Grond, Florian, Chemistry / media art, ZKM Karlsruhe
Grossegger, Dieter, EEG Software, NeuroSpeed Vienna
Hipp, Walter, Electrical power systems, TU Graz
Huber, Anton, Physical Chemistry, University of Graz
Markum, Harald, Atomic Institute of the Austrian Universities, TU Vienna
Plessas, Willibald, Physics Institute, University of Graz
Shutin, Dimitri, Electrical power systems, TU Graz
Schweitzer, Susanne, Wegener Center for Climate and Global Change, University of Graz
Witrisal, Klaus, Electrical power systems, TU Graz
Appendix E
Background on ’Navegar’
The saying has a long history. Plutarch ascribes it to General Pompeius saying this line
to soldiers he sent off on a suicide mission, and Veloso may well have read it in a famous
poem by Fernando Pessoa. Here are Veloso’s lyrics:
Table E.1: Os Argonautas - Caetano Veloso
O barco, meu coracao nao aguenta the ship, my heart cannot handle it
Tanta tormenta, alegria so much torment, happiness
Meu coracao nao contenta my heart is discontent
O dia, o marco, meu coracao, o porto, nao the day, the limit, my heart, the port, no
Navegar e preciso, viver nao e preciso sea-faring is necessary, living is not
O barco, noite no ceu tao bonito the ship, night in the beautiful sky
Sorriso solto perdido the free smile, lost
Horizonte, madrugada horizon, morning dawn
O riso, o arco, da madrugada the laugh, the arc, of morning
O porto, nada the port, nothing
Navegar e preciso, viver nao e preciso sea-faring is necessary, living is not
O barco, o automovel brilhante the ship, the brilliant automobile
O trilho solto, o barulho the free track, the noise
Do meu dente em tua veia of my tooth in your vein
O sangue, o charco, barulho lento the blood, the swamp, slow soft noise
O porto silencio the port - silence
Navegar e preciso, viver nao e preciso sea-faring is necessary, living is not
(Literal English translation: Alberto de Campo.)
197
Appendix F
Sound, meaning, language
Sounds can change their meanings in different contexts. This ambiguity has also been
interesting for poetry, as this work by Ernst Jandl shows.
Ernst Jandl - Oberflachenubersetzung (’Surface Translation’)
mai hart lieb zapfen eibe hold
er rennbohr in sees kai.
so was sieht wenn mai lauft begehen,
so es sieht nahe emma mahen,
so biet wenn arschel grollt
ohr leck mit ei!
seht steil dies fader rosse mahen,
in teig kurt wisch mai desto bier
baum deutsche deutsch bajonett schur alp eiertier.
Original poem by William Wordsworth
My heart leaps up when I behold
a rainbow in the sky.
so was ist when my life began,
so is it now I am a man,
so be it when I shall grow old
or let me die!
The child is father of the man
and I could wish my days to be
bound each to each by natural piety.
198
Bibliography
Abbott, A. (1990). A Primer on Sequence Methods. Organization Sci-
ence, 1(4):375–392.
Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas.
Annual Review of Sociology, 21:93–113.
Anderson, M. L. (2003). Embodied cognition: A field guide. Artificial
Intelligence, 149(1):91–130.
Anonymous (March 20, 2001). L’histoire: PDG surpayes. Liberation.
Armstrong, N. (2006). An Enactive Approach to Digital Musical Instru-
ment Design. PhD thesis, Princeton University.
Baier, G. and Hermann, T. (2004). The Sonification of Rhythms in
Human Electroencephalogram. In Proc. Int. Conf. on Auditory Display
(ICAD), Sydney, Australia.
Baier, G., Hermann, T., Sahle, S., and Ritter, H. (2006). Sonified Epilec-
tic Rhythms. In Proc. Int Conf. on Auditory Display (ICAD), London,
UK.
Baier, G., Hermann, T., and Stephani, U. (2007). Event-based sonifica-
tion of EEG rhythms in real time. Clinical Neurophysiology, 118(6).
Barnes, J. (2007). ”The Odd Couple”. Review of ”That Sweet Enemy:
The French and the British from the Sun King to the Present” by
Robert and Isabelle Tombs. New York Review of Books, LIV(5):4–9.
Barrass, S. (1997). Auditory Information Design. PhD thesis, Australian
National University.
Barrass, S. and Adcock, x. (2004). Sonification Design Patterns. In
Proc. Int. Conf. on Auditory Display (ICAD), Sydney, Australia.
199
200
Barrass, S., Whitelaw, M., and Bailes, F. (2006). Listening to the Mind
Listening: An Analysis of Sonification Reviews, Designs and Corre-
spondences. Leonardo Music Journal, 16:13–19.
Barrass, T. (2006). Description of Sonification for ICAD 2006 Concert:
Life Expectancy. In Proc. Int Conf. on Auditory Display (ICAD), Lon-
don, UK.
Beck, U. (1992). Risk Society: Towards a New Modernity. Sage, New
Delhi.
Ben-Tal, O., Berger, J., Cook, B., Daniels, M., Scavone, G., and Cook,
P. (2002). SONART: The Sonification Application Research Toolbox.
In Proc. ICAD, Kyoto, Japan.
Blauert, J. (1997). Spatial Hearing: The Psychophysics of Human Hear-
ing. MIT Press.
Blossfeld, H.-P., Hamerle, A., and Mayer, K. U. (1986). Ereignisanalyse.
Statistische Theorie und Anwendung in den Wirtschafts- und Sozial-
wissenschaften. Campus, Frankfurt.
Blossfeld, H.-P. and Rohwer, G. (1995). Techniques of event history
modeling. New approaches to causal analysis. Lawrence Erlbaum As-
sociates, Mahwah (N. J.).
Borges, J. L. (1980). The analytical language of john wilkins. In
Labyrinths. Penguin.
Boulanger, R. (2000). The Csound Book: Perspectives in Software
Synthesis, Sound Design, Signal Processing, and Programming. MIT
Press, Cambridge, MA, USA.
Bovermann, T. (2005). MBS-Sonogram. http://www.techfak.
uni-bielefeld.de/~tboverma/sc/.
Bovermann, T., de Campo, A., Groten, J., and Eckel, G. (2007). Jug-
gling Sounds. In Proceedings of Interactive Sonification Workshop
ISon2007.
Bovermann, T., Hermann, T., and Ritter, H. (2006). Tangible data
scanning sonification model. In Proc. of the International Conference
on Auditory Display, London, UK.
Bregman, A. S. (1990). Auditory Scene Analysis. Bradford Books, MIT
Press, Cambridge, MA.
201
Bruce, J. and Palmer, N. (2005). SIFT: Sonification Integrable Flexible
Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Limerick,
Ireland.
Buxton, Bill with Billinghurst, M., Guiard, Y., Sellen, A., and
Zhai, S. (2008). Human Input to Computer Systems: Theo-
ries, Techniques and Technology. http://www.billbuxton.com/
inputManuscript.html.
Candey, R., Schertenleib, A., and Diaz Merced, W. (2006). xSonify:
Sonification Tool for Space Physics. In Proc. Int Conf. on Auditory
Display (ICAD), London, UK.
Conner, C. D. (2005). A People’s History of Science: Miners, Midwives
and ”Low Mechanicks”. Nation Books, New York, NY, USA.
Cooper, D. H. and Shiga, T. (1972). Discrete-Matrix Multichannel
Stereo. J. Audio Eng. Soc., 20:344–360.
Cruz-Neira, C., Sandin, D. J., DeFanti, T. A., Kenyon, R. V., and Hart,
J. C. (1992). The CAVE: Audio Visual Experience Automatic Virtual
Environment. Commun. ACM, 35(6):64–72.
Daye, C. and de Campo, A. (2006). Sounds sequential: Sonification in
the Social Sciences. Interdisciplinary Science Reviews, 31(6):349–364.
Daye, C., de Campo, A., and Egger de Campo, M. (2006). Sonifikationen
in der wissenschaftlichen Datenanalyse. Angewandte Sozialforschung,
24(1/2):41–56.
Daye, C., de Campo, A., Fleck, C., Frauenberger, C., and Edelmayer, G.
(2005). Sonification as a tool to reconstruct user’s actions in unob-
servable areas. In Proceedings of ICAD 2005, Limerick.
de Campo, A. (2007a). A Sonification Design Space Map. In Proceedings
of Interactive Sonification Workshop ISon2007.
de Campo, A. (2007b). Toward a Sonification Design Space Map. In
Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada.
de Campo, A. and Daye, C. (2006). Navegar E Preciso, Viver Nao E
Preciso. In Proc. Int. Conf. on Auditory Display (ICAD), London, UK.
de Campo, A. and Egger de Campo, M. (1999). Sonification of So-
cial Data. In Proceedings of the 1999 International Computer Music
Conference (ICMC) Beijing.
202
de Campo, A., Frauenberger, C., and Holdrich, R. (2004). Designing
a Generalized Sonification Environment. In Proceedings of the ICAD
2004, Sydney.
de Campo, A., Frauenberger, C., and Holdrich, R. (2005a). Sonenvir
- a progress report. In Proc. Int. Computer Music Conf. (ICMC),
Barcelona, Spain.
de Campo, A., Frauenberger, C., Vogt, K., Wallisch, A., and Daye,
C. (2006a). Sonification as an Interdisciplinary Working Process. In
Proceedings of ICAD 2006, London.
de Campo, A., Hormann, N., M., H., P., and W., Vogt, K. (2006b). Soni-
fication of lattice data: Dirac spectrum and monopole condensation
along the deconfinement transition. In Proceedings of the Minicon-
ference in honor of Adriano Di Giacomo on the Sense of Beauty in
Physics, Pisa, Italy.
de Campo, A., Hormann, N., Markum, H., Plessas, W., and Sengl, B.
(2005b). Sonification of Lattice Data: The Spectrum of the Dirac
Operator Across the Deconfinement Transition. In Proc. XXIIIrd Int.
Symposium on Lattice Field Theory, Trinity College, Dublin, Ireland.
de Campo, A., Hormann, N., Markum, H., Plessas, W., and Sengl, B.
(2005c). Sonification of Lattice Observables Across Phase Transitions.
In International Workshop on Xtreme QCD, Swansea.
de Campo, A., Hormann, N., Markum, H., Plessas, W., and Vogt, K.
(2006c). Sonification of Monopoles and Chaos in QCD. In Proc. of
ICHEP’06 - the XXXIIIrd International Conference on High Energy
Physics, Moscow, Russia.
de Campo, A., Sengl, B., Frauenberger, C., Melde, T., Plessas, W., and
Holdrich, R. (2005d). Sonification of Quantum Spectra. In Proc. Int
Conf. on Auditory Display (ICAD), Limerick, Ireland.
de Campo, A., Wallisch, A., Holdrich, R., and Eckel, G. (2007). New
Sonification Tools for EEG Data Screening and Monitoring. In Proc.
Int Conf. on Auditory Display (ICAD), Montreal, Canada.
de Rujula, A., Georgi, H., and Glashow, S. L. (1975). Hadron masses in
a gauge theory. Phys. Rev., D12(147).
Dix, A. (1996). Closing the loop: Modelling action, perception and in-
formation. In Catarci, T., Costabile, M. F., Levialdi, S., and Santucci,
203
G., editors, AVI’96 - Advanced Visual Interfaces, pages 20–28. ACM
Press.
Dix, A., Finlay, J., Abowd, G., and Beale, R. (2004). Human-Computer
Interaction. Prentice Hall, Harlow, 3rd edition.
Dombois, F. (2001). Using Audification in Planetary Seismology. In
Proc. Int Conf. on Auditory Display (ICAD), Espoo, Finland.
Drake, S. (1980). Galileo. Oxford University Press, New York.
Drori, G. S., Meyer, J. W., Ramirez, F. O., and Schofer, E. (2003).
Science in the Modern World Polity: Institutionalization and Global-
ization. Stanford University Press, Stanford.
Ebe, M. and Homma, I. (2002). Leitfaden fur die EEG-Praxis. Urban
und Fischer bei Elsevier, 3rd edition.
Eidelman, S. e. a. (2004). Review of Particle Physics. Phys. Lett.,
B592(1).
Fickert, L., Eckel, G., Nagler, W., de Campo, A., and Schmautzer, E.
(2006). New developments of teaching concepts in multimedia learning
for electrical power systems introducing sonification. In Proceedings
of the 29th ICT International Convention MIPRO, Opatija, Croatia.
Fitch, T. and Kramer, G. (1994). Sonifying the Body Electric: Superi-
ority of an Auditory over a Visual Display in a Complex Multivariate
System. In Kramer, G., editor, Auditory Display. Addison-Wesley.
Frauenberger, C., de Campo, A., and Eckel, G. (2007). Analysing time
series data. In Proc. Int Conf. on Auditory Display (ICAD), Montreal,
Canada.
Gardner, B. and Martin, K. (1994). Hrtf measurements of a kemar
dummy-head microphone. online.
Gaver, W. W., Smith, R. B., and O’Shea., T. (1991). Effective Sounds
in Complex Systems: The ARKola Simulation. In Proceedings of CHI
’91, New Orleans, USA.
Gell-Mann, M. (1964). A Schematic Model of Baryons and Mesons.
Phys. Lett., 8:214.
Gerzon, M. (1977a). Multi-System Ambisonic Decoder, Part 1: Basic
Design Philosophy. Wireless World, 83(1499):43–47.
204
Gerzon, M. (1977b). Multi-System Ambisonic Decoder, Part 2: Main
Decoder Circuits. Wireless World, 83(1500):69–73.
Ghazala, R. (2005). Circuit-Bending: Build Your Own Alien Instruments.
Wiley, Hoboken, NJ.
Giddens, A. (1990). The Consequences of Modernity. Stanford University
Press.
Giddens, A. (1999). Runaway World. A series of lectures on globalisa-
tion for the BBC. http://news.bbc.co.uk/hi/english/static/
events/reith_99/.
Glantschnig, K., Kainhofer, R., Plessas, W., Sengl, B., and Wagenbrunn,
R. F. (2005). Extended Goldstone-boson-exchange Constituent Quark
Model. Eur. Phys. J. A.
Glaser, B. and Strauss, A. (1967). The Discovery of Grounded Theory.
Aldine.
Glozman, L., Papp, Z., Plessas, W., Varga, K., and Wagenbrunn, R. F.
(1998). Unified Description of Light- and Strange-Baryon Spectra.
Phys. Rev., D58(094030).
Goodrick, M. (1987). The Advancing Guitarist. Hal Leonard.
GSL Team (2007). Gnu scientific library. http://www.gnu.org/
software/gsl/manual/gsl-ref.html.
Harrar, L. and Stockman, T. (2007). Designing Auditory Graph
Overviews. In Proceedings of ICAD 2007, pages 306–311. McGill
University.
Hayward, C. (1994). Listening to the Earth Sing. In Kramer, G., edi-
tor, Auditory Display, pages 369–404. Addison-Wesley, Reading, MA,
USA.
Hermann, T. (2002). Sonification for Exploratory Data Analysis. PhD
thesis, Bielefeld University, Bielefeld, Germany.
Hermann, T., Baier, G., Stephani, U., and Ritter, H. (2006). Vocal
Sonification of Pathologic EEG Features. In Proceedings of ICAD
2006, London.
Hermann, T. and Hunt, A. (2005). Introduction to Interactive Sonifica-
tion. IEEE Multimedia, Special Issue on Sonification, 12(2):20–24.
205
Hermann, T., Nolker, C., and Ritter, H. (2002). Hand postures for
sonification control. In Wachsmuth, I. and Sowa, T., editors, Gesture
and Sign Language in Human-Computer Interaction, Proc. Int. Gesture
Workshop GW2001, pages 307–316. Springer.
Hermann, T. and Ritter, H. (1999). Listen to your Data: Model-Based
Sonification for Data Analysis. In Advances in intelligent computing
and multimedia systems, pages 189–194, Baden-Baden, Germany. Int.
Inst. for Advanced Studies in System research and cybernetics.
Hinterberger, T. and Baier, G. (2005). POSER: Parametric Orchestral
Sonification of EEG in Real-Time for the Self-Regulation of Brain
States. IEEE Multimedia, Special Issue on Sonification, 12(2):70–79.
Hollander, A. (1994). An Exploration of Virtual Auditory Shape Percep-
tion. Master’s thesis, Univ. of Washington.
Hunt, A. and Pauletto, S. (2006). The Sonification of EMG data. In Pro-
ceedings of the International Conference on Auditory Display (ICAD),
London, UK.
Hunt, A. D., Paradis, M., and Wanderley, M. (2003). The importance
of parameter mapping in electronic instrument design. Journal of New
Music Research, 32(4):429–440.
Igoe, T. (2007). Making Things Talk. Practical Methods for Connecting
Physical Objects. O’Reilly.
Jorda Puig, S. (2005). Digital Lutherie. Crafting musical computers for
new musics’ performance and improvisation. PhD thesis, Departament
de Tecnologia, Universitat Pompeu Fabra.
Joseph, A. J. and Lodha, S. K. (2002). MUSART: Musical Audio Trans-
fer Function Real-time Toolkit. In Proc. Int. Conf. on Auditory Display
(ICAD), Kyoto, Japan.
Kramer, G. (1994a). An Introduction to Auditory Display. In Kramer,
G., editor, Auditory Display: Sonification, Audification, and Auditory
Interfaces, chapter Introduction. Addison-Wesley.
Kramer, G., editor (1994b). Auditory Display: Sonification, Audification,
and Auditory Interfaces. Addison-Wesley, Reading, Menlo Park.
Krassnigg, A., Papp, Z., and Plessas, W. (2000). Faddeev Approach to
Confined Three-Quark Problems. Phys. Rev., C(62):044004.
206
Latour, B. and Woolgar, S. (1986). Laboratory Life: The Construction of
Scientific Facts. Princeton University Press, Princeton, NJ, (Revised
edition with an introduction by Jonas Salk and a new postscript by
the authors.) edition.
Leman, M. (2006). The State of Music Perception Research. Talk at
’Connecting Media’ conference, Hamburg.
Leman, M. and Camurri, A. (2006). Understanding musical expressive-
ness using interactive multimedia platforms. Musicae Scientiae, special
issue.
Lodha, S. K., Beahan, J., Heppe, T., Joseph, A., and Zane-Ulman, B.
(1997). MUSE: A Musical Data Sonification Toolkit. In Proc. Int
Conf. on Auditory Display (ICAD), Palo Alto, CA, USA.
Loering, U., Metsch, B. C., and Petry, H. R. (2001). The light baryon
spectrum in a relativistic quark model with instanton-induced quark
forces: The non-strange baryon spectrum and ground-states. Eur.
Phys. J., A10:395.
Madhyastha, T. (1992). Porsonify: A Portable System for Data Sonifi-
cation. Master’s thesis, University of Illinois at Urbana-Champaign.
Malham, D. G. (1999). Higher Order Ambisonic Systems for the Spa-
tialisation of Sound. In Proceedings of the ICMC, Beijing, China.
Marsaglia, G. (2003). DIEHARD: A Battery of Tests for Random Number
Generators. http://www.csis.hku.hk/ diehard/.
Mathews, M. and Miller, J. (1963). Music IV programmer’s manual. Bell
Telephone Laboratories, Murray Hill, NJ, USA.
Mayer-Kress, G. (1994). Sonification of Multiple Electrode Human Scalp
Electroencephalogram. Poster presentation demo at ICAD ’94, http:
//www.ccsr.uiuc.edu/People/gmk/Projects/EEGSound/.
McCartney, J. (2003-2007). SuperCollider3. http://supercollider.
sourceforge.net.
McKusick, V. A., Sharpe, W. D., and Warner, A. O. (1957). Harvey
Tercentenary: An Exhibition on the History of Cardiovascular Sound
Including the Evolution of the Stethoscope. Bulletin of the History of
Medicine, 31:p.463–487.
207
Meinicke, P., Hermann, T., Bekel, H., Muller, H. M., Weiss, S., and
Ritter, H. (2002). Identification of Discriminative Features in EEG.
Journal for Intelligent Data Analysis.
Milczynski, M., Hermann, T., Bovermann, T., and Ritter, H. (2006).
A malleable device with applications to sonification-based data explo-
ration. In Proc. of the International Conference on Auditory Display,
London, UK.
Moore, B. C. (2004). An Introduction to the Psychology of Hearing.
Elsevier, fifth edition.
Musil, T., Noisternig, M., and Holdrich, R. (2005). A Library for Realtime
3D Binaural Sound Reproduction in Pure Data (PD). In Proc. Int.
Conf. on Digital Audio Effects (DAFX-05), Madrid, Spain.
Neuhoff, J. (2004). Ecological Psychoacoustics. Springer.
Noisternig, M., Musil, T., Sontacchi, A., and Holdrich, R. (June, 2003).
A 3D Ambisonic based Binaural Sound Reproduction System. In Proc.
Int. Conf. Audio Eng. Soc., Banff, Canada.
P. Fronczak, A. Fronczak, J. A. H. (2006). Ferromagnetic fluid as a
model of social impact. International Journal of Modern Physics,
17(8):1227–1235.
Panek, P., Daye, C., Edelmayer, G., and et al. (2005). Real Life Test with
a Friendly Rest Room (FRR) Toilet Prototype in a Daye Care Center
in Vienna – An Interim Report. In Proc. 8th European Conference for
the Advancement of Assistive Technologies in Europe, Lille.
Pauletto, S. (2007). Interactive non-speech auditory display of multivari-
ate data. PhD thesis, University of York.
Pauletto, S. and Hunt, A. (2004). A Toolkit for Interactive Sonification.
In Proceedings of ICAD 2004, Sydney.
Pelling, A. E., Sehati, S., Gralla, E. B., Valentine, J. S., and Gimzewski,
J. K. (2004). Local Nanomechanical Motion of the Cell Wall of Sac-
charomyces cerevisiae. Science, 305(5687):1147–1150.
Pereverzev, S. V., Loshak, A., Backhaus, S., Davies, J., and Packard,
R. E. (1997). Quantum Oscillations between two weakly coupled reser-
voirs of superfluid 3He. Nature, 388:449–451.
208
Piche, J. and Burton, A. (1998). Cecilia: A Production Interface to
Csound. Computer Music Journal, 22(2):52–55.
Pigafetta, A. (1530). Primo Viaggio Intorno al Globo Terracqueo (First
Voyage Around the Terraqueous World). Giuseppe Galeazzi, Milano.
Pigafetta, A. (2001). Mit Magellan um die Erde. (Magellan’s Voyage: A
Narrative Account of the First Circumnavigation). Edition Erdmann,
Lenningen, Germany. (First edition Paris 1525.).
Potard, G. (2006). Guernica 2006: Sonification of 2006 Years of War
and World Population Data. In Proc. Int Conf. on Auditory Display
(ICAD), London, UK.
Pulkki, V. (2001). Spatial Sound Generation and Perception by Ampli-
tude Panning. PhD thesis, Helsinki University of Technology, Espoo.
Raskin, J. (2000). The Humane Interface. Addison-Wesley.
Rheinberger, H.-J. (2006). Experimentalsysteme und Epistemische Dinge
(Experimental Systems and Epistemic Things). Suhrkamp, Germany.
Riess, F., Heering, P., and Nawrath, D. (2005). Reconstructing Galileo’s
Inclined Plane Experiments for Teaching Purposes. In Proc. of the In-
ternational History, Philosophy, Sociology and Science Teaching Con-
ference, Leeds, UK.
Roads, C. (2002). Microsound. MIT Press.
Rohrhuber, J. (2006). Terra Nullius. In Proc. Int Conf. on Auditory
Display (ICAD), London, UK.
Rohrhuber, J., de Campo, A., and Wieser, R. (2005). Algorithms To-
day - Notes on Language Design for Just In Time Programming. In
Proceedings of the ICMC 2005, Barcelona.
Ryan, J. (1991). Some Remarks on Musical Instrument Design at
STEIM. Contemporary Music Review, 6(1):3–17. Also available online:
http://www.steim.org/steim/texts.phtml?id=3.
Saraiya, P., North, C., and Duca, K. (2005). An insight-based method-
ology for evaluating bioinformatics visualizations. Transactions on Vi-
sualization and Computer Graphics, 11(4):443– 456.
Scaletti, C. (1994). Sound Synthesis Algorithms for Auditory Data Rep-
resentations. In Kramer, G., editor, Auditory Display: Sonification,
Audification, and Auditory Interfaces. Addison-Wesley.
209
Schaeffer, P. (1997). Traite des objets musicaux. Le Seuil, Paris.
Snyder, B. (2000). Music and Memory. MIT Press.
Speeth, S. D. (1961). Seismometer sounds. J. Acoust. Soc. Am., 33:909–
916.
Stockman, T., Nickerson, L. V., and Hind, G. (2005). Auditory graphs:
A summary of current experience and towards a research agenda. In
Proc. ICAD 2005, Limerick.
Suzuki, Y. and Varga, K. (1998). Stochastic variational approach to
quantum-mechanical few-body problems. Lecture Notes in Physics,
m54.
TAP, ACM (2004). Acm transactions of applied perception. New York,
NY, USA.
Theussl, L., Wagenbrunn, R. F., Desplanques, B., and Plessas, W.
(2001). Hadronic Decays of N and Delta Resonances in a Chiral Quark
Model. Eur. Phys. J., A12:91.
UN Statistics Division (1975). Towards A System of Social Demographic
Statistics. United Nations, Available online at UN Statistics Division
(2006).
UN Statistics Division (1989). Handbook of Social Indicators. UN Statis-
tics website.
UN Statistics Division (2006). Social Indicators.
http://unstats.un.org/unsd/demographic/products/socind/default.htm.
Urick, R. J. (1967). Principles of Underwater Sound. McGraw-Hill, New
York, NY, USA.
U.S. Census Bureau (2006). World POPClock Projection. http://www.
census.gov/ipc/www/popclockworld.html.
Vercoe, B. (1986). CSOUND: A Manual for the Audio Processing System
and Supporting Programs. M.I.T. Media Laboratory, Cambridge, MA,
USA.
Vogt, K., de Campo, A., Frauenberger, C., Plessas, W., and Eckel, G.
(2007). Sonification of Spin Models. Listening to Phase Transitions
in the Ising and Potts Model. In Proc. Int Conf. on Auditory Display
(ICAD), Montreal, Canada.
210
Voss, R. and Clarke, J. (1975). 1/f noise in speech and music. Nature,
(258):317–318.
Voss, R. and Clarke, J. (1978). 1/f Noise in Music: Music from 1/f
Noise. J. Acoust. Soc. Am., 63:258–263.
Walker, B. (2000). Magnitude Estimation of Conceptual Data Dimen-
sions for Use in Sonification. PhD thesis, Rice University, Houston.
Walker, B. and Cothran, J. (2003). Sonification Sandbox: A Graphical
Toolkit for Auditory Graphs. In Proceedings of ICAD 2003, Boston.
Walker, B. N. and Kramer, G. (1996). Mappings and Metaphors in
Auditory Displays: An Experimental Assessment. In Frysinger, S. and
Kramer, G., editors, Proc. Int. Conf. on Auditory Display (ICAD),
pages 71–74, Palo Alto, CA.
Walker, B. N. and Kramer, G. (2005a). Mappings and Metaphors in
Auditory Displays: An Experimental Assessment. ACM Trans. Appl.
Percept., 2(4):407–412.
Walker, B. N. and Kramer, G. (2005b). Sonification Design and
Metaphors: Comments on Walker and Kramer, ICAD 1996. ACM
Trans. Appl. Percept., 2(4):413–417.
Walker, B. N. and Kramer, G. (2006). International Encyclopedia of
Ergonomics and Human Factors (2nd ed.), chapter Auditory Displays,
Alarms, and Auditory Interfaces, pages 1021–1025. CRC Press, New
York.
Wallisch, A. (2007). EEG plus Sonifikation. Sonifikation von EEG-Daten
zur Epilepsiediagnostik im Rahmen des Projekts ’SonEnvir’. PhD the-
sis, Medical University Graz, Graz, Austria.
Warusfel, O. (2002-2003). LISTEN HRTF database.
http://recherche.ircam.fr/equipes/salles/listen/.
Wedensky, N. (1883). Die telefonische Wirkungen des erregten Ner-
ven - The Telephonic Effects of the Excited Nerve. Centralblatt fur
medizinische Wissenschaften, (26).
Wessel, D. (2006). An Enactive Approach to Computer Music Perfor-
mance. In GRAME, editor, Proc. of ’Rencontres Musicales Pluridisci-
plinaires’, Lyon, France.
211
Wikipedia (2006a). Gini Coefficient.
http://en.wikipedia.org/wiki/Gini coefficient.
Wikipedia (2006b). Magellan. http://en.wikipedia.org/wiki/Magellan.
Wikipedia (2007). Levy skew alpha-stable distribution.
http://en.wikipedia.org/wiki/Levy skew alpha-stable distribution.
Williams, S. (1994). Perceptual Principles in Sound Grouping. In Kramer,
G., editor, Auditory Display. Addison-Wesley.
Wilson, C. M. and Lodha, S. K. (1996). Listen: A Data Sonification
Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Santa Cruz,
CA, USA.
Wilson, K. (1974). Renormalization group theory. Physics Reports,
75(12).
Worrall, D., Bylstra, M., Barrass, S., and Dean, R. (2007). SoniPy: The
Design of an Extendable Software Framework for Sonification Research
and Auditory Display. In Proc. Int Conf. on Auditory Display (ICAD),
Montreal, Canada.
Yeo, W. S., Berger, J., and Wilson, R. S. (2004). A Flexible Framework
for Real-time Sonification with SonArt. In Proc. Int Conf. on Auditory
Display (ICAD), Sydney, Australia.
Yeomans, J. M. (1992). Statistical Mechanics of Phase Transitions.
Oxford University Press.
Zouhar, V., Lorenz, R., Musil, T., Zmolnig, J. M., and Holdrich, R.
(2005). Hearing Varese’s Poeme Electronique inside a Virtual Philips
Pavilion. In Proc. Int. Conf. on Auditory Display (ICAD), Limerick,
Ireland.
Zweig, G. (1964). An SU(3) Model for Strong Interaction Symmetry and
its Breaking. CERN Report Th.401/Th.412, page 8182/8419.
Zweig, S. (1983). Magellan - Der Mann und seine Tat. (Magellan - The
Man and his Achievement). Fischer, Frankfurt am Main. (First ed.
Vienna 1938).
Zwicker, E. and Fastl, H. (1999). Psychoacoustics-Facts and Models,
2nd Ed. Springer, Berlin.