science by ear diss decampo

224
Institute of Electronic Music and Acoustics - IEM, University for Music and Dramatic Arts Graz Science By Ear. An Interdisciplinary Approach to Sonifying Scientific Data Alberto de Campo Dissertation Graz, February 23, 2009 Supervisor: Prof Dr Robert H¨oldrich (IEM/KUG), Prof Dr Curtis Roads (MAT/UCSB)

Upload: adcxyz

Post on 20-Feb-2015

167 views

Category:

Documents


14 download

TRANSCRIPT

Page 1: Science by Ear Diss DeCampo

Institute of Electronic Music and Acoustics - IEM,University for Music and Dramatic Arts Graz

Science By Ear.An Interdisciplinary Approach to Sonifying ScientificData

Alberto de Campo

Dissertation

Graz, February 23, 2009

Supervisor:Prof Dr Robert Holdrich (IEM/KUG),Prof Dr Curtis Roads (MAT/UCSB)

Page 2: Science by Ear Diss DeCampo

ii

Science By Ear.An Interdisciplinary Approach to Sonifying Scientific Data

Author: Alberto de CampoContact: [email protected]

Supervisor:Prof Dr Robert Holdrich (IEM/KUG),Prof Dr Curtis Roads (MAT/UCSB)Contact: [email protected], [email protected]

DissertationInstitute of Electronic Music and Acoustics - IEM,University for Music and Dramatic Arts GrazInffeldgasse 10, A-8020 Graz, Austria

February 23, 2009, 211 pages

Abstract

Sonification of Scientific Data is intrinsically interdisciplinary: It requires collaborationbetween experts in the respective scientific domains, in psychoacoustics, in artistic designof synthetic sound, and in working with appropriate programming environments. TheSonEnvir project hosted at IEM Graz put this view into practice: in four domain sciences,sonification designs for current research questions were realised.

This dissertation contributes to sonification research in three aspects:

The body of sonification designs realised within the SonEnvir context is described, whichmay be reused in sonification research in different ways.

The software framework built with and for these sonification designs is presented, whichsupports fluid experimentation with evolving sonification designs.

A theoretical model for sonification design work, the Sonification Design Space Map, wassynthesised based the analysis of this body of sonification designs (and a few selectedothers). This model allows systematic reasoning about the process of creating sonifica-tion designs, and provides concepts for analysing and categorising existing sonificationsdesigns more systematically.

Deutsche Zusammenfassung - German abstract

Die Sonifikation von wissenschaftlichen Daten ist intrinsisch interdisziplinar: Sie verlangtZusammenarbeit zwischen ExpertInnen in den jeweiligen wissenschaftlichen Gebieten, inPsychoakustik, in der kunstlerischen Gestaltung von synthetischem Klang, und in derArbeit mit geeigneten Programmierumgebungen. Das Projekt SonEnvir, das am IEMGraz stattfand, hat diese Sichtweise in die Praxis umgesetzt: in vier wissenschaftlichenGebieten (domain sciences) wurden Sonifikations-Designs zu aktuellen Forschungsfragenrealisiert.

Page 3: Science by Ear Diss DeCampo

iii

Diese Dissertation tragt drei Aspekte zur Sonifikationforschung bei:

Der Korpus der im Kontext von SonEnvir entwickelten Sonification Designs wird detail-liert beschrieben; diese Designs konnen in der Forschungsgemeinschaft in verschiedenerWeise Weiterverwendung finden.

Das Software-Framework, das fur und mit diesen Designs gebaut wurde, wird beschrieben;es erlaubt fliessendes Experimentieren in der Entwicklung von Sonifikationsdesigns.

Ein theoretisches Modell fur die Gestaltung von Sonifikationen, die Sonification DesignSpace Map, wurde auf Basis der Analysen dieser (und ausgewahlter anderer) Designssynthetisiert. Dieses Modell erlaubt systematisches Nachdenken (reasoning) uber denGestaltungsprozess von Sonifikationsdesigns, und bietet Konzepte fur die Analyse undKategorisierung existierender Sonifikationsdesigns an.

Keywords: Sonification, Sonification Theory, Perceptualisation, Interdisciplinary Re-search, Interactive Software Development, Just In Time Programming

Page 4: Science by Ear Diss DeCampo

iv

Acknowledgements

First of all, I would like to thank Marianne Egger de Campo for designing several versionsof the XENAKIS proposal with me - a sonification project with European partners thateventually became SonEnvir. Then, I would like to thank my research partners in theSonEnvir project: Christian Daye, Christopher Frauenberger, Kathi Vogt and AnnetteWallisch, without whom this work would not have been possible. I would like to thankRobert Holdrich for his collaboration on the grant proposals, and for his contribution tothe EEG realtime sonification; and Gerhard Eckel for leading the SonEnvir project formost of its lifetime.

I would like to thank the participants of the Science By Ear workshop, who have been veryopen to a very particular experimental setup in interdisciplinary collaboration, especiallyfor the discussions which eventually led to formulating the concept of the SonificationDesign Space Map. A very special thank you is in order for the brave people whowere willing to try programming sonification designs just-in-time within this workshop:Till Bovermann, Christopher Frauenberger, Thomas Musil, Sandra Pauletto, and JulianRohrhuber.

For the Spin Models, the following Science By Ear participants also worked on a sonifica-tion design for the Ising model (besides the SonEnvir team): Thomas Hermann, HaraldMarkum, Julian Rohrhuber and Tony Stockman. Concerning the background in theo-retical physics, we would also like to thank Christof Gattringer, Christian Bernd Lang,Leopold Mathelitsch and Ulrich Hohenester.

For the piece Navegar, I would to thank Peter Jakober for researching the detailedtimeline, and Marianne Egger de Campo for suggesting the Gini index as an interestingvariable.

Alberto de Campo Graz, February 23, 2009

Page 5: Science by Ear Diss DeCampo

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Overview of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Psychoacoustics, Perception, Cognition, and Interaction 6

2.1 Psychoacoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Auditory perception and memory . . . . . . . . . . . . . . . . . . . . . 8

2.3 Cognition, action, and embodiment . . . . . . . . . . . . . . . . . . . . 10

2.4 Perception, perceptualisation and interaction . . . . . . . . . . . . . . . 11

2.5 Mapping, mixing and matching metaphors . . . . . . . . . . . . . . . . 12

3 Sonification Systems 13

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.1 A short history of sonification . . . . . . . . . . . . . . . . . . . 14

3.1.2 A taxonomy of intended sonification uses . . . . . . . . . . . . . 17

3.2 Sonification toolkits, frameworks, applications . . . . . . . . . . . . . . 18

3.2.1 Historic systems . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2 Current systems . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3 Music and sound programming environments . . . . . . . . . . . . . . . 20

3.4 Design of a new system . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4.1 Requirements of an ideal sonification environment . . . . . . . . 23

3.4.2 Platform choice . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.5 SonEnvir software - Overall scope . . . . . . . . . . . . . . . . . . . . . 24

3.5.1 Software framework . . . . . . . . . . . . . . . . . . . . . . . . 25

v

Page 6: Science by Ear Diss DeCampo

vi

3.5.2 Framework structure . . . . . . . . . . . . . . . . . . . . . . . . 25

3.5.3 The Data model . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Project Background 29

4.1 The SonEnvir project . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Partner institutions and people . . . . . . . . . . . . . . . . . . 29

4.1.2 Project flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Science By Ear - An interdisciplinary workshop . . . . . . . . . . . . . . 32

4.2.1 Workshop design . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Working methods . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 ICAD 2006 concert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 Listening to the Mind Listening . . . . . . . . . . . . . . . . . . 34

4.3.2 Global Music - The World by Ear . . . . . . . . . . . . . . . . . 34

5 General Sonification Models 37

5.1 The Sonification Design Space Map (SDSM) . . . . . . . . . . . . . . . 38

5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1.3 The Sonification Design Space Map . . . . . . . . . . . . . . . . 41

5.1.4 Refinement by moving on the map . . . . . . . . . . . . . . . . 43

5.1.5 Examples from the ’Science by Ear’ workshop . . . . . . . . . . . 47

5.1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.1.7 Extensions of the SDS map . . . . . . . . . . . . . . . . . . . . 51

5.2 Data dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.1 Data categorisation . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.2 Data organisation . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.3 Task Data analysis - LoadFlow data . . . . . . . . . . . . . . . . 53

5.3 Synthesis models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3.1 Sonification strategies . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.2 Continuous Data Representation . . . . . . . . . . . . . . . . . 57

5.3.3 Discrete Data Representation . . . . . . . . . . . . . . . . . . . 61

5.3.4 Parallel streams . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Page 7: Science by Ear Diss DeCampo

vii

5.3.5 Model Based Sonification . . . . . . . . . . . . . . . . . . . . . 63

5.4 User, task, interaction models . . . . . . . . . . . . . . . . . . . . . . . 64

5.4.1 Background - related disciplines . . . . . . . . . . . . . . . . . . 64

5.4.2 Music interfaces and musical instruments . . . . . . . . . . . . . 65

5.4.3 Interactive sonification . . . . . . . . . . . . . . . . . . . . . . . 66

5.4.4 ”The Humane Interface” and sonification . . . . . . . . . . . . . 67

5.4.5 Goals, tasks, skills, context . . . . . . . . . . . . . . . . . . . . 69

5.4.6 Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.5 Spatialisation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.5.1 Speaker-based sound rendering . . . . . . . . . . . . . . . . . 75

5.5.2 Headphones . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.5.3 Handling speaker imperfections . . . . . . . . . . . . . . . . . 80

6 Examples from Sociology 81

6.1 FRR Log Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.1.1 Technical background . . . . . . . . . . . . . . . . . . . . . . . 82

6.1.2 Analysis steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.1.3 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.1.4 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.1.5 Evaluation for the research context . . . . . . . . . . . . . . . . 88

6.1.6 Evaluation in SDSM terms . . . . . . . . . . . . . . . . . . . . 88

6.2 ’Wahlgesange’ - ’Election Songs’ . . . . . . . . . . . . . . . . . . . . . 90

6.2.1 Interface and sonification design . . . . . . . . . . . . . . . . . . 91

6.2.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.3 Social Data Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.3.2 Interaction design . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.3.3 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7 Examples from Physics 98

7.1 Quantum Spectra sonification . . . . . . . . . . . . . . . . . . . . . . . 100

7.1.1 Quantum spectra of baryons . . . . . . . . . . . . . . . . . . . . 101

7.1.2 The Quantum Spectra Browser . . . . . . . . . . . . . . . . . . 101

Page 8: Science by Ear Diss DeCampo

viii

7.1.3 The Hyperfine Splitter . . . . . . . . . . . . . . . . . . . . . . . 104

7.1.4 Possible future work and conclusions . . . . . . . . . . . . . . . 107

7.2 Sonification of Spin models . . . . . . . . . . . . . . . . . . . . . . . . 109

7.2.1 Physical background . . . . . . . . . . . . . . . . . . . . . . . . 109

7.2.2 Ising model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7.2.3 Potts model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.2.4 Audification-based sonification . . . . . . . . . . . . . . . . . . 114

7.2.5 Channel sonification . . . . . . . . . . . . . . . . . . . . . . . . 116

7.2.6 Granular sonification . . . . . . . . . . . . . . . . . . . . . . . . 117

7.2.7 Sonification of self-similar structures . . . . . . . . . . . . . . . 119

7.2.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

8 Examples from Speech Communication and Signal Processing 122

8.1 Time Series Analyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.1.1 Mathematical background . . . . . . . . . . . . . . . . . . . . . 123

8.1.2 Sonification tools . . . . . . . . . . . . . . . . . . . . . . . . . 124

8.1.3 The PDFShaper . . . . . . . . . . . . . . . . . . . . . . . . . . 124

8.1.4 TSAnalyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

8.2 Listening test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.2.1 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8.2.2 Listening experiment . . . . . . . . . . . . . . . . . . . . . . . . 128

8.2.3 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

9 Examples from Neurology 134

9.1 Auditory screening and monitoring of EEG data . . . . . . . . . . . . . . 134

9.1.1 EEG and sonification . . . . . . . . . . . . . . . . . . . . . . . . 134

9.1.2 Rapid screening of long-time EEG recordings . . . . . . . . . . . 135

9.1.3 Realtime monitoring during EEG recording sessions . . . . . . . . 136

9.2 The EEG Screener . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

9.2.1 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . 136

9.2.2 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . 138

9.3 The EEG Realtime Player . . . . . . . . . . . . . . . . . . . . . . . . 140

9.3.1 Sonification design . . . . . . . . . . . . . . . . . . . . . . . . . 141

Page 9: Science by Ear Diss DeCampo

ix

9.3.2 Interface design . . . . . . . . . . . . . . . . . . . . . . . . . . 143

9.4 Evaluation with user tests . . . . . . . . . . . . . . . . . . . . . . . . 144

9.4.1 EEG test data . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9.4.2 Initial pre-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

9.4.3 Tests with expert users . . . . . . . . . . . . . . . . . . . . . . 145

9.4.4 Analysis of expert user tests EEG Screener 1 vs. 2 . . . . . . . . 146

9.4.5 Analysis of expert user tests - RealtimePlayer 1 vs. 2 . . . . . . 147

9.4.6 Qualitative results for both players (versions 2) . . . . . . . . . 149

9.4.7 Conclusions from user tests . . . . . . . . . . . . . . . . . . . . 149

9.4.8 Next steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

9.4.9 Evaluation in SDSM terms . . . . . . . . . . . . . . . . . . . . 150

10 Examples from the Science by Ear Workshop 151

10.1 Rainfall data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

10.2 Polysaccharides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

10.2.1 Polysaccharides - Materials made by nature . . . . . . . . . . . . 156

10.2.2 Session notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

11 Examples from the ICAD 2006 Concert 160

11.1 Life Expectancy - Tim Barrass . . . . . . . . . . . . . . . . . . . . . . 160

11.2 Guernica 2006 - Guillaume Potard . . . . . . . . . . . . . . . . . . . . 162

11.3 ’Navegar E Preciso, Viver Nao E Preciso’ . . . . . . . . . . . . . . . . 163

11.3.1 Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

11.3.2 The route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

11.3.3 Data choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

11.3.4 Economic characteristics . . . . . . . . . . . . . . . . . . . . . 167

11.3.5 Access to drinking water . . . . . . . . . . . . . . . . . . . . . 168

11.3.6 Mapping choices . . . . . . . . . . . . . . . . . . . . . . . . . 168

11.4 Terra Nullius - Julian Rohrhuber . . . . . . . . . . . . . . . . . . . . . 169

11.4.1 Missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

11.4.2 The piece . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

11.5 Comparison of the pieces . . . . . . . . . . . . . . . . . . . . . . . . . 172

12 Conclusions 175

Page 10: Science by Ear Diss DeCampo

x

12.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

A The SonEnvir framework structure in subversion 177

A.1 The folder ’Framework’ . . . . . . . . . . . . . . . . . . . . . . . . . . 177

A.2 The folder ’SC3-Support’ . . . . . . . . . . . . . . . . . . . . . . . . . 178

A.3 Other folders in the svn repository . . . . . . . . . . . . . . . . . . . . . 178

A.4 Quarks-SonEnvir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

A.5 Quarks-SuperCollider . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

B Models - code examples 180

B.1 Spatialisation examples . . . . . . . . . . . . . . . . . . . . . . . . . . 180

B.1.1 Physical sources . . . . . . . . . . . . . . . . . . . . . . . . . 180

B.1.2 Amplitude panning . . . . . . . . . . . . . . . . . . . . . . . . 181

B.1.3 Ambisonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

B.1.4 Headphones . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

B.1.5 Handling speaker imperfections . . . . . . . . . . . . . . . . . 186

C Physics Background 189

C.1 Constituent Quark Models . . . . . . . . . . . . . . . . . . . . . . . . . 189

C.2 Potts model- theoretical background . . . . . . . . . . . . . . . . . . . 192

C.2.1 Spin models sound examples . . . . . . . . . . . . . . . . . . . . 193

D Science By Ear participants 195

E Background on ’Navegar’ 197

F Sound, meaning, language 198

Page 11: Science by Ear Diss DeCampo

List of Tables

5.1 Scale types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2 The Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 The Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 The Data/Information: . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.5 The Data: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.1 Sectors of economic activities . . . . . . . . . . . . . . . . . . . . . . . 95

9.1 Equally spaced EEG band ranges. . . . . . . . . . . . . . . . . . . . . . 135

9.2 Questionnaire scales for EEG sonification designs . . . . . . . . . . . . . 146

11.1 Navegar - Mappings of data to sound parameters . . . . . . . . . . . . . 169

11.2 Some stations along the timeline of ’Navegar’ . . . . . . . . . . . . . . . 170

B.1 Remapping spatial control values . . . . . . . . . . . . . . . . . . . . . 182

E.1 Os Argonautas - Caetano Veloso . . . . . . . . . . . . . . . . . . . 197

xi

Page 12: Science by Ear Diss DeCampo

List of Figures

2.1 Some aspects of auditory memory, from Snyder (2000). . . . . . . . . . 9

3.1 Inclined plane for Galilei’s experiments on the law of falling bodies. . . . 15

3.2 UML diagram of the data model. . . . . . . . . . . . . . . . . . . . . . 27

5.1 The Sonification Design Space Map . . . . . . . . . . . . . . . . . . . 42

5.2 SDS Map for designs with varying numbers of streams. . . . . . . . . . . 46

5.3 All design steps for the LoadFlow dataset. . . . . . . . . . . . . . . . . 48

5.4 LoadFlow - time series of dataset (averaged over many households) . . . 55

5.5 LoadFlow - time series for 3 individual households . . . . . . . . . . . . 55

6.1 The toilet prototype system used for the FRR field test. . . . . . . . . . 83

6.2 Graphical display of one usage episode (Excel). . . . . . . . . . . . . . . 85

6.3 FRR Log Player GUI and sounds mixer. . . . . . . . . . . . . . . . . . 87

6.4 SDS Map for the FRR Log Player. . . . . . . . . . . . . . . . . . . . . 89

6.5 GUI Window for the Wahlgesange Design. . . . . . . . . . . . . . . . . 91

6.6 SDS-Map for Wahlgesange. . . . . . . . . . . . . . . . . . . . . . . . . 94

6.7 GUI Window for the Social Data Explorer. . . . . . . . . . . . . . . . . 96

7.1 Excitation spectra of N (left) and ∆ (right) particles. . . . . . . . . . . 101

7.2 The QuantumSpectraBrowser GUI. . . . . . . . . . . . . . . . . . . . . 103

7.3 The Hyperfine Splitter GUI. . . . . . . . . . . . . . . . . . . . . . . . . 106

7.4 Schema of spins in the Ising model as an example for Spin models. . . . 110

7.5 Schema of the orders of phase transitions in spin models. . . . . . . . . 111

7.6 GUI for the running 4-state Potts Model in 2D. . . . . . . . . . . . . . . 113

7.7 Audification of a 4-state Potts model. . . . . . . . . . . . . . . . . . . . 115

7.8 Sequentialisation schemes for the lattice used for the audification. . . . . 115

7.9 A 3-state Potts model cooling down from super- to subcritical state. . . 117

xii

Page 13: Science by Ear Diss DeCampo

xiii

7.10 Granular sonification scheme for the Ising model. . . . . . . . . . . . . . 118

7.11 A self similar structure as a state of an Ising model. . . . . . . . . . . . 119

8.1 The PDFShaper interface . . . . . . . . . . . . . . . . . . . . . . . . . 125

8.2 The TSAnalyser interface . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.3 The interface for the time series listening experiment. . . . . . . . . . . 128

8.4 Probability of correctness over ∆ kurtosis in set 1 . . . . . . . . . . . . 129

8.5 Probability of correctness over ∆ kurtosis in set 2 . . . . . . . . . . . . 130

8.6 Probability of correctness over ∆ skew in set 2 . . . . . . . . . . . . . . 130

8.7 Probability of correctness over ∆ skew and ∆ kurtosis in set 2 . . . . . . 131

8.8 Number of replays over ∆ kurtosis in set 2 . . . . . . . . . . . . . . . . 132

9.1 The Sonification Design Space Map for both EEG Players. . . . . . . . . 137

9.2 The EEGScreener GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

9.3 The Montage Window. . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.4 EEG Realtime Sonification block diagram. . . . . . . . . . . . . . . . . 142

9.5 The EEG Realtime Player GUI. . . . . . . . . . . . . . . . . . . . . . . 143

9.6 Expert user test ratings for both EEGScreener versions. . . . . . . . . . 147

9.7 Expert user test ratings for both RealtimePlayer versions. . . . . . . . . 148

10.1 Precipitation in the Alpine region, 1980-1991. . . . . . . . . . . . . . . . 152

10.2 Orography of the grid of regions. . . . . . . . . . . . . . . . . . . . . . 153

10.3 SDSM map of Rainfall data set. . . . . . . . . . . . . . . . . . . . . . . 156

11.1 Magellan’s route in Antonio Pigafetta’s travelogue . . . . . . . . . . . . 165

11.2 Magellan’s route, as reported in wikipedia. . . . . . . . . . . . . . . . . 166

11.3 The countries of the world and their Gini coefficients. . . . . . . . . . . 167

11.4 Terra Nullius, latitude zones . . . . . . . . . . . . . . . . . . . . . . . . 171

11.5 SDSM comparison of the ICAD 2006 concert pieces. . . . . . . . . . . . 173

B.1 The Spectralyzer GUI window. . . . . . . . . . . . . . . . . . . . . . . . 187

C.1 Multiplet structure of the baryons as a decuplet. . . . . . . . . . . . . . 191

Page 14: Science by Ear Diss DeCampo

Chapter 1

Introduction

Sonification of Scientific Data, i.e., the perceptualisation of data by means of sound

in order to find structures and patterns within them, is intrinsically interdisciplinary: It

requires collaboration between experts in the respective scientific domains the data come

from, in psychoacoustics, in the artistic design of synthetic sound, and in working with

appropriate programming environments to realise successful sonification designs. The

concept of the SonEnvir project (hosted at IEM Graz from 2005 to 2007) has put this

view into practice: in four science domains, sonification designs for current research

questions were realised in close collaboration with audio programming specialists.

The research reported here mainly took place in the SonEnvir project context. This

dissertation contributes to sonification research in three ways:

• The body of sonification designs realised within SonEnvir is described in detail.

They may be reused in sonification research by the community, both as concepts

and as open-source implementations on which new solutions can be based.

• For realising these sonification designs, a software framework was built in the lan-

guage SuperCollider3 that allows for flexible, rapid experimentation with evolving

sonification designs (in Just In Time programming style). Being open-source, this

framework may be reused and possibly maintained by the research community in

the future.

• The analysis of this body of sonification designs (and a few others of interest)

has eventually led to a general model of sonification design work, the Sonification

Design Space Map. This contribution to sonification theory allows systematic

reasoning about the process of developing sonification designs; based on data

properties and context, it suggests candidates for the next experimental steps in

the ongoing design process. It also provides concepts for analysing and categorising

existing sonifications designs more systematically.

1

Page 15: Science by Ear Diss DeCampo

2

1.1 Motivation

Data are pervasive in modern societies: Science, politics, economics, and everyday life

depend fundamentally on data for decisions. Larger and larger amounts of data are

being acquired in the hope of their usefulness, taking advantage of continuing progress

in information technology.

While data may contain obvious information (i.e., well-understood ’content’), very often

one also assumes they contain implicit or even hidden facts about the phenomena ob-

served; understanding these hitherto unknown facts is highly desired. The research field

that most directly addresses this interest is Data Mining, or Exploratory Data Analysis.

Two approaches are in common use for extracting new information from data: One

is statistical analysis, the other is data perceptualisation, i.e, making data properties

perceptible to the human senses; and many existing software tools combine both: from

statistics programs like Excel and SPSS, science and engineering environments like MAT-

Lab and Mathematica, to a host of special-purpose tools for specific domains of science

or economy.

For scientists, perceptualisation of data is of vital interest; it is almost exclusively ap-

proached by visual means for a combination of reasons1. Visualisation tools have per-

meated scientific cultures to the point of being invisible; many scientists are well-versed

in tools that visualize their results, and rarely do scientists question how accurately and

adequately visual representations represent the data content. Many Virtual Reality sys-

tems, such as the CAVE (Cruz-Neira et al. (1992)) and others, claim scientific data

exploration as one of their stronger usage scenarios. Nevertheless, sound often seems to

be added to such systems only as an afterthought, usually with the intention to achieve

better ’immersion’ and emotional engagement (sometimes even alluding to cinema-like

effects as the inspiration for the approach intended).

Sonification, the representation of data by acoustic means, is a potentially useful al-

ternative and complement to visual approaches that has not reached the same level of

acceptance. This is the starting point for the research agenda described here: To create

an interdisciplinary research setting where scientists from different domains (’domain

scientists’) and specialists in artistic audio design and programming (’sound experts’)

work together on auditory representations (’sonification designs’) for specific scientific

data sets and their context. Such a venture should be well positioned to contribute to

the progress of sonification as a scientific discipline. This has been the guiding strategy

for the research project SonEnvir, described in some detail in section 4.1.

The thesis presented here analyses sonification design work done within the SonEnvir

project2. From these designs, it abstracts a general model for approaching sonification

1Availability, traditions of scientific cultures, ease of publishing on paper, and many others.2 These analyses follow the notion of providing ’rich context’, taken from Science Studies (see e.g.

Page 16: Science by Ear Diss DeCampo

3

design work, from the general Sonification Design Space Map to detailed models of

synthesis, spatialisation, and user interaction, presented in chapter 5. This abstraction

process is based on Grounded Theory (Glaser and Strauss (1967)), aiming to design

flexible theoretical models that capture and explain as much detail as possible of the ob-

servation data collected. Such an integrative approach appears to be the most promising

way forward for sonification as a research discipline.

Finally, it should be noted that scientists are not the only social group that is interested

in the role of data for modern societies: Artists have always taken part in the general

discourse in society, and in recent years, media artists as well as musicians and sound

artists have become interested in creating works of art that represent data in artistically

interesting ways. This aspect certainly played a role in my personal motivation for this

dissertation project.

1.2 Scope

While multimodal display systems are extremely interesting for data exploration, the

complexity of interactions between modalities and individual differences in perception is

considerable. Therefore, the research work in this thesis has been intentionally limited

to audio-centric data representation; however, simple forms of visual representations and

haptic interaction have been provided where it seemed appropriate and helpful.

Abstract representations of data by auditory means are not at all well understood yet;

thus providing collections of different approaches for discussion may well be fruitful

for the community. Special importance has been given to design methodology, and to

considering the human-computer interaction loop; ranging from interaction in the design

process to interactive choices and control in a realtime sonification design.

Sonification designs may be intended for several different uses, with different aims. To

give a few examples:

Presentation entails clear, straightforward auditory demonstration of finished results;

this may be useful in conference lectures, science talks, and similar situations.

Exploration is all about interaction with the data, ’acquiring a feeling for one’s data’;

this must necessarily remain informal, as it is a heuristic for generating hypotheses, which

will be cross-checked and verified later with every analysis tool available.

Analysis requires well-understood, reliable tools for detecting specific phenomena; ac-

cepted by the conventions in the scientific domain they belong to.

In Pedagogy, different students may learn to understand structures/patterns in data

better when presented in different modalities; the auditory approach may be more ap-

Latour and Woolgar (1986); Rheinberger (2006)).

Page 17: Science by Ear Diss DeCampo

4

propriate and useful for some cases, e.g. people with visual impairments.

This thesis focuses on studying the viability of exploration and analysis of scientific data

by means of sonification; thus we (meaning the author and the SonEnvir team) developed

exemplary cases in close collaboration with the domain scientists, implemented sonifica-

tion designs for these cases, and analysed them to understand their general usefulness.

We built a software framework to support the efficient realisation of these sonification

designs; this is reported on in section 3.5.1, and available as open-source code here3. The

sonification design prototypes developed are also accessible online4 and can be re-used

both as concepts and as fully functional code. Note that the SonEnvir software environ-

ment is not a complete ’big system’, but a flexible, extensible collection of approaches,

and the infrastructure needed to support them.

Ths software environment is freely extensible by others (being open source), and it aims

to shorten development times for Auditory Display design sketches, thus allowing for

freely moving between discussion and fast redesign. It also supports Auditory Display

design pedagogy, as well as other uses, such as artistic projects involving data-related

control of sound and image processes.

1.3 Methodology

The methodology employed in the SonEnvir project is centered on interdisciplinary col-

laboration - domain scientists bring current questions and related data from their research

context, and learn the basic concepts of sonification and auditory perception. The ques-

tions are addressed with sonification design prototypes which are refined in iterative

steps; common understanding and patience while learning is the key to eventual success.

This concept was condensed into an experimental setting of the interdisciplinary work

process: The Science By Ear workshop brought together international sonification ex-

perts, mostly Austrian domain scientists, and audio programming specialists to work on

sonification designs in a very controlled setting, within very short time frames. This

workshop has been received very favorably by the participants, and is reported on in

section 4.2.

The methodology of the thesis is based on Grounded Theory5 (Glaser and Strauss

(1967), see also section 5.1): By looking at a body of sonification designs, and analysing

their context, design approaches and decisions, a general, practice-based model is ab-

stracted: the Sonification Design Space Map (SDSM). Aspects of this model that

3 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/4 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/5In sociology, Grounded Theory is used inductively to create new hypotheses from observations or

data collected with few pre-assumptions; this is in contrast to formulating hypotheses a priori and

testing them by experiments.

Page 18: Science by Ear Diss DeCampo

5

warrant further detail are given: models for synthesis approaches, spatialisation, and

user/task/interaction.

The sonification designs analysed stem from the following sources:

• Work with SonEnvir domain scientists

• The Science By Ear workshop

• Submissions to the ICAD 2006 concert

1.4 Overview of this thesis

Chapter 2, Psychoacoustics, Perception, Cognition, and Action, provides the necessary

background in psychoacoustics, covering mainly psychoacoustics and auditory cognition

literature that is directly relevant to sonification design work in more detail, rather than

giving a general overview of the psychoacoustics literature.

Chapter 3, Sonification Systems, provides an introduction to sonification and its history,

and covers some current systems that support sonification design work. The software

system implemented for the SonEnvir project is described here from a more general

perspective.

Chapter 4, Sonification and Interdisciplinary Research, provides further details on the

interdisciplinary nature of sonification research; here, the research design of the SonEnvir

project, and two activities within it, namely the Science By Ear workshop and the ICAD

2006 Concert, are described.

Chapter 5, General Sonification Models, is the main contribution to sonification theory

in this thesis. It describes a general model for sonification design work, divided into

several aspects: Overall design decisions and strategies are covered by the Sonification

Design Space Map (SDSM); appropriate synthesis approaches are covered in the Synthe-

sis model; user interaction is covered in the User Interaction model; and spatial aspects

of sonification design are covered in the Spatialisation model.

Chapters 6, 7, 8, and 9 present example sonification designs from the four domain

sciences in SonEnvir, chapter 10 presents designs for two datasets explored in the Science

By Ear workshop, and chapter 11 discusses and compares four works from the ICAD

2006 concert. This is the main practical and analytic contribution in this thesis. These

chapters describe much of the body of sonification designs created within the SonEnvir

project, as well as some others; this body of designs provided the background material

for creating the General Sonification Models.

Chapter 12, Conclusions, positions the scope of work presented within the wider context

of sonification research, and concludes the insights gained.

Page 19: Science by Ear Diss DeCampo

Chapter 2

Psychoacoustics, Perception, Cognition, and

Interaction

2.1 Psychoacoustics

Psychoacoustics is a branch of psychophysics, the psychological discipline which studies

the relationship between (objective) physical stimuli and their subjective perception by

human beings; psychoacoustics then studies acoustic stimuli and their auditory percep-

tion. Consequently, much of its literature is mainly concerned with the physiological

base of auditory perception, i.e., finding out how perception works by creating stimuli

that force the auditory system into specific interpretations of what it hears.

When considering the stimuli used in traditional psychoacoustics experiments as a world

of sounds, this world has an extremely reduced vocabulary. Of course this reduction

makes perfect sense for experiments which try to clarify how (especially lower level,

more physiological) perceptual mechanisms (assumed to be hard-coded in the ’neural

hardware’) work, but the knowledge thus acquired is often only indirectly useful for

sonification design work.

A number of works are considered major references for the field: For psychoacoustics

in general, Psychoacoustics - Facts and Models (Zwicker and Fastl (1999)) is very

comprehensive; a good introductory textbook that is also accessible for non-specialists

is An Introduction to the Psychology of Hearing (Moore (2004)); Bregman thoroughly

studies the organisation of auditory perception in more complex (and thus nearer to

everyday life) situations in Auditory Scene Analysis (Bregman (1990)); for the spatial

aspects of human hearing, the standard reference is Spatial Hearing (Blauert (1997)).

The typical background of psychoacoustics research is speech, spatial hearing, and music;

sonification is fundamentally different from all of these, possibly with the exception of

conceptual similarity to experimental strands of electronic music.

The main concepts in these sources which are relevant for sonification research are:

6

Page 20: Science by Ear Diss DeCampo

7

Just Noticeable Differences (JNDs) for different audible properties of sounds (and

consequently, the corresponding synthesis parameters) have been studied exten-

sively; being aware of these helps to make sure that differences in synthetic sounds

will be noticeable by users with normal hearing.

Masking Effects can occur when sonifications produce dense streams of events; under-

standing how these depend on properties of the individual events is important to

avoid perceptually ’losing’ information in the soundscape created by a sonification

design.

Auditory Stream Formation and its rules are essential for multiple stream sonifica-

tion; here it is important to control whether streams will tend to perceptually

segregate or fuse into merged percepts.

Testing Methodology can be employed to verify that sonification users are physically

able to perceive the sensory differences of interest. In effect, this entails writing

auditory tests for sonification designs, such that designers can test that they can

hear the differentiation they are aiming for, and that users can acquire analytical

listening skills from well-controlled examples.

Cognitive and Memory Limits determine how we understand common musical struc-

tures, and in fact, much music intended to be ’accessible’ is created (unknowingly)

conforming to these limits. Sonification design issues from choices for time scal-

ings, to user interface options for quick repetitions, choosing segments to listen

to, and others, also crucially depend on these limits.

More recent research assumes the perspective of Ecological Psychoacoustics (Neuhoff

(2004)), which takes into account that in daily life, hearing usually deals with complex

environments of sounds, and thus allows for considering sonification designs from the

perspective of ecologies of sounds that coexist.

However, in a way sonification research and design work addresses a problem that is

inverse to what psychoacoustics studies: rather than asking how we perceive existing

worlds of sounds, the question in sonification is, how can we create a world of sounds

that can communicate ’meaning’ by aggregates of finely differentiated streams of sound

events?

Bob Snyder actually addresses this inverse problem (i.e., how to create worlds of sounds

that can communicate meaning) directly, if for a more traditional purpose: Music and

Memory (Snyder (2000)) is a textbook for teaching composition to non-musicians in a

perceptually deeply informed way, in a course Snyder gives at the Art Institute of Chicago.

He describes how limitations of perception and memory influence artistic choices, and

explains and demonstrates these with examples from a very wide range of musical cul-

tures and traditions, almost entirely without traditional (Western) music notation. This

Page 21: Science by Ear Diss DeCampo

8

is intended to give musicians/composers informed free choice to stay within these limi-

tations (and be ’accessible’), or approach and transgress them intentionally. By covering

a wide range of psychoacoustics and auditory perception literature from the perspective

of art practice, and describing it in terms accessible for art students, many of whom

do not have traditional musical or scientific training, Snyder has created a very useful

resource for practicing sonification designers who are willing to learn more about creating

perceptually informed (and artistically interesting) worlds of sounds.

2.2 Auditory perception and memory

This section is a brief summary of the first part of Music and Memory, to provide enough

background for readers to follow auditory perception-related arguments made later.

Figure 2.1 shows a symbolic representation of the current models of both bottom-up

and top-down perceptual processes. Bottom-up processes begin with sound exciting the

eardrums, which gets translated into firing patterns of a large number of auditory nerves

(ca. 30.000) coming from the ears. For a short time, a ’raw’ representation of the

sound just heard remains in echoic memory. This raw signal is being held available for

many concurrent feature extraction processes: these processes can include rather low-

level aspects (which are almost certainly built into the neural ’hardware’) like ascribing

sound components coming from the same direction to the same sound source, but also

higher-level aspects like a surprising harmonic modulation in a piece of music (which is

certainly culturally learned).

The extracted features are then integrated into higher level percepts, often in several

stages; in this process of abstraction, finer details are discarded, e.g. pitches in a

musical context are categorised into a familiar tuning system, and nuances in rhythm

and articulation usually also fade quickly from memory, unless one makes a special effort

to retain them.

Feature extraction interacts very strongly with long term memory: personal auditory

experience determines what is in long term memory, so for any listener, the extracted

features will unconsciously activate related memory content, which may or may not

become conscious. Note that unconsciously activated memories feed back into the

feature extraction processes, potentially priming the perceptual binding that happens

toward specific cultural or personal notions.

Short term memory (STM) is the only conscious part in figure 2.1: perceptual awareness

of what one is hearing now, as well as the few related memories that become activated

enough are the only results of perception one becomes consciously aware of. Short term

memory content can be rehearsed, and thus kept in working memory for a while, which

increases its chance of being committed to long term memory eventually. On average,

Page 22: Science by Ear Diss DeCampo

9

Figure 2.1: Some aspects of auditory memory, from Snyder (2000), p.6. The connections

shown are only a momentary configuration of the perceptual system, and will continuously

change quite rapidly.

short term auditory memory can keep several seconds of sound around. This depends on

’chunking’: generally, it is assumed that one can keep 7 +- 2 items in working memory

at any moment; however, one can and does increase this number by forming groups of

multiple items, which are then treated by memory as single (bigger) items (again with a

limit of ca. 7 applying).

Page 23: Science by Ear Diss DeCampo

10

The longer the auditory structures one tries to keep in memory, the more this depends

on abstraction; i.e. forming categories, simplifying detail, and grouping into higher level

items. This imposes a limit that is relevant for sonification contexts: comparing a hard

to categorize structural shape that only becomes recognizable over two minutes to a

potentially similar episode of two minutes one hears an hour later is very difficult.

Generally, while bottom-up processes (usually equalled with perception) are usually as-

sumed to be properties of the human neural system, and thus quite universal for all people

with normal hearing, top-down processes (often equalled with cognition) are more per-

sonal: they depend on cultural learning and are informed by individual experience, and

thus can vary much more between individuals.

2.3 Cognition, action, and embodiment

A closer connection to sonification research, as well as some useful terminology, can be

found in Music Cognition research:

Recent work, e.g. by Marc Leman (Leman and Camurri (2006), and Leman (2006)),

defines terminology that works well for describing what sonification can achieve. Leman

talks of proximal and distal cues: Proximal (near) cues refer to the perceptually relevant

features of auditory events, i.e. the audible properties ’on the surface’ of a sound event;

by contrast, distal (further away) cues are actions inferred by the listeners that are likely

to have caused the proximal cues. One example of distal actions would be a musician’s

physical actions; and a little further away, a performer’s likely intentions behind her

actions would also be considered distal cues.

In recent years, Cognition research has widely moved away from the traditionally abstract

notion of ’cognitive’ (meaning only dealing with symbols, and thus easy to model by

computation); today the idea is widely accepted that cognition is deeply intertwined with

the body, resulting in the concept of Embodied Cognition (see e.g. Anderson (2003));

applying this idea to auditory cognition, Leman says that the perception of gesture

in music involves the whole body (of the performer and the listener). Music listeners

who engage with listening may spontaneously express this by moving along with the

music; when asked in experimental settings to make movements that correspond to the

music they are listening to, even musically untrained listeners can be remarkably good

at imitating performer gestures.

Appropriating this terminology and applying it to sonification, one can describe soni-

fication elegantly in these terms: sound design decisions inform details of the created

streams of sound, i.e. they determine the proximal cues; ideally, these design decisions

lead to perceptual entities (’auditory gestalts’), which can create a sensation of plausible

distal cues behind the proximal cues. In case of success, these distal cues, which arise

Page 24: Science by Ear Diss DeCampo

11

within the listener’s perception, create an ’implied sense’ in the sounds presented (which

could be called the ’sonificate’); thus these distal cues are likely to be closely related to

’data meaning’ (the equivalent to performers’ gestures, which are commonly taken to

correspond closely to their intentions).

In reflecting on his research on design of experimental electronic music instruments,

David Wessel argues that the equivalent of the ’babbling phase’ (of small infants) is

really essential for electronic music instruments: free-form, purpose-free interaction with

the full possibility space of an instrument allows for more efficient and meaningful learning

of what the instrument is capable of; just like children learn the phonetic possibilities of

their vocal tract by (seemingly random) exploration (Wessel (2006)).

He cites a classic experiment by Held and Hein, where two kittens are acquiring visual

perception skills in very different ways: one kitten can move about the space, while

the other kitten gets the same visual stimuli, but does not make its own choices of

where to move - instead, it has the moving kitten’s choices imposed on it. This second

kitten sustained considerable perceptual impairments. Wessel argues that the role of

sensory-motor engagement is essential in auditory learning, but not well understood yet;

he suggests designing electro-acoustic musical instruments such that they allow for the

described forms of interaction by providing ’control intimacy’, in short low-latency, high-

resolution, multichannel control data from performer gestures. This strategy should

create a long term chance of arriving at the equivalent of virtuosity on (or at least

mastery of) that instrument.

Transposed to the context of sonification for scientific data, this is in full agreement

with an Embodied Cognition perspective, and is another strong argument for allowing as

much ’user’ interaction with sonification tools as possible: from haptic interfaces used

e.g. for dynamic selection of data subsets, to access for tuning sound design parameters,

to fully accessible code that defines how a particular sonification design operates.

2.4 Perception, perceptualisation and interaction

Perception of the physical world is intuitively non-modal and unified: events in the world

are synchronous, so sensory input from different modalities is too1; many multimodal

data exploration projects use virtual environments so that they can provide integrated

visual, auditory and haptic modes for perception and interaction. The argument that

learning is strongly dependent on sensory-motor involvement has found its way into HCI

research literature; here, the common term is ’closing the human-computer interaction

loop’ (see e.g. Dix (1996)).

1One interesting exception here is far away events that are both visible and audible; the puzzling

difference between speeds of sound and light has led to the first measurements of the speed of sound.

Page 25: Science by Ear Diss DeCampo

12

In the context of sonification research, this has led to a special conference series, the

Interactive Sonification workshops (ISon)2, so far held at Bielefeld (2004) and York

(2007). In a special issue of IEEE Multimedia resulting from ISon2004, the editors

emphasize that learning how to ’play’ a sonification design with physical actions, in

fact similar to a musical instrument, really helps for an enactive understanding of both

the nature of the perceptualisation processes involved and of the data under study

(Hermann and Hunt (2005)). They find that there is a lack of research on how learning

in interactive contexts take place; obviously this applies equally to interactive visual

display applications.

2.5 Mapping, mixing and matching metaphors

Mapping data dimensions to representation parameters always involves choices. Walker

and Kramer (1996) report interesting experiments on this topic: They play through a

number of different permutations of mappings of the same data to the same set of display

parameters, rated by the designers as ’intuitive’, ’okay’, ’bad’, and random, and they

test how well users accomplished defined tasks with them. Expert assumptions turned

out to be not as accurate as they expected; users could learn quite arbitrary mappings

nearly as well as supposedly more ’natural ones’3.

Whether this also holds true for exploratory contexts, when there is no pre-defined goal to

be achieved, is an open question. Here, performance in an easy-to-measure (but trivial)

task is not a very interesting criterium for sonification designs. On the other hand,

it is of course good design to reduce cognitive load while users are involved in data

exploration (by using cognitively simple mappings). For visualisation systems designed

for exploration, the idea of measuring ’insight’ and the number of hypotheses formed in

the exploration process has been suggested recently (Saraiya et al. (2005)); as far as we

know this evaluation strategy has not been applied to exploratory sonification yet.

In de Campo et al. (2004), we make the case that the impression of perceiving the

sources of representations (in Leman’s terms, the distal cues) becomes easier when the

metaphorical distance between the data dimension and the audible representation appears

smaller; i.e., when a reasonably similar concept in the world of sound was found for the

data property to be communicated. For example, almost all time-series data can be

treated as if they were acoustic waveforms, which is what ’audification’ essentially does.

With more complex data, the option of accessing data subsets by interactive choice,

browsing through the data space with different auditory perspectives, can potentially

allow forming new hypotheses on the data.

2 http://interactive-sonification.org/3This paper was republished in a recent special issue of IEEE Spectrum Multimedia on Sonification,

with a new commentary (Walker and Kramer (2005a,b))

Page 26: Science by Ear Diss DeCampo

Chapter 3

Sonification Systems

In a certain Chinese Encyclopedia, the Celestial Emporium of Benevolent

Knowledge, ”it is written that animals are divided into: (a) Those that

belong to the Emperor, (b) embalmed ones, (c) those that are trained,

(d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h)

those included in the present classification, (i) those that tremble as if

they were mad, (j) innumerable ones, (k) those drawn with a very fine

camelhair brush, (l) others, (m) those that have just broken a flower

vase, (n) those that from a long way off look like flies.”

in Jorge Luis Borges - The Analytical Language of John Wilkins

Borges (1980)

Perceptualisation of scientific data by visualisation has been extremely successful. It is

by now completely established scientific practice, and a wide variety of visualisation tools

exist for a wide range of applications. Given the different set of perceptual strengths

of audition compared to vision, sonification has long been considered to have similar

potential as an exploratory tool for scientists which is complementary to visualisation

and statistics.

One strategy to realize more of this potential of sonification is to create a general software

environment that supports fast development of sonification designs for a wide range of

scientific applications, a design process in close interaction with scientific users, and

simple exchange of fully functional sonification designs. This is the central idea of the

SonEnvir project, as described in detail (in advance of the project itself) in de Campo

et al. (2004).

There are a number of software packages for sonification and auditory display (Ben-Tal

et al. (2002); Pauletto and Hunt (2004); Walker and Cothran (2003), and others), all of

which make different choices: whether they are to be used as toolkits to integrate into

applications, or whether they are full applications already; which data formats or real-

time input modalities are supported; what sonification models are assumed (sometimes

13

Page 27: Science by Ear Diss DeCampo

14

implicitly); and what kinds of interaction modes are possible and provided.

This chapter provides a very short overview of the history of sonification, and describes

the most common uses of sonification. Then, some historical and current sonification

toolkits and environments are described, and the main types of audio and music program-

ming environments. Finally, the system developed for the present thesis is described.

3.1 Background

3.1.1 A short history of sonification

The prehistory and early history of sonification is covered very interestingly (within a very

good general overview) in Gregory Kramer’s Introduction to Auditory Display (Kramer

(1994a)).

Employing auditory perception for scientific research was not always as unusual as it is

considered in today’s visually dominated scientific cultures; in fact, sonification can be

said to have had a number of precursors:

In medicine, the practice of auscultation, i.e., listening to the body’s internal sounds for

diagnostic purposes, seems to have been present in Hippocrates’ time (McKusick et al.

(1957)). This was long before Laennec, who is usually credited with the invention of the

stethoscope in 1819.

In engineering, mechanics tend to be very good at hearing which parts of a machine they

are familiar with are not functioning well; just consider how much good car mechanics

can tell just from listening to a running engine.

Moving on to technically mediated acoustic means of measurement, there is evidence

that Galileo Galilei employed listening for scientific purposes: Following Stillman Drake’s

biography of Galilei (Drake (1980)), it seems plausible that Galilei used auditory infor-

mation to verify the quadratic law of falling bodies (see figure 3.1.1). By running strings

across the plane at distances increasing according to the quadratic law ( 1, 4, 9, 16,

etc.), the ball running down the plane would ring the bells attached to the strings in a

regular rhythm. In a reconstruction of the experiment, Riess et al. (2005) found that

time measuring devices of the 17th century were likely too imprecise, while listening for

rhythmic precision works well and is thus more plausible to have been used.

An early example of a technical device rendering an environment variable perceptible

which humans do not naturally perceive is the Geiger-Muller-Counter: Incidence of a

particle generated by radioactive decay on the detector causes an audible click; the

density of the irregular sequence of such clicks informs users instantly about changes in

radiation levels.

Page 28: Science by Ear Diss DeCampo

15

Figure 3.1: Inclined plane for Galilei’s experiments on the law of falling bodies.

This device was rebuilt at the Istituto e Museo di Storia della Scienza in Florence.

c©Photo Franca Principe, IMSS, Florence.

Sonar is another interesting case to consider: Passive Sonar, where one listens to un-

derwater sound to determine distances and directions of ships, has apparently been

experimented with by Leonardo da Vinci (Urick (1967), cited in Kramer (1994a)); in

Active Sonar, sound pulses are projected in order to penetrate visually opaque volumes of

water, listening to reflections to understand local topography, as well as moving objects

of interest, be they vessels, whales, or fish swarms.

In seismology, Speeth (1961) had subjects try to differentiate between seismograms of

natural earthquakes and artificial explosions by playing them back speeded up. While

subjects could classify the data very successfully, and rapidly (thanks to the speedup),

little use was made of this until Hayward (1994) and later Dombois (2001) revived the

practice and the discussion.

Page 29: Science by Ear Diss DeCampo

16

An interesting case of auditory proof of a long-standing hypothesis was reported in

Pereverzev et al. (1997): In the early 1960s, Josephson and Feynman had predicted

quantum oscillations between weakly coupled reservoirs of superfluid helium; 30 years

later, the effect was verified by listening to an amplified vibration sensor signal of these

mass-current oscillations (see also chapter 7).

One can say that the history of sonification research officially began with the first Inter-

national Conference for Auditory Display (ICAD) in 1992, organised by Gregory Kramer

to bring all the researchers working on related topics, but largely unaware of each other,

into one research community. The extended book version of the conference proceedings

(Kramer (1994b)) is considered the main founding document of this research domain,

and the yearly ICAD conferences are still the central event for researchers, generating

much of the body of sonification research literature.

In 1997, the ICAD board wrote a report for the NSF (National Science Foundation) on

the state of the art in sonification1; and more recently, a collection of seminal papers

mostly presented at ICADs between 1992 and 2004 appeared as a special issue of ACM

Transactions on Applied Perception(TAP, ACM (2004)), which shows the range and

quality of related research.

Many interesting applications of sonification for specific surposes have been made:

Fitch and Kramer (1994) showed that an auditory display of medical patients’ life signs

can be superior to visual displays; Gaver et al. (1991) found that monitoring a vir-

tual factory (ArKola) by acoustic means works remarkably well for keeping it operating

smoothly.

The connection between neural signals and audition has its own fascinating history, from

early neurophysiologists like Wedensky (1883) listening to nerve signals by telephone, to

current EEG sonifications like Baier et al. (2007); Hermann et al. (2006); Hinterberger

and Baier (2005); as well as musicians’ fascination with brainwaves, beginning with

Alvin Lucier’s Music for Solo Performer (1965), among many others. (See also the

ICAD concert 2004, described in section 4.3.)

The idea of listening for scientific insight keeps being rediscovered by researchers even

if they seem to be unaware of sonification research; e.g., what James Gimzewski calls

Sonocytology (Pelling et al. (2004), see also here2) is (in auditory display terminology)

a form of audification of signals recorded with an atomic force microscope used as a

vibration sensor.

There are also current uses in Astronomy by NASA (Candey et al. (2006)), where one of

the motivations given is providing better data accessibility for visually impaired scientists;

and at University of Iowa3, mainly dealing with electromagnetic signals.

1http://icad.org/node/4002 http://en.wikipedia.org/wiki/James Gimzewski3 http://www-pw.physics.uiowa.edu/space-audio/

Page 30: Science by Ear Diss DeCampo

17

Nevertheless, a large number of scientists still appear quite surprised when they hear of

the idea of employing sound to understand scientific data.

3.1.2 A taxonomy of intended sonification uses

Sonification designs may be intended for a wide range of different uses, with substantially

different aims4:

Presentation calls for clear, straightforward, auditory demonstration of finished results;

this may be useful in conference lectures, science talks, teaching contexts, and

other situations.

Exploration is very much about interaction with the data, ’acquiring a feeling for the

data’; while this seems a rather fuzzy target, and is in fact hard to measure, it is

actually indispensible and central. Following Rheinberger (2006), exploration must

necessarily remain informal; it is a heuristic for generating hypotheses - once they

appear on the epistemic horizon, they will be cross-checked and verified with every

analysis tool available. So generating some hypotheses that turn out to be wrong

eventually is not a problem at all; in the worst case, if too many hypotheses are

wrong, this can be an efficiency issue.

Analysis requires well-understood, reliable tools for detecting specific phenomena, which

are accepted by the conventions in the scientific domain they belong to. The prac-

tice of auscultation in medicine may be considered to belong into this category,

even though it only relies on physical means, with no electronic mediation. Also

the informal practice of listening to seismic recordings belongs here.

Monitoring is intended for a variety of processes that benefit from continuous moni-

toring by human observers, whether in industrial production, in medical contexts

like intensive care units, or in scientific experiments. Human auditory perception

habituates quickly to soundscapes with little change; any sudden changes, even

of an unexpected nature, in the soundscape are easily noticed, and enable the

observer to intervene if necessary.

Pedagogy - Different students may learn to understand structures/patterns in data

better when presented in different modalities; an auditory approach to presentation

may be more appropriate and useful in some cases. For example, students with

visual impairments may benefit from data representations with sound, as research

on auditory graphs shows (e.g. Harrar and Stockman (2007); Stockman et al.

(2005)).

4Note that the points separated here may overlap; e.g. presentation and pedagogy certainly do.

Page 31: Science by Ear Diss DeCampo

18

Artistic Uses - Many works in sound art are sonification-based, whether they are sound-

only installations, or more generally, data-driven multimedia works. The recent

appearance of special topics issues like Leonardo Music Journal, Volume 16 (2006)

confirm this trend, as do sonification research activities at art institutions like the

Bern University of Arts5.

The intended uses a specific sonification system has been designed for largely determine

the scope of its functionality, and its usefulness for different contexts.

3.2 Sonification toolkits, frameworks, applications

A number of sonification systems have been implemented and described since the 1980s.

They all differ in scope of features, and limitations; some are historic, meaning they run

on operating systems that are obsolete, while others are in current use, and thus alive

and well; most of them are toolkits meant for integration into (usually visualisation)

applications. Few are really open and easily extensible; some are specialised for very

particular types of datasets. Current systems are given more space here, as they are

more interesting to compare with the system developed for this thesis.

3.2.1 Historic systems

The Porsonify toolkit (Madhyastha (1992)) was developed at a time when realtime syn-

thesis was still out of reach on affordable computers; thus Porsonify aimed to provide an

interface for the Sun Sparc’s audio device and two MIDI synthesizers. Behaviour defined

for a single sound event (usually triggered from a single data point) is formulated in sonic

widgets, which generate control commands for the respective sound device. Example

sonifications were created using data comparing living conditions of different U.S. cities

(cf. the accompanying CD to Kramer (1994b)), and multi-processor performance data.

The LISTEN toolkit (Wilson and Lodha (1996)) was written for SGI workstations, using

(alternatively) the internal sound chip, or external MIDI as sound rendering; it was

meant to be easy to integrate into existing visualisation software, which was done for

visualising geometric uncertainty of surface interpolants, and for algorithmic uncertainty

in fluid flow.

The Musical Data Sonification Toolkit, or MUSE (Lodha et al. (1997)), was a followup

project, aiming to map scientific data to musical sound. Also written for SGI, it uses

mapping to very traditional musical notions: timbres are traditional orchestra instruments

and vowel sounds generated with CSound instruments, rhythms come from a choice of

5see http://www.hkb.bfh.ch/y.html

Page 32: Science by Ear Diss DeCampo

19

seven dance rhythms, pitch is defined from the major scale, following rules for melodic

shapes, and harmony is based on overtone ratios. It has been applied ”to visualize (sic)

uncertainty in isosurfaces and volumetric data”.

A later incarnation, MUSART (Musical Audio transfer function Real-time Toolkit, see

Joseph and Lodha (2002)) sonifies data by means of musical sound maps. It converts

data dimensions into ’audio transfer functions’, and renders these with CSound instru-

ments. Users can personalise their auditory displays by choosing which data dimensions

to map to which display parameters. In the article cited, the authors report uses for

exploring seismic volumes for the oil industry. Again, the authors emphasize their use of

musical concepts for sonification design.

While not a single software system, Auditory Information Design by Stephen Barrass

(Barrass (1997)), is a fascinating collection of multiple concepts (all with catchy names):

it encompasses a task-data analysis method (’TaDa’), a collection of use cases for finding

auditory metaphors for design (’ear-benders’), a set of design principles (’Hearsay’), a

perceptually linearised information sound space (’GreyMUMS’), and tools for designing

sonifications (’Personify’). The practical implementations described show a wide variety

of approaches; they all share unix flavor, often being shell scripts that connect command-

line programs. Thus it is not one consistent framework, but rather a collection of how-to

examples. For data treatment, mostly perl scripts are used; for sound synthesis, CSound,

which at the time was non-realtime. Some examples also appeared in the CSound book

(Boulanger (2000)) mentioned below.

3.2.2 Current systems

xSonify (Candey et al. (2006)) has been developed at NASA; it is also based on Java,

and runs as a web service6. It aims at making space physics data more easily accessible

to visually impaired people. Considering that it requires data to be in a special format,

and that it only features rather simplistic sonification approaches (here called modi), it

will likely only be used to play back NASA-prepared data and sonification designs.

SonART (Ben-Tal et al. (2002); Yeo et al. (2004)) is a framework for data sonifica-

tion, visualisation and networked multimedia application. In its latest incarnation, it is

intended to be cross-platform and uses OpenSoundControl for communication between

(potentially distributed) processes for synthesis, visualisation, and user interfaces.

The Sonification Sandbox (Walker and Cothran (2003)) has intentionally limited range,

but it covers that range well: Being written in Java, it is cross-platform; it generates MIDI

output e.g. to any General MIDI synth (such as the internal synth on many soundcards).

One can import data from CSV textfiles, and view these with visual graphs; a mapping

editor lets users choose which data dimension to map to which sound parameter: Timbre

6http://spdf.gsfc.nasa.gov/research/sonification

Page 33: Science by Ear Diss DeCampo

20

(musical instruments), pitch (chromatic by default), amplitude, and (stereo) panning.

One can select to hear an auditory reference grid (clicks) as context. It is very useful

for learning basic concepts of parameter mapping sonification with simple data, and it

may be sufficient for many auditory graph applications. Development is still continuing,

as the release of version 4 (and later small updates) in 2007 show.

The Sonification Integrable Flexible Toolkit (SIFT, see Bruce and Palmer (2005)) is

again a toolkit for integration into other applications, typically for visualisation. While

it is also written in Java and uses MIDI for sound rendering, it emphasizes realtime

data input support from network sources. It has been used for oceanographic data sets;

however, the paper cited describes the first prototype of this system, and no later versions

of it seem to have been developed.

Sandra Pauletto’s toolkit for Sonification (Pauletto and Hunt (2004)) is based on Pure-

Data (see section 3.3 below), and has been used for several application domains: Elec-

tromyelography data for Physiotherapy (Hunt and Pauletto (2006)), helicopter flight

data, and others. While it supports some data types well, adapting it for new data is

rather cumbersome, mainly because PureData is not a general-purpose programming

language.

SoniPy is a very recent and quite ambitious project, written in the Python language,

and described in Worrall et al. (2007). It is still in the early stages of development at

this time, but may well become interesting. Being an open source project, it is hosted

at sourceforge7; at the beginning of this thesis, it did not exist yet.

All these toolkits and applications are limited in different ways, based on resources for

development available to their creators, and the applications envisioned for them. For

the broad parallel approach we had in mind, and the flexibility required for it, none

of these systems seemed entirely suitable, so we chose to build on a platform that is

both a very efficient realtime performance system for music and audio processing and a

full-featured modern programming language: SuperCollider3 (McCartney (2007)). To

provide some more background, here is an overview of the three main families of music

programming environments.

3.3 Music and sound programming environments

Computer Music has been dealing with programming to create sound and music struc-

tures and processes for over fifty years now; current music and sound programming

environments offer many features that are directly useful for sonification purposes as

well.

Mainly, three big families of programs have evolved; most other music programming

7http://sourceforge.net/projects/sonipy

Page 34: Science by Ear Diss DeCampo

21

systems are conceptually similar to one of them:

Offline synthesis - MusicN to CSound

MusicN languages started in 1957/58, from the Music I program developed at Bell Labs

by Max Mathews and others; Music IV (Mathews and Miller (1963)), already featured

many central concepts in computer music languages, such as the idea of a Unit Generator

as the building block for audio processes (unit generators can be e.g. oscillators, noises,

filters, delay lines, and envelopes). As the first widely used incarnation, Music V, was

written in FORTRAN and thus relatively easy to port to new computer architectures, it

spawned a large number of descendants.

The main strand of successors in this family is CSound, developed at MIT Media Lab

beginning in 1985 (Vercoe (1986)), which has been very popular in academic computer

music. Its main approach is to use very reduced language dialects for orchestra files (con-

sisting of descriptions of DSP processes called instruments), and score files (descriptions

of sequences of events that each call one specific instrument with specific parameters

at specific times). A large number of programs were developed as compositional front-

ends, to write score files based on algorithmic procedures, such as Cecilia (Piche and

Burton (1998)), Cmix, Common Lisp Music, and others; so CSound has in fact created

an ecosystem of surrounding software.

CSound has a very wide range of unit generators and thus synthesis possibilities, and a

strong community; e.g. the CSound Book (Boulanger (2000)) demonstrates its scope

impressively. However, for sonification, it has a few disadvantages: Even though it is text-

based, it uses specialised dialects for music, and thus is not a full-featured programming

language. Any control logic and domain-specific logic would have to be built in other

languages/applications, while CSound could provide a sound synthesis back-end. Being

originally designed for offline rendering, and not built for high-performance realtime

demands, it is not an ideal choice for realtime synthesis either. CSound has been ported

to very many platforms.

Graphical patching - Max/FTS to Max/MSP(/Jitter) to PD/GEM

The second big family of music software began with Miller Puckette’s work at IRCAM

on Max/FTS in the mid-1980s, which later evolved into Opcode Max, which eventually

became Cycling’74’s Max/MSP/Jitter environment. In the mid-1990s, Puckette began

developing an open source program called PureData (Pd), later extended with a graphics

system called GEM. All these programs share a metaphor of ’patching cables’, with

essentially static object allocation of both DSP and control graphs.

This approach was never meant to be a full programming language, but a simple facility

Page 35: Science by Ear Diss DeCampo

22

to allow for patching multiple DSP processes written in lower-level (and thus more

efficient) languages; with Max/FTS, the programs actually ran on a DSP card built by

IRCAM. Thus, the usual procedure for making patches for more complex ideas often

entails writing new Max or Pd objects in C; while these can run very efficiently if well

written, special expertise is required, and the development process is rather slow.

In terms of sound synthesis, Max/MSP has a much more limited palette than CSound,

though a range of user-written MSP objects exist; support for graphics with Jitter has

become very good recently. Both Max and Pd have a strong (and somewhat overlap-

ping) user base; Pd is somewhat smaller, having started later than Max. While Max is

commercial software with professional support by a company, Pd is open-source software.

Max runs on Mac OS X and Windows, but not on linux, while Pd runs best on linux,

reasonably well on Windows, and less smoothly on OS X.

Realtime text-based environments - SuperCollider, ChucK

The SuperCollider language and realtime system came from the idea of having both

realtime synthesis and musical structure generation in one environment, using the same

language. Like Max/PD, it can be said to be an indirect descendant of CSound. From

SC1 written by James McCartney in 1996, it has gone through three complete rewriting

cycles, thus the current version SC3 is a very mature system. In version 2, SC2, it

inherited much of its language characteristics from Smalltalk; in SC3 the language and

the synthesis engine were split into a client/server architecture, and many syntax features

from other languages were adopted as options. Its sound synthesis is fully dynamic like

CSound, it has been written for realtime use with scientific precision, and being a text-

based, modern, elegant, full programming language, it is a very flexible environment for

very many uses, including sonification.

The range of unit generators is quite wide, though not as abundant as in CSound;

synthesis in SC3 is very efficient. SC3 also provides a GUI system with a variety of

interface widgets, but its main emphasis is on stable realtime synthesis. SC3 has a

somewhat smaller user community, which is nevertheless quite active. Having become

open source with version 3, it has since flourished in terms of development activity. SC3

runs very well on OS X, pretty well on Linux, and less well on Windows (though the

SonEnvir team put some effort into improving the Windows port).

The ChucK language has been written by Ge Wang and Perry Cook, starting in 2002.

It is still under development, exploring specific notions such as being strongly-timed,

and others. Like SC3, it is not really intended as a general purpose language, but as a

music-specific environment. While being cross-platform, and having interfacing options

similar to SC3 and Max, it has a considerably smaller palette of unit generator choices.

One possible advantage of ChucK is that it has very fine-grained control over time; both

synthesis and control can have single-sample precision.

Page 36: Science by Ear Diss DeCampo

23

3.4 Design of a new system

As the existing systems did not have the scope we required, we designed our own. A full

description of the design of the Sonification Environment as it was before the SonEnvir

project started is given in de Campo et al. (2004); the following section is updated from

a post-project perspective.

3.4.1 Requirements of an ideal sonification environment

The main design aim is to allow fluid development of new and modification of existing

sonification designs. By using modular software design which decouples components like

basic data handling objects, data processing, sound synthesis processes, mappings used,

playback approaches, and real-time interaction possibilities, all the individual aspects of

one sonification design can be re-used as starting points for new designs.

A Sonification Environment should:

• Read data files in various formats. The minimum is human-readable text files for

small data sets, and binary data files for fast handling of large data sets. Reading

routines for special file formats should be writable quickly. Realtime input from

network sources should also be supported.

• Perform basic statistics on the data for user orientation. This includes (for ev-

ery data channel): minimum, maximum, average, standard deviation, and simple

histograms. This functionality should be user-extensible in a straightforward way.

• Provide basic playback facilities like ordered iteration (in effect, a play button with

a speed control), loop playback of user-chosen segments, zooming while playing,

data-controlled playback timing, and 2D and 3D navigation along user-chosen

data dimensions. Later on, navigation along data-derived dimensions such as

lower-dimensional projections of the data space is also desirable.

• Provide a rich choice of interaction possibilities: Graphical user interfaces, MIDI

controllers, graphics tablets, other human interaction devices, and tracking data

should be supported. (The central importance of interaction only became clear in

the course of the project.)

• Provide a variety of possible synthesis approaches, and allow for changing and

refining them while playing. (The initial design suggested a more static library of

synthesis processes, which turned out to be unnecessary.)

• Allow for programming domain-specific models to run and generate data to sonify.

This strongly suggests a full modern programming language. (This requirement

only came up in the course of the project, for the physics sonifications.)

Page 37: Science by Ear Diss DeCampo

24

• Store sonification designs in human-readable text format: This allows for long-

term platform independence of designs, provides possibilities for informal rapid

exchange (text is easy to send by e-mail), and can be an appropriate and useful

publication format for sonification designs that employ user interaction.

• Serve to build a library/database of high-quality sonification designs made in this

environment, with real research data coming from a diverse range of scientific

fields, developed in close collaboration with experts from these domains.

More generally, the implementation should be kept as lightweight, open, and flexible as

possible to accommodate evolving new understanding of the design issues involved.

3.4.2 Platform choice

While PureData was a platform option for a while, we soon decided to stay entirely in

SuperCollider3, based on the list of requirements given above. This decision had some

benefits, as well as some drawbacks.

The benefits we experienced were:

• A fully open source programming language is easy to extend in ways that are useful

for a wider community;

• Interpreted languages like SC3 provide relatively simple entry to users programming

(starting with little scripts, and changing details for experimentation);

• Readability has turned out to be very useful, as the code script is also a full

technical documentation;

• An interactive development environment encourages code literacy, and thus general

competence, of sonification ’users’. In this context, the notion of Just In Time

Programming (as described e.g. in Rohrhuber et al. (2005)) has turned out to be

extremely useful for interdisciplinary team development sessions, see chapter 4.

The main drawback we encountered was that SC3 only runs really well on OS X, a

bit more uncomfortably on linux (which was not used by any of the team members),

while on Windows (which we had to support), it was initially quite unusable; this led to

SonEnvir taking care of substantially improving the Windows port.

3.5 SonEnvir software - Overall scope

The main goal of the SonEnvir sonification framework is to allow for the creation of

meaningful and effective sonifications more easily. Such a sonification environment sup-

Page 38: Science by Ear Diss DeCampo

25

ports sonification designers by providing software components, and concepts for using

them. It combines all the important aspects that need to be considered: data represen-

tation, interaction, mapping and rendering.

A famous phrase about computer music programming systems is that they are kitchens,

not restaurants, which also applies to SonEnvir: rather than giving users a menu of

finished dishes to choose from (which other people created), it provides ingredients,

utensils, recipes and examples.

3.5.1 Software framework

SuperCollider3 has a very elegant extension system; one can assemble components to be

published in different ways: Classes, their respective Help files, UnitGenerator plugins,

and all kinds of support files can be combined into packages which can be downloaded,

installed, and de-installed directly from within SC3. Such packages are called Quarks.

Currently, most of the code created in the project is under version control with Subversion

at the SonEnvir website8. In order to achieve maximum reuse, some parts have been

converted into Quarks, while for others, this is still in process. Many items of general

usefulness have already been migrated directly into the main SC3 distribution. The

sonification-specific components will remain available at the SonEnvir website, as will

the collection of sonification designs. (For an overview, see the end of this section.)

The subsequent sections briefly describe the overall structure of the framework and

the design and implementation of the data representation module. For reference, the

framework structure in the subversion repository is described in appendix A.

3.5.2 Framework structure

The SonEnvir framework implements a generic sonification model consisting of four

aspects:

Data model The data model unifies the notions of how data are handled in the frame-

work and deals with the diversity of data types that can be used for sonification.

User-Interaction model This aspect deals with all aspects of interactive model for

exploration and analysis of data. It is mainly implemented in the JInT package

(see below).

Synthesis model The mapping onto properties of sound or the creation of more com-

plex structures of sound by a sonification model. As all the needed code infras-

8https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/

Page 39: Science by Ear Diss DeCampo

26

tructure existed in the JITLib library within SC3, it is not coded as classes, but

only a conceptual model, described in section 5.3, Synthesis Models.

Spatialisation model This model takes care of the audio rendering of the designed

sonification for different requirements and playback environments. It is described

in detail in section 5.5, Spatialisation Model. Its code components reside partially

in SC3 itself, in the Framework/Rendering folder, and in the AmbIEM package9,

which is now a SuperCollider quark package.

All these models taken together allow for designing sonifications in a flexible way. As

the data model is the most implementation-related aspect, it is described in detail here,

and not in the more conceptual chapter on the general models (chapter 5).

3.5.3 The Data model

The aim of the data model is to provide a unified representation of different types of

data that can be used in the sonification framework. This demands a highly flexible and

abstract model as data may have very different structures. The data model also provides

functionality for input/output in the original form the data are supplied in, and includes

various statistical functions for data analysis.

All models are object-oriented in design, and the classes and their inter-relations are

described using UML (Unified Modelling Language) charts. In order to avoid possible

name-space conflicts with other class definitions on any target platform, the classes in

the SonEnvir framework have a prefix ”SE”. Figure 3.2 illustrates the design of the data

model in a UML graph.

The SEData class is central to the design of the data model. It is the highest abstraction

of any kind of dataset to be sonified. Besides providing properties for the name and

the data source, the actual data is organised in channels. An SEData object contains

instances of SEDataChannel, which is the base class for all different types of data

channels and represents a single dimension in a dataset. Data channels can be numerical

data, but also any sort of nominal data with the only restriction that they are organised

as a sequence and addressable by index.

SENumDataChan specifies that the data values in the given channel are all numbers,

and provides a basic set of numerical properties of this set of numbers. Besides the usual

minimum, maximum, mean, and standard deviation values, it also implements functions

that proved to be useful for sonifications, such as removing offsets or a drift, as well as

normalising and ’whitening’ the numbers.

Another important subclass of numeric data channel is covering all time-based data

channels. These basically refer to two types: time series (SETimeSeriesCh) providing

9AmbIEM is a port of a subset of a system by Musil et al. (2005); Noisternig et al. (2003).

Page 40: Science by Ear Diss DeCampo

27

Figure 3.2: UML diagram of the data model.

a sample rate, and data with time stamps (SETimeStampsCh). Although basically

a numeric data channel as well, we decided to introduce another basic type for vector

based data with a subclass for 3D spatial data. Any of the data channel types mentioned

above may be combined in order to form a dataset described through SEData. For

convenience, there are two predefined classes derived from SEData that cover some

common combinations of data channels: SETimeData and SESpatialData.

Every SEData instance is associated with a SEDataSource. This class abstracts the

Page 41: Science by Ear Diss DeCampo

28

access to the raw data material. It takes care that the space required for big datasets is

made available when needed, and uses different parsers for reading different file formats.

If needed, it can be extended to include network resources and real-time data. Each

SEDataSource also provides information about the type of each data series that is con-

tained in the raw data. This might be available from headers of some data formats, or

it has to be set explicitly such that SEData can create the appropriate SEDataChannels.

Like the entire framework, the data model is provided as a class library for SuperCollider3.

Once the library is brought into place, it is compiled at startup of the SuperCollider3

language. The following listing illustrates using SEData objects in SC3:

// Example listing of data model usage in SC3.

(

// read an ascii data file

~vectors = FileReader.readInterpret(

"~/data/C179_T_s.dat",

true, true

);

// supply data channel names by hand

~chanNames = [’temperature’, ’solvent’,

’specificHeat’, ’marker’];

// make an SEData object

~phaseData = SEData.fromVect(

’phaseData’,

~chanNames,

~vectors,

SENumDataChan // all numerical data, so use SENumDataChan class.

);

// provide simple statistics

~phaseData.analyse;

~phaseData.means.postln;

~phaseData.stdDevs.postln;

)

Page 42: Science by Ear Diss DeCampo

Chapter 4

Project Background

A physicist, a chemist, and a computer scientist try to go up a hill in an

ancient car. The car crawls, stutters, and then stalls. The physicist says,

”The transmission ratio is wrong - I’ll take a look at it.”; the chemist

says, ”No, the fuel mix is wrong, I’ll experiment with it.”; the computer

scientist says, ”why don’t we all get out, close the doors, get back in,

and try again.”.

This chapter describes the research design for and the working methodology developed

within the SonEnvir project, the design and the process of the Workshop ’Science By

Ear’ the project team held in March 2006, and the concert the team organized for the

ICAD 2006 conference in London. As most of the work presented in this dissertation was

done within the context of the SonEnvir project, it is helpful to provide some background

on that context here.

4.1 The SonEnvir project

The central concept of the SonEnvir project was to create an interdisciplinary setting in

which scientists from different domains and sonification researchers could learn how to

work on data perceptualisation by auditory means. The project took place from January

2005 to March 2007, and it was the first collaboration of all four universities in Graz.

SonEnvir was funded by the Future Funds of the Province of Styria.

4.1.1 Partner institutions and people

The project brought together the following institutions as partners:

• the Institute of Electronic Music and Acoustics (IEM), at the University of Music

and Dramatic Arts Graz;

29

Page 43: Science by Ear Diss DeCampo

30

• the Theoretical Physics Group - Institute of Physics, at the University of Graz;

• the Institute for Sociology, at the University of Graz;

• the University Clinic for Neurology, at the Medical University Graz;

• and the Signal Processing and Speech Communication Laboratory SPSC, at the

University of Technology Graz.

The IEM was the host institution coordinating the project, and the source of audio design

and programming as well as sonification expertise in the project. The main researcher

here was the author of this dissertation.

From the Institute of Sociology, Christian Daye provided data from a variety of sociologi-

cal contexts, and co-designed and experimented with sonifications for them, as discussed

in section 6. He was also responsible for feedback and evaluation of the interdisciplinary

work process from the perspective of sociology of science.

The Physics group had changing members in the course of the project: initially Bianka

Sengl provided data from quantum physics research, namely from competing Constituent

Quark models, as discussed in section 7.1 and appendix C.1. Later on, Katharina Vogt

worked on a number of different physics topics and sonifications for them, including the

Ising and Potts models discussed in section 7.2.

The Signal Processing and Speech Communication Laboratory was represented by Chris-

topher Frauenberger, who worked on a number of different sonification experiments,

among others on propagation of electromagnetic waves, and time series classification,

as discussed in section 8. He also contributed substantially to the code implementa-

tions, and has become the main developer for the python-based Windows version of

SuperCollider3.

For the Institute of Neurology, Annette Wallisch was the main researcher. She provided

a variety of EEG data for experimenting with sonification designs for screening and

monitoring, as described in section 9. She also dealt with an industry research partner,

the company BEST medical systems (Vienna), and she wrote a dissertation (Wallisch

(2007), in German) on the research done within SonEnvir.

4.1.2 Project flow

In order to create a broad base of sonification designs for a wide range of data from

the scientific contexts described, the project was structured in three iterations. Each

iteration began with identifying potentially interesting research questions from the do-

mains, and collecting example data for these. Then sonification designs were created

and tested, which became a more collaborative and experimental cooperation as the

project proceeded.

Page 44: Science by Ear Diss DeCampo

31

In each of the scientific fields, we started by building simple sonification designs to begin

the discussion process. The key question here has turned out to be learning how to work

in such a highly interdisciplinary group, how to build bridges for common understanding,

and to develop a common language for collaboration.

We focused on building sonification designs that demonstrate the usefulness of sonifi-

cation by showing practical benefit for the respective scientific field. Identifying good

research questions at this intermediate level of complexity was not trivial. Nevertheless,

being able to come up with sufficiently convincing examples to reach the immediate

partner ’audience’ is very important.

Finally, the project goal was to integrate all the approaches that worked well in one

context into a single software framework that includes all the software infrastructure,

thus making them re-usable for a wide range of applications; this was intended to result

in a meaningful contribution to the sonification community. The diversity of the research

group and their problem domains forced us toward very flexible and re-usable solutions.

By making our collection of implemented sonification designs freely accessible, we hope

to capture much of what we have learned in a form that other researchers can build on.

4.1.3 Publications

Many research results were published in conference and journal papers, which are indi-

cated in the respective chapters, and briefly listed here:

de Campo et al. (2004) was a project plan for SonEnvir before the fact. Papers on

sociological data (Daye et al. (2005)), quantum spectra (de Campo et al. (2005d)), and

the project in general (de Campo et al. (2005a)) were presented at ICAD and ICMC

2005. We wrote some papers with external collaborators, on electrical systems (Fickert

et al. (2006)), and various kinds of lattice data (de Campo et al. (2005c), de Campo

et al. (2006b), de Campo et al. (2006c), de Campo et al. (2005b)).

For ICAD 2006, we contributed an overview paper, de Campo et al. (2006a), and organ-

ised a concert of sonifications described in section 4.3, contributing a piece described in

de Campo and Daye (2006) and in section 11.3.

At ICAD 2007, we presented papers on EEG (de Campo et al. (2007)), time series

(Frauenberger et al. (2007)), Potts models (Vogt et al. (2007)), and on the Design

Space Map concept (de Campo (2007b)). At the ISon workshop in York 2007, we

presented work on juggling sonification (Bovermann et al. (2007)) and the Sonification

Design Space Map (de Campo (2007a)).

Some project results and insights in the sociological context were also presented in two

journal publications: Daye et al. (2006) and Daye and de Campo (2006).

Page 45: Science by Ear Diss DeCampo

32

4.2 Science By Ear - An interdisciplinary workshop

This workshop was in our opinion the most innovative experiment in methodology within

SonEnvir. Aiming to intensify the interdisciplinary work setting within SonEnvir, we

brought in both sonification experts and scientists from different domains to spend three

days working on sonification experiments. Considering participant responses (both during

and after the event), this workshop was very successful. Detailed background is available

online here1.

4.2.1 Workshop design

We chose the participants to invite so they would form an ideal combination of com-

petences: Eight international sonification experts, eight domain scientists (mainly from

Austria), six audio specialists and programmers, and (partially overlapping with the

above) the SonEnvir team itself (see appendix D). This group of ca. 24-28 people was

just large enough to allow for different combinations for three days, but still small enough

to allow for good group cohesion.

The workshop program consisted of five short lectures by the sonification experts, which

served to inform less experienced domain scientists about sonification history, method-

ology, and psychoacoustics. This helped to bring all participants closer to a common

language. Most of the workshop time was spent in sonification design sessions. For

each day, three interdisciplinary teams were formed, composed of the three categories;

2-3 sonification experts, 2-3 domain scientists, 2 audio programmers, 1 moderator (a

SonEnvir member).

These sessions typically lasted 2 hours, after which the group would report to the plenary

about their results. For the first two days, all three teams worked on the same problem

at the same time (in parallel), which allowed for good comparisons of design results. On

the last day, each group worked on a separate problem for two sessions to allow working

more in depth on the exploration of ideas.

4.2.2 Working methods

The design sessions focused on data submitted by the participating domain scientists;

the scientific domains included Electrical Power Systems, EEG Rhythms, Global Social

data, meteorology in the Alpine region, computational Ising models, Ultra-Wide-Band

communication, and research in biological materials called Polysaccharides.

The parallel sessions began with a talk by the submitting domain specialist introducing

the problem dataset, for to the plenary group. Then the group split into the three teams,

1http://sonenvir.at/workshop/

Page 46: Science by Ear Diss DeCampo

33

and the teams began their parallel sessions. The typical sequence in a session was to

do some brainstorming first, to get ideas what sonification strategies may be applica-

ble. Once a few candidate ideas were around, experimentation began by coding little

sonification designs (some administrative code like data reading routines was prepared

beforehand).

Time tended to be rather short, so decisions what to try first were often based on what

seemed doable within limited time. Toward the end of the session time, the group began

preparing what they would report to the plenary meeting. This usually consisted of little

demos of what the group had tried, many more ideas for experiments to do as follow-up

steps, and an informal evaluation of what the group felt they had learned.

On the final workshop day, spending two sessions on a topic was a welcome change.

Having more time to experiment, and especially taking a break and then continuing

work on a problem allowed for more sophisticated mini-realisations.

Having a wiki set up for the workshop allowed to distribute latest versions of information

materials, all the code examples written, and the notes that were taken during all sessions.

Furthermore, most sessions and discussions were recorded (audio, and some video) to

allow later analysis of the working process and the interactions taking place.

4.2.3 Evaluation

Many of the designs ended up being adapted in some form for later work in SonEnvir;

two that were not used elsewhere are described in section 10 for completeness.

Based on feedback given by the workshop participants, it can be considered a highly

successful experiment in methodology. Many participants commented very positively on

the innovative aspects of this workshop: Actually doing design work in an interdisciplinary

group setting rather than going through prepared examples was considered remarkable.

The major design tradeoff that was also discussed in the responses was how much time

to spend on each data problem: time pressure limited the eventual usefulness of the

designs that were created, so the alternative of working on much fewer data sets for

much longer may be worth trying - at the potential risk of having less comprehensive

overall scope.

Christian Daye made a qualitative and quantitative content analysis of the audio record-

ings of the sessions that confirmed the overall positive response (publication still in

progress), and he developed a number of guidelines for future similar events:

Prepare and distribute basic literature on the domains well beforehand. In the

SBE workshop, there was sometimes a tendency that domain scientists would

mainly listen, thus leaving the sonification experts and programmers to do most

Page 47: Science by Ear Diss DeCampo

34

of the talking. From an interdisciplinary point of view, this is not ideal, as it does

not create equally shared understanding.

Do more technical preparation together with the programmers beforehand. In

some sessions problems came up with reading and handling data properly, which

made them less practical than intended.

Have a scientist from the problem domain in every group. As the SBE workshop

covered a wide range of problems, this was not feasible in the parallel sessions.

This strategy would work well in combination with a more limited set of problems

to work on.

4.3 ICAD 2006 concert

While the ICAD has been holding conferences since 1992, the first ever concert of

sonifications at an ICAD conference was only in 2004.

4.3.1 Listening to the Mind Listening

For the ICAD conference in Sydney 2004, Stephen Barrass organised a concert of sonifi-

cations of brain activity, called Listening to the Mind Listening2. The concert call3 invited

participants to create sonifications of neural activity: a dataset was provided with five

minutes of multichannel EEG recording of a person listening to a piece of music. A jury

selected ten submissions for the concert which took place in the Sydney Opera House.

Even though the pieces were constrained to adhere to the time axis of the recording, the

diversity of the approaches taken, and the variety in the sounding results was extremely

interesting. The pieces can be listened to here4 and the organisers published an analytical

paper in Leonardo Music Journal comparing all the entries in a number of different ways

(Barrass et al. (2006)).

The concert was a great success, so it seemed likely to become a regular event at ICAD.

4.3.2 Global Music - The World by Ear

In 2006 the author was invited to be Concert Chair for the ICAD conference in London.

Together with SonEnvir colleagues Christopher Frauenberger and Christian Daye, we

agreed that social data would be an interesting and accessible topic for a sonification

2http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm3http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert call.htm4http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm

Page 48: Science by Ear Diss DeCampo

35

concert/competition, and we proceeded to collect and prepare social data of 190 nations

represented in the United Nations.

The concert call5 invited participants to contribute a sonification that illuminates aspects

of the social, political and economic circumstances represented in the data. The following

quote is the central part of the concert call.

Motivation

Werner Pirchner, Ein halbes Doppelalbum, 1973: ”The military costs every

person still alive roughly as much as half a kilogram of bread per day.”

Global data are ubiquitous - one finds them in every newspaper, and they

cover a range of themes, from global warming to increasing poverty, from

individual purchasing power to the ageing of the world’s population. Obvi-

ously these data are of a social nature: They describe specific aspects (e.g.

ecological or economic) of the environment in which societies exist, which

taken together determine culture, i.e. the way people live.

Rising awareness of these global interdependencies has led both to fear and

concerns (e.g. captured in the notion of the risk society, see Beck (1992);

Giddens (1990, 1999)), as well as hopes for eventual positive consequences

of globalisation. Along with developments like the scientisation of politics

(see Drori et al. (2003)), this growing understanding of global issues has re-

defined the context of the political discourse in modern societies: As modern

societies claim to steer their own course based on self-observation by means

of data, an information feedback loop is realised.

Alternative choices of data that are important to consider, which data should

be set in relation to each other, and a consideration of how to perceptualise

these data choices meaningfully can enrich this discourse.

Closing the feedback loop by informing society about its current state and

its development is a task that both scientists and artists have responded to,

and this is the key point of this call:

• You can contribute to the discourse by perceptualising aspects of world

societal developments,

• search for data that concern interesting questions, and devise strategies

for investigating them, and

• demonstrate that sound can communicate information in an accessible

way for the general public.

5http://www.dcs.qmul.ac.uk/research/imc/icad2006/concertcall.php

Page 49: Science by Ear Diss DeCampo

36

The reference dataset of 190 countries included data ranging from commonly expected

dimensions like geographical data (capital location, area), population number, to basic

social indicators such as GDP, access to sanitation and drinking water, and life ex-

pectancy. An extended dataset included data on education (years in school for males

and females), illiteracy, housing situation, economic independence of males and females,

and others.

The call went on to specify the following constraints:

Using this reference dataset was mandatory, so countries, capital locations, population

and area data should be used. Participants were strongly encouraged to extend this

dataset with more dimensions, and possible resources for such data extensions were

pointed out.

The concert sound system was to be a symmetrical ring of eight speakers, so any spa-

tialisation used in pieces should employ such a configuration.

Finally, participants had to provide a short paper that documents the context and back-

ground of their data choices and sonification design.

An international jury composed of sociologists, computer musicians/composers, and

sonification specialists wrote reviews rating the anonymous submissions, and eight pieces

were finally selected for the concert6.

Four of these pieces are described in more detail in section 11.

6 Papers and headphone-rendered mp3 files for all pieces are available at

http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html.

Page 50: Science by Ear Diss DeCampo

Chapter 5

General Sonification Models

A British Euro-joke tells of a meeting of officials from various countries who

listen to a British proposal, nodding sagely at its numerous benefits;

the French delegate stays silent until the end, then taps his pencil and

remarks: ”I can see that it will work in practice. But will it work in

theory?”

reported in Barnes (2007)

In this chapter, several models are proposed to allow better understanding of the main

aspects of sonification designs:

Sonification Design Space Map - General orientation in the design process

Synthesis Model - Considerations of and examples for synthesis approaches

User Interaction Model - Understanding sonification usage contexts and users’ goals

and tasks to be achieved

Spatialisation Model - Using spatial distribution of sound for sonification

Note the entangled nature of these aspects: splitting sonification designs into aspects is

only a simplification that is temporarily useful for grasping the concepts. Because of their

close connections, it will be necessary to cross-reference between sections. Generally,

because of these interdependencies the understanding of these sections will benefit from

re-reading.

37

Page 51: Science by Ear Diss DeCampo

38

5.1 The Sonification Design Space Map (SDSM)

5.1.1 Introduction

This section describes a systematic approach for reasoning about experimental sonifi-

cation designs for a given type of dataset. Starting from general data properties, the

approach recommends initial strategies, and lists possible refinements to consider in the

design process. An overview of the strategies included is presented as a mental (and

visual) map called the Sonification Design Space Map (SDSM), and the refinement steps

to consider correspond to movements on this map.

The main purpose of this approach is to extract ’theory’ from ’observation’ (in our case,

of design practice), similar to Grounded Theory in sociology (Glaser and Strauss (1967)):

to make implicit knowledge (often found in ad hoc design decisions which sonification

experts consider ’natural’) explicit and thus available for reflection, discussion, learning,

and application in design work.

This approach is mainly the result of studying design sessions which took place in the

interdisciplinary sonification workshop ’Science By Ear’, described in detail in section

4.2.

In order to explain the concept in practice as well, a set of workshop sessions on one

simple dataset is analysed here in the terms proposed; in the chapters on implemented

designs, many more of these are described in detail using SDSM terms.

5.1.2 Background

When collaborations on sonification for a new field of application start, sonification

researchers may know little about the new domain, its common types of data, and

its interesting research questions; similarly, domain scientists may know little about

sonification, its general possibilities, and its possible benefits for them. In such early

phases of collaboration, the task to be achieved with a single particular sonification is

often difficult to define clearly, so it makes sense to employ an exploratory strategy which

allows for mutual learning and exchange. Eventually, the interesting tasks to achieve

become clearer in the process. Note that even when revisiting familiar domains, it is

good methodological practice to start with as few implicit assumptions as possible, and

introduce any concepts from domain knowledge later, and transparently and explicitly,

in the course of the design process.

Rheinberger (2006) describes that researchers deal with ’epistemic things’, which are by

definition vague at first (they can be e.g. physical objects, concepts or procedures whose

usefulness is only slowly becoming clear); they choose ’experimental setups’ (ensembles

of epistemic things and established tools, devices, procedures), which allow for endless

Page 52: Science by Ear Diss DeCampo

39

repetitions of experiments with minimal variations. The differential results gained from

this exhaustion of a chosen area in the possibility space can allow for new insights. Then,

an experimental setup can collapse into an established device or practice, and become

part of a later experimental setup.

From this perspective, sonification designs start their lifecycle as epistemic things, which

need to be refined under usage; they may in time become part of experimental setups,

and if successful, eventually ’disappear’ into the background of a scientific culture as

established tools.

Some working definitions

The objects or ’content’ to be perceptualised can be well-known information, or new

unknown data (or shades of gray in between). The aims for these two applications are

very different: for information, establishing easy-to-grasp analogies is central, for data,

enabling the perceptual emergence of latent phenomena of unforeseeable type in the

data. As working terminology for the context here, we propose to define the following

three terms:

Auditory Display is the rendering of data and/or information into sound designed for

human listening. This is the most general, all-encompassing term (even though the term

’display’ has a visual undertone to it).

We further propose to differentiate between two subspecies of Auditory Displays:

Auditory Information Display is the rendering of well-under-stood information into

sound designed for communication to human beings. It includes speech messages such

as in airports and train stations, auditory feedback sounds on computers, alarms and

warning systems, etc.

Sonification or Data Sonification is the rendering of (typically scientific) data into

(typically non-speech) sound designed for human auditory perception. The informational

value of the rendering is often unknown beforehand, particularly in data exploration.

The model described here focuses on Data Sonification in the narrower sense.

These definitions are quite close to the current state of the evolving terminology; In

the International Ecyclopedia of Ergonomics and Human Factors, Walker and Kramer

(2006) define the terms quite similarly:

”Auditory display is a generic term including all intentional, nonspeech

audio that is designed to transmit information between a system and a user.

...

Sonification is the use of nonspeech audio to present data. Specifically,

sonification is the transformation of data relations into auditory relations,

for the purpose of studying and interpreting the data.”

Page 53: Science by Ear Diss DeCampo

40

Common sonification strategies

The literature usually classifies sonification approaches into Audification and Parame-

ter Mapping (Kramer (1994b)), and Model-Based Sonification (Hermann (2002)). For

the context here, we prefer to differentiate the categories more sharply, which will be-

come clear along the way; so, our three most common approaches are: Sonification (or

generally, perceptualisation) by Continuous Data Representation, Discrete Point Data

Representation, and Model-Based Data Representation.

Continuous Data Representation treats data as quasi-analog continuous signals, and

relies on two preconditions: equal distances along at least one dimension, typically time

and/or space; and sufficient (spatial or temporal) sampling rate, so that the signals is

free of sampling artifacts, and interpolation between data points is smooth. Both simple

audification and parameter mapping involving continuous sounds belong in this category.

Its advantages include: subjective perceptual smoothness; interpolation can make the

sampling interval (which is an observation artifact) disappear; perception of continuous

shapes (curves) can be appropriate; audition is very good at structures in time; mapping

data time to listening time is metaphorically very close and thus easy to understand.

Its drawbacks include: it is often tied to linear movement along one axis only; and events

present in the data (e.g. global state changes in a system) may be difficult to represent

well.

Discrete Point Data Representation creates individual events for every data point.

Here, one can easily arrange the data in different orders, choose subsets based on special

criteria (e.g. based on navigation input), and when special conditions arise, they can be

expressed well.

Its advantages include: more flexibility, e.g. subset selections of changeable sizes, based

on changeable criteria, and random iterations over the chosen subsets; and the lack of

illusion of continuity may be more accurate to the data.

Its drawbacks include: attention may be drawn to data independent display parame-

ters, such as a fixed grain repetition rate; at higher event rates, interactions between

overlapping sound events may occur, such as phase cancellations.

Model-Based Data Representation employs more complex mediation between data

and sound rendering by introducing a model, whose properties are informed by the data.

Its advantages include: apart from data properties, more domain knowledge can be

captured and employed in the model; and models may be applicable to datasets from a

variety of contexts, as is commonly aimed for in Data Mining.

Its drawbacks include: assumptions built into models may introduce bias leading away

from understanding the domain at hand; there may be a sense of disconnection be-

tween data and sounding representations; higher complexity of model metaphors may be

Page 54: Science by Ear Diss DeCampo

41

difficult to understand and interpret.

5.1.3 The Sonification Design Space Map

Task/Data Analysis (Barrass (1997)) focuses on solving well-defined auditory informa-

tion design problems: How to design an Auditory Display for a specific task, based on

systematic descriptions of the task and the data. Here, the phenomena to be perceptu-

alised are known beforehand, and one tries to render them as clearly as possible.

The Sonification Design Space Map given here addresses a similar but different problem:

The aim to be achieved here is to find transformations that let structures/patterns in

the data (which are not known beforehand) emerge as perceptual entities in the sound

which jump to the foreground, i.e. as identifiable ’interesting audible objects’; these are

closely related to ’sound objects’ in the electronic music field (from ’objets sonores’, see

Schaeffer (1997)), and in psychoacoustics literature, ’auditory gestalts’ (e.g. Williams

(1994)).

In other words, the most general task in data sonification designs for exploratory pur-

poses is to detect auditory gestalts in the acoustic representation, which one assumes

correspond to any patterns and structures in the data one wants to find.

SDS Map axes

To facilitate this search for the unknown, the Design Space Map enables a designer,

researcher, or artist to engage in systematic reasoning about applying different sonifica-

tion strategies to his/her task or problem, based on data dimensionality and perceptual

concepts.

Especially while the task is not yet clearly understood and defined (which is often the

case in exploratory contexts), reasoning about data aspects, and making well-informed

initial choices based on perceptual givens can help to develop a clearer formulation of

useful tasks.

So, the proposed map of the Sonification Design Space (see figure 5.1) has these axes:

X-axis: the number of data points estimated to be involved in forming one gestalt, or

’expected gestalt size’;

Y-axis: the number of data dimensions of interest, i.e. to be represented in the current

sonification design;

Z-axis: the number of auditory streams to be employed for data representation.

Page 55: Science by Ear Diss DeCampo

42

Figure 5.1: The Sonification Design Space Map

The overlapping zones are fuzzy areas where different sonification approaches apply; the arrows

on the right refer to movements on the map, which correspond to design iterations. For detailed

explanations see sections 5.1.3 and 5.1.4.

To ensure that the auditory gestalts of interest will be easily perceptible, the most

fundamental design decision is the time scale: In auditory gestalts (or sound objects)

of 100 milliseconds and less it becomes more and more difficult to discern meaningful

detail, while following a single gestalt for longer than say 30 seconds is nearly impossible,

or at least takes enormous concentration; thus, a reasonable rule of thumb for single

gestalts is to time-scale their rendering into the duration of echoic memory and short

term memory, i.e. on the order of 1-3 seconds (Snyder (2000)). Sounds up to this

duration can be kept in working memory with much detail information, keeping all the

nuances and inflections while more perceptual processing goes on. This time frame can

be called ’echoic memory time frame’. The ’expected gestalt size’ is the number of data

points (of the dataset under study) that should be represented within this time frame to

allow for perception of individual gestalts at this data subset size.

Note that the three-second time frame does not impose a limit on the number of data

points represented: as a deep exploration of the world of Microsound (Roads (2002))

shows, clouds of short sound events can happen at very high densities in the micro-time

scale; in fact this is a fascinating area for creating sound that is rich in perceptual detail

and artistic possibilities.

Page 56: Science by Ear Diss DeCampo

43

SDS Map zones

The zones shown in figure 5.1 do not have hard borders; their extensions are only meant

to give an indication how close-by (and thus meaningfully applicable) which strategies

are for a given data ’gestalt size’ and dimensions number. Similarly, the number ranges

given below are only approximate orders of magnitude, and mainly based on personal

experience both in electronic music and sonification research.

The Discrete-Point zone ranges roughly from gestalt size 1 - 1000 and from dimensions

number 1 - 20; the transitions shown in the map from note-like percepts via textures to

granular events which merge into clouds of sound particles are mainly perceptual.

The Continuous zone ranges roughly from gestalt size 10 - 100.000 and from dimensions

number 1 - 20; the main transition here is between parameter mapping and audification,

with various technical choices indicated along the way, such as using the continuous data

signal as a modulation source, band splitting it, and/or applying filtering to it.

The Model-Based zone ranges roughly from gestalt size 10 - 50.000 and from dimensions

number 8 - 128; because the approach is so varied and flexible, there are no further

orientation points in it yet. Existing varieties of model-based approaches are still to be

analysed in the terms of this Sonification Design Space, and can eventually be integrated

in appropriate locations on the map.

All these zones apply mainly for single auditory streams; generally, when multiple streams

are used in a sonification design, the individual streams can and should use fewer dimen-

sions. In fact, using multiple streams is the main strategy for reducing the number of

dimensions while keeping the overall density of presentation constant.

5.1.4 Refinement by moving on the map

In the evolution of a sonification design, all intermediate incarnations can be conceptu-

alised easily as locations on the map, based on how many data points are rendered into

the basic time interval, how many data dimensions are being used in the representation,

and how many perceptual streams are in use. A step from one version to the next can

then be considered analogous to a movement on the map. This mind model aims to

capture the design processes we could observe in concentrated form in the Science by

Ear workshop (’SBE’, described in detail in section 4.2), and in extended form in the

development work in the main strands of the SonEnvir project.

Data anchor

For exploring a dataset, one can start by putting a reference point on the map, which we

call Data Anchor: This is a point on the map corresponding to the full number of data

Page 57: Science by Ear Diss DeCampo

44

points and data dimensions. A first synopsis, or more properly Synakusis, of the entire

dataset (within the echoic memory time frame of ca. 3 seconds) can then be created with

one of the nearest sonification strategies on the map. Subsequent sonification designs

and sketches will typically correspond to a movement down from this point, i.e. toward

using fewer dimensions at a time, and to the left, toward listening to less than the total

number of data points in the echoic memory time frame. Of course one can still listen

to the entire dataset, total presentation time will simply become longer.

Shift arrows

Shift arrows, as shown in figure 5.1 on the right hand side, allow for moving one’s current

’working position’ on the Design Space Map, in effect deploying different sonification

strategies in the exploration process. Note that some shifting operations are used for

’zooming’, and leave the original data untouched, while others employ (temporary) data

reduction, extension, and transformation; in any sonification design one develops, it

is essential to differentiate between these kinds of transformations and document the

steps taken clearly. Finally, one can decide to defer such decisions and turn them into

interaction possibilities, so that e.g. subsets are selected interactively.

A left-shifting arrow can be used to reduce the ’expected gestalt size’, in effect using

fewer data points within the echoic memory time frame. Some options are: investigat-

ing smaller, user-chosen data point subsets (this can be by means of interaction, e.g.

’tapping’ on a data region and hearing that subset); downsampling; by subsets chose by

appropriate random functions; and other forms of data preprocessing.

A down-shifting arrow can be used to reduce the ’dimensions number’, i.e. to employ less

data properties (or dimensions) in the presentation. Some options are: dimensionality

reduction by preprocessing (e.g. statistical approaches like Principal Component Analysis

(PCA), or using locality-preserving space-filling curves in higher-dimensional spaces, e.g.

Hilbert curves); and user-chosen data property subsets, keeping the option to explore

others later. Model-based sonification concepts may also involve dimensionality reduction

techniques, yet they are in principle quite different from mapping-based approaches.1

An up-shifting arrow can be used to increase the number of dimensions used in the

sonification design; e.g. for better discrimination of components in mixed signals, or

to increase ’contrast’ by emphasizing aspects with relevance-based weighting. Some

options are: time series data could be split into frequency bands to increase detail

resolution; extracting the amplitude envelope of a time series and using it to accentuate

its dynamic range2; other domain-specific forms of preprocessing may be appropriate for

adding secondary data dimensions to be used in the sonification design.

1Thomas Hermann, personal communication, Jan 2007.2Whether such transformations happen in the data preprocessing stage or in the audio DSP imple-

mentation of a sonification design makes no difference to the conceptual reasoning process.

Page 58: Science by Ear Diss DeCampo

45

A right-shifting arrow can be used to increase the number of data points used, which

can help to reduce representation artifacts. Some options are: interpolation of signal

shape between data points; repetition of data segments (e.g. granular synthesis with

slower-moving windows); local waveset audification (see section 5.3); and model-based

sonification strategies can be used to create e.g. physical vibrational models, whose state

may be represented in larger secondary datasets informed by comparatively few original

data points.

Interpolation in time-series data is often employed habitually without further notice; the

model proposed here strongly suggests notating this transformation as a right-shifting

arrow. If one is certain that the sampling rate used was sufficient, using cubic (or better)

interpolation instead of the actually measured steps creates a smoother signal which is

nearer to the phenomenon measured than the sampled values. When such a smoothed

signal is used for modulating an audible synthesis parameter, the potentially distracting

presence of the time step unit should be less apparent.

Z axis shifts

So far, all arrows have concerned movement in the front plane of the map, where only a

single auditory stream is used for data representation. After the time scale, the number

of streams is the second most fundamental perceptual design decision. By presenting

some data dimensions in parallel auditory streams (especially data dimensions of the

same type, such as time-series of EEG measurements for multiple electrodes), overall

display dimensionality can be increased in a straightforward way, while dimensionality

in each individual stream can be lowered substantially, thus making each single stream

easier to perceive. (The equivalent movement is difficult to represent well visually on

a 2D map, but easy to imagine in 3D space. Figure 5.2 shows a rotated view.) For

multiple streams, all previous arrow movements apply as above, and two more arrows

become available:

An inward arrow can be used to increase the number of parallel streams in the represen-

tation. Some options are: multichannel audio presentation; and setting one perceptual

dimension of the parallel streams to fixed values with large enough differences to cause

stream separation, thus in effect labelling the streams.

An outward arrow can be used to decrease the number of parallel streams in the repre-

sentation. Some options are: selecting fewer streams to listen to; intentionally allowing

for perceptual merging of streams.

Experimenting with different numbers of auditory streams can be very interesting, as

multiple perspectives on the same data ’content’ may well contribute to more intuitive

understanding of the dataset under study. Figure 5.2 shows the range of hypothetical

variants of a sonification design for a dataset with 16 dimensions; the graph plane is at

Page 59: Science by Ear Diss DeCampo

46

Figure 5.2: SDS Map for designs with varying numbers of streams.

Hypothetical variants of a sonification design for a dataset with 16 dimensions; see text.

an expected gestalt size of 100 data points, and the axes shown are Y (number of data

properties mapped) and Z (number of auditory streams employed). Different designs

might employ, for example, one stream with 16 mapped parameters, 2 streams with 8, 4

streams with 4, 8 with 2 and 16 streams with a single parameter. Of course, depending

on the character of the data dimensions, other, more asymmetrical combinations may

be worth exploring; these will typically be located below the diagonal shown.

Note that the map is slightly ambiguous between number of generated versus perceived

streams; parallel streams of generated sound may fuse or separate based on perceptual

context. This is a very interesting phenomenon whch can be quite fruitful: perceptual

fusion between streams can be an appropriate expression of data features, e.g. in EEG

recordings, massive synchronisation of signals across electrodes may cause the streams

to fuse, which can represent the nature of some epileptic seizures well.

Page 60: Science by Ear Diss DeCampo

47

5.1.5 Examples from the ’Science by Ear’ workshop

In order to clarify the theoretical considerations given so far, we now turn to analysing

design work done in an interdisciplinary setting. We report one exemplary set of design

sessions as they happened, with added after-the-fact analysis in terms of the Sonification

Design Space Map concept (short: SDSM). Where SDSM strongly calls for additional

designs, these are provided and marked as additions. This is intended to demonstrate

the potential of going from practice-grounded theory back to theory-informed practice.

The workshop concept is described in section 4.2.

The workshop setting

True to the inherently interdisciplinary nature of scientific data sonification, the SBE

workshop brought together three groups of people for three days: Domain scientists who

were invited to supply data they usually work with; an international group of sonification

experts; and audio programmers/sound designers. Apart from invited talks by the soni-

fication experts, the main body of work consisted of sonification design sessions, where

interdisciplinary groups (ca. 8 people, domain scientists, sonification experts, program-

mers, and a moderator) spent 2 hours discussing one submitted data set, experimenting

with different sonification designs, and then discussing results across groups in plenary

meetings.

In each session, discussion notes were taken as documentation, where possible the soni-

fication designs were kept as code, and all the sound examples played in the plenary

meetings were rendered as audio files. All this documentation is available online3.

Load Flow - data background

The particular data set serving as a starting example came from electrical power systems:

It captures electrical power usage for one week (December 18 - 24, 2004) across five

groups of power consumers: households, trade and industry, agriculture, heating and

warm water, and street lighting; a sum over all consumer groups was also provided.

Clear daily cycles were to be expected, as well as changes between workdays and week-

ends/holidays. While this is not scientifically challenging, it is a good example of simple

data with everyday relevance. We chose this dataset for the first parallel session, and it

did serve well for exploring basic sonification concepts with novices. The full documen-

tation for these sessions is available online here4.

3http://sonenvir.at/workshop/4http://sonenvir.at/workshop/problems/loadflow/. All sound examples can be found here, in the

folders TeamA, TeamB, TeamC, and Extras; for layout reasons, relative links at this site are given here

as ./TeamX/name.mp3 etc.

Page 61: Science by Ear Diss DeCampo

48

Figure 5.3: All design steps for the LoadFlow dataset.

Steps are shown as locations labeled with team name and step number (A1, B2, C3, etc.),

and arrows between locations.

The dataset was an excel file with 5 columns for the consumer groups, and consumption

values were sampled at 15 minute intervals; so for a week, there are 24 * 4 * 7 = 672

data points for the entire dataset. In SDSM terms, this puts the Data Anchor for this set

right in the middle of the Design Space Map, in the overlap zone between Discrete-Point

and Continuous sonification, see section 5.3.

Sonification designs

All sonification designs are shown as locations on the Design Space Map in figure

fig:loadmap labeled as A1, B1, C3 etc. Teams A and B created their design sketches in

SuperCollider3, while Team C worked with the PureData environment.

[A1] Team A began by sonifying the entire dataset as five parallel streams, scaled to 13

seconds, i.e. one day scaled to ca. 2 seconds; power values were mapped to frequency

with identical scaling for all channels5. The resulting five parallel streams were panned

into one stereo panorama.

After experimenting with larger and smaller timescales, agreement was reached that the

5 ./Team A/TeamA 1 FiveSines PowersToFreqs.mp3

Page 62: Science by Ear Diss DeCampo

49

initial choice of timescale was appropriate and useful. In SDSM terms, this means the

team was looking for auditory gestalts at the scale of single days.

[A+] As SDSM recommends starting with a synakusis into a timeframe of 3 seconds,

this is provided here6. This was only added after the workshop.

Then, alternative sound parameter mappings were tried out based on team suggestions:

[A2] Mapping powers to amplitudes of five tones labeled with different pitches7. While

this is closer in metaphorical distance, it is perceptually less successful: one could not

distinguish much shape detail in amplitude changes.

[A3] Mapping powers to amplitudes and the cutoff frequencies of resonant lowpass filters

of five differently pitched tones8. This was clearer again, but still not as differentiated

as mapping to tone frequencies.

[A4] Going back to mapping to frequencies, each tone was labeled with a different

phase modulation index (essentially, different levels of brightness)9. While this allowed

for better stream identification, the (very quickly chosen) scaling was not deemed very

pleasant, if inadvertently amusing.

[A5] Finally, the team tried using less parallel streams, and adding secondary data: the

phase modulation depth (basically, the brightness) of both channels (household and

agriculture) was controlled from the difference between the two data channels10. While

this did not work very well, it seemed promising with better secondary data choices;

however, at this point session time was over. In SDSM terms, design A5 is a move

down - to less channels - and a move back up - derived data used to control additional

parameters (the map only shows the resultant move).

Team B chose to do audification (following one sonification expert’s request), and to

use an interactive sonification approach: Their design loaded the entire data for one

channel (672 values, equivalent to one week of data time) into a buffer, and played back

a movable 96-value segment (equal to one day) as a looped waveform. The computer

mouse position was used to control which 24hour-segment is heard at any time. This

maps the signal’s local ’jaggedness’ into spectral richness and its overall daily change

into amplitude. (For the non-interactive sound examples that follow, the mouse is moved

automatically through the week within 14 seconds.)

While the team found the data sample rate and overall data size too low for much

detail, an interesting side effect turned up: when audifying segments in this fashion, the

difference between the same time of day for two adjacent days was emphasized; large

6./extras/LoadflowSynakusis.mp37./Team A/TeamA 2 FiveTones PowersToAmps.mp38./Team A/TeamA 3 FiveTones PowersToAmpsAndFilterfreqs.mp39./Team A/TeamA 4 FiveFMSounds IDbyModDepth.mp3

10./Team A/TeamA 5 TwoFMSounds DiffToModDepth.mp3

Page 63: Science by Ear Diss DeCampo

50

differences at a specific time between adjacent days created strong buzzing11. In the next

design step, 2 channels, households (left) and agriculture (right) were compared side by

side12, and for clearer separation, they were labeled with different loop frequencies 13.

The final design example maps the power values corresponding to the current mouse

position directly to the amplitude of a 50Hz (European mains frequency) filtered pulse

wave 14. As above, in the fixed rendering here, the mouse moves through the week at

constant speed within 14 seconds.

In SDSM terms, the initial choices were to move all the way down on the map (to

only 1, and then 2 out of 5 channels at a time), and essentially a move to the left: a

one-day window chosen data subset was played by moving a one-day window within the

data. Note that this move is actually creating an interaction parameter for sonification

design users, which is one the many advantages of current interactive programming

environments.

Note that the interpolation commonly used in audification is actually slightly dubious

here: There may well have been meaningful short-time fluctuations within 15 minute

intervals which would not have been captured in the data as supplied.

Team C used PureData as programming environment. Their approach was quite similar

to Team A, with interesting differences: They began with scaling each single data channel

into 3 seconds, mapping power in that channel both to frequency and to amplitude, and

subsequently rendered all channels in this fashion15. Finally, this team also produced a

version with six parallel streams (including the sum value), scaled into 12 seconds, and

with different timbres16.

In SDSM terms, they first moved to the bottom of the map, while keeping full data

scale, i.e. a synakusis-sized time window; example 7 moves back up (using all channels),

and to the left (i.e. toward higher time resolution, gestalts on the order of single days

of data).

5.1.6 Conclusions

Conceptualising the sonification design process in terms of movements on a design space

map, one can experiment freely by making informed decisions between different strategies

to use for the data exploration process; this can help to arrive at a representation which

produces perceptible auditory gestalts more efficiently and more clearly. Understanding

the sonification process itself, its development, and how all the choices made influence

11./Team B/1 LoadFlow B Households.mp312./Team B/2 LoadFlow B households agriculture.mp313./Team B/3 LoadFlow B households agriculture.mp314./Team B/4 LoadFlow B households agriculture.mp315http://sonenvir.at/workshop/problems/loadflow/Team C/, sound examples 1-6.16./Team C/TeamC AllChannels.mp3

Page 64: Science by Ear Diss DeCampo

51

the sound representation one has arrived at, is essential in order to attribute perceptual

features of the sound to their possible causes: They may express properties of the dataset,

they may be typical features of the particular sonification approach chosen, or they can

be artifacts of data transformation processes used.

As these analyses of some rather basic sonification design sessions show, the terminology

and map metaphor provide valuable descriptions of the steps taken; having the map

available (mentally or physically) for a design work session seems very likely to provide

good clues for next experimental steps to take.

Note that the map is open to extensions: As new sonification strategies and techniques

evolve, they can easily be classified as either new zones, areas within existing zones, or as

transforms belonging to one of the directional arrows categories; then their appropriate

locations on the map can easily be estimated and assigned.

5.1.7 Extensions of the SDS map

There are several ways to extend the map, and make it more useful, and this dissertation

aims to provide some of them:

More and richer detail can be added by analysing the steps taken in observed design

sessions, classifying them as strategies, and adding them if new or different. This is the

object of chapters 6, 7, 8, 9, 10, and 11, the example sonification designs from different

SonEnvir research activities.

A more detailed analysis of the existing varieties model-based sonification can be under-

taken, and that understanding can and should be expressed in the terms of the conceptual

framework of the map; however, this is beyond the scope of this thesis.

Expertise can be integrated by interviewing sonification experts, tapping into their expe-

rience, inquiring about their favorite strategies, or decisions they remember that made

a big difference for a specific design process.

One can imagine building an application that lets designers navigate a design space map,

on which simple example data sets with coded sonification designs are located. When

one moves in an area that corresponds to the dimensionality of the data under study, the

nearest example pops up, and can be adapted for experimentation with one’s own data.

Obviously such examples should be canonical and capture established sonification best

practices and guidelines, e.g. concerning mapping Walker (2000), as well as sonification

design patterns Barrass and Adcock (2004).

Finally, many of the strategies need not be fixed decisions made once; being able to delay

many of the strategic choices, and to make them available as interaction parameters when

exploring a dataset can be extremely valuable.

Page 65: Science by Ear Diss DeCampo

52

5.2 Data dimensions

Before proceeding to synthesis models, it will be helpful to discuss the nature of data

dimensions in more depth.

5.2.1 Data categorisation

In data analysis, data dimensions are classified by scales: data may capture categorical

differences, ordered differences, which may have a metric, and a natural zero.

Table 5.1: Scale types

Scale: Characteristics: Example:

nominal difference without order kind of animal

ordinal difference with order degrees of sweetness

interval difference with order and metric temperature

ratio difference, order, metric, and natural zero length

For nominal scales (such as ’kind of animal’) and ordinal scales, it is useful to know

the set of all occurring values, or categories (such as cat, dog, horse). The size of this

set greatly influences the choices of possible representations of the values in this data

dimension.

For metrical scales (interval and ratio), it is necessary to know the numerical range in

order to make scaling choices; also knowing the measurement resolution or increment

(for example, age could measured in full years, or days since birth) and precision (e.g.

tolerances of a measuring device) is useful.

5.2.2 Data organisation

Apart from the phenomena recorded, and their respective values, data may have differ-

ent forms of organisational structure: Individual data points may have different kinds

of neighbour relations to specific other data points. The simplest case would be no

organisation at all: Measuring all the individual weights of a herd of cows is just a set

of measured values with no order. When recording health status at the same time, each

data point has two dimensions, but there is still no order.

If the cows’ identities are recorded as well, similar measurements at different times can

be compared. If the cows have names, the data can be sorted alphabetically (nominal

scale); if the cows’ birth dates are known as well, the data can also be sorted by age

Page 66: Science by Ear Diss DeCampo

53

(interval). Both sortings are derived from data dimensions, and there is no obvious

’best’, or preferable order. Often the order in which data are recorded is considered an

implicit order; however, in the example given, it may simply be the order in which the

cows happened to be weighed. In social statistics, data for individuals or aggregates

without obvious neighbour relations are the most frequent case.

When physical phenomena are studied, measurements and simulations are often organ-

ised in time (e.g. time series of temperature) and space (temperature in n measuring

stations in a geographical area, or force field simulations in a 3D grid of a specific res-

olution). These orders can actually be considered separate data dimensions; for clear

differentiation one may call a dimension which expresses a value (such as temperature)

a value dimension, while a dimension that expresses an order (e.g. a position in time or

space) can be called ordering dimension or indexing dimension.

TaskData analysis by Barrass (1997), chapter 4, provides a template that captures data

dimensions systematically, as well as initial ideas for desirable ways of representation of

and interaction with the data under study. As a practical example, the TaDa Analysis

made for the LoadFlow dataset as a preparation for the Science By Ear workshop is

reproduced here.

5.2.3 Task Data analysis - LoadFlow data

Name of Dataset: Load Flow

Date: March 12, 2006

Authors: Walter Hipp, Alberto de Campo (TaDa)

File: LoadFlow.xls (original), .tab, .mtx.

Format: excel xls original, tab delimited for Sc3, mtx format for pure data. The

file contains 672 lines with date and time, total electrical power consumption, and

consumption for five groups of power consumers.

Scenario

The Story:

Load Flow describes how the electrical power consumption of different groups of con-

sumers changes in time. A time series was taken for a week (in Winter 2004) of 15

minute average values, documenting date and time, total power consumption, and con-

sumption for a) households, b) trade, c) Agriculture, d) heating and warm water, and

e) street lighting.

Tasks for this data set:

Page 67: Science by Ear Diss DeCampo

54

• Find out which kinds of patterns can be discerned at which time domain; e.g. daily

cycles versus shorter fluctuations.

• Since all five individual channels have the same unit of measurement, find ways to

represent them in a way that their values and their movements can be compared

directly.

Table 5.2: The Keys

Question: Who uses how much power when?

Are there patterns that recur? At what time scales?

Are there overall periodicities?

Answers: One or several of the channels;

Yes/No, days/hour/times of day;

categories of pattern shapes

Subject: Relative proportions, patterns of change in time

Sounds: ? (none at the time the TaDa analysis was written)

TaDa

Table 5.3: The Task

Generic question: What is it? How does it develop?

Purpose: Identify, compare

Mode: interactive

Type: continuous

Style: exploration

Table 5.4: The Data/Information:

Level: Intermediate and global

Reading: Conventional (possibly direct)

Type: 5 channels, ratio

Range: continuous

Organization: time

Page 68: Science by Ear Diss DeCampo

55

Table 5.5: The Data:

Type: 5 channels of ratio scale with absolute zero

Range: Individual channels 0 2.24, total power 1.08 4.55

Organisation: Time

Appendix

Figure 5.4: LoadFlow - time series of dataset (averaged over many households)

Figure 5.5: LoadFlow - time series for 3 individual households

Page 69: Science by Ear Diss DeCampo

56

5.3 Synthesis models

Perceptualisation designs always require decisions in what manner precisely data values

(the ’sonificate’) determine perceptible representations (in the case of auditory repre-

sentation, the sonifications). While section 5.1 focused on which data subsets are to be

presented in the rendering, this section covers the question which technical aspects of

the sound synthesis algorithms deployed are to be determined by which data dimensions.

The three sonification strategies defined in section 5.1.2 are discussed in more depth,

and concrete examples of synthesis processes are provided in ascending complexity.

With all strategies from the very simplest to the most complex model-based designs,

decisions of mappings (of data dimensions or model properties) to synthesis parameters

are required; these decisions need to be informed by perceptual principles such as those

covered in chapter 2.

While building sonification designs may be technically simple, mapping choices are by no

means trivial. One aspect to consider is metaphorical proximity: Mappings that relate

closely to concepts in the scientific domain may well reduce cognitive load and thus

allow for better concentration on explorations tasks. (For a discussion of performance of

clearly defined tasks with ’intuitive’, ’okay’, random, and intentionally ’bad’ mappings,

see Walker and Kramer (1996), described in section 2.5.)

Another aspect is the clarity of the communicative function to be fulfilled in the research

context: What will a perceptible aspect of the sound serve as? Some possible categories

are:

analogic display of a data dimension - a value dimension mapped to a synthesis pa-

rameter which is straightforward to recognise and follow

a label identification for a stream - needed when several streams are heard in par-

allel

an indexing strategy - ordering the data by one dimension, then indexing into subsets

context information/orientation - mapping non-data; e.g. using clicks to represent

a time grid

Finally, it is essential to understand the resolution of perceptual dimensions, such as

Just Noticeable Differences (JNDs) of perceptual dimensions. Note that sound process

parameters need not be directly perceptible; they may govern aspects of the sound that

will indirectly produce differences that may be described perceptually in other terms.

Perceptual tests can be integrated into the sonification design process, like writing tests

to verify that new code works as intended. Writing examples that test whether a specific

concept produces audible differences for the data differences of interest can provide

Page 70: Science by Ear Diss DeCampo

57

such immediate confirmatory feedback, as well as direct learning experience for the test

listeners immediately at hand. Such examples also provide a good base for discussions

with domain specialists.

Similar mapping decisions come up in the process of designing electronic or software-

based music instruments; how the ranges of sensor/controller inputs (the equivalent to

data to be sonified) are scaled into synthesis parameter ranges determines how playing

that instrument will feel to a performer.

5.3.1 Sonification strategies

The three most common concepts, Continuous Data Representation, Discrete Point

Data Representation, and Model-Based Data Representation, correspond closely to the

approaches described first in Scaletti (1994). The examples given again use the LoadFlow

dataset, and loosely follow the order given by Scaletti. Pauletto and Hunt (2004) briefly

describe how different data characteristics sound under different sonification methods:

Static areas, trends, single outliers, discontinuities, noisy sections, periodicities (loops),

or near-periodicities are simple characteristics that may occur in a single data dimension,

and will be used as examples of easily detectable phenomena.

The data for the code examples can be prepared as follows:

( // load data file

q = q ? (); // a dictionary to store things by name

// load tab-delimited data file:

q.text = TabFileReader.read( "LoadFlow.tab".resolveRelative, true, true );

// keep the 5 interesting channels, convert to numbers

q.data = q.text.drop(1).collect { |line| line[3..7].collect(_.asFloat) };

// load one data channel into a buffer on the server

q.buf1 = Buffer.loadCollection(s, q.data.flop[0]); // households

);

5.3.2 Continuous Data Representation

Audification is the simplest case of continuous data representation: Typically, converting

the numerical values of a long enough time series into a soundfile is a good first pass at

finding structures in the data. Scaletti (1994) calls this 0th order sonification. Scaling

the numerical values is straightforward, as one only needs to fit them into the legal range

for the type of soundfile to be used; for high precision, 32 bit floating point data can be

converted to sound file formats without any loss of information. For audification, one can

simply scale the (expected or actual) maximum and minimum values to the conventional

-1.0 to +1.0 range for audio signals at full level. This maps the data dimension under

Page 71: Science by Ear Diss DeCampo

58

study directly to the amplitude of the audible signal.

By making the playback rate user-adjustable allows for simple time-scaling, one can

change expected gestalt size interactively. The fastest timescaling value will typically

be around 40-50 kHz, which includes the default sample rates of most common audio

hardware; this puts roughly 100.000 data points into working memory, which makes

audification the fastest option for screening large amounts of data with minimal prepro-

cessing.

Typical further operations to provide are: selection of an index range in the data, options

for looped and non-looped playback, and synchronised visual display of the waveform

under study. The EEGScreener described in chapter 9.1 is an example of a powerful,

flexible audification instrument.

Of the phenomena to be detected, static values will become silent: the human ear does

not hear absolute pressure values, and while audio hardware may output DC offsets,

loudspeakers do not render these as reproducible pressure offsets. Trends are also not

represented clearly: Ramp direction is not an audible property. Single outliers become

sharp clicks, and discontinuities (e.g. large steps) become be loud pops. Rapidly fluctu-

ating sections will sound noisy, and periodicities will be easy to discern even in they are

only weak components in mixed signals.

Code examples for 0th order - audification.

p = ProxySpace.push; // prepare sound

~audif.play; // start an empty sound source

// play entire week once, within 0.05 seconds

~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05) * 0.1 };

// try agriculture data

q.buf1.loadCollection(s, q.data.flop[2]);

// play the entire week looped

~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05, loop: 1) * 0.1 };

The next example loops over an adjustable range of days; starting day within the week

and loop length can be set in days.

(

~audif = { |dur = 0.05, day=0, length=1|

var stepsPerDay = 96;

var start = day * stepsPerDay;

var rate = q.buf1.duration / dur;

// read position in the data buffer

Page 72: Science by Ear Diss DeCampo

59

var phase = Phasor.ar(1, rate, start, start + (length * stepsPerDay));

BufRd.ar(1, q.buf1, phase, interpolation: 4) * 0.1;

};

)

The next example loops a single day, and allows moving the day-long time window, thusnavigating by mouse - this is the solution SBE TeamB developed.

(

~audif = {

var start = MouseX.kr; // time in the week (0 - 1)

var range = BufFrames.kr(q.buf1); // full range is one week.

var rate = 1 / 10; // guess a usable rate

var phase = Phasor.ar(0, rate, 0, range / 7) + (start * range);

var out = BufRd.ar(1, q.buf1, phase, interpolation: 4);

out = LeakDC.ar(out * 0.5); // remove DC offset

};

)

Parameter mapping continuous sonification, or what Scaletti calls 1st-order sonification,

maps data dimensions onto parameters that control a directly audible synthesis parame-

ter, such as pitch, amplitude (of a carrier signal), brightness, etc. Here, the simplest case

would be mapping to frequency (a synthesis parameter) respectively pitch (a perceptual

property of the rendered sound).

The first example maps the data range of 0 - 2.24 into pitch range of (midinote) 60 -

96, or frequencies between ca 260 and 2000 Hz, time-scaled into 3 seconds.

// loop a week’s equivalent of data

(

~maptopitch.play;

~maptopitch = { | loopdur = 3|

var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1);

var pitch = datasignal.linlin(0, 2.24, 60, 96); // scale into 3 octaves;

var sound = SinOsc.ar(pitch.midicps) * 0.2;

Pan2.ar(sound);

}

)

It may seem a little over-engineered here, but in general, it is a good idea to consider

here what the smallest data variations of interest are, and whether they will be audible

in the mapping used.

Page 73: Science by Ear Diss DeCampo

60

While data for Just Noticeable Differences for some perceptual properties of sound exist

in the literature, their values will depend on the experimental context and circumstances.

Thus, rather than relying only on experiments which were conducted for other purposes,

it makes sense to do at least some perceptual tests for the intended usage context.

For example given above, data resolution is 0.01 units; scaled from range [0, 2.24] into[60, 96] creates a minimum step of 0.01 * 36 / 2.24, or 0.16 semitone steps. Theliterature agrees that humans are most sensitive to pitch variation when it occurs at a(vibrato) rate of ca. 5 Hz, so a first test may use a pitch of 78 (center of the chosenrange), a drift/variation rate of 5 Hz, and a variation depth of +-0.08 semitones; allof these can be adjusted to find the marginal conditions where pitch variation is justnoticeable.

(

~test.play;

~test = { |driftrate = 5, driftdepth = 0.08, centerpitch = 78|

var pitchdrift = LFNoise0.kr(driftrate, driftdepth);

SinOsc.ar( (centerpitch + pitchdrift).midicps) * 0.2

};

)

Changing driftrate, driftdepth and center pitch will give an impression of how this be-

haves; to my ears, 0.08 is in fact very near the edge of noticeability. One could sys-

tematically test this by setting drift depth to random start values above and below the

expected JND, and having test persons do e.g. forced choice tests that would converge

on the border for a given drift rate and center pitch.

The next example maps the same data values to amplitude, which could seem metaphor-ically closer - the data value is consumed energy, and amplitude is directly correlatedto acoustical energy. However, the rendering is perceptually not very clear: humans aregood at filling in dropouts in audio signals, such as speech phonemes masked in noisyenvironments, or damaged by bad audio connections, such as intermittent telephonelines. The patterns that emerged in the pitch example, where the last three days areclearly different, almost disappear. Changing to linear mapping instead of exponentialmakes little difference.

(

~maptoamp.play;

~maptoamp = { | loopdur = 3|

var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1);

var amp = datasignal.linlin(0, 2.24, -60, -10).dbamp;

// var amp = datasignal * 0.2; // linear mapping

var sound = SinOsc.ar(300) * datasignal * 0.2;

Pan2.ar(sound);

}

Page 74: Science by Ear Diss DeCampo

61

)

The next example shows what Scaletti calls a second-order mapping. The data aremapped to a parameter that controls another parameter, phase modulation depth; how-ever, perceptually this translates roughly to brightness (which could be considered afirst-order audible property).

(

~maptomod.play;

~maptomod = { | loopdur = 3|

var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1);

var modulator = SinOsc.ar(300) * datasignal * 2;

var sound = SinOsc.ar(300, modulator) * 0.2;

Pan2.ar(sound);

}

)

5.3.3 Discrete Data Representation

An alternative approach to creating continuous signals based on data dimensions, one can

also create streams of events, which may sound note-like when slower than ca. 20 events

per second; at higher rates, they can best be described with Microsound terminology, as

granular synthesis.

The example below demonstrates the simplest case: one creates one synthesis event for

each data point, with a single data dimension mapped to one parameter. A duration of

3 seconds will create a continuous-seeming stream; 10 seconds will sound like very fast

grains, while 30 seconds takes the density down to 22.4 events per second, which can

seem like very fast marimba-like sounds.

(

~grain.play;

~grain = { |pitch=60, pan|

var sound = SinOsc.ar(pitch.midicps);

var envelope = EnvGen.kr(Env.perc(0.005, 0.03, 0.2), doneAction: 2);

Pan2.ar(sound * envelope, pan)

};

// ~grain.spawn([\pitch, 79]);

Tdef(\data, {

var duration = 10;

var datachannel = q.data;

var power;

Page 75: Science by Ear Diss DeCampo

62

q.data.do { |chans|

power = chans[0]; // households;

~grain.spawn([\pitch, power.linlin(0, 2.24, 60, 96)]);

(duration / datachannel.size).wait;

};

}).play;

)

5.3.4 Parallel streams

When the dimensions in a data set are directly comparable (like here, where they are all

power consumption measured in the same units at the same time instants), it is concep-

tually convincing to render them as parallel streams. Auditory streams, as discussed in

Bregman (1990) and Snyder (2000), are a perceptual concept: a stream is formed when

auditory events are grouped together perceptually, and multiple streams can form when

all the auditory events separate into several groups.

With the example above, a minimal change can be made to create two parallel streams:

Instead of creating one sound event for one data dimension, one creates two, and pans

them left and right for separating the two streams by spatial location.

(

Tdef(\data, {

var duration = 10;

var datachannel = q.data;

var powerHouse, powerAgri;

~grain.play;

q.data.do { |chans|

powerHouse = chans[0];

powerAgri = chans[2];

~grain.spawn([\pitch, powerHouse.linlin(0, 2.24, 60, 96), \pan, -1]);

~grain.spawn([\pitch, powerAgri.linlin(0, 2.24, 60, 96), \pan, 1]);

(duration / datachannel.size).wait;

};

}).play;

)

When presenting several data dimensions simultaneously, one can obviously map them

to multiple parameters of a single synthesis process, thus creating one stream with

multiparametric controls. This makes the individual events fairly complex, and may

require that each event has more time to unfold perceptually. (In the piece Navegar, a

fairly complex mapping is used, see section 11.3.)

Page 76: Science by Ear Diss DeCampo

63

It should be noted that what is technically created as one stream of sound events is not

guaranteed to fuse into one perceptual stream - it may split into several layers, just like

separately created multiple streams may perceptually merge into single auditory stream.

In fact, as perception is strongly influenced by a listener’s attitude, one can intentionally

choose analytic or holistic listening attitudes; either focusing on details of rather few

streams, or listening to the overall flow of the unfolding soundscape - whether it is a

piece of music or a sonification.

5.3.5 Model Based Sonification

In Model Based Sonification (Hermann and Ritter (1999)), the general concept is that the

data values are not mapped directly, but inform the state of a model; properties of that

model (which is a kind of front-end) are then accessed when user input demands it (e.g.

by exciting the model with energy, somewhat akin to playing a musical instrument). The

model properties then determine how the sound engine renders the current user input;

this backend inevitably contains some mapping decisions to which the considerations

given here can be applied.

Till Bovermann’s example implementation of the Data Sonogram (Bovermann (2005)) is

a good compact example for MBS. The approach is to treat the data values as points in

n-dimensional space (for the example Iris data set, 4); then user input triggers a circular

energy wave propagating from a current user-determined position, and the reflections of

each data point are simulated by mapping distance (in 4D space) to amplitude and delay

time, as if in natural 3D space. The other parameters for the sound grains (frequency,

number of harmonics) are also determined by data based mappings.

The Wahlgesange sonification based on this examples uses somewhat more elaborate

mapping: Distance in 2D is mapped to delay and amplitude, with user-tunable scaling;

panning is determined by 2D circular coordinates; the data value of interest (voter

percentage) is mapped to the sound grain parameter pitch, and controls for attack/decay

times make the tradeoff between auditory pitch resolution and time resolution explicit.

Both of these examples are too extended for the context here; but they are both available

online, and Wahlgesange is described in more detail in section 6.2.

While it would be worthwhile to analyse more MBS examples in detail, this is beyond

scope of the present thesis. Further research will be necessary for a more fine-grained

integration of the model-based approach into the context of the sonification models

given here.

Page 77: Science by Ear Diss DeCampo

64

5.4 User, task, interaction models

Humans experience the world with all their senses, and interacting with objects in the

world is the most common everyday activity they are well trained at. For example,

handling physical objects may change their visual appearance, and touching, tapping or

shaking them may produce acoustic responses they can use to learn about objects of

interest. Perception of the world, action in it, and learning are tightly linked in human

experience, as discussed in section 2.3.

In artificial systems that model aspects of the world, from office software to multimodal

display systems, or sonification systems in particular, interaction crucially determines how

users experience such a system: whether they can achieve tasks correctly (effectiveness)

with it, whether they can do so in reasonable amounts of time (efficiency), and whether

they enjoy the working process (positive user experience, pleasantness).

This section looks at potential usage situations of sonification designs and systems: the

people working in these contexts (’sonification users’); the goals they will want to pursue

by means of (or supported by) sonification; the kinds of tasks entailed in pursuing these

goals; the kinds of interfaces and/or devices that may be useful for these goals; and

some notions of how to go about matching all of these.

5.4.1 Background - related disciplines

Interaction is a field where a number of disciplines come into play:

Human Computer Interaction (HCI) studies the alternatives for communication between

humans and computers (from translating user actions into input for a computer system

to rendering computer state into output for the human senses), sometimes to amazing

depths of detail and variety (Buxton et al. (2008); Dix et al. (2004); Raskin (2000)).

Musical instruments are highly-developed physical interfaces for creating highly differ-

entiated acoustic sound, with a very long tradition; in electronic music performance,

achieving similar degrees of control flexibility (or better, control intimacy) has long been

desirable. While the mainstream music industry has focused on a rather restricted set

of protocols (MIDI) and devices (mostly piano-like keyboards, simulated tape machines,

and mixing desks), experimental interfaces that allow very specific, sometimes idiosyn-

cratic ideas of musical control have been an interesting source of problems for interested

engineers. The research done at institutions like STEIM17 (see Ryan (1991)) and CN-

MAT18 (Wessel (2006)) has made interface and instrument design its own computer

music sub-discipline, with its own conference (NIME19, or ’New Instruments/Interfaces

17http://www.steim.nl18http://cnmat.berkeley.edu19http://www.nime.org

Page 78: Science by Ear Diss DeCampo

65

for Musical Expression’, since 2001).

Computer game controllers tend to be highly ergonomic and very affordable; thus they

have become a popular resource for artistic (re-)appropriation as cheap and expressive

music controllers: Gamepads, and more recently, Wii controllers, have both been adopted

as is, and creatively rewired for specialised artistic uses.

This has been part of an emerging movement toward more democratic electronic devices:

Beginning with precursors like Circuit Bending (Ghazala (2005)), extending the design

of sound devices my introducing controlled options for what engineers might consider

malfunction), designers have created open-source hardware - such as the Arduino mi-

crocontroller board20 - to simplify experimentation with electronic devices. With these

developments, finding ways to create meaningful connections and new usage contexts for

object-oriented hardware (Igoe (2007)) has become interesting for a much larger public

than strictly electronics engineers and tinkerers.

5.4.2 Music interfaces and musical instruments

CD/DVD players or MP3 players tend to have rather simple interfaces: play the current

piece, make it louder or softer, go to the next or previous track, use randomised or

ordered playback of tracks.

A piano has a simple interface for playing single notes: one key per note, ordered

systematically, and hitting the key with more energy will make it louder. Thus, beginners

can experience rather fast success at finding simple melodies on this instrument. Playing

polyphonic music really well on piano is a different matter; as Mick Goodrick puts it, in

music there is room for infinite refinement (Goodrick (1987)).

On a violin, learning to produce good tone already takes a lot of practice; and playing in

tune (for whichever musical culture one is in) requires at least as much practice again.

(One is reminded of the joke where a neighbour asks, ”why can’t your children spend

more time practicing later, when they can already play better?”)

Instruments from non-western cultures may provide interesting challenges: Playing nose

flutes is a good example of an instrument that involves the coordination of unusual

combinations of body parts, thus developing (in Western contexts) rather unique skills.

However, a violin allows very subtle physical interaction with musical sound while it is

sounding, and in fact requires that skill for playing expressively. On piano, each note

sounds by itself once it has been struck, thus the relations between keys pressed, such

as chord balance, micro-timing between notes, and agogics are the main strategies for

playing expressively on the piano.

In Electronic Music performance, mappings between user actions as registered by con-

20http://www.arduino.cc

Page 79: Science by Ear Diss DeCampo

66

trollers (input devices like the ones HCI studies, buttons, sliders, velocity-sensitive keys,

sensors for pressure, flexing, spatial position etc.) and the resulting sounds and musical

structures are essentially arbitrary - there are no physical constraints as in physical instru-

ments. Designing satisfying personal instruments with digital technology is an interesting

research topic in music and media art; e.g. Armstrong (2006) bases his approach on

a deep philosophical background, and discusses his example instrument in these terms;

Jorda Puig (2005) provides much historical context of electronic instruments, and dis-

cusses an array of his own developments in that light. Thor Magnusson’s (and others’)

ongoing work with ixi software21 explores applying intentional constraints to interfaces

for creating music in interesting ways.

5.4.3 Interactive sonification

The main researchers who have been raising awareness for interaction in sonification

are Thomas Hermann and Andy Hunt, who started the series of Interactive Sonification

workshops, or ISon22. In the introduction to a special issue of IEEE Multimedia resulting

from ISon2004, the editors give the following definition:

”We define interactive sonification as the use of sound within a tightly closed human-

computer interface where the auditory signal provides information about data under

analysis, or about the interaction itself, which is useful for refining the activity.” (Her-

mann and Hunt (2005), p 20)

In keeping with Hermann’s initial descriptions of Model-Based Sonification (Hermann

(2002)); they maintain that learning to ’play’ a sonification design with physical inter-

action, as with a musical instrument, really helps users acquire an understanding of the

nature of the perceptualisation processes involved and of the data to be explored. They

find that there is not enough research on how learning in interactive contexts actually

occurs.

The Neuroinformatics group at University Bielefeld (Hermann’s research group) has

studied a number of very interactive interfaces in sonification contexts: recognizing

hand postures to control data exploration (Hermann et al. (2002)), a malleable surface

for interaction with model-based sonifications (Milczynski et al. (2006)), tangible data

scanning using a physical object to control movement in model space (Bovermann et al.

(2006)), and others.

At University of York, in Music Technology, Hunt has studied both musical interface

design issues (e.g. Hunt et al. (2003)) and worked on a number sonification projects

mainly with Sandra Pauletto (e.g. Hunt and Pauletto (2006)). Pauletto’s PhD thesis,

Interactive non-speech auditory display of multivariate data (Pauletto (2007)), discusses

21http://www.ixi-software.net/22 http://interactive-sonification.org/

Page 80: Science by Ear Diss DeCampo

67

interaction and sonification in great detail (pp. 56-67), and studies central sonification

issues with user experiments: The first two experiments compare listening to auditory

displays of data (audifications of helicopter flight data, sonifications of EMG (elec-

tromyelography) data) with their traditional analysis methods (visually reading spectra,

signal processing analysis). In both cases, auditory display of large multi-variate data

sets turned out to be an effective choice of display.

Her third experiment directly studies the role of interaction in sonification: Three al-

ternative interaction methods are provided for exploring synthetic data sets to locate a

given set of structures. A low interaction method allows selection of data range, play-

back speed, and play/stop commands. For the medium interaction method, a jog wheel

and shuttle is used to navigate the sonification at different speeds and direction. The

high interaction method lets the analyst navigate by moving the mouse over a screen

area that corresponds to the data, like tape scrubbing.

Both objective measurements and subjective opinions found the low interaction method

less effective and efficient, and preferred the two higher interaction modes. Interestingly,

users preferred the medium interaction mode for its option to quickly set the sonification

parameters, and then letting it play while concentrating on listening; the high interaction

method requires constant user activity to keep the sound going. It should be noted here

that these results strictly apply only to the specific methods studied, and cannot be

generalised; however, they do provide interesting background.

5.4.4 ”The Humane Interface” and sonification

The field of Human Computer Interaction (HCI) is very wide and diverse, and cannot

be covered here in depth. However, a rather specialised look at some examples of inter-

faces may suffice to provide enough context for discussing the main issues in designing

sonification interfaces.

Rather than attempting to cover the entire field, I will take a strong position statement

by an expert in the field as a starting point: Jef Raskin was responsible for the Macintosh

Human Interface design guidelines that set the de facto standard for best practice in HCI

for a long time, and his book ”The Humane Interface” (Raskin (2000)) is an interesting

mix between best practice patterns and rather provocative ideas.

Here is a brief overview of the main statements by chapter:

1. Background - The central criterium for interfaces is quality of the interaction; it

should be made as humane as possible. Humane means responsive to human needs, and

considerate of human frailties. As one example, the user should always determine the

pace of interaction.

2. Cognetics - Human beings only have a single locus of attention, and a single focus

of attention, which in interactions with machines is nearly always on the task they try

Page 81: Science by Ear Diss DeCampo

68

to achieve.23 Computers and interfaces should not distract users from their intentions.

Human beings always tend to form habits; user interfaces should allow the formation of

good habits, as through benign habituation competence becomes automatic. A possible

measure of how well an interface supports benign habituation is to imagine whether a

blind user can learn it.

As a more general point, humans mostly use computers to get work done; here, user

work is sacred, and user time is sacred.

3. Modes - Modes are system states where the same user gesture can have different

effects, and are generally undesirable; one should eliminate modes where possible. The

exception to the rule is physically maintained modes, which he calls quasi-modes (entered

e.g. by holding down a special key, and reverted to normal when the key is released.)

Visible affordances should provide strong clues as to their operations. If modes cannot be

entirely avoided, monotonic behaviour is the next best solution: a single gesture always

causes the same single operation; and in a mode where the operation is not meaningful

the gesture should do nothing. It is worth keeping in mind that everyone is both expert

and novice at the same time when different aspects of a system are considered.

4. Quantification - Interface efficiency can be measured, e.g. with the GOMS Keystroke

model. For most cases, ’back of the envelope’ calculations give a good first indication

of efficiency; standard times for hitting a key, pointing by mouse, moving from mouse to

keyboard, and mentally preparing an action are sufficient for that. Finding the minimum

combination for a given task is likely to make that task more pleasant to perform.

Obviously, the time a user is kept waiting for software to respond should be as low as

possible; while a user is busy with other things, s/he will not notice waiting times.

5. Unification - This chapter ranges far beyond the scope needed here, eventually making

a case that operating systems and applications should disappear entirely. Fundamental

actions are catalogued, and variants of computer-wide string search are discussed as one

example of how system-wide unified behaviour should work.

6. Navigation - The adjectives ’intuitive’ and ’natural’ when used for interfaces generally

translate to ’familiar’. Navigation, as with the ZoomWorld approach might be interesting

for organising larger collections of sonification designs; for the context of the SonEnvir

project these ideas were not applicable.

7. Interface issues outside the user interface - Programming Environments are notoriously

bad interfaces, and actually have been getting worse: On a 1984 computer, starting up,

running Basic, and typing a line to evaluate 3 +4 may be accomplished in maybe 30

seconds; on a current (2000) computer, every one of these steps takes much longer,

even for expert users.

23Raskin’s motto for the chapter is a quote from a character in the TV series Northern Exposure,

Chris: ”I can’t think of X if I’m thinking of Y.”

Page 82: Science by Ear Diss DeCampo

69

Relevance to sonification and the SonEnvir project

The most closely related notion to disappearing system software (chapter 5) is the

Smalltalk heritage of SC3. Smalltalk folklore says that ’when Smalltalk was a little

girl, she thought she was an operating system’ - one could do almost everything within

Smalltalk, including one of Raskin’s major desirables, namely, defining new operations by

text at any time, which change or extend the ways things work in a given environment.

The question what ’user content’ is actually being created is extremely important in

sonification work: In sonification usage situations, ’content to keep’ can comprise uses of

a particular data file, particular settings of the sonification design, perceptual phenomena

observed with these data and settings, and text documentation, i.e., descriptions of all

of the above and possibly user actions to take to cause certain phenomena to emerge.

The text editor and code interface in SC3 is well suited for this: commands to invoke

a sonification design (e.g. written as a class), code to access a specific data file, and

notes of observations can be kept in a single document, as they are all just text. Across

different sonification designs, SC3 behaves uniformly in this respect.

Compared to most programming environments, the SC3 environment allows very fluid

working styles. Documentation within program creation (literate programming, as Don-

ald Knuth called it), is supported directly.

5.4.5 Goals, tasks, skills, context

From a pragmatic point of view, a number of compromises need to be balanced in

interaction design, especially when it is just one of several aspects to be negotiated:

• Simple designs are quicker to implement, test and improve than more complex

designs. Given that one usually understands requirements much better by imple-

menting and discussing sketches, simpler designs will often be better.

• Exotic devices can be very interesting; however, they limit transferability to other

users, and will require extra costs and development time. Even when there is a

strong reason to use a special interface device, including a fallback variants with

standard UI devices is recommended.

• Functions should be clearly made available to the users; usually that means making

them visible affordances. (Buxton et al. (2008) argues here that the attitude ’you

can do that already’, in some arcane way experts may know about, means that

final users will not use that implemented function.)

Goals are firmly grounded in the application domain, and with the users. What do users

want to achieve with the sonification design to be created? The goals will naturally

Page 83: Science by Ear Diss DeCampo

70

be different for different domains, datasets, and contexts (e.g. research prototypes or

applications for professional use); nevertheless these examples may apply to most designs:

• experience the differences between comparable datasets of a given kind

• find phenomena of interest within a given dataset, e.g. at specific locations, with

specific settings

• document such phenomena and their context, as they may become findings

• make situations in which phenomena of interest occurred repeatable for other users

The interaction design of a sonification design should allow the user’s focus of attention

to remain at least close to these top-level goals. Ideally, the design should add as little

cognitive load as necessary for the user, to keep her attention free for the goals.

The sonification design’s interface should offer ways to achieve all necessary and useful

actions toward achieving these goals. The concepts for these actions should obviously

be formulated in terms of the mental model the user has of the data and the domain

they come from.

Tasks comprise all the actions users take to achieve their top-level goals. Tasks can

be directly functional for attaining a goal, or necessary to change the system’s state

such that a desired function becomes available. Systems that often require complicated

preparation to get things done tend to distract users from their goals, and are thus

experienced as frustrating. Some example tasks that come up when using a sonification

design are:

• load a sonification design of choice (out of several available)

• load a dataset to explore

• start the sonification

• compare with a different datasets

• tune the sonification design while playing

• explore different regions of a dataset by moving through them

• look up documentation of the sonification design details

• start, repeat, stop sonification of different sections

• store a current context: a dataset, current selection, current sonification parameter

settings, and accompanying text/explanation.

Page 84: Science by Ear Diss DeCampo

71

For all these tasks, there should be visible affordances that communicate to the user how

the related tasks can be done. Ideally, a single task should be experienced as one clear

sequence of individual actions (or subtasks).

More complex tasks will be composed of a sequence of subtasks. As novice users ac-

quire more expertise, they will form conceptual chunks of these operations that belong

together. As long as these subtasks require meaningful decisions, it is preferable to keep

them separate; if there is only a single course of actions, one should consider making it

available as a single task.

Skills are what users need to have or acquire to use an interface efficiently. These can

include physical skills like manual dexterity, knowledge of operating systems, and other

skills. In the HCI literature, two conflicting viewpoints can be found here: a. users

already possess skills that should be re-used; one should add as little learning load as

possible, and enable as many users as possible to use a design quickly; b. interfaces

should allow for long-term improvement, and enable motivated users to learn to do

very complex things very elegantly eventually. Which of these apply will depend on the

context the sonification is designed for; in any case it is advisable to consider well what

one is expecting of users in terms of learning load.

Some necessary knowledge / skills include:

• locating files (e.g. program files, data files)

• reading documentation files

• selecting and executing program text

• using program shortcuts (e.g. start, stop)

• using input devices like mice, trackballs, tablets

Context should be represented clearly to reduce cognitive load: all changeable settings

should be e.g. visible on a graphical user interface, such as choice of data file, sonification

parameter settings, current subset choice, and others. Often, the display elements for

these can double as affordances that invite experimentation. In some cases, it can be

useful to display the current data values graphically, or to double auditory events visually

as they occur in realtime playback.

5.4.6 Two examples

EEG players

In the course of the SonEnvir projects, most of the interaction design was done in collab-

orative sessions. One exception that required more formal procedures was redesigning

Page 85: Science by Ear Diss DeCampo

72

the EEG Screener and Realtime Player (discussed in depth in chapter 9.1), as the in-

tended expert users were not available for direct discussion. These designs went through

a full design revision, with a task analysis that is identical for most of the interface.

The informal ’wish list’ included: Simple to use, start in very few steps, low effort, keep

results reproducible; include a small example data file that can be played directly.

The task analysis comprised these items:

Goals:

• quickly screen large EEG data files to find episodes to look at in detail

Tasks:

1. locate and load EEG files in edf format

2. select which EEG electrode channels will be audible

3. select data range to playback:

which time segment within file

speedup factor, filtering

4. play control: play, stop, pause, loop;

feedback current location

5. document current state so it can be reproduced by others

6. include online documentation in german

7. later: prepare example files for different absences

All of these were addressed with the GUI shown in figure 9.2: 1. File selection is done

with a ’Load EDF’ button and regular system file dialog; for faster access, the edf file is

converted to soundfiles in the background, and feedback is given when ready.

2. Initially, this was only planned with popup menus and the electrode names; however,

making a symbolic map of the electrode positions on the head and letting users drag-

and-drop electrode labels to the listening locations (see figure 9.3) proved was much

appreciated by the users.

3. Time range within the file was realised in multplie ways: graphical selection within a

soundfile view showing the entire file; providing the start time, duration, and end time as

adjustable number boxes; and showing the selected time segment in a magnified second

soundfile view. This largely follows sound editor designs, which EEG experts are typically

not familiar with.

Page 86: Science by Ear Diss DeCampo

73

4. Play controls are implemented as buttons; play state is shown by button color (white

font is active) and by a moving cursor in both soundfile views. The cursor’s location

is also given numerically. Looping and filtering is also controlled by buttons; in looped

mode, a click plays when the loop point is crossed. In filter mode, the volume controls

for the individual bands are enabled. When filtering is off, these controls are disabled for

clarity.

Adjustable playback parameters are all available as named sliders, with the exact numer-

ical values and units. (Recommended presets for different usage situations were planned,

but not realised eventually.)

5. The current state can be documented with buttons and shortcuts: The ’Take Notes’

button opens a text window, which contains the current filename; the current time and

playback settings can be pasted into it, so they can be reconstructed later.

6. The ’Help’ button opens a detailed help page in German.

The EEG Realtime Player re-uses this design with minimal extensions, as shown in figure

9.5; this reduces learning time for both designs, which are intended for the same group

of users. The main differences are the use of different time units (seconds instead of

minutes) and more parameter controls, as the synthesis concept is more elaborate.

Wahlgesange

This design is described in detail in section 6.2; its GUI is shown in figure 6.5. As this

design follows a Model-Based concept, the realtime interaction mode is central:

Goals:

• compare geographical distribution of voters for ca. 12 parties in four elections in

a region of Austria.

Tasks:

1. switch between a fixed range of elections and parties to explore

2. ’inject energy’ by interaction to excite the model at a visually chosen location

3. compare parties and elections quickly after another

4. adjust free sonification parameters like timescale

1. Choosing which election and party results to explore is done with two groups of

buttons which show all available choices. The currently active button has a white font.

2. As common in Model-Based Sonification, this design requires much more interaction:

to obtain sound, users must click on the geographical map. This causes a circular wave

Page 87: Science by Ear Diss DeCampo

74

to emerge from the given location, which spreads over the entire extent of the map.

Each data point is indexed by spatial location on the map; when the expanding wave

hits it, a sound is played based on its value for the current data channel (voter turnout

for one of the parties).

3. For faster comparisons, switching to a new election or party plays the sonification

for the new choice with the last spatial location; switching between parties can also be

done by typing reasonably mnemonic characters as shortcuts.

4. The free sonification parameters like expansion speed of the wave, number of data

points to play (to reduce spatial range), etc., can be adjusted with sliders which also

show the precise numerical values.

Full explanations are given in a long documentation section before the program code,

which was deemed sufficient at the time.

An interesting possible extension here would be the use of a graphical tablet to obtain a

pressure value when clicking on the map; this would be equivalent to a velocity-sensitive

MIDI keyboard. However, in the interest of easier transfer to other users, we preferred

to keep the design independent of specific non-standard input devices.

5.5 Spatialisation Model

The most immediate quality of a sound event is its localization: What direction did

that sound come from? Is it near or far away? We often spontaneously turn toward an

unexpected sound, even if we were not paying attention earlier.

Spatial direction is also one of the stronger cues for stream separation or fusion (Breg-

man (1990), Snyder (2000), Moore (2004)); when sound events come from different

directions, they are unlikely to be attributed to the same physical source.

Music technology has developed a variety of solutions for spatialising synthesized sound,

and both SuperCollider3 and the SonEnvir software environment support multiple ap-

proaches for different source characteristics, and different reproduction setups.

Sources can either be continuous or short-term single events; while continuous sources

may have fixed or moving spatial positions, streams of individual events may have dif-

ferent spatial positions for each event. In effect, giving each individual sound event its

own static position in space is a granular approach to spatialisation.

(1D) Stereo rendering over loudspeakers works well for few parallel streams, where

spatial location mainly serves to identify and disambiguate streams. The most common

spatialisation method employed is amplitude panning, which relies on the illusion of

phantom sound sources created between a pair of loudspeakers, with the perceived

position depending on the ratio of signal levels between the two speakers. Panorama

Page 88: Science by Ear Diss DeCampo

75

potentiometers (pan pots) on mixing desks employ this method. Sound localisation on

such setups is of course compromised at listening positions outside the sweet spot.

(2D) Few channel rendering is typically done with horizontal rings of 4 - 8 speakers.

This has become more easy in recent years with 5.1 (by now, up to 7.1) home audio

systems, which can be used with external input from multichannel audio interfaces. Such

systems can spatialize sources on the horizontal plane quite well, and can be used as up

to 7 static physical point sources as well.

(3D) Multichannel systems, such as the CUBE at IEM Graz with 24 speakers, or the An-

imax Multimedia Theater in Bonn with 40 speakers, are usually designed for symmetry,

spreading a number of loudspeakers reasonably evenly on the surface of a sphere. This

allows for good localisation of sources on the sphere, with common spatialisation ap-

proaches including vector based panning, Ambisonics, and Wave Field Synthesis. Source

distances outside the sphere can be simulated well by reducing the level of the direct

sound relative to the reverb signal, and lowpass filtering it.

(1D/3D) Headphones are a special case: they can be used to listen to stereo mixes

for loudspeakers (and most listeners today are well trained at localising sounds with

this kind of spatial information); and they can be used for binaural rendering, i.e. sound

environments that feature the cues which allow for sound localisation in ’normal’ auditory

perception. For music, this may be done with dummy head recordings; for auditory

display, this is done with simulations of these cues applied to all the sound sources

individually to create their spatial characteristics.

5.5.1 Speaker-based sound rendering

Physical sources

For multiple speaker setups, a simple and very effective strategy is to use individual

speakers as real physical sources. The main advantage is that physics really help in this

case; when locations only serve to identify streams, as with few fixed sources, fixed single

speakers work very well.

Amplitude Panning

The most thorough overview on amplitude panning methods is provided in Pulkki (2001).

Note that all of the following methods work for both moving and static sources. Code

examples for all these are given in Appendix B.1.

1D: In the simplest case of panning between two speakers, equal power stereo panning

is the standard method.

2D: The most common case here is panning to a horizontal, symmetrical ring of n

Page 89: Science by Ear Diss DeCampo

76

speakers by controlling azimuth; in many implementations, the width over how many

speakers (at most) the energy is distributed can be adjusted.

In case the angles along the ring are not symmetrical, adjustments can be made by

remapping, e.g. with a simple breakpoint lookup strategy. However, using the best

geometrical symmetry attainable is always superior to compensation for asymmetries.

Often it is necessary to mix multiple single-channel sources down to stereo: The most

common technique for this is to create an array of pan positions (e.g. n steps from 80%

left to 80% right), to pan every single channel to its own stereo position, and summing

these stereo signals.

Mixing multiple channel sources into a ring of speakers can be done the same way; the

array of positions then corresponds to (potentially compensated) equal angular distances

around the ring. Both larger numbers of channels can be panned into rings of fewer

speakers, and vice versa.

3D: For simple geometrical arrangements of speakers, straightforward extensions of am-

plitude panning will suffice. E.g. for the CUBE setup at IEM consists of rings of 12,

8, and 4 speakers (bottom, middle, top); the setup at Animax Multimedia Theater in

Bonn adds a bottom ring of 16 speakers. For these systems, having 2 panning axes, one

between the rings for elevation, and one for azimuth in each ring, works well.

Again, the speaker setup should be as symmetrical as possible; compensation can be

trickier here. Generally speaking, even while compensations for less symmetrical se-

tups are mathematically plausible, spatial images will be worse outside the sweet spot.

Maximum attainable physical symmetry cannot be fully substituted by more DSP math.

Compensating overall vertical ring angles and individual horizontal speaker angles within

each ring is straightforwrd with the remapping method described above. For placement

deviations that are both horizontal and vertical, using fuller implementations of Vector

Based Amplitude Panning (VBAP, see e.g. Pulkki (2001)) is recommended24; however,

this was not required within the context of the SonEnvir project, or this dissertation.

For placement deviations that are both horizontal and vertical, it is preferable to have .

However, this was not needed within the context of the SonEnvir project.

Ambisonics

Ambisonics is a multichannel reproduction system developed independently by several

researchers in the 1970s Cooper and Shiga (1972); Gerzon (1977a,b), based on the idea

that spherical harmonics can be used to encode and decode directions from which sound

energy comes; a good basic introduction to Ambisoncis math is online here25.

24VBAP has been implemented for SC3 in 2007 by Scott Wilson and colleagues, see

http://scottwilson.ca/site/Software.html25 http://www.york.ac.uk/inst/mustech/3d audio/ambis2.htm

Page 90: Science by Ear Diss DeCampo

77

The simplest form of Ambisonics, first order, can be considered an extension of the

classic Blumlein MS stereo microphone technique: in MS, one uses an omnidirectional

microphone as a center channel (M for Mid), and a figure-of-8 mike to create a Side

signal (S). By adding or subtracting the side signal from the center, one obtains Left

and Right signals; e.g. L = M-S, R = M+S. By using figure-of-8 mikes for Left/Right,

Front/Back, and Top/Bottom signals, one obtains a first order Ambisonic microphone,

such as those made by the Soundfield company26. The channels are conventionally

named W, X, Y, Z. Such an encoded recording can be decoded simply for speaker

positions on a sphere.

In the 1990s, the mathematics for 2nd and 3rd order Ambisonics were developed to

achieve increasingly higher spatial resolution; these are formulated in Malham (1999),

and also available online here27.

Extensions to even higher orders were realised recently by IEM researchers (Musil et al.

(2005); Noisternig et al. (2003)), with multiple DSP optimizations implemented as a

PureData library. Using MATLab tools written by Thomas Musil, coefficients for encod-

ing/decoding matrices for different speaker combinations and tradeoff choices can be

calculated offline, and can then simply be read in from text files in the realtime platform

of choice. The most complex use of this library so far has been the VARESE system

(Zouhar et al. (2005)). This is a dynamic recreation of the acoustics of the Philips pavil-

ion at Brussels World Fair, for which Edgard Varese’s Poeme Electronique (and Iannis

Xenakis’ concrete PH) was composed.

While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team

decided to write a consistent new implementation of Ambisonics in SC3, based on a

subset of the existing PureData libraries. This package was realised up to third order

Ambisonics by Christopher Frauenberger for the AmbIEM package, available here28.

It supports the main speaker setup of interest, the IEM Cube, as well as a setup for

headphone rendering as described below.

5.5.2 Headphones

For practical reasons, such as when working in one room with colleagues, scientists exper-

imenting with sonifications are required to use headphones. Many standard techniques

work well for lateralising sounds, which can be entirely sufficient for making streams

segregate or fuse as desired. In order to achieve perceptually credible simulations of

auditory cues for full localisation, for example, making sounds appear to come from the

front, or above, more complex approaches are needed; the most common approach is to

model the cues by means of which the human ear determines sound location.

26http://www.soundfield.com27 http://www.york.ac.uk/inst/mustech/3d audio/secondor.html28 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/

Page 91: Science by Ear Diss DeCampo

78

Sound localisation in human hearing depends on the differences between the sound heard

in the left and right ears; in principle, three kinds of cues are involved:

Interaural Level Difference (ILD), which is the level difference of a sound source between

the ears, dependent on the source’s direction. This can roughly be simulated with

amplitude panning, which is however limited to left/right distinction in headphones

(usually called lateralisation). Being so similar to amplitude panning, it is fully compatible

with stereo speaker setups.

Interaural Time Difference (ITD), the difference in arrival time of a sound between the

ears. This is on the order of a maximum of 0.6 msec: at a speed of sound of 340 m/sec,

this is the time equivalent to a typical ear distance of 21 cm. This can be simulated well

for headphones; but because delay panning does not transfer reliably for speakers (one

hardly ever sits exactly on the equidistance symmetry axis of one’s loudspeaker pair), it

is hardly used. Like amplitude panning, delay panning only creates lateralisation cues.

Head Related Transfer Functions - HRTF / HRIR

Head Related Transfer Functions (HRTFs) or equivalently, Head Related Impulse Re-

sponses (HRIRs) capture the fact that both ITD and ILD are frequency-dependent: For

every direction of sound incidence, the sound arriving at each ear is colored by reflections

on the human pinna, head, and upper torso; such pairs of filters are quite characteristic

for the particular direction they corrsepond to. Roughly speaking, localising a heard

sound depends on extracting the effect of the pair of filters that colored it, and inferring

the corresponding direction from the characteristics of this pair of filters; obviously, this

works more reliably on known sources.

HRTFs/HRIRs can be measured by recording known sounds from a set of directions with

miniature microphones at the ear, and extracting the effect of the filters. Obviously,

HRTF filters are different for every person (as are people’s ears and heads), and every

person is completely accustomed to decoding sound directions from her own HRTFs.

Thus, there is no miracle HRTF curve that works perfectly for everyone; however, because

some features in HRTFs are generalizable (such as the directional bands described in

Blauert (1997)), the idea of using HRTFs to simulate sounds coming from different

directions has become quite popular. The KEMAR set of HRIRs (see Gardner and

Martin (1994); the data are available online here29) is based on recordings made with

a dummy head, and is considered to work reasonably well for different listeners. The

IRCAM has also published individual HRIRs of ca. 50 people for the LISTEN project

(Warusfel (2003), online here30), so one can try to find matches to suit a particular

person’s preferences well.

29http://sound.media.mit.edu/resources/KEMAR/full.tar.Z30http://recherche.ircam.fr/equipes/salles/listen/

Page 92: Science by Ear Diss DeCampo

79

Implementing fixed HRIRs for fixed source locations is straightforward, as one only needs

to convolve the sound source with one pair of HRIRs. However, this is not sufficient:

static angles tend to sound like colouration (as caused by inferior audio equipment); in

everyday life, we usually move our heads slightly, creating small changes in ITD, ILD

and HRTF which quickly disambiguate any localisation uncertainties. Thus, creating

convincing moving sources with HRTF spatialisation is required, which is not trivial: as

a source’s position changes, its impulse responses must be updated quickly and smoothly.

There is no generally accepted scheme for efficient high-quality HRIR interpolation, and

convolving every source separately is computationally expensive.

Ambisonics and Virtual Binaural Rendering

For complex changing scenes, the IEM has developed a very efficient approach for bin-

aural rendering (Musil et al. (2005); Noisternig et al. (2003)): In effect, taking a virtual,

symmetrical speaker setup, and spatializing to that setup with Ambisonics; then render-

ing these virtual speakers as point sources with their appropriate HRIRs, thus arriving at

a binaural rendering. This provides the benefit that the Ambisonic field can be rotated

as a whole, which is really useful when head movements of the listener are tracked, and

the binaural rendering is designed to compensate for them. Also, the known problems

with Ambisonics when listeners move outside the sweet zone disappear; when one carries

a setup of virtual speakers around one’s head, one is always right in the center of the

sweet zone. This approach has been ported to SC3 by C. Frauenberger; its main use is in

the VirtualRoom class, which simulates moving sources within a rectangular box-shaped

room. This class is especially useful for preparing spatialisation with multi-speaker setups

by headphone simulation.

Among other things, the submissions for the ICAD 2006 concert31, described also in

section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web

documentation32.

One can of course also spatialize sounds on the virtual speakers by any of the simpler

panning strategies given above as well; this trades off easy rotation of the entire setup

for better point source localisation.

To support simple headtracking, C. Frauenberger also created the ARHeadTracker ap-

plication, also available as a SuperCollider3 Quark.

31 http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php32 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html

Page 93: Science by Ear Diss DeCampo

80

5.5.3 Handling speaker imperfections

All standard spatialisation techniques work best when speaker setups are as symmetrical

and well-controlled as possible. While it may not always be feasible to adjust mechan-

ical positions of speakers freely for very precise geometry, a number of factors can be

measured and compensated for, and this is supported by several utility classes written in

SuperCollider, which are part of the SonEnvir framework.

Latency

The Latency class plays a test signal for a given number of audio channels, and waits for

the signals to arrive back at an audio input. The resulting list of measured per-channel

latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class

described below.

Spectralyzer

While inter-speaker latency differences are well-known and very often addressed, we have

found another common problem to be more distracting for multichannel sonification:

Each individual channel of the reproduction chain, from D/A converter to amplifier,

cable, loudspeaker, and speaker mounting location in the room, can sound quite different.

When changes in sound timbre can encode meaning, this is potentially really confusing!

To address this, the Spectralyzer class allows for simple analysis of a test signal as

played into a room, with optional smoothing over several measurements, and then tuning

compensating equalizers by hand for reasonable similarity across all speaker channels.

SpeakerAdjust

Once one has achieved usable EQ curves for every speaker channel, one can begin

to compensate for volume differences between channels (with big timbral differences

between channels, measuring volume or adjusting it by listening is rather pointless).

The SpeakerAdjust class expects specifications for relative amplitude, (optionally) delay

time, and (optionally) as many parametric EQ bands as needed for each channel. Thus, a

speaker adjustment can be created that runs at the end of the signal chain and linearizes

the given speaker setup as much as possible; of course, adding limiters for speaker and

listener protection can be built into such a master effects unit as well.

Page 94: Science by Ear Diss DeCampo

Chapter 6

Examples from Sociology

Though sociology has early on been considered a promising field of application (Kramer

(1994a)), sonification to date is not widely known within the social sciences. Thus, one

purpose of collaborating with sociologists was to raise the awareness of the potential

benefits sonification can bring to social research.

Three sonification designs and their research context are described and analysed as case

studies here: the FRR Log Player, Wahlgesange (election/whale songs), and the Social

Data Explorer.

Social (or sociological) data generally show characteristics that make them promising

for sonification: They are multi-dimensional, and they usually depict complex relations

and interdependencies (de Campo and Egger de Campo (1999)). We consider the

application of sonification to data depicting historical (or geographical) sequences as

the most promising area within the social sciences. The fact that sound is inherently

time-bound is an advantage here, because sequential information can be conveyed very

directly by mapping the sequences on the implicit time axis of the sonification.

In fact, social researchers are very often interested in events or actions in their temporal

context. The importance of developmental questions is even growing due to the glob-

alized notion of social change. Sequence analysis, the field methodologically concerned

with these kinds of questions, assembles methodologies that are by now rather estab-

lished, like event history analysis, and appropriate techniques to model causal relations

over time (Abbott (1990, 1995); Blossfeld et al. (1986); Blossfeld and Rohwer (1995)).

Like most methods of quantitative (multivariate) data analysis, sequence analysis meth-

ods need to be based on an exploratory phase. The quality of the analysis process as

a whole depends critically on the outcome of this exploratory phase. As the amount of

social data is continuously increasing, effective exploratory methods are needed to screen

these data. On higher aggregation levels (such as global, or UN member states level),

social data have both a time (e. g. year) and a space dimension (e. g. nation) and thus

can be understood both as time and geographical sequences. The use of sonification to

explore data of social sequences was the main focus of the sociological part within the

81

Page 95: Science by Ear Diss DeCampo

82

SonEnvir project.

6.1 FRR Log Player

An earlier stage of this work was described in detail in a poster for ICAD 2005 (Daye

et al. (2005)), it is briefly documented in the SonEnvir sonification data collection here1,

and the full code example is available from the SonEnvir code repository here2.

Researchers in social fields, be they sociologists, psychologists or design researchers,

sometimes face the problem of studying actions in an area which is not observable for

ethical reasons. This was especially true in the context of the RTD project Friendly

Rest Room FRR3 (see Panek et al. (2005), which was partly funded by the European

Commission. The project’s aim was to develop an easy to use toilet for older persons,

and persons with (physical) disabilities. In order to meet that objective, an interdisci-

plinary consortium was set up, bringing together specialists of various backgrounds like

industrial design, technical engineering, software engineering, user representation, and

social scientists.

In the final stage of the FRR project, a prototype of this toilet was installed at a day

care center for patients with multiple sclerosis (MS) in Vienna, in order to validate the

design concept in daily life use. The sonification design described here was intended

for sonifying the log data gathered during this validation phase, because difficulties had

arisen with these analyses. Being unable, for ethical reasons, to gather observational

data, these log data are the only way to understand the actions taken by the user. The

FRR researchers are interested in these data because they provide information on the

user’s interaction with the given technical equipment, and thus on the usability and

everyday usefulness of the toilet system.

6.1.1 Technical background

The guests of this day care center are patients with varying degrees of Multiple Sclerosis

(MS); some need support from nurses when using the toilet while others can use it

independently. Due to security considerations as well as for pragmatical reasons, not

all components developed within the FRR-project were selected for this field test (see

Panek et al. (2005)). The main features of the installed conceptual prototype are:

• Actuators to change the height of the toilet seat, ranging from 40 to 70 cm.

1http://sonenvir.at/data/logdata1/2https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/FRR Logs/3http://www.is.tuwien.ac.at/fortec/reha.e/projects/frr/frr.html

Page 96: Science by Ear Diss DeCampo

83

• Actuators to change the seat’s tilt, ranging from 0 to 7 degrees forward/down.

• Six buttons on a hand-held remote control to use these actuators: toilet up, toilet

down, tilt up, tilt down, as well as flush and alarm triggers.

• Two horizontal support bars next to the toilet that can be folded up manually.

• A door handle of a new type which is easier to use for people with physical dis-

abilities was mounted on the outside of the entrance door.

Figure 6.1: The toilet prototype system used for the FRR field test.

Left to right: the door with specially designed handle, the toilet prototype as installed at the

day center, and an illustration of the tilt and height changing functionality.

As direct observation of the users’ interaction with the toilet system was out of the

question, sensors were installed in the toilet area that continuously logged the current

status of the toilet area. These sensors recorded:

• the height of the toilet seat (in cm, one variable),

• the tilt of the toilet seat (in degree, one variable),

• the status of the remote control buttons (pressed/not pressed, six variables),

• the status of the entrance door (open/not open, one variable); and,

• the presence of RFID tagged smart cards (RFID mid range technology) near the

toilet seat to identify any persons present. The guests and the employees of the

day care center were provided with such smart cards, and an RFID module in the

toilet area registered the identities of up to four cards simultaneously.

Page 97: Science by Ear Diss DeCampo

84

The log data matrix recorded from these sensor data is quite unusual for sociological

data, due to its time resolution of about 0.1 sec maximum (which is high for social data),

and the sequential properties of the information captured by the data. One log entry

consists of about 25 variables, of which 11 are relevant for our analysis: A timestamp

for when an entry was logged, and the ten variables described above. Of these eleven

variables, seven are binary. Each log file records the events of one day. In case there is

no event for a longer time (e.g. during the night), a ’watchdog’ in the logging software

creates a blank event every 18 minutes to show the system is still on.

In order to use these log files to understanding what the users did, we needed to recon-

struct sequences of actions of a user based on the events registered by the sensors. The

technical data had to be interpreted in terms of users’ interaction with the equipment;

otherwise the toilet prototype could not be evaluated. The technical data themselves are

not sufficient for a validation, as we need to validate whether or not the proposed techni-

cal solution results in an improvement of the users’ quality of life, which is the eventual

social phenomenon of interest here. Due to the sequential nature of the information

contained in the log files, established routines from multivariate statistics could not be

applied, as they usually do not consider the fundamental difference of data composed of

events in temporal sequence.

6.1.2 Analysis steps

Graphical Screening

On a graphical display (which is what the FRR researchers used), it is not at all easy to

follow the sequential order of the events, above all because such a sequence consists of

several variables. Yet, as the first step of analysis, we relied on graphs with the purpose

on identifying episodes. An episode in our context is defined by a single user’s visit to

the toilet. A prototypical minimum episode consists the following logged events:

door open

door close

tilt down (multiple events)

tilt up (multiple events)

button flush

door open

Note that, in this specific episode, the height and the tilt of the toilet bowl are adjusted

via remote control by the user. Still, this episode is a very simple chain of events. Most

of the logged events for tilt down and tilt up result only from the weight of the person

sitting on the toilet seat.

Page 98: Science by Ear Diss DeCampo

85

Figure 6.2: Graphical display of one usage episode (Excel).

The first step in analysing the data material was to use graphical displays to look for

sections that could be identified as one user’s visit to the toilet prototype, and to chunk

the data into such episodes, which formed our new entities of analysis. The episode

displayed graphically in figure 6.2 is an example for a very simple, single episode. It is

obvious that the graph is not easy to interpret due to its complexity (possibly additionally

complicated on black and white printouts). The sequential character of the events can

be read visually, if not very comfortably: One can see that the starting event is that the

door opens, and then closes; followed by the event that the toilet bowl tilts forward (the

tilting degree grows). We can assume that the person is now sitting on the toilet. Then

the height is adjusted, and the tilt as well. After the tilt returns to a lower value (we can

assume the weight has been removed, so we can infer that the person has stood up),

the flush button is pressed, and the door opens and closes again. The other variables

remain unchanged.

Investigating patterns of use

The FRR researchers were not interested primarily in the way a single person behaves in

the Friendly Rest Room, but rather whether different groups of people would be found

who, for instance due to similar physical limitations, show similarities in interacting with

the given technical equipment. Such typical action patterns of various user groups,

are interesting to cross-reference with data from other sources: Characteristics like sex,

weight, age of a person, her/his physical and cognitive limitations, additional informa-

Page 99: Science by Ear Diss DeCampo

86

tion like whether s/he is using a wheelchair, or crutches, are important to deepen the

interpretation and allow for causal inferences. For this purpose, an identification module

was mounted behind the cover of the water container of the FRR prototype, which was

intended to recognize users wearing RFID tags.

To give just one example how user identification can help: usually, people who use

wheelchairs need more time than non-wheelchair users to open a door, enter the room

and close the door again. This is partly because of the need to manoeuvre around the

door when sitting in a wheelchair, mainly because standard door handles are hard to use,

especially when, as is the case with MS, people have restricted mobility in their arms.

Thus, if an analysis shows that the time needed by wheelchair users to enter the room is

on average shorter than with a standard door, one can conclude that the FRR-designed

door handle is a usability improvement for wheelchair users.

Similarly, one can identify further patterns of use and possibly relate them to user char-

acteristics as mentioned above. However, these patterns are not only important for the

evaluation of the equipment, but also for figuring out user IDs that were accidentally

not recorded.

Comparing anonymous episodes with patterns

Unfortunately, RFID tag recognition only worked within a range of about 50cm around

the toilet, and so not every person using the toilet was identified. Thus there are

anonymous episodes which cannot be related to personal data from other sources. From

a heuristic perspective, these anonymous data are nearly useless. As this applies for 53

% of the 316 episodes, this was a serious concern for the validity of the results.

Thus it was decided to study the episodes of identified users in order to find patterns that

may allow for eventual identification of anonymous episodes. For some of the anonymous

episodes, direct identification was possible. For others, most likely for users who did not

use the prototype often, we could rely on conjecture based on what we could derive from

the episodes of identified users. By comparing with the patterns identified in step 2, we

made use of the ’anonymous’ episodes we analysed them by approaching the problem

with empirically found categories.

6.1.3 Sonification design

The repertoire of sounds for the FRR Log player sonification design is:

• Door state (open or closed) is represented by coloured noise similar to diffuse

ambient outside noise; this noise plays when open and fades out when closed.

• Button presses on the remote control for height and tilt change (up or down) play

short glissando events, up or down, identified for height or tilt by different basic

pitch and timbre.

Page 100: Science by Ear Diss DeCampo

87

• Alarm button presses are rendered by a doorbell-like sound - this button is mostly

used to inform nurses that assistance is needed; use for emergency is rare.

• Flush button presses are represented with a decaying noise burst.

• Continuous state of height and tilt are both represented as soft background drones;

when their values change, they move to the foreground, and when their values are

static, they recede into the background again.

This design mixes discrete-event sonification (marker sounds for the button presses) and

continuous sonification (tilt and height).

Figure 6.3: FRR Log Player GUI and sounds mixer.

6.1.4 Interface design

The graphical user interface shown in 6.3 provides visual feedback, and possibilities for

interaction:

A button allows for selection of different episode files to sonify; it shows the filename

when a file has been selected. If a user ID tag has been recorded in the log, that is shown.

Page 101: Science by Ear Diss DeCampo

88

For playing the sequence, Start/Stop buttons and a speed control are provided. Speed

is the most useful control, as different patterns may appear on different timescales. A

mixer for the levels of all the sound components is provided, and for tuning the details

of each sound, can be called up from a button (”px mixer”). This ProxyMixer window

allows for storing all tuning details as code scripts, so that useful settings can be fully

documented, communicated and reproduced.

The binary state variables are all visually displayed as buttons, and allow for triggering

from the GUI: A button for the door, and buttons for remote buttons turn red when

activated in the log. When they are pressed on the GUI, they play their corresponding

sound, so users can learn the repertoire of symbolic sounds very quickly.

The continuous variables are all displayed: time within the log as hours:minutes:seconds;

height and tilt of the seat as labeled numbers and as a line with variable height and tilt.

Finally, the last 5 and the next 5 events in the log are shown as text; this was very useful

for debugging, and it provided an extra layer of available information to the users of the

sonification design.

6.1.5 Evaluation for the research context

For the research context these data came from, this sonification design was successful:

It represented time sequence data with several parallel streams of parameters, and events

to be detected efficiently, and it was straightforward to learn and use.

The researchers reported being able to use rather high speedups, and being able to

achieve good recognition of different user categories. In fact, the time scaling was

essential for understanding the meaning behind the sequential order and timing of events.

Especially the times between events, the breaks, were instructive as they possibly point

to problems of the user with the equipment to be evaluated.

In short, the sonification design solved the task at hand more efficiently than the other

tools previously used by the researchers.

6.1.6 Evaluation in SDSM terms

Within the subset of 30 episodes used for design development (out of 316), the longest

is 1660 lines, and covers 32 minutes, while the shortest ones are ca. 180 lines, and 5

minutes. The SDSM Map shows data anchors for this variety of files, and marks for three

different speedups of these 2 example files, original speed (x1) and speedups of x10 and

x100. At lower speeds, one can leave the continuous sounds (tilt and height) on, while

at high speedups, the rendering is clearer without them. The 8 (or 6 at higher speeds)

data properties used are actually rendered technically as parallel streams; whether they

are perceived as such is a question of the episode under study, the listening attitude, and

Page 102: Science by Ear Diss DeCampo

89

the playback speed. For example, one could listen to each button sound indivually, but

usually the timed sequence of button presses would be heard as one stream of related,

but disparate sound events.

Figure 6.4: SDS Map for the FRR Log Player.

While this design was created before the SDSM concept existed, it conforms to all basic

SDSM recommendations, as well as secondary guidelines.

Time is represented as time; time scaling as the central SDSM navigation strategy is

available as a user interaction parameter. Thus, users can experiment freely with different

time scales to bring different event patterns into focus; in SDSM terms, the expected

’gestalt number’ (here, the data time rescaling) can be adjusted to fit into the echoic

memory time frame.

This is supported here by adaptive time-scaling of the sound events: as time is sped up,

sound durations shorten by the square root of the speedup factor (see below). Recorded

(binary) events in time are represented by simple, easily recognized marker sounds; they

either sound similar to the original events (the flush button, alarm bell), or they employ

straightforward metaphors consistently (glissando up is up both for tilt or height), thus

minimizing learning load.

Continuous state is represented by a background drone, which is turned louder when

changes happen; this jumping to the foreground amplifies the natural listening behavior

of ’tuning out’ constant background sounds, and being alerted when the soundscape

Page 103: Science by Ear Diss DeCampo

90

changes. For higher speedups, researchers reported that they often turned these com-

ponents off completely, so the option to let users do that quickly was useful.

The time scaling of marker sounds is handled in a way that can be recommended for

re-use: Constant sound durations create too much overlap at higher speeds, while pro-

portional scaling to the speedup factor deforms the symbolic marker sounds too much

for easy recognition. So, the strategy invented for this sonification was to scale the du-

rations of the marker sounds by 1/(timeScalescaleExp); scaleExp values being between

0.0 (no duration scaling) and 1.0 (fully match sequence time scaling). For the time

scaling range desired here, 1 to 100, scaling sound durations by the power of 0.5, i.e.

the square root, has turned out to work well: the sounds are still easily recognized as

transformations of their original type, and one can still follow dense sequences well.

6.2 ’Wahlgesange’ - ’Election Songs’

This work is also described in de Campo et al. (2006a), and in the SonEnvir data

collection here4. The SC3 code for running this design can be downloaded from the

SonEnvir svn repository here5. It was designed by Christian Daye and the author, and

it is based on an example for the Model-Based Sonification concept called ’Sonogram’

described by Hermann (2002); Hermann and Ritter (1999) (not to be confused with

standard spectrograms or medical ultrasound-based imaging). The code example by Till

Bovermann is available here6.

With the sonification design presented here, we can explore geographical sequences. As

a straightforward and familiar example for social data with geographical distributions,

we use election results; in particular, from the Austrian province of Styria, for provincial

parliament elections in 2000 and 2005, and the national parliament election in 20067.

Our interest focused on displaying social data both in their geographical distribution, and

at a higher spatial resolution than usual. Whereas most common displays of social data

focus on the level of districts (here, 17), we wanted to design a sonification that displays

spatial distances and similarities in the election results among neighboring communities.

The mind model is that of a journey through Styria. A journey can be defined as the

transformation of a spatial distribution into a time distribution. A traveler who starts at

4http://sonenvir.at/data/wahlgesaenge/5 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/ElectionsDistMap/6 http://www.techfak.uni-bielefeld.de/ tboverma/sc/tgz/MBS Sonogram.tgz7Styria is one of nine federal states in Austria. It consists of 542 communities grouped in 17 districts,

and about 1 190 000 people live here. In autumn 2005, more than 700 000 Styrian voters elected their

political representatives. The result of this election was politically remarkable: the ruling conservative

party OVP (Osterreichische Volkspartei: Austrian People’s Party) has been defeated for the first time

since 1945 by the left social-democratic party SPO (Sozialdemokratische Partei Osterreichs: Social-

Democratic Party of Austria).

Page 104: Science by Ear Diss DeCampo

91

community A passes first the neighbouring communities, and the longer she is on the

way the more space is between her and community A. Hence, in this sonification, the

spatial distances between communities are mapped onto the time axis.

6.2.1 Interface and sonification design

The communities are displayed in a two-dimensional window on a computer screen (see

figure 6.5). For each community, the coordinates of the community’s administrative

offices were determined and used as the geographical reference point of the respective

community. The distances as well as the angles within our data thus correspond with

the real distances and angles between the communities’ administrative offices.

Figure 6.5: GUI Window for the Wahlgesange Design.

The left hand panel allows switching between different election results (and district/community

levels of aggregation), and between the parties to listen to. It also allows tuning some param-

eters of the sonification, and it displays a short description of the closest ten communities.

The maps window shows a map of Styria with the community borders; this map is the clicking

interface.

This sonification design depends strongly on user interaction: like most Model-Based

Sonifications, it needs to be played, like to a musical instrument; without user actions,

Page 105: Science by Ear Diss DeCampo

92

there is no sound. Clicking the mouse anywhere in the window initiates a circular wave

that spreads in two-dimensional space. The propagation of this wave is shown on the

window by a red circle. When the wave hits a data point, this point begins to sound in

a way that reflects its data properties. In our case, these data properties are the election

results within each community. Thus, the user first hears the data point nearest to

the clicking point, from the proper spatial direction, with pitch being controlled by the

turnout percentage of the currently selected party in that community (high pitch being

high percentage); then the result for the second-nearest community, and so on. The

researcher can select different parties to listen to their results from the election under

study.

Further, the researcher can choose a direction in which to look and listen. In figure 6.5,

this direction is North, indicated by the soft radial line within the circular wave. The

line begins at the point where the researcher has initiated the wave, to provide visual

feedback while listening, and keeping a trace of which initial location the current sound

was generated for. Data points along this line are heard from the front, others are

panned to their appropriate directions. While this sonification was designed for a ring of

twelve speakers surrounding the listener, it can be used with standard stereo equipment

as well: For stereo headphones, one changes to a ring of four, and listens to the front

two channels. Then, data points along the main axis are heard from the center, those

on the left (or right) are panned accordingly, 90 degrees being all the way left or right8.

The points at more than 90 degrees off axis progressively fade out, and those above 135

degrees off axis are silent.

The GUI provides the following sonification parameter controls:

A distance exponent defines how much the loudness for a single data point decreases

with increasing distance. For 2D spaces, 1/distance is physically correct, but stronger

or weaker weightings are interesting to experiment with.

The velocity of the expanding wave in km/second. The default of 50 km/sec scales the

entire area (when played from the centre) into a synakusis-like time scale of 3 seconds.

Slower or higher speeds can be experimented with to zoom further in or out.

The maximum number of communities (Orte in German) that will be played. Selecting

only the nearest 50 or so data points allows for exploration of smaller areas in more

detail.

The decay time for each individual sound grain. At higher speeds, shorter decays create

less overall overlap, and thus provide more clarity; for smaller sets and slower speeds,

longer decay times allow for more detailed pitch perception and thus higher perceptual

8Note that for stereo speakers at +-30 degrees, the angles within +-90 degrees are scaled together

to +-30 degrees - which we find preferable to keeping the angles intact and only hearing a 60 degree

’slice’ of all the data points, which could be done by leaving the setting at 12 channels, and only using

the first 2.

Page 106: Science by Ear Diss DeCampo

93

resolution.

The direction in which the wave is looking; in the sound, this determines which direction

will be heard from the front. The direction can be rotated through North, West, South

and East.

For more detail information, the ten data point locations nearest to the clicked point are

shown on a list view.

6.2.2 Evaluation

This sonification design is a good tool for outlier analysis. It works rather fast at a low

level of aggregation (communities), and outliers are easily identified by tones that are

higher than their surroundings. Typically, these are local outliers: in an area that has a

local average value of say 30%, you can hear a 40% result ’sticking out’; when analysing

the entire dataset statistically, this may not show up as an outlier.

A second strong feature is the ability to get a quick impression of distributions of a data

dimension with their spatial order intact, so achieving the tricky task of developing an

intuitive grasp of the details of one’s data becomes more likely.

This sonification design is not restricted to election data: Other social indicators that are

assessed at the community level (unemployment rates, labor force participation rate of

women, and others) can be included. To represent them in conjunction with e.g. election

results promotes the investigation of local dependencies that might be hidden by higher

aggregation levels or by the mathematical operations of correlation coefficients.

Finally, this sonification design is of course not restricted to the geographical borders of

Styria. It can be used as an exploratory tool enabling researchers to quickly scan social

data in their geographical distribution, at different aggregation levels. Given an inter-

esting question to address at such higher levels, an adaptation to different geographical

scales, i.e. European and global data distributions is straightforward to do, e.g. with

nations as the aggregation entity.

When considered from an SDSM perspective, this sonification design respects a number

of SDSM recommendations: It shows the important role of interaction design, while the

sound aspect of the sonification design itself remains rather basic. It also shows the

central importance of time scaling/zooming between overview and details; in fact this

design was the source for recommending this particular time-scaling strategy within the

SDSM concept. The design also demonstrates metaphorical simplicity recommended by

SDSM. An SDSM graph shows that the sonification can render one data property of

the entire set within echoic memory time frame, and zoom into more detail by selecting

subsets, or by slowing down the propagation speed.

Page 107: Science by Ear Diss DeCampo

94

Figure 6.6: SDS-Map for Wahlgesange.

6.3 Social Data Explorer

6.3.1 Background

This sonification design is a study for mapping geographically distributed multidimen-

sional social data to a multiparametric sonification, i.e. a classical parameter mapping

sonification. It offers a number of interaction possibilities, so that sociologists (the

intended user group) can experiment with changing the mappings freely. This serves

both for learning sonification concepts by experimentation and for finding interesting

mappings, for instance, mappings that confirm known correlations between parameters.

The example data file contains the distribution of the working population of all 542

communities in Styria by sectors of economic activities, given in table 6.1.

This data file is quite typical for geographically distributed social data.

6.3.2 Interaction design

A number of interactions can be accessed from the user interface shown in figure 6.7:

’Order’ allows sorting by a chosen parameter (alphabetically or numerically); ’up’ is

ascending, ’dn’ is descending. The number-box is for choosing one data item to inspect

Page 108: Science by Ear Diss DeCampo

95

Table 6.1: Sectors of economic activities

Agrarian, Wood-, and Fishing Industries

Mining

Production of commodities

Energy and Water Industries

Construction

Trade

Hotel and Restaurant Trade

Traffic and Communication

Credit and Insurance

Realty, Company Services

Public Administration, Social Security

Pedagogy

Health, Veterinary, and Social Services

Other Services

Private Households

Exterritorial Organisations

First-time seeking work

by index in the sorted data, so e.g. 0 is the first data point of the current sorted order.

Every parameter of the sonification can be mapped by using the elements of a ’mapping

line’: For every synthesis or playback parameter, users can select a data dimension. The

data range in minimum and maximum values is displayed. The data can have a ’warp’

property, i.e. whether the data should be considered linear, exponential, or have another

characteristic mapping function.

The arrow-button below ’pushes’ the range of the current data dimension to the editable

number boxes, as this is the data scaling range (’mimax’) to use for parameter mapping.

This range can be adjusted, in case this becomes necessary to experiment with a specific

hypothesis. The second ’mimax’ range is the Synth parameter range, which is adjustable,

as is the warp factor. Here, the arrow-button also pushes in the default parameter values.

The ’range’ display that follows shows the default synthesis parameter range (e.g. 20-

20000 for frequency), and the popup menu under ’Synth Param’ shows the name of the

parameter chosen for that mapping line.

Setting ’playRange’ determines the range of data point indices to play within the current

sorted data, with 0 being the first datapoint. ’post current range’ posts the current

range in the current order.

The final group of elements, labeled ’styrData’, allows for starting and stopping the

Page 109: Science by Ear Diss DeCampo

96

Figure 6.7: GUI Window for the Social Data Explorer.

The top line of elements is used for sorting data by criteria. The five element lines below are for

mapping data dimensions to synth parameters, and scaling the ranges flexibly. The bottom line

allows selecting a range of interest within the sorted data, and sonification playback control.

sonification playback.

6.3.3 Sonification design

The sonification design itself is quite a simple variant of discrete-event parameter map-

ping. Three different synthesis processes (’synthdefs’) are provided, all with control

parameters for freq, amp, pan, sustain. The synthdefs mainly vary in the envelope

they use (one is quasi-gaussian, the other two percussive), and in the panning algo-

rithm (’sineAz’ is for multichannel ring-panning). Which of these sounds is used can be

changed in the code.

The player algorithm iterates over the chosen range of data indices. It maps the values

of each data item to values for the synthesis event’s parameters, based on the current

mapping choices. If nothing is chosen for a given synthesis parameter, a default value is

used (e.g. for duration of the event, 0.1 seconds).

Page 110: Science by Ear Diss DeCampo

97

6.3.4 Evaluation

For experimenting with parameter mapping sonification, this design allows for similar

rendering complexity as the Sonification Sandbox (Walker and Cothran (2003)), though

without parallel streams of sounds. Both the user interface and the sonification itself are

sketches rather than polished applications, e.g. the user interface could allow loading

data files, switching between instruments, and derive its initial display state from the

current state of the model.

Given more development time, it would benefit from multiple and more complex sound

functions, from making more functionality available from GUI elements, and from fuller

visual representation of the ongoing sonification. Nevertheless, according to the sociol-

ogist colleague who experimented with it, it supported exploration of the particular type

of data file well enough to confirm its viability.

While we intended to experiment with designs bridging between the Wahlgesange design

and the Social Data Explorer, this was not pursued, mainly due to time constraints, and

because other ventures within the SonEnvir project were given higher priority.

Page 111: Science by Ear Diss DeCampo

Chapter 7

Examples from Physics

In the course of the SonEnvir project, we began with sonifications of quantum spectra,

and later decided to shift the focus to statistical spin models as employed in computa-

tional physics, for various reasons given below.

Sonification has been used in physics rather intuitively, without referring to the term

explicitly. The classical examples are the Geiger counter and the Sonar, both monitoring

devices for physical surroundings. An early example of research using sonification is the

experiment of the inclined plane by Galileo Galilei. Following Drake (1980), it seems

plausible that Galilei used auditory information to verify the quadratic law of falling

bodies (see chapter 3, and figure 3.1.1). In reconstructing the experiment, Riess et al.

(2005) found that time measuring devices of the 17th century (water clocks) were almost

certainly not precise enough for these experiments, while rhythmic perception was.

In modern physics, sonification has already played a role: one example of audification is

given in a paper by Pereverzev et al., where quantum oscillations between two weakly

coupled reservoirs of superfluid helium 3 (predicted decades earlier) were found by lis-

tening: ”Owing to vibration noise in the displacement transducer, an oscilloscope trace

[...] exhibits no remarkable structure suggestive of the predicted quantum oscillations.

But if the electrical output of the displacement transducer is amplified and connected to

audio headphones, the listener makes a most remarkable observation. As the pressure

across the array relaxes to zero there is a clearly distinguishable tone smoothly drifting

from high to low frequency during the transient, which lasts for several seconds. This

simple observation marks the discovery of coherent quantum oscillations between weakly

coupled superfluids.” (Pereverzev et al. (1997))

Next to sonification methods in physics, physics methods found their way into sonifica-

tion, as in the model-based sonification approach by Hermann and Ritter (1999). For

example, in so called data sonograms, physical formalisms are used to explore high-

dimensional data spaces; an adaptation of the data sonogram approach has been used

in the ’Wahlgesange’ sonification design described in section 6.2.

98

Page 112: Science by Ear Diss DeCampo

99

Physics and sonification

In physics, sonification has particular advantages. First of all, modern particle physics

is usually described in a four-dimensional framework. For a three dimensional space

evolving in time, a complete static visualisation is not possible any more. This makes

it harder to understand and thus very abstract - thus in both didactics and research,

sonification may be useful. In the auditory domain, many sound parameters may be used

to display a four-dimensional space, maintaining symmetry between the four dimensions

by comparing different rotations of their mappings. A feature of auditory dimensions

that has to be taken into account is that these dimensions are generally not orthogonal,

but could rather be compared to mathematical subspaces (see Hollander (1994)). This

concept is very common in physics, and thus easily applicable.

Furthermore in physics, many phenomena are wave phenomena happening in time, just

as sound is. Thus sonification provides a very direct mapping. While scientific graphs

usually map the time direction of physical phenomena onto a geometrical axis, this is

not necessary in a sonification, where physical time persists, and multiple parameters

may be displayed in parallel.

While perceptualisation is not intended to replace classical analytical methods, but rather

to complement them, there are examples where visual interpretation is superior to or at

least preceding mathematical treatment. For instance, G. Marsaglia (2003) describes a

battery of tests for the quality of numerical random number generators. One of these is

the parking lot test, where mappings of randomly filled arrays in one plane are plotted and

visually searched for regularities. He argues that visual tests are striking, but not feasible

in higher dimensions. As nothing is known beforehand about the nature of patterns that

may appear in less than ideal random number generators, there is no all-encompassing

mathematical test for this task. Sonification is a logical continuation of such strategies

which can be applied with multidimensional data from physical research contexts.

The major disadvantage of sonification we encountered is that physicists (and probably

natural scientists in general) are not familiar with it. Visualisation techniques and our

learnt understanding of them has been refined since the beginnings of modern science.

For auditory perception especially, we were e.g. confronted with the opinion that the

hearing process is just a Fourier transformation, and could be fully replaced by Fourier

analysis. This illustrates that much work is required before sonification becomes standard

practice in physics.

Page 113: Science by Ear Diss DeCampo

100

7.1 Quantum Spectra Sonification1

Quantum spectra are essential to understand the structure and interactions of composite

systems in such fields as condensed matter, molecular, atomic, and subatomic physics.

Put very briefly, quantum spectra describe the particular energy states which different

subatomic particles can assume; as these cannot be observed directly, competing models

have been developed that predict the precise values and orderings of these energy levels.

Quantum spectra provide an interesting field for auditory display due to the richness

of their data sets, and their complex inner relations. In our experiments (’us’ refer-

ring to the physics group within SonEnvir), we were concerned with the sonification of

quantum-mechanical spectra of baryons, the most fundamental particles of subatomic

physics observed in nature. The data under investigation stem from different competing

theoretical models designed for the description of baryon properties. This section reports

our attempts at finding valid and useful strategies for displaying, comparing and explor-

ing various model predictions in relation to experimentally measured data by means of

sonification. We investigated the possibilities of sonification in order to develop them as

a tool for classifying and explaining baryon properties in the context of present particle

theory.

Baryons - most prominently among them the proton and the neutron - are considered

as bound systems of three quarks, which are presently the ultimate known constituents.

The forces governing their properties and behaviour are described within the theory of

quantum chromodynamics (QCD). While up to now this theory is not yet exactly solvable

for baryons (at low and intermediate energies), one resorts to effective models, such as

constituent quark models (CQMs).

CQMs have been suggested in different variants. Existing models differ mainly in which

components they consider to constitute the forces binding the constituent quarks: All

models include a so called confinement component - as the distance between quarks

expands, the forces between them grow, which keeps them confined - and a hyperfine

interaction, which models interactions between quarks by particle exchange. As a result

there is a variety of quantum-mechanical spectra for the ground and excited states of

baryons. The characteristics of the spectra contain a wealth of information important

for the understanding of baryon properties and interactions. Baryons are also classified

by the combinations of quarks they are made up of, and by a number of other properties

such as color, flavor, spin, parity, and angular momentum, which can be arranged in

symmetrical orders. For more background in Constituent Quark Models and baryon

classification, please refer to Appendix C.1.

1This section is based on material from two SonEnvir papers: de Campo et al. (2005d) and de Campo

et al. (2006a).

Page 114: Science by Ear Diss DeCampo

101

7.1.1 Quantum spectra of baryons

The competing CQMs produce baryon spectra with characteristic differences due to the

different underlying hyperfine interactions. In figure 7.1 the excitation spectra of the

nucleon (N) and delta (∆) particles are shown for three different classes of modern

relativistic CQMs. While the ground states are practically the same (and agree with

experiments) for all CQMs, the excited states show different energies and thus level

orderings. (For instance, in the OGE CQM the first excitation above the N ground state

is JP = 1−

2, whereas for the GBE CQM it is JP = 1+

2.) Evidently the predictions of the

GBE CQM reach the best overall agreement with the available experimental data.

Figure 7.1: Excitation spectra of N (left) and ∆ (right) particles.

In each column, the three entries left to right are the energies (in MeV, or Mega-electronVolts)

based on One-Gluon exchange (Eidelman (2004)), Instanton-induced (Glozman et al. (1998);

Loering et al. (2001)), and Goldstone-Boson Exchange (Glantschnig et al. (2005)) constituent

quark models. The shaded boxes represent experimental data, or more precisely, the ranges of

imprecision that measurements of these data currently have (Eidelman (2004)).

7.1.2 The Quantum Spectra Browser

Sonifying baryon mass spectra

The baryon spectra as visualised by patterns such as in Fig. 7.1 allow a discrimination of

the qualities of the CQM description of experiment. Also one can read off characteristic

features of the different CQMs such as the distinct level orderings, etc. However, it

is quite difficult to conjecture specific symmetries or other relevant properties in the

dynamics of a given CQM by just looking at the spectra. Thus, there are a number

of open research questions where we expected sonification to be helpful. We began by

identifying phenomena that are likely to be discernible in sonification experiments:

Page 115: Science by Ear Diss DeCampo

102

• Is it possible to distinguish e.g. the spectrum of an N 1+

2nucleon from, say, a delta

∆3+

2by listening only?

• Is there a common family sound character for groups of particles, or for entire

models?

• In the confinement-only model, the intentionally absent hyperfine interaction causes

data points to merge into one: is this clearly audible?

We studied the sonification of baryon spectra with three specific data sets. They contain

the N as well as ∆ ground state and excitation levels for three different dynamical

situations: 1) the GBE CQM (Glozman et al. (1998)), 2) the OGE CQM (Theussl et al.

(2001)), and 3) the case with confinement interaction only, i.e., omitting the hyperfine

interaction component. Each one of these data files is made up of 20 lists, and each

list contains the energy levels of a particular N as well as ∆ multiplet JP . The lists are

different in length: Depending on the given JP multiplet they contain 2 - 22 entries,

since we only take into account energy levels up to a certain limit.

Sonification design

For sonification of baryon spectra, the most immediately interesting feature is the level

spacing. The quantum-mechanical spectrum is bounded from below and its absolute

position is fixed by the N ground state (at 939 MeV); above that, spectral lines up to

ca 3500 MeV appear for the excited states in the spectrum of each particle. As the

study of these level spacings depends on the precise nature of the distances between

these lines within and across particles, a sonification design demands high resolution for

that parameter; thus we decided to map these differences between the energy levels to

audible frequencies.

Several mapping strategies were tried for an auditory display of the spacings between

the energy levels in the spectra: I) Mapping the mass spectra to frequency spectra

directly, with tunable transposition together with optional linear frequency shift and

spreading, and II) Mapping the (linear) mass spectra to a scalable pitch range, i.e.

using perceptually linear pitch space as representation. Both of these approaches can be

listened to as simultaneous static spectra (of one particle at a time) and as arpeggios

with adjustable temporal spread against a soft background drone of the same spectrum.

Interface design

These models are implemented in SuperCollider3 scripts; for more flexible browsing, a

GUI was designed (see figure 7.2). All the tuneable playback settings can be changed

Page 116: Science by Ear Diss DeCampo

103

while playing, and they can be saved for reproducibility and an exchange of settings

between researchers. Some tuning options have been included in order to account for

known data properties: E.g., the values calculated for higher excitations in the mass

spectra are considered to be less and less reliable; we modeled this with a tuneable slope

factor that reduces amplitude of the sounds representing the higher excitation levels in

all models.

Figure 7.2: The QuantumSpectraBrowser GUI.

The upper window allows for multiple selection of particles that will be iterated over in 2D

loops; or alternatively, for direct playback of that particle by clicking. The lower window is for

tuning all the parameters of the sonification design interactively.

For static data like these, flexible, interactive comparison between different subsets of the

data is a key requirement; e.g. in order to find out whether discrimination by parity P is

possible with auditory display, one will want to automatically play interleaved sequences

alternating between selected particles with positive and negative parities.

The Quantum Spectra Browser window allows for the following interactions:

The buttons Manual, Autoplay choose between manual mode (where click on a button

switches to the associated sound) and an autoplay mode that iterates over all the selected

Page 117: Science by Ear Diss DeCampo

104

particles, either horizontally (line by line) or vertically (column by column). The buttons

LoopStart, LoopStop stop and start this automatic loop; the numberbox stepTime sets

for how many seconds each spectrum is presented. The three rows of buttons below

Goldstone, OneGluon, Confinement allow for playing individual spectra, or for a multiple

selection of which particles are heard in the loop.

The QSB Sound Editor allows for setting many synthesis/spatialisation parameters:

fixedFreq sets the freqency that corresponds to ground state; the default value is 939

Hz (for 939 MeV). fRangScale rescales the frequency range the other energy levels are

mapped into: a scale of 1 is original values, 2 expands to twice the linear range. As this

distorts proportions, we mostly left this control at 1. transpose transposes the entire

spectrum by semitones, so a value of -24 is two ocatves down. This leaves proportions

intact, and many listeners find this frequency range more comfortable to listen to. slope

determines how much the frequency components for higher energy levels are attenuated;

this models the decreasing validity of higher energy levels. 0 is full level, 0.4 means each

line is softer by a factor of 1 − 0.4 than the previous line. (The frequency-dependent

sensitivity of human hearing is compensated for separately using the AmpComp unit

generator).

panSpread sets how much spectral lines are separated spatially. With a spread of 1, and

stereo playback, the ground state is all the way left, and the highest excited state is

all right; less than 1 means they are panned closer together. When using multichannel

playback, this can expand over a series of up to 20 adjacent channels. panCenter sets

where the center line will be panned spatially - 0 is center, -1 is all left, 1 is all right.

The remaining parameters tune the details of an arpeggiation loop: essentially, a loop of

spread-out impulses excites the spectral lines individually, and they ring until a minimum

level is reached. ringTime determines how long each component will take to decay (RT

for -60dB) after an impulse. bgLevel maintains presence of the entire spectrum as one

gestalt: the sprectal line sounds will only decay to this minimum level and remain at that

level. attDelay determines when within the loop the first attack will play. attSpread

determines how spread out the attacks will be within the loop time; within the loop the

first attack will play. loopTime determines the time for one cycle of impulses.

7.1.3 The Hyperfine Splitter

Addressing a more subtle issue, we then designed a Hyperfine Level Splitter, which allows

for studying the so called splittings of the energy levels due to a variable strength of the

hyperfine interaction inherent in the CQMs. The hyperfine interaction is needed in order

to describe the binding of three quarks more realistically, i.e. in closer accordance with

experimental observation. When it is absent (in simulations), certain quantum states

are degenerate, meaning that the corresponding energy levels of some particles coincide.

Page 118: Science by Ear Diss DeCampo

105

In the first demonstration example, we chose the excitation levels of two different particles

(the Neutron n-1/2+ and the Delta d3/2+), calculated within the same CQM, the

Goldstone-Boson Exchange model (gbe) Glozman et al. (1998). These two particles are

degenerate when there is no hyperfine interaction present.

Sonification design

Mapped into sound, this means that one hears a chord of three tones for the ground

states and the first two excitation levels, which are the same for both particles. Here,

auditory perception is more difficult than in the Quantum Browser, as the mass spectra

are being played as continuous chords, and the hyperfine interaction may be ’turned up’

gradually (to 100 percent). Thereby, the energy levels are pulled apart, and one hears a

complex chord of six tones. The two particles that are compared can be distinguished

acoustically now, as when they are observed in experiments. With the Level Splitter, the

dynamical ingredients leading to these energy splittings may be studied in detail, and

likewise the quantitative differences in distinct CQMs.

The underlying sonification design is an extension of that for the Quantum Browser.

Mainly, some parameters are added to control the number of spectral lines to be rep-

resented at once, and a balance control between the simultaneous or interleaved two

channels that are compared.

Interface design

The Hyperfine Data Player window allows for the following interactions: The sets of

pop-up menus labeled left and right select which model (GBE, OGE), which particle

(Nukleon, Delta, etc.), which state (1/2, 3/2 etc), and which parity (+, -, both) is

chosen for sonification in that audio channel. The slider percent determines where

to interpolate between the model points of choice and their corresponding points in

the Confinement-only model; this is where the hyperfine interaction component to the

model can be gradually turned on or off. The graphical view below, labeled l3, l2, l1 -

l1, l2, l3 shows the precise values for the first several energy states of the two particles

chosen. The very bottom is ground state (939MeV), the visible range above goes up

to 3500. In the state shown, a so called ’level crossing can be seen (and heard): level

3 of GBE nucleon 1/2 (both parities) crosses below level 2; by comparison, in OGE,

the same particle has monotonically ascending spectral energy states. The bottom row

of buttons stops and starts the sonification, posts the current interpolated values, and

recalls a number of prepared demo settings.

The Hyperfine Editor allows for setting many synthesis/spatialisation parameters fa-

miliar from the Quantum Spectra Browser, as well as several more: balance sets the

balance between left and right channels. bgLevel sets the minimum level for arpeggiated

Page 119: Science by Ear Diss DeCampo

106

Figure 7.3: The Hyperfine Splitter GUI.

The left window is for selecting two particles by model, particle name, spin, and parity; the

hyperfine component is faded in and out with the slider in the middle. The bottom area shows

the audible spectral lines central-symmetrically. The window on the right side is the editor for

the synthesis and spatialisation parameters of the sonification design.

settings, as above. brightness adds harmonic overtones (by frequency modulation) to

the individual lines so that judging their pitch becomes easier. pitStretch rescales the

pitch range the other energy levels are mapped into: a scale of 1 is original values, 2

expands to twice the intervallic range. (This is different from fRangScale above, which

used linear scaling). transpose transposes the entire spectrum by semitones, as above.

melShift determines when within the loop the second channel’s attack will play relative

to the first. 0 means they play together, 3 means they are equally spaced apart (by 3 of

6 subdivisions); the maximum of 6 plays them in sync again. melSpread determines how

much the attacks within one channel are arpeggiated; 3 means such that they appear

equally spread in time. Together, these two controls allow alternating the two spectra

as groups, or interleaving the individual spectral lines across the two spectra. ringAtt

determines how fast the attack times are for both channels. ringDecay sets the decay

time for the spectral lines for both channels. nMassesL, nMassesR is handled automati-

cally when changing particle types and properties. This is the number of masses audible

in the spectrum, which can be reduced by hand if desired. ampGrid sets the volume for

a reference grid (clicks) which can be turned on for orientation.

Page 120: Science by Ear Diss DeCampo

107

7.1.4 Possible future work and conclusions

At this point, the physicists that had worked closely with these data in their own research

agenda unfortunately had to leave the project, which meant that this line of experiments

came to an end before more interesting ideas could be experimented with. These were

intended to explore a number of further aspects that may ultimately be relevant in the

scientific study of particle physics by sonification, and for completeness, these ’loose

ends’ are given here.

Comparison with experimental data

As can be seen seen from figure 7.1, there are several experimental data available for the

energy levels. However, they are affected by experimental uncertainties. Consequently,

their auditory display needs some adaptations. We intended to differentiate between

(sharp) theoretical data as deduced from the CQMs and (spread) phenomenological

data measured in experiment by adding narrow band modulation to spread-out data

bands. It should be quite interesting to qualify the theoretical predictions vis-a-vis the

experimental data.

Representing symmetries with spatial ordering

Much effort has gone into finding visual representations for the multiple symmetries

between particle groups and families. Arranging the sound representations in 3D-space

with virtual acoustics in a spatial order determined by symmetry properties between

particle groups may well be scientifically interesting; navigating such a symmetry space

could become an experience that lets physicists acquire a more intuitive notion of the

nature of these symmetries.

Temporal aspects

There are plenty of interesting time phenomena in the quantum physics, which could

be made use of in numerous ways in further explorations. For example, there is an

enormous variation in the half-life of different particles. This could be expressed quite

directly in differentiated decay times for different spectral lines. In addition, including the

probabilities for transitions between excited states and ground states will open promising

possibilities for demonstrating the dynamical ingredients in the quark interactions inside

baryons.

Page 121: Science by Ear Diss DeCampo

108

Conclusions

Our investigations have indicated that sonification is an interesting alternative and a

promising complementary tool for analysing quantum-mechanical data. While many

interesting design ideas came up in this line of research, which may well be useful for

other contexts, the implemented sonification designs were not fully tested by domain

experts in this quite specialised field. Given motivated domain science research partners,

a number of good candidates for sonification approaches remain to be explored further

in the context of quantum spectra.

Page 122: Science by Ear Diss DeCampo

109

7.2 Sonification of Spin models2

Spin models provide an interesting test case for sonification in physics, as they model

complex systems that are dynamically evolving and not satisfactorily visualisable. While

the theoretical background is largely understood, their phase transitions have been an

interesting subject of studies for decades, and results in this field can be applied to many

scientific domains. While most classical methods of solving spin models rely on mean

values, their most important feature, especially at the critical point of phase transition,

are the spin fluctuations of single elements.

Therefore we started out with the fluctuations of the spins, and provided auditory in-

formation that can be analysed qualitatively. The goal was to display three-dimensional

dynamic systems, distinguish the different phases and study the order of the phase tran-

sition. Audification and sonification approaches were implemented for the spin models

studied, so that both realtime monitoring of the running model and analysis of pre-

recorded data sets is possible. Sound examples of these sonifications are described in

Appendix C.2.

7.2.1 Physical background

Spin systems describe macroscopic properties of materials (such as ferromagnetism) by

computational models of simple microscopic interactions between single elements of the

material. The principal idea of modeling spin systems is to study a complex system in a

controlled way, where they are theoretically tractable, and mirror the behaviour of real

compounds.

From a theoretical perspective, these models are interesting because they allow studying

the behaviour of universal properties in certain symmetry groups. This means that some

properties do not depend on details like the kind of material, such as so-called order

parameters giving the order of the phase transition. Already in 1945, E. A. Guggenheim

(cited in Yeomans (1992)) found that the phase diagrams of eight different fluids he

studied shows the very same coexistence curve3. A theoretical explanation is given by

a classification in symmetry groups – all of these different fluids belonged to the same

mathematical group.

2This section is based on a SonEnvir ICAD conference paper, Vogt et al. (2007)3This becomes apparent when plotted in so-called reduced variables, the reduced temperature being

T/Tcrit, the actual temperature relative to the critical one, and pressure is treated likewise.

Page 123: Science by Ear Diss DeCampo

110

7.2.2 Ising model

One of the first spin models, the Ising model, was developed by Ernst Ising in 1924 in

order to describe a ferromagnet. Since the development of computational methods, this

model has become one of the best studied models in statistical physics, and has been

extended in various ways.

Figure 7.4: Schema of spins in the Ising model as an example for Spin models.

The lattice size here is 8 x 8. At each lattice location, the spin can have one of two possible

values, or states (up or down).

Its interpretation as a ferromagnet involves a simplified notion of ferromagnetism.4 As

shown in figure 7.4, it is assumed that the magnet consists of simple atoms on a quadratic

(or in three dimensions cubic) lattice. At each lattice point an atom (here, a magnetic

moment with a spin of up or down) is located. In the computational model, neighbouring

spins try to align to each other, because this is energetically more favorable. On the

other hand, the overall temperature causes random spin flips. At a critical temperature

Tcrit, these processes are in a dynamic balance, and there are clusters of spins on all

orders of magnitude. If the temperature is lowered from Tcrit, one spin orientation will

prevail. (Which one is decided by the random initial setting.) Macroscopically, this is

the magnetic phase (T < Tcrit). At T > Tcrit, the thermal fluctuations are too strong

for uniform clusterings of spins. There is no macroscopic magnetisation, only thermal

noise.

4There are many different application fields for systems with next neighbour interaction and random

behaviour. Ising models have even been used to describe social systems, as e.g. in P. Fronczak (2006),

though this is a disputed method in the field.

Page 124: Science by Ear Diss DeCampo

111

7.2.3 Potts model

A straightforward generalisation of this model is the admission of more spin states than

just up and down. This was realized by Renfrey B. Potts in 1952, and was accordingly

called the Potts model. Several other extensions of models were studied in the past.

We worked with the q-state Potts model and its special case for q = 2, the Ising model,

both being classical spin models. For mathematical background, see Appendix C.2.

The order of the phase transition is defined by a discontinuity in the derivates of the free

energy (see figure 7.5). If there is a finite discontinuity in one of the first derivatives,

the transition is called first order. If the first derivatives are continuous, but the second

derivatives are discontinuous, it is a so-called continuous phase transition.

Figure 7.5: Schema of the orders of phase transitions in spin models.

The mean magnetisation is plotted vs. decreasing temperature. (a) shows a continuous phase

transition and (b) a phase transition of first order. In the latter, the function is discontinuous

at the critical temperature. The roughly dotted line gives an approximation on a finite system,

e.g. a computational model. The bigger the system, the better this approximation models the

discontinuous behaviour.

Nowadays, spin models are usually simulated with Monte Carlo algorithms, giving the

most probable system states in the partition function (Yeomans, 1992, p. 96). We

implemented a Monte Carlo simulation for an Ising and Potts model in SuperCollider3

(see figure 7.2.3). The lattice is represented as a torus (see fig. 7.8) and continually

updated: for each lattice point, a different spin state is proposed, and the new overall

energy calculated. As shown in equation C.3, it depends on the neighour’s interactions

(SiSj) and the overall temperature (given by the coupling J ∼ 1/T ). If the new energy is

smaller than the old one, the new state is accepted. If not, there is still a certain chance

that it is accepted, leading to random spin flips representing the overall temperature.

To observe the model and draw conclusions from it, usually mean values of observables

are calculated from the Monte Carlo simulation, e.g. the overall magnetisation. The

simulation needs time to equilibrate at each temperature in order to model physical

Page 125: Science by Ear Diss DeCampo

112

reality, e.g. with small or large clusters. Big lattices with a length of e.g. 100 need

many equilibration steps. With a typical evolution of the model, critical values or the

order of the phase transition can be deduced. This is not rigorously doable, as on a finite

lattice a function will never be continuous, compare figure 7.5. In a quantised system,

the ”jump” in the observable will just look more sudden for a first order phase transition.

This last point is both an argument for using sonification and a research goal for this

study: by using more information than the mean values, the order of the phase transition

can be more clearly distinguished. Also, we studied different phase transitions with the

working hypothesis that there might be principal differences in the fluctuations, which

can be better heard. (A Potts model with q ≤ 4 states has a continuous phase transition,

whereas with q ≥ 5 states it has a phase transition of first order.) Thus researchers may

gain a quick impression of the order of the phase transition.

Implementing spin models

In all the analytical approaches, the solving procedures of models are based on abstract

mathematics. This gives great insight in the universal basics of critical phenomena, but

often a quick glance on a graph complements classical analysis, as mentioned above.

Thus in areas where visualisation cannot be done, applying sonification can help to reach

an intuitive understanding with relatively few underlying assumptions. Sonification tools

can also serve as monitoring devices for highly complex and high dimensional simulations.

The phases and the behaviour at the critical temperature can be observed. Finally, we

were particularly interested in sonification of the critical fluctuations with self-similar

clusters on all orders of magnitude.

We wanted to provide for a more or less direct observation of data on all levels of the

analysis, both to verify assumptions and to not overlook new insights. This should be

done by observing the dynamic evolution of the spins, not only mean values. Thus,

the important characteristic of spin fluctuations can be studied and the entire system

continuously observed.

Spin model data features

Spin models have several basic characteristics, which were used in different sonifica-

tion approaches. These properties refer to the structure of the model, the theoretical

background and its interpretation, and they were exploited for the sonification as follows:

• The models are discrete in space by fixed lattice positions and these are filled with

discrete valued spins. The data sets are rather big, on the order of a lattice size

of 100 in two or three dimensions, and are dynamically evolving. Because of the

Page 126: Science by Ear Diss DeCampo

113

Figure 7.6: GUI for the running 4-state Potts Model in 2D.

The GUI shows the model in a state above critical temperature, where large clusters emerge.

The lattice size is 64x64. The averages below the spin frame show the development of the mean

magnetisation for the 4 spin parities over the last 50 configurations. As the temperature is

constant and the system has been equilibrated before, these mean values are rather constant.

Page 127: Science by Ear Diss DeCampo

114

specifics of the modeling, the simulations are only correct on the statistical aver-

age, and many configurations have to be taken into account together for correct

interpretation. Considering that a single auditory event has to have some mini-

mum duration to display perceptually distinguishable characteristics, we explored

two options for the auditory display: a fast audification approach, and omission,

i.e. representing only a subset of all spins, using a granular approach.

• The models are calculated by next-neighbour interaction aligning the spins on the

one hand, and random fluctuations on the other. We aimed to preserve the next-

neighbour property at least partially by different strategies of moving through the

data frame: either along a conventional torus path, or along a Hilbert-curve, see

fig. 7.8 (in approaches 7.2.4, 7.2.5 and 7.2.7). For the lossy (omission) approach,

the statistical nature of the model was preserved by picking random elements for

the granular sonification.

• There is a global symmetry in the spins, thus - in the absence of an exterior

magnetic field - no spin orientation is preferred. This was mapped for the Ising

model by choosing the octave for the two spin parities. In the audifications, every

spin orientation is assigned a fixed value, and symmetry is preserved as the sound

wave only depends on the relative difference between consecutive steps in the

lattice.

• At the critical point of phase transition, the clusters of spins become self-similar

on all length scales. We tried to use this feature in order to generate a different

sound quality at the point of phase transition. This would allow a clear distinc-

tion between the two phases and the (third) different behaviour at the critical

temperature itself.

7.2.4 Audification-based sonification

In this approach, we tried to utilise the full available information generated by the model.

As the Sonification Design Space Map suggests audification for higher density auditory

display, we interpreted the spins within each time instant as a waveform (see figure 7.7).

This waveform can be listened to directly or taken as a modulator of a sine wave.5 When

the temperature is lowered, regular clusters emerge, changing only slowly from time step

to time step. Thus, if the audification preserves locality, longer structures will emerge

aurally as well, resulting in more tone-like sounds. When one spin dominates, there is

silence, except for some random thermal fluctuations at non-zero temperature.

5While this would not qualify as an audification by the strictest definition, such a simple modulation

is still conceptually quite close.

Page 128: Science by Ear Diss DeCampo

115

Figure 7.7: Audification of a 4-state Potts model.

The first 3 milliseconds of the audio file of a model with 4 different states in the high temper-

ature phase (noise).

Figure 7.8: Sequentialisation schemes for the lattice used for the audification.

The left scheme shows a torus sequentialisation, where spins at opposed borders are treated

as neighours. This treats a 2D-grid like a torus (a doughnut shape), as row by row is read.

On the right side a Hilbert curve is shown.

While fig. 7.7 explains handling of one line of data for the sonification, the question

remains how to move through all of them. Different approaches of sequentialisation

are shown in fig. 7.8. The model has periodic boundary conditions, so a torus path is

possible. We also experimented with moving through the lattice along a Hilbert curve.

This is a space filling curve for quadratic geometries, reaching every point without inter-

secting with itself. This was intended to make the audification insensitive to differences

which arise depending on whether rows or columns are read first, which can occur in the

case of symmetric clustering. Eventually, it turned out that symmetric clustering mainly

depends on unfavorable starting conditions and occurs only rarely, so we mostly used a

torus path, as the model does in the calculation.

The sounds were recorded directly from the interactive model, using the GUI shown in

fig. 7.2.3 for a specific temperature. In order to judge the phase of the system, this

simple method is most efficient.

Page 129: Science by Ear Diss DeCampo

116

At the time of recording, the model has already been equilibrated - its state represents

a typical physical configuration for the specific temperature. When the temperature is

cooled down continually, the system needs several transition steps at each new temper-

ature before the data represents the new physical state correctly. Thus, in a second

approach, data was pre-recorded and stored as a sound-file. Contrary to our assump-

tions, the continuous phase transition is not very clearly distinguishable from the first

order phase transition. This is partly due to the data - on a quantised lattice there

are no truly continuous observables, so the distinction between first and second order

transitions is fuzzy in principle.

A fundamental problem is that the equilibration steps (which are not recorded!) between

the stored configurations cut out the meaningful transitions between them: That these

equilibration steps are needed at all is in fact a common drawback in the established

computational spin models. When one considers every complete lattice state as one

sequence of single audio samples (e.g. 32x32 = 1024 lattice sites), then with a sampling

frequency of 44100 Hz, at every 23 ms a potentially completely different state is rendered,

instead of a continuously evolving system with only few changes in the cluster structures

from one frame to the next. This makes it more difficult to understand the dynamic

evolution of the transitions. We tried to leave out as few equilibration steps as possible

to stick closely to a physical relevant state and still keep the transitions understandable.

Consequently, we recorded e.g. for a 32x32-lattice every 32nd step, and on the whole

10 different couplings (temperatures), every 32 times. Thus, our soundfiles (described

in appendix C.2) have (32 x 32) lattice sites x 10 couplings x 32 record steps = 327680

samples, and last 7.4 s. Still, when comparing a 4-state Potts model to one with 5 spin

states, the change in the audio pattern is only slightly more sudden in the latter.

7.2.5 Channel sonification

We refined the audification approach by recording data for each spin separately. This

concept is shown in figure 7.9. All of the lattice is sequentialised like a torus (see fig.

7.8) and read out for every spin state separately. When data of spin A is collected, only

lattice sites with spin A are set to 1; all the others to 0. On the contrary, when spin

B data is collected, all lattice sites with spin A are set to 0, and spin B to 1; and so forth.

Thus, the different spins are separate and can be played on different channels. One

remaining problem is that the channels are highly correlated: in the Ising model with only

2 states, the 2 channels are exactly reciprocal. Thus there may be phase cancellations

in the listening setup that makes it harder to distinguish the channels. Still, the overall

impression is clearer than the simple audification, and this approach is the most promising

regarding the order of the phase transition.

Page 130: Science by Ear Diss DeCampo

117

Figure 7.9: A 3-state Potts model cooling down from super- to subcritical state.

The three states are recorded as audio channels, shown here with time from left to right.

Toward the end, channel 2 dominates.

7.2.6 Granular sonification

In this approach, the data were pre-processed, which allowed for designing less fatiguing

sounds. Also, more sophisticated considerations can be included in the sonification

design.

In a cloud sonification we first sonified each individual spin as a very short sound grain,

and played them at high temporal density. A 32x32 lattice (1024 points) can be played

within one second, and allowing some overlap, this leaves on the order of 3 ms for each

sound grain. One second is a longer than desirable time for going through one entire

time instant, but this is simply a trade-off between representing all the available data

for that time instant, and moving forward in time fast enough. For bigger lattices, this

approach is too slow for practical use.

Thus the next step was calculating local mean values. We took random averaged spin

blocks in the Ising model6, see figure 7.10, so the data was pre-processed for the soni-

fication, and we did not use all available information. At first, for each configuration a

6In this sonification we stayed with the simpler Ising model due to realtime CPU limitations, but the

results do transfer to the Potts model.

Page 131: Science by Ear Diss DeCampo

118

Figure 7.10: Granular sonification scheme for the Ising model.

The spatial location of each randomly chosen spin block within the grid determines its spa-

tialisation, and its averaged value determines pitch and noisiness of the corresponding grain.

few lattice sites are chosen; then for each site, the average of its neighbouring region

is calculated, giving a mean magnetic moment between −1 (all negative) and +1 (all

positive); 0 meaning the ratio of spins is exactly half/half. This information is used

to determine the pitch and the noisiness of a sound grain. The more the spins in one

block are alike, the clearer the tone (either lower or higher), the less alike, the noisier

the sound. Location of the block in 3D space is given by spatial position of the sound

grain.7 The soundgrains are very short and played quickly after one another from differ-

ent virtual regions. With this setting, a three-dimensional gestalt of the local state of a

cubic lattice is generated around and above the listener.

Without seeing the state of the model, a clear picture emerges from the granular sound

7This spatial aspect can only be properly reproduced with a multi-channel sound system. We

adapted the settings for the CUBE, a multi-functional performance space with a permanent multi-

channel system at the IEM Graz. Using the VirtualRoom class described in section 5.5, one can also

render this sonification for headphones.

Page 132: Science by Ear Diss DeCampo

119

texture, and also untrained listeners can easily distinguish the phases of the model.

7.2.7 Sonification of self-similar structures

To study a detail aspect of the above approach, we looked at self-similar structures

at the point of phase transition by sonification. Music has been considered to exhibit

self-similar structures, beginning with (Voss and Clarke (1975, 1978)); later on, the

general popularity of self-similarity within chaos theory has also extended to computer

music, and the hypothesis that self-similar structures may be audible has led to a lot of

experimentation and compositions with such conceptual background.

In internal listening tests we tried to display structures on several orders of magnitude in

parallel. These were calculated by a blockspin transformation, which returns essentially

the spin orientation of the majority of points in a region of the lattice. It was our goal to

make such structures of different orders of magnitude recognisable as similarly moving

melodies, or as a unique sound stream with a special sound quality.

Figure 7.11: A self similar structure as a state of an Ising model.

This is used as a test case for detecting self similarity. Blockspins are determined by the

majority of spins of a certain region.

In our design, three orders of magnitude in the Ising model were compared to each

other, as shown in figure 7.11. The whole lattice (on the right side - with the least

resolved blockspins) was displayed in the same time as a quarter of the middle and as an

eighth of the left blockspin spin structure (second on the left side). The original spins

are shown on the left. Comparing three simultaneous streams for similarities turned out

to be a demanding cognitive task: Trying to follow three streams and comparing their

melodic behaviour at the same time is not trivial, even for trained musicians. Thus

we experimented with an alternative: the three streams representing different orders of

magnitude are interleaved quickly. When the streams are self-similar, one only hears a

single (random) stream; as soon as one stream is recognisably different from the others,

a triple grouping emerges. While this method works well with simple test data as shown

Page 133: Science by Ear Diss DeCampo

120

in fig. 7.11, we could not verify self-similarities in noisy data of running spin models. We

suspect that self-similar structures do not persist long enough for detection in running

models, but for time reasons did not pursue this further.

7.2.8 Evaluation

Domain Expert Opinions

A listening test with statistical analysis was not appropriate as there were not enough

subjects available familiar with researching spin models. Thus, as a qualitative evaluation

we obtained opinions from experts in the field. These were four professors of Theoretical

Physics in Graz, who were not directly involved in the sonification designs. They were

explained the results and given a few questions on the applicability and usefulness of the

results.

The overall attitude may be summed up as curious but rather sceptical, even if the opin-

ions differed in the details. Asked whether they themselves would use the sonifications,

all of them answered they would do so only for didactic reasons or popular scientific

talks. The possibility of identifying different phases was acknowledged but was not seen

as superior to other methods (e.g. studying graphs of observables, as would be the stan-

dard procedure). One subject remarked that, for research purposes, the ’aha-moment’

was missing. This might be due to the fact that the Ising and Potts model have both

been studied for decades and are well understood. While the data is mainly thermal

noise, there is only little information to extract. Our sonifications reveal no new physi-

cal findings for the models we chose. A three dimensional display seems interesting for

the experts, even if the dimensions are not experienced explicitly (in the audification

approach there is a sequentialisation for displaying one dimension) and the sound grain

approach as implemented only applies for three physical dimensions.

Another application that was discussed is a quick overview over large data sets: e.g.

checking numerical parameters (that there are enough equilibration steps, for instance)

or getting a first impression of the order of the phase transition. This seems plausible

to all subjects, even if the standard procedure, e.g. a program for pattern recognition,

would still be equivalent and - given the familiarity with such tools - preferable to them.

The main point of criticism was the idea of a qualitative rather than quantifiable approach

towards physics, which is seen as a possible didactics tool but ’not hard science’. General

sonification problems were discussed as well: it was noted that visualisation techniques

play a more and more important role in science, and that they are tough competitors.

Also for state of the art of publishing, sonification is at a disadvantage.

Besides this expected scepticism, it can be remarked that all subjects heard immediately

the differences in the sound qualities. Metaphors to the sounds came up spontaneously

during the introduction, as e.g. boiling water for the point of phase transition. The

Page 134: Science by Ear Diss DeCampo

121

experts came up with several ideas for future projects to discuss; this kind of interest is

an encouraging form of feedback.

Conclusions and Possible Future Work

Spin models are interesting test cases for studying sonification designs for running mod-

els. We implemented both Monte Carlo simulations of Potts and Ising models and

sonification variants in SuperCollider3. These models produce dynamically evolving data

with their main characteristics being fluctuations of single spins; although analytically

well defined, finite computational models can only reproduce a numerical approximation

of the predicted behaviour, which has to be interpreted.

A number of different sonifications were designed in order to study different aspects

of these spin models. We created tools for the perceptualisation of lattice calculations

which are extensible to higher dimensions and a higher number of states. They allow both

observing running models, and analysing pre-recorded data to obtain a first impression

of the order of the phase transition.

Experimenting with alternative sonification techniques for the same models, we found

differing sets of advantages and drawbacks: Granular sonification of spin blocks gives

a reliable classification of the phase the system is in, and allows to observe running

simulations, using the random behaviour of spin models. Audification based tools allow

us to make use of all the available data, and even track each spin orientation separately

in parallel. This tool is used to study the order of the phase transition. Additionally, we

worked on sonifications of self similar structures.

With this study, sonification was shown to be an interesting complementary data repre-

sentation method for statistical physics. Useful future directions for extending this work

would include increased data quality and choices of different input models, which would

lead to classification tools for phase transitions that allow studying models of higher

dimensionality. Continued work in this direction could lead to applications in current re-

search questions in the field of computational physics. The research project QCDAudio

hosted at IEM Graz with SonEnvir participant Kathi Vogt as lead researcher will explore

some of these directions.

Page 135: Science by Ear Diss DeCampo

Chapter 8

Examples from Speech Communication and

Signal Processing

The Signal Processing and Speech Communication Laboratory at TU Graz focuses on

research in the area of non-linear signal processing methods, algorithm engineering and

applications thereof in speech communication and telecommunication. After investigat-

ing sonification approaches to the analysis of stochastic processes and wave propagation

in ultra-wide-band communication (briefly mentioned in de Campo et al. (2006a)), the

focus for the last phase in SonEnvir was on the analysis of time series data.

In signal processing and speech communication, most of the data under study are se-

quences of values over time. There are many properties of time series data that interest

the researcher. Besides analysis in the frequency domain, the statistical distribution of

values provides important information about the data at hand. With the Time Series

Analyser, we investigated the use of sonification in analysing the statistical properties

of amplitude distributions in time series data. From the domain science’s point of view,

this can be used as a method for the classification of signals of unknown origin, or for the

classification of surrogate data to be used in experiments in telecommunication systems.

8.1 Time Series Analyser1

The analysis of time series data plays a key role in many scientific disciplines. Time

series may be the result of measurements, unknown processes or simply digitised signals

of a variety of origins. Although usually visualised and analysed through statistics, the

inherent relationship to time makes them particularly suitable for a representation by

means of sound.

1This section is based on the SonEnvir ICAD paper Frauenberger et al. (2007).

122

Page 136: Science by Ear Diss DeCampo

123

8.1.1 Mathematical background

The statistical analysis of time series data is concerned with the distribution of values

without taking into account their sequence in time. As we will see later, changing the

sequence of values in a time series completely destroys the frequency information while

keeping the statistical properties intact. The most well known statistical properties of

time series data is the arithmetic mean (8.1) and the variance (8.2).

x =1

n

n∑i=1

xi (8.1)

σ2 =1

n

n∑i=1

(xi − x)2 (8.2)

However, higher order statistics provide more properties of time series data, describing

the shape of the underlying probability function in more detail. They all derive from the

statistical moments of a distribution defined by

µ′

n =n∑i=1

(xi − α)nP (x) (8.3)

where n is the order of the moment, α the value around which the moment is taken

and P (x) the probability function. The moments are most commonly taken around the

mean, which is equivalent to the first moment µ1. The second moment around the mean

(or second central moment) is equivalent with the variance σ2 and hence, the squared

standard deviation σ.

Higher order moments define the skewness and kurtosis of the distribution. The skewness

is a measurement for the asymmetry of the probability function, meaning a distribution

has high skewness if its probability function has a more pronounced tail toward one end

than to the other. The skew is defined by

γ1 =µ3

µ322

(8.4)

with µi being the i − th central moment. The kurtosis describes the ’peakedness’ of a

probability function; the more pronounced peaks there are in the probability function,

the higher the kurtosis of the distribution. It is defined by

β2 =µ4

µ22

(8.5)

Both values distinguish time series data and are significant properties in signal processing.

From the SDSM point of view, the inherent time line and the typically large numbers of

data values in time series data suggest the use of the most direct approach to auditory

perceptualisation - audification. When interpreted as a sonic waveform the statistical

Page 137: Science by Ear Diss DeCampo

124

properties of time series data become acoustical dimensions, which may be perceived:

The variance corresponds directly to the power of the signal, and hence (though non-

linearly) to its perceived loudness. The mean, however, is nothing more than an offset

and is not perceivable. The question of interest is whether the skewness and the kurtosis

of signals can be related to perceptible dimensions as well.

8.1.2 Sonification tools

In order to investigate the statistical properties of time series data by audification, we

first developed a simple tool that allows for defining arbitrary probability functions for

noise. Subsequently, we built a more generic analysis tool that makes it possible to

analyse any kind of signal. This tool was also used as the underlying framework for the

experiment described in section 8.2.

8.1.3 The PDFShaper

The PDFShaper is an interactive audification tool that allows users to draw probability

functions and listen to the resulting distribution as an audification in real-time. Figure

8.1 shows the user interface.

PDFShaper provides four graphs (top down): the probability function, the mapping func-

tion, the measured histogram and the frequency spectrum of the time series synthesised

as specified by the probability function. The tool allows the user to interactively draw

in the first graph to create different kinds of amplitude distributions. It then calculates

a mapping function which is defined by

C(x) =1

g(x)=

∫ x

0

P (t)dt (8.6)

where C(x) is the cumulative probability function and g(x) is a mapping function that if

applied to a uniform distribution y produces values according to the probability function

P (t). This mapping function essentially shapes values from a uniform distribution to

any desired probability function P (t).

In the screenshot shown, the probability function is drawn into the top graph as a shifted

exponential function. After applying the mapping function shown in the second graph to

white noise, the third graph shows the real-time histogram of the result. It approximately

resembles the target probability function. Note that both skew and kurtosis are relatively

high in this example as the probability function is shifted to the right and has a sharp

peak.

Page 138: Science by Ear Diss DeCampo

125

Figure 8.1: The PDFShaper interface

8.1.4 TSAnalyser

The TSAnalyser is a tool to load any time series data and analyse its statistical properties.

Figure 8.2 shows the user interface.

Besides providing statistical information about the file loaded (aiff format) it shows a

histogram and a spectrum. Its main feature is to be able to ”scramble” the signal.

Page 139: Science by Ear Diss DeCampo

126

Figure 8.2: The TSAnalyser interface

That is, it randomly re-orders the values in the time series and hence, destroys all

spectral information. When analysing amplitude distributions, the spectral information

is often distracting. Scrambling a signal will result in a noise-like sound with the same

statistical properties as the original. In the screenshot the loaded file is a speech sample

that comes with every SuperCollider installation. When scrambled, the spectrum at the

bottom shows an almost uniform distribution in the frequency domain.

Both PDFShaper and TSAnalyser are implemented in SuperCollider, and available as

part of the SonEnvir Framework by svn here2.

2https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/

Page 140: Science by Ear Diss DeCampo

127

8.2 Listening test

The experiment described here was designed to investigate whether the higher order

statistical properties of arbitrary time series data are perceptible when rendered by au-

dification. If so, what are the perceptual dimensions that would correlate to these

properties, and what are the just noticeable difference levels?

8.2.1 Test data

The first challenge in designing the experiment was to create appropriate data. They

should not contain any spectral information and the statistical properties should be

fully controllable, ideally independently. Unfortunately, it is a non-trivial task to define

probability functions with certain statistical moments, as this is an ill-defined problem.

We settled on a random number generator for the Levi skew alpha-stable distribution

Wikipedia (2007). It was chosen because it features parameters that directly control the

resulting skew and kurtosis which also can be made atypically high. It is defined by the

probability function

f(x;α, β, c, µ) =1

∫ + inf

− inf

φ(t)e−itxdt (8.7)

φ(t) = eitµ−|ct|α(1−iβsign(t)Φ) (8.8)

Φ = tan(πα

2) (8.9)

Where α is an exponent, β directly controls the skewness and c and µ are scaling

parameters. There is no analytic solution to the integral, but there are special cases in

which the distribution behaves in specific ways. For example, for α = 2 the distribution

reduces to a Gaussian distribution. Fortunately, the Levi distribution was implemented

as a number generator in the GNU Scientific Library GSL, see GSL Team (2007). It

allows for generating sequences of numbers of any length for a distribution determined

by providing the α and β parameters.

For the experiment we generated 24 signals with skew values ranging from -0.19 to 0.25

and kurtosis ranging from 0.17 to 14. It turned out to be impossible to completely de-

couple skew from kurtosis. So, we decided to generate two sets, one that has insignificant

changes in skew, but a range in kurtosis of 0.16 to 14, while the other set covered the full

range for skew and 0.15 to 5 for kurtosis. All signals were normalised to have a variance

of 0.001 and were 3 seconds long (at a samplerate of 44.1 kHz) with 0.2 seconds fade-in

and fade-out times.

Page 141: Science by Ear Diss DeCampo

128

8.2.2 Listening experiment

The experiment was designed as a similarity listening test. Participants were listening to

sequences of three signals and had to select the two signals they perceived as being most

similar. Each sequence was composed of the signal under investigation (each of the 24),

a second randomly chosen signal out of the 24, and the first signal scrambled; the three

signals were in random order. It was pointed out to participants that they will not hear

two exactly identical sounds within the sequence, but they were asked to select the two

that sounded most similar. The signal under investigation and its scrambled counterpart

were essentially different signals, but shared identical statistical properties. It was not

specified which quality of the sound they should listen for to make this decision. This

and the scrambling was done to make sure that participants focus on a generic quality

of the noise rather than specific events within the signals.

After a brief written introduction into the problem domain and the nature of the exper-

iment, participants started off with a training phase of three sequences to learn the user

interface. For this training phase, the signals with the largest differences in skew and

kurtosis were chosen to give people an idea of what to expect. Subsequently, each of the

sets were played; Set one with 9 sequences, Set two with 15. The sequence of the sets

was altered with each participant. Participants were also able to replay the sequence as

often as they wished and adjust the volume to their taste. Figure 8.3 shows the user

interface used.

Figure 8.3: The interface for the time series listening experiment.

A post-questionnaire asked for the sound quality participants used to distinguish the

signals and asked them to assign three adjectives to describe this quality. Furthermore,

participants were asked whether participants could tell any difference between the sets,

and whether they felt there was any learning effect, i.e., whether the task became easier

during the experiment.

Page 142: Science by Ear Diss DeCampo

129

8.2.3 Experiment results

Eleven participants took part in the experiment, most of them working colleagues or

students at the institute. Four participants were members of the SonEnvir team and

had more substantial background on the topic which, however, did not seem to have any

impact on their results.

The collected data shows that there is a significant increase in the probability of choosing

the correct signals as the difference in kurtosis and skew increased. Figure 8.4 shows

the average probabilities in four different ranges of ∆ kurtosis. The skew in this set

was nearly constant (±0.001), so the resulting difference in correct answers is related

to the change in ∆ kurtosis. While up to a difference of 5 in kurtosis the probability

Figure 8.4: Probability of correctness over ∆ kurtosis in set 1

is only insignificantly higher than 0.333 (the probability of random answers), and even

decreases, there is a considerable increase thereafter, topping at over 70% at differences

of around 11. This indicates that 5 is the threshold for just noticeable differences for

kurtosis. This is also supported by the results from set 2 as shown in figure 8.5.

For skewness the matter was more difficult as we had no independent control over

it. Although the data from set 2 suggest that there is an increase in probability with

increasing difference in skew (as shown in figure 8.6), this might also be related to the

difference in kurtosis.

Looking at the probability of correctness over both, the difference in kurtosis and the

Page 143: Science by Ear Diss DeCampo

130

Figure 8.5: Probability of correctness over ∆ kurtosis in set 2

Figure 8.6: Probability of correctness over ∆ skew in set 2

difference in skew (as in figure 8.7) reveals that it is unlikely that the increase is related

to the change in ∆ skewness. While in every spine in which ∆ skew is constant the

Page 144: Science by Ear Diss DeCampo

131

probability increases with increasing ∆ kurtosis, this is not the case vice versa.

Figure 8.7: Probability of correctness over ∆ skew and ∆ kurtosis in set 2

Summarising, we found evidence that participants could reliably detect changes in kur-

tosis greater than 5, but we did not find enough proof for the case of skewness. This

may indicate that we need to use a different dataset which has bigger differences in

skew while having small values for the kurtosis. However, for this another family of

distributions must be found.

The number of times participants used the replay option seemed to have no impact on

their performance. Figure 8.8 shows the number of replays of all data points over ∆

kurtosis. Red crosses indicate correct answers, black dots incorrect answers. Although

participants replayed the sequence more often when the difference in kurtosis was small,

there is no evidence that they were more successful when using more replays.

The answers to the post-questionnaire must be seen in the light of the data analysis

above. The quality participants assessed to drive their decisions must be linked to the

kurtosis rather than skewness in the signal. The most common answers for this quality

were crackling and the frequency of events. Others included roughness and spikes.

However, some participants also stated that they heard different colours of noise and

other artefacts related to the frequency spectrum. This is a common effect when being

exposed to noise signals for a longer period of time. Even if the spectrum of noise

is not changing at all (as in our case), humans often start to imagine hearing tones

Page 145: Science by Ear Diss DeCampo

132

Figure 8.8: Number of replays over ∆ kurtosis in set 2

and other frequency related patterns. Asked for adjectives to describe the quality the

participants provided cracking, clicking, sizzling, annoying, rhythmic, sharp, rough and

bright/dark. In retrospect, this correlates nicely with the kurtosis being the ’peakedness’

of the probability function.

There was no agreement over which set was easier. Most participants said there was

hardly any difference while some would state the one or the other. Finally, on average

people felt that there was no learning curve involved and the examples were short enough

for them not to get too tired over listening to them.

8.2.4 Conclusions

In this section we presented an approach for analysing statistical properties of time series

data by auditory means. We provided some background on the mathematics involved

and presented the tools for audification of time series data that were developed. Subse-

quently, we described a listening test designed to investigate the perceptual dimensions

that would correlate with higher order statistical properties like skew and kurtosis. We

discussed the data chosen and the design of the experiment. The results show that

there is evidence that participants improved in distinguishing noise signals as the dif-

ference in kurtosis increased. The data suggests that in this setting the just noticeable

difference was 5. However, for skew we were not able to find similar evidence. In a

Page 146: Science by Ear Diss DeCampo

133

post-questionnaire we probed for the qualities that participants used to distinguish the

signals and obtained a set of related adjectives.

Future work will have to investigate why there was nothing to be found for skewness

in the signals. It might have been the case that our range of values did not allow

for segregation by skew, and a different source for data will have to be found to have

independent control over skew. However, it might also be the case that skew is not

perceivable in direct audification and a different sonification approach has to be chosen

to make this property perceptible.

In SDSM terms, the listening experiment respected the 3 second echoic memory time

limit, maximising the number of data points to fit into that time frame by audifying at

a samplerate of 44.1 kHz.

Page 147: Science by Ear Diss DeCampo

Chapter 9

Examples from Neurology

9.1 Auditory screening and monitoring of EEG data

This chapter describes two software implementations for EEG data screening and realtime

monitoring by means of sonification. Both have been designed in close collaboration with

our partner institution, the University Clinic for Neurology at the Medical University Graz.

Both tools were tested in depth with volunteers, and then tested with the expert users

they are intended for, i.e. neurologists who work with EEG data daily. In the course

of these tests, a number of improvements to the designs were realised; both the tests

and the final versions of the tools are described in detail here. This scope of reported

work is intended to provide an integrated description and analysis of all aspects of the

design process from sonification design issues, interaction choices, user acceptance, to

steps towards clinical use.

This work is described with much more neurological background in the PhD thesis by

Annette Wallisch (Wallisch (2007), in German). This chapter is based on a SonEnvir

paper for ICAD 2007 (de Campo et al. (2007)), and this work is also briefly documented

online in the SonEnvir data collection1, with accompanying sound examples.

9.1.1 EEG and sonification

As the general background for EEG and sonification is covered extensively in a number

of papers (Baier and Hermann (2004); Hermann et al. (2006); Hinterberger and Baier

(2005); Mayer-Kress (1994); Meinicke et al. (2002)), it is kept rather brief here.

EEG is short for electroencephalogram, i.e. the registration of the electrical signals

coming from the brain that can be measured on the human head. There are standard

systems where to locate electrodes on the head, called montages; e.g., the so-called

10-20 system, which spaces electrodes at similar distances over the head (see Ebe and

1 http://sonenvir.at/data/eeg/

134

Page 148: Science by Ear Diss DeCampo

135

Homma (2002) and many other EEG textbooks).

The signal from a single electrode is often analysed in terms of its characteristic frequency

band components: The useful frequency range is typically given as 1-30 Hz, sometimes

extended a little higher and lower. Within this range, different frequency bands have been

associated with particular activities and brain states; e.g., the ’alpha’ range is between 8

and 13 Hz, associated with a general state of relaxedness, and non-activity of the brain

region for visual perception; thus alpha activity is most prominent with eyes closed.

For both sonification designs presented, we split the EEG signal into frequency ranges

which closely correspond to the traditional EEG bands2, as shown in table 9.1.

Table 9.1: Equally spaced EEG band ranges.

EEG band name frequency range

deltaL(ow) 1 - 2 Hz

deltaH(igh) 2 - 4 Hz

theta 4 - 8 Hz

alpha (+ mu) 8 - 16 Hz

beta 16 - 32 Hz

gamma 32 - 64 Hz

9.1.2 Rapid screening of long-time EEG recordings

For a number of neurological problems, it is standard practice to record longer time

stretches of brain activity. A stationary recording usually lasts more than 12 waking

hours; night recordings are commonly even longer, up to 36 hours. For people with

so-called ’absence’ epileptic seizures (often children), recordings with portable devices

are made over similar stretches of time. These recordings are then visually screened, i.e.

looked through in frames of 20-30 seconds at a time; this process is both demanding

and slow.

For the particular application toward ’absences’, rapid auditory screening is ideal: these

seizures tend to spread over the entire brain, so the risk of choosing only few electrodes

to screen acoustically is not critical; furthermore, the seizures have quite characteristic

features, and are thus relatively easy to identify quickly by listening. For more gen-

eral screening, finding time regions of interest quickly (by auditory screening) potentially

reduces workload and increases overall diagnostic safety. With visual and auditory screen-

ing combined, the risk of failing to notice important events in the recorded brain activity

2The alpha band we employ is slightly wider than the common 8-13 Hz; we merge it with the slightly

higher mu-rhythm band to maintain equal spacing.

Page 149: Science by Ear Diss DeCampo

136

is quite likely reduced.

9.1.3 Realtime monitoring during EEG recording sessions

A second scenario that benefits from sonification is realtime monitoring while recording

EEG data. This is a long-term attention task: an assistant stays in a monitor room next

to the room where the patient is being recorded; s/he watches both a video camera view

of the patient, and the incoming EEG data on two screens. In the event of atypical EEG

activity (which must be noticed, so one can intervene if necessary), a patient may or

may not show peculiar physical movements. Watching the video camera, one can easily

miss atypical EEG activity for a while.

Here, sonification is potentially very useful, because it can alleviate constant atten-

tion demands: One can easily habituate to a background soundscape, which is known

to represent ’everything is normal’. When changes in brain activity occur, the sound-

scape changes (in most cases, activity is increased, which increases both volume and

brightness), and this change in the realtime-rendered soundscape automatically draws

attention.

A sonification design that aims to render EEG data in real time is also useful for studying

brain activity as recorded by EEG devices at its natural speed: One can easily portray

activity in the traditional EEG frequency bands acoustically; as many of the phenomena

are commonly considered to be rhythmical phenomena, auditory presentation is partic-

ularly appropriate here, see Baier et al. (2006). Realtime uses of biosignals have other

applications too, see e.g. Hinterberger and Baier (2005); Hunt and Pauletto (2006).

9.2 The EEG Screener

9.2.1 Sonification design

For rapid EEG data screening, there is little need for an elaborate sonification design.

As the signal to be sonified is a time signal, and a signal speed of several 10000s of

points per seconds is deemed useful for screening, straightforward audification is the

obvious choice recommended by the Sonification Design Space Map. Not doing any

other processing allows for keeping the rich detail of the signals entirely intact. With

common EEG sampling rates around 250 Hz, a typical speedup factor is 60x faster than

real time, which transposes our center band (alpha, 8-16Hz) to 480-960 Hz, well in the

middle of the audible range. For more time resolution, one can go down to 10x, or for

more speedup, up to 360x. See Figure 9.1 for locations on the Sonification Design Space

Map.

Page 150: Science by Ear Diss DeCampo

137

Figure 9.1: The Sonification Design Space Map for both EEG Players.

As there is no total size for EEG files (they can be anything from a few minutes to 36 hours

and more), Data Anchors are given for one minute and for one hour (center, and far right).

The labels Scr x10, Scr x60, and Scr x360 shows the map locations for minimum, default, and

maximum settings of speedUp, i.e. the time scaling of the EEGScreener (bottom right). The

labels RTP 1band and RTP 6bands show the locations for a single band and all six bands of

the EEGRealtimePlayer. Note that the use of two audio channels moves both of these designs

inwards along the ’number of streams’ axis, which is not shown here for simplicity.

This allows for wide ranges of time scales of local structures in the data to be put into

the optimum time window (the ca. 3 second window of echoic memory, see section 5.1

and de Campo (2007b)), while keeping the inner EEG bands well in the audible range;

if needed, one can compensate for reduced auditory sensitivity to the outer bands by

raising their relative amplitudes.

A lowpass filter for the EEG signal is available from 12 to 75 Hz, with a default value at

30 Hz, to provide the equivalent of visual smoothing used in EEG viewer software. Our

users wanted that feature, and it is a simple way to reduce higher band activity, which

is mostly considered noise (from a visual perspective that is).

A choice is provided between the straight audified signal, and a mix of six equal-

bandwidth layers, which can all be individually controlled in volume. This allows both

for focused listening to individual bands of interest, and for identification of the EEG

band in which a particular audible component occurs. A further reason to include this

Page 151: Science by Ear Diss DeCampo

138

Figure 9.2: The EEGScreener GUI.

The top rows are for file, electrodes, and time range selection. Below the row for playback

and note-taking elements are the playback parameter controls, and band filtering display and

controls.

band-splitting was to introduce the concept in a simpler form, such that users could

transfer the idea to their understanding of the realtime player.

9.2.2 Interface design

The task analysis for the Screener demanded that a graphical user interface be simple

to use (low-effort, little training needed), fast, and to provide for keeping reproducible

results of screening sessions. Furthermore, it should provide choices of what to listen

to, and visual feedback of what exactly one is hearing, and how. The GUI elements are

similar to sound file editors (which audio specialists are familiar with, but EEG specialists

usually are not).

File, electrode, and range selection

Page 152: Science by Ear Diss DeCampo

139

The button Load EDF is for selecting a file to be screened. Currently, only .edf3 files

are supported, but other formats are easy to add if needed. The text views next to it

(top line) provide file data feedback: file name, duration, and montage type the file was

recorded with4. The button Montage opens a separate GUI for choosing electrodes by

location on the head (see figure 9.3).

Figure 9.3: The Montage Window.

It allows for electrode selection by their location on the head (seen from above, the triangle

shape on top being the nose). One can drag the light gray labels and drop them on the white

fields ’Left’ and ’Right’.

The popup menus Left and Right let users choose which electrode to listen to on which

of the two audio channels. Like many soundfile editors, the signal views Left and Right

show a full-length overview of the signal of the chosen electrodes. During screening, the

current playback position is indicated by a vertical cursor.

The range slider Selection and the number boxes Start, Duration, End show the current

selection and allow for selecting a range within the entire file to be screened. The number

box Cursor shows the current playback position numerically. The signal views Left Detail

and Right Detail show the waveform of the currently selected electrodes zoomed in for

3A common format for EEG files, see http://www.edfplus.info/4As edf files do not store montage information, this is inferred from the number of EEG channels in

the file; at our institution, all the raw data montage types have different numbers of channels.

Page 153: Science by Ear Diss DeCampo

140

the current selection.

Playback and note taking

The buttons Play, Pause, Stop start, pause, and stop the sound.

The button Looped/No Loop switches between once-only playback and looped playback

(with a click to indicate when the loop restarts). The button Filters/Bypass switches

playback between Bypass mode (the straight audified signal, only low-pass-filtered), and

Filters mode, the mixable band-split signal.

The button Take Notes opens a text window for taking notes during screening. The edf

file name, selected electrodes and time region, and current date are pasted in as text

automatically. The button Time adds the current playback time at the end of the notes

window’s text, and the button Settings adds the current playback settings (see below)

to the notes window text.

To let the user concentrate on listening while screening a file, it is possible to stay on

the notes window entirely: Key shortcuts allow for pausing/resuming playback (e.g. to

type a note), for adding the current time as text (so one can take notes for a specific

time), and for the current playback settings as text.

Playback Controls

These control the parameters of the screener’s sound synthesis.

speedUp sets the speedup factor, with a range between 10-360; the default value of

60 means that one minute of EEG is presented within one second. Note that this is

straightforward tape-speed acceleration, which preserves full signal detail. The option

to compare different time-scalings of a signal segment allows for learning to distinguish

mechanical (electrode movements) and electrical artifacts (muscle activity) from EEG

signal components. lowPass sets the cutoff frequency for the lowpass filter, range be-

tween 12 and 75 Hz, with a default of 30 Hz. clickVol sets the volume of the loop

marker click, and volume sets the overall volume.

In Bypass mode, only the meter views are visible in this section, and they display the

amount of energy present in each of the six frequency bands (deltaL, deltaH, theta,

alpha, beta, gamma). In Filters mode, the controls become available, and one can raise

the level of bands one wants to focus on, or turn down bands that distract from details

in other bands. The buttons All On / All Off allow for quickly resetting all levels to

defaults.

9.3 The EEG Realtime Player

The EEGRealtimePlayer allows listening into details of EEG data in real time (or up to

5x faster when playing back files), in order to follow temporal events in or near their

Page 154: Science by Ear Diss DeCampo

141

original rhythmic contour. This design (and its eventual distribution as a tool) has been

developed in two stages:

Stage one is a data player, which plays recorded EEG data files at realtime speed with

the same sonification design (and the same adjustment facilities) as the final monitor

application. This allows for familiarising users with the range of sounds the system can

produce, for experimenting with a wide variety of EEG recordings, and for finding settings

which work well for a particular situation and user. This stage is described here.

Stage two is an add-on to the software used for EEG recording, diagnosis, and admin-

istration of patient histories at the institute. Currently, this stage is implemented as a

custom version of the EEG recording software which simulates data being recorded now

(by reading a data file), and sending the ’incoming’ data by network on to a special

version of the Realtime player (i.e. the sound engine and interface). Here, the incoming

data is sonified with the same approach as in the player-only version. Eventually, this

second program is meant be implemented within the EEG software itself.

9.3.1 Sonification design

The sonification design for real time monitoring is much more elaborate than the screener.

It was prototyped by Robert Holdrich in MATLAB, and subsequently adapted and im-

plemented for realtime interactive use in SC3 by the author. For a block diagram, see

fig. 9.4.

The EEG signal of each channel listened to is split into six bands of equal relative band-

width (one octave, 1-2, 2-4, ... 32-64 Hz). Each band is sonified with its own oscillator

and a specific carrier frequency: based on a user-accessible fundamental frequency base-

Freq, the carriers are by default multiples of baseFreq by integer numbers, 1, 2, ... 6. If

one wants to achieve more perceptual separation between the individual bands, one can

deform this overtone pattern with a stretch factor harmonic, where 1 is pure overtone

tuning:

carFreq = baseFreq ∗ i ∗ harmonici−1 (9.1)

The carrier frequency in each band is modulated with the band-filtered EEG signal,

thus creating a representation of the signal shape details as deviation from center pitch.

The amplitude of each oscillator band is determined by the amplitude extracted from

the corresponding filter-band, optionally stretched by an expansion factor contrast; this

creates a stronger foreground/background effect between bands with low energy and

bands with more activity.

For realtime monitoring as a background task, a second option for emphasis exists:

high activity levels activate an additional sideband modulation at carFreq * 0.25, which

creates a new fundamental frequency two octaves lower. This should be difficult to miss

even when not actively attending.

Page 155: Science by Ear Diss DeCampo

142

Figure 9.4: EEG Realtime Sonification block diagram.

Page 156: Science by Ear Diss DeCampo

143

Figure 9.5: The EEG Realtime Player GUI.

Note the similarities to the EEGScreener GUI; the main difference is the larger number of

synthesis control parameters.

Finally, for file playback, crossing the loop point of the current selection is acoustically

marked with a bell-like tone.

9.3.2 Interface design

Most elements (buttons, text displays, signal views, notes window) have the same func-

tions as in the EEGScreener. The main difference to the EEGScreener is that there are

many more playback controls, since the sonification model (as described above) is much

more complex.

The Playback controls are ordered by importance from top to bottom:

contrast ranges from 1-4; values above 1 expands the dynamic range, making active

Page 157: Science by Ear Diss DeCampo

144

bands louder and thus moving them to the foreground relative to average-activity bands.

For background monitoring, levels between 2-3 are recommended.

baseFreq is the fundamental frequency of the sonification, between 60-240 Hz; this can

be tuned to user taste - and our users have in fact expressed strong preferences for their

personal choice of baseFreq.

freqMod is the depth of frequency modulation of the carrier for each band. At 0, one

hears a pure harmonic tone with varying overtone amplitudes, at greater values, the pitch

of the band is modulated up and down, driven by the filtered signal of that band. Thus

the signal details of the activity in that band are rendered in high perceptual resolution.

A value of 1 is normal deviation.

emphasis fades in a new pitch two octaves below baseFreq for very high activity levels;

this can be used for extra emphasis in background monitoring.

harmonic is the harmonicity of the carrier frequencies: A setting of 1 means purely

harmonic carrier frequencies, less compresses the spectrum, and more expands it; this

can be used to achieve better perceptual band separation.

clickVol sets the volume of the loop marker click, volume sets the overall volume of

the sonification, and speed controls an optional speedup factor for file playback, with a

range between 1-5, 1 being realtime; in live monitoring mode, this control is disabled.

Band Filter Controls and Views

The buttons All On and All Off allow for setting all levels to medium or zero. The meter

views show the amount of energy present in each of the six frequency bands, and the

sliders next to them set the volume of each frequency band.

9.4 Evaluation with user tests

9.4.1 EEG test data

For development and testing of the sonification players described, a variety of EEG

recordings - containing typical epileptic events and seizures - was collected. This database

was assembled at the Department for Epileptology and Neurophysiological Monitoring

(University Clinic of Neurology, Medical University Graz), by using the in-house archive

system. It contains anonymous data of currently or recently treated patients.

For the expert users tests, three data examples were chosen, suited for each player’s

special purpose. For the Screener, rather large data sets were selected, to test with a

realistic usage example. Two measurements of absences and one day/night EEG with

seizures localized in the temporal lobe were prepared. The Realtime Player was tested

with three short data files; one a normal EEG (containing eye movement artefacts and

Page 158: Science by Ear Diss DeCampo

145

alpha waves), and two pathological EEGs (generalized epileptic potentials, and fronto-

temporal seizures).

The experts we worked with considered the use of audition in EEG-diagnostics very

unusual. We expected them to find it difficult to associate sounds with the events, so

they did some preliminary sonification training: For all data examples, they could look

at the data with their familiar EEG viewer software after having listened first, and try to

match what they had heard with the visual graphs familiar to them.

9.4.2 Initial pre-tests

An initial round of tests was done to get a first impression of usability, and data appropri-

ateness, which also contained experimental tasks (learning to listen). In order to obtain

independent and unbiased opinions, two interns were invited to test the first versions of

the screener and the realtime player by listening through the entire prepared database

at their own pace. They were instructed to take detailed notes of the phenomena they

heard (including inventing names for them), and where in which files; they spent roughly

40 hours on this task. The documentation of their listening experiments was then ver-

ified in internal re-listening and testing sessions. After these pre-tests, we decided to

reduce some parameter ranges to prevent users from choosing too extreme settings, and

we chose a smaller number of data sets for the second test round with expert users.

9.4.3 Tests with expert users

As the eventual success of these players depends on acceptance by the users in a clinical

setting, it was essential to do an evaluation with medical specialists. This was done

by means of two feedback trials; using the results of the primary expert test round,

the players were then improved in many details. For both players we made pre/post-

comparisons of user ratings between the different versions.

Even though we tested with the complete potential user group at our partner institution,

the test group is rather small (n=4); thus we consider the tests, and especially the open

question/personal interviews section, as more qualitative than quantitative data.

To prepare the four specialists for their separate test sessions, they were introduced to

the new aspects of data evaluation and experience by sonification in a group session.

For each EEG player a separate test session was scheduled to avoid ’listening overload’

and potential confusion.

Questionnaire

The questionnaire contained the following 11 scales:

The ratings to give for each statement ranged from 1 (strongly disagree) to 5 (strongly

Page 159: Science by Ear Diss DeCampo

146

Table 9.2: Questionnaire scales for EEG sonification designs

1 Usability

2 Clarity of interface functions

3 Visual design of interface

4 Adjustability of sound (to individual taste)

5 Freedom of irritation (caused by sounds)

6 Good sound experience (i.e. pleasing)

7 Allows for concentration

8 Recognizability of relevant events in data by listening

9 Comparability (of observations) with EEG-Viewer software

10 Practicality in Clinical Use (estimated)

11 Overall impression (personal liking)

agree). In addition to the 11 standardized questions, space for individual documentation

and description was provided. Moreover, an open question asked for further comments,

observations, and suggestions.

Results of first expert tests

This initial round of tests resulted in a number of improvements in both players: Elab-

orate EEG waveform display and data range selection was added to both; the visual

layout was unified to emphasize elements common to both players; and the screener was

extended with band filtering, which is both useful in itself, and a good mediating step

toward the more complex realtime sonification design.

9.4.4 Analysis of expert user tests EEG Screener 1 vs. 2

Optimizing the interface and interaction possibilities for version 2 of the Screener im-

proved most of its ratings sustantially: it was considered to offer more comfortable use

(+1) and more attractive visual design (+1). The sound experience for the medical

specialists has improved somewhat (+0.5), while the freedom of irritation experienced

improved very much (+2.0). While all other criteria improved substantially, recognizabil-

ity of events, comparability with viewer software, and clinical practicality received lower

ratings (between -0.5 and -0.25). We suspect that the better rating in the first test round

may have been enthusiasm about the novelty of this tool. Thus, personal conversation

with the expert users after the tests showed how strongly opinions differed: One user

did not feel ’safe’ and comfortable with the screener and could not trust his own hearing

skills enough to discriminate relevant information from (technical) artefacts by listening.

Page 160: Science by Ear Diss DeCampo

147

Figure 9.6: Expert user test ratings for both EEGScreener versions.

By contrast, the three others were quite relaxed and felt positively reassured to have

done their listening tasks properly and effectively. Furthermore, the users probably were

less motivated in comparing the EEG viewer to the ’listening result’ (which was asked

in one question), as they had done that carefully in the first tests already.

Overall, all users reported much higher satisfaction with version 2 of the screener (+1).

The answers in the open comments section can be summarized as follows: All users

confirmed better usability, design, clarity and transparency of version 2. Some improve-

ments were suggested in the visualization of the selected EEG channels, in particular

when larger files are analysed. Moreover, integration of the sonification into the real

EEG viewer would be appreciated a lot. A plug-in version of the player for the EEG-

Software used (NeuroSpeed by B.E.S.T. medical) was already in preparation before the

tests; in effect, the expert users confirmed its expected usefulness.

9.4.5 Analysis of expert user tests - RealtimePlayer 1 vs. 2

The mean estimation of the second realtime player version shows a positive shift in nearly

all scales of the questionnaire. Moreover, the range of the ratings is smaller than before,

so the answers were more consistent. The best ratings were given for visual design (+1),

Page 161: Science by Ear Diss DeCampo

148

Figure 9.7: Expert user test ratings for both RealtimePlayer versions.

adjustability of sound (+1) and comparability to viewer (+1.5), all estimated with ’good

to very good’. The ’overall impression’ was now estimated as ’good’ (+1), as well as

’usability’ (+0.5), ’clarity of interface’ (+0.5), and ’good sound experience’ (+1). The

aspects ’recognizability of relevant EEG events’ ( +1) and ’practical application’ (+1)

are estimated similarly satisfying. The only item that remains at the same mean rating

is ’freedom of irritation’, estimated as a little better than average. The same rating was

given for ’allowed concentration’ (+1.5), which has improved very much.

Probably, these two aspects correspond to each other: in spite of the improved control

of irritating sounds and a learning effect, the users were still untrained in coping with

the rather complex sound design. This sceptical position was taken in particular by two

users, affecting items 5 to 9. All in all, the ratings indicate good progress in the realtime

player’s design. This may well have been influenced the strong time restraints on these

tests: As our experts have very tight schedules in clinical work, it has been difficult to

obtain enough time for reasonably unhurried, pressure-free testing.

Comparing the ratings across the two first versions, the Realtime Player 1 was not rated

as highly as the Screener 1. We attribute this to the higher complexity of the sound

design (which did not come across very clearly under the time pressure given), the related

non-transparency of some parameter controls, and to ensuing doubts about the practical

Page 162: Science by Ear Diss DeCampo

149

benefit of this method of data analysis. Only the rating for irritation is better than

Screener 1, which indicates that the sound design is aesthetically viable for the users.

All these concerns were addressed in the Realtime Player 2: In order to clarify the band-

splitting technique, GUI elements indicate the amount of power present in each band,

and allow for interactive choice of which bands to listen to; less parameter controls are

made available to the user5, with simpler and clearer names. Much more detailed help

info pages are also provided now.

Finally, band-splitting (adapted to audification) was integrated into the Screener 2 as

well, which gives users a clearer understanding of this concept across different sonification

approaches.

9.4.6 Qualitative results for both players (versions 2)

For both players, all users mentioned easy handling (usability), good visual design, and

transparency of functionality. More positive comments on the Screener were ’higher

creativity’ (by using the frequency controls) and that irritating sounds have nearly disap-

peared. One user explained this by a training effect, and we agree: It seems that as users

learn to interpret the meaning of ”unpleasant” sounds (such as muscle movements), the

irritation disappears. Regarding the realtime player, users mentioned good visual corre-

lation with the sound, because of the new visual presentation of EEG on the GUI. One

user noted that acoustical side-localisation of the recorded epileptic seizure works well.

Further improvements were suggested: For both players, the main wish is synchronization

of sound and visual EEG representation (within the familiar software): In case of realtime

monitoring, this would allow to better compare the relevant activities. As far as screening

is concerned, the visual representation of larger files on the GUI was considered not very

satisfying.

For the realtime player, presets for the complex parameters in accordance to specific

seizure types were suggested as very helpful. Moreover, usability could still be improved

a bit more (however, no specific wishes were given), as well as irritating sounds should

be further decreased. This wish may also be due to the fact that the offered parameter-

controls for reducing disturbing sounds may not have been used fully. This can likely be

addressed by more training.

9.4.7 Conclusions from user tests

According to the experts’ evaluation of the EEG Screener, intensive listening training

will be essential for its effective use in clinical practice - in spite of improved usability and

5Version 1 had some visible controls mainly of interest to the developer.

Page 163: Science by Ear Diss DeCampo

150

acceptance of the second version. As the visual mode in clinical EEG diagnostics and

data analysis is still dominant, for the widespread use of sonification tools an alternative

time and training management is necessary. After such training, our new tools may well

help to successively reduce effort and time in data analysis, decrease clinical diagnostic

risk, and in the longer term, offer new ways for exploring EEG data.

9.4.8 Next steps

A number of obvious steps could be taken next (given followup research projects):

For the Realtime Player, the top priority would be integration of the network connec-

tion for realtime monitoring during EEG recording sessions. Then, user tests in real

world long-term monitoring settings can be conducted. These tests should result in

recommended synthesis parameter presets for different usage scenarios.

For the sound design, we have experimented with an interesting variant which empha-

sizes the rhythmic nature of the individual EEG bands more (see Baier et al. (2006);

Hinterberger and Baier (2005)). This feature can be made available as an added user

parameter control (’rhythmic’), with a value of 0 maintaining the current sound design,

and 1 accentuating the rhythmic features more strongly.

For both Realtime Player and Screener, eventual integration into the EEG administration

software used at our clinic was planned; however, this can only be done after another

round of longer-term expert user tests, and when the ensuing design changes have been

finalised.

9.4.9 Evaluation in SDSM terms

The main contributions to the Sonification Design Space Map concept resulting from

work on the EEG players were the following lessons:

Adopt domain concepts and terminology wherever possible (band splitting)

make interfaces as simple and user-friendly as possible

provide lots of visual support for what is going on (here, show band amplitudes)

provide opportunities to understand complex representations interactively by providing

options to take them apart (here, listening to single bands at a time)

give users enough time to learn (this did not happen for the Realtime Player).

Page 164: Science by Ear Diss DeCampo

Chapter 10

Examples from the Science by Ear Workshop

For more background on the Science By Ear workshop, see section 4.2, and here1.

The dataset LoadFlow and the experiments made with it in the SBE Workshop are

instructive basic examples; they are given as first illustrations of the Sonification Design

Space Map in section 5.1.

Other SBE datasets and topics (EEG, Ising, UltraWideband, Global Social Data) were

elaborated in more depth in mainstream SonEnvir research activities, and are thus cov-

ered in the examples from the SonEnvir research domains. The remaining two datasets,

RainData, and Polysaccharides, are described briefly here for completeness.

10.1 Rainfall data

These data were provided and prepared by Susanne Schweitzer and Heimo Truhetz

of the Wegener Center for Climate and Global Change, Graz. The data describe the

precipitation per day over the European alpine region from 01.01.1980 to 01.01.1991.

Additionally, associated orographic information (i.e. describing the average height of

the area) was provided. Such data are quite common in climate physics research. The

precipitation for 24 hours is measured as the total precipitation within 6:00 UTC2 and

6:00 UTC of the next day.

The data were submitted in a single large binary file of the following format: Each single

number is precipitation data in mm/day over the European alpine region (latitude 49.5N-

43N, longitude 4E-18E) with 78 x 108 grid points. The time range covers 11 years, from

1980-1990; this equals 4018 days. The data is stored in 4018 arrays (one after another)

of 78 x 108 (rows x colums) values. The first array contains precipitation data over the

selected geographic region of day 1 (1.1.1980), the 2nd array is precipitation data over

the selected geographic region of day 2 (2.1.1980), and so on. A visualisation of the

1 http://sonenvir.at/workshop/2Coordinated Universal Time

151

Page 165: Science by Ear Diss DeCampo

152

Figure 10.1: Precipitation in the Alpine region, 1980-1991.

average precipitation over the 11 years given is shown in figure 10.1.

A second file provides associated information on orography of the European alpine region,

i.e., the terrain elevation in meters. This data is stored in one 78 x 108 array.

General questions the domain scientists deemed interesting were whether it would be

possible to hear all three dimensions (geographical distribution and time) simultaneously

and to find a meaningful representation of the distribution of precipitation in space and

time. They also speculated that it might be relaxing to listen to a synthetic rendering

of the sound of rain.

As possible topics to investigate, they suggested:

• 10-year mean precipitation in the seasons

• variability of precipitation via standard deviations (i.e., do neighbouring regions

more often swing together or against each other?)

• identification of regions with similar characteristics via covariances (do different

regions sound different?)

Page 166: Science by Ear Diss DeCampo

153

• extreme values (does the rain fall regularly, or are there long droughts in some

regions?)

• correlations in height (does precipitation behave similarly in similar orographic

heights?)

• distribution of precipitation amounts (on how many days is the precipitation higher

than 20mm, 19 mm, 18mm, etc?)

As a test of the proper geometry of the data format, the SC3 starting file for the sessions

provided a graphical representation of the orographic dataset, with higher regions shown

as brighter gray, see figure 10.2. We also provided example reading routines for the data

file itself.

Figure 10.2: Orography of the grid of regions.

Session team A

In the brainstorming phase, team A came up with the idea to use spatial distribution for

the definition of features like variability, entropy, etc., possibly using large regions such as

quarters of the entire grid. The team agreed that the data should be used as time series,

since rhythmical properties are expected to be present. The opinion was that the main

interest is in the deviations from the average yearly shape. Thus, the team decided to

Page 167: Science by Ear Diss DeCampo

154

try using an acoustic representation of the data series conditioned to the average yearly

shape as a benchmark curve as follows: if the value in question is higher than average,

high pitched dust (single sample crackle through a resonance filter) is audible. if value

is lower than average, lower pitched dust is heard. The amplitude should scale with the

absolute deviation from average, and the dust density should scale with absolute rain

values.

In this fashion, one could sonify different locations at the same time by assigning the

sonifications of different locations to different audio channels. This should produce

temporal rhythms if there are systematic dependencies between the locations. As data

reading turned out to be more difficult than expected, the team began experimenting

with dummy data to design the sounds and behaviour of the sonification, while the

second programmer worked on data preparation. In the end, the team ran out of time

before the real data became accessible enough for replacement.

Session team B

Team B discussed many options while brainstorming: as the data set was quite large,

choosing data subsets, e.g. by regions; looking for possible correlations; maybe listening

to the entire time range for a single location; maybe use a random walk for a reading tra-

jectory; select location by pointing (mouse), compare a reference time-series sonification

to the data subset under study.

The team found a good solution for the data reading difficulties: they read only the data

points of interest directly fro the binary file, as this turned out to be fast enough for real

time use. The designs written explored comparing the time-series for two single locations

over ten years; the sound examples produced demonstrate these pairs played sequentially

and simultaneously on left and right channels. The sounds are produced with discrete

events: each data point is rendered by a gabor grain with a center frequency determined

by the amount of rain for the day and location of interest.

In the final discussion, the team found that a comparison of different regions would be

valuable, where the mean area over which to average should be flexible. Such averaging

could also be considered conceptually similar to fuzzy indexing into the data; modulating

the averaging range and providing fuzziness in three dimensions would be worth further

exploration.

Session team C

Team C had the most difficulties getting the data loaded properly; this was certainly

a deficiency in the preparation. After converting a subset of the data with Excel, they

decided on comparing the data for January in all years, and listen for patterns and dif-

Page 168: Science by Ear Diss DeCampo

155

ferences across different regions. Some uncertainty about whether the conversions were

fully correct remained, but this was considered relatively unimportant for the experimen-

tal workshop context.

The sonification design entailed calculating a mean January value for all locations, and

comparing each individual day to the mean value. This was intended to show how the

precipitation varies, and to identify extreme events. All 8424 stations are being scanned

along north/south lines, which slowly move from west to east. The ration of the day’s

rainfall to the mean was mapped to the resonance value of a bandpass filter driven by

white noise.

The sound examples provided cover January 5, 15, and 25 for the years 1980 and 1981

scaled into 9 seconds; a much slower variant with only 190 stations is presented as well

for comparison, and this shows a much smoother tendency. Varying filter resonance as

rapidly as described above is not likely to be very clearly audible.

Comparison in SDSM terms

The data set given is quite interesting from an SDSM perspective: it has 2 spatial

indexing dimensions, with 78 * 108 = 8424 geographical locations, for which orographic

data dimension (average elevation above sea level) are also given. For each location,

data are given for 1 (or maybe 2) time dimensions, namely, 365 (resp. 366) days * 11

years = 4018 time steps (days). Thus, multiple locations are possible for its data anchor

(see figure 10.3), depending on the viewpoint taken. From a temporal point view, one

would treat the 8424 locations as the data size, and create a ’day anchor’ at x: 8424, y:

1, a month anchor at x: 8424, y: 30, the year anchor and the 11-year are both outside

the standard map size. For a single location, an anchor could be at x: 4018, y: 2. In

any case, whatever one considers to be the unit size of this kind of data set is arbitrary,

as both time and space dimensions could be different sizes and/or resolutions.

Team A mapped one year into 7.3 seconds, and presented two streams of two mapped

dimensions each (pitch label for deviation polarity, and intensity for deviation amount).

These choices put its SDSM point at an expected gestalt size of 150 (x), dimensions at

2 (y), and streams at 2 (Z). Continuous parameter mapping is a reasonable choice for

this location on the map.

Team B begins with 8424 data points per 4 second loop; this is a rather dense gestalt

size of ca. 6000. The design choice of averaging over 9-10 values scales this to ca.

600, which seems well suited for granular synthesis with a single data dimension used,

mapped to the frequency parameter (y-axis), and using two parallel streams (z).

Team C maps 8424 values into 9 seconds, which creates a gestalt size of ca. 3000

(label C1 on the map); this seems very fast for modulation synthesis of filter bandwidth,

although it uses only a single stream and dimensions, so y and z values are both 1.

Page 169: Science by Ear Diss DeCampo

156

Figure 10.3: SDSM map of Rainfall data set.

The slower version (190 values in 9 seconds, C2) is more within SDSM recommended

practice, at a gestalt size of ca. 60. While the SDSM concept recommends making

indexing dimensions available for interaction, this was too complex for the workshop

setting.

10.2 Polysaccharides

This problem was worked on for two two-hour sessions, so the participants had more

time to reflect and consider how to proceed. The data were submitted by Anton Huber

of the Institute of Chemistry at University of Graz.

10.2.1 Polysaccharides - Materials made by nature3

Polysaccharides make up most of the biological substance of plant cells. Their molecular

geometries, such as their symmetries determine the physical properties of most plant-

based materials. Even materials from trees of the same kind have different properties

because of the environment they come from; so understanding the properties of a given

3This was the title of Anton Huber’s introductory talk.

Page 170: Science by Ear Diss DeCampo

157

sample is of crucial importance to Materials scientists. A typical question that occurs

is: Are the given datasets (which should be the same) somehow different?

In aqueous media, polysaccharides form so-called supermolecular structures. Very few of

these molecules can structurise amazing amounts of water: water clusters can be several

millimeters large. By comparison, the individual molecules are measured in nanometers,

so there is a scale difference of six orders of magnitude!

In a given measurement setup the materials are physically sorted by fraction: on the left

side particles with big molecules (high mol numbers) are found, one the right small ones.

Rather few bins (on the order of 30) of sizes and corresponding weights are conventionally

considered sufficiently precise for classification, both in industry and science.

The data for this session were analysis data of four samples of plant materials: beech,

birch, oat and rice. Three different measurements were given, along with their indexing

axes: channel 1 is an index (corresponding to mol size) of the measurement at channel

2, channel 3 is an index of channel 4, and channel 5 is an index of channel 6.

Channels 1 and 2 contain the measured delta-refraction index of electromagnetic radi-

ation aimed at the material sample; i.e. how strongly light of a given wavelength is

diverted from its direction by the size-ordered regions along the sample. (The exact

wavelength used was not given.)

Channels 3 and 4 contain the measured fluorescence index under electromagnetic radi-

ation, again dependent on the size-ordered regions along the sample.

Channels 5 and 6 contain the measured dispersion of the material sample under light,

or more precisely, how much the dispersion differs from that of clear water, based on

molecule size along the size-ordered axis of the sample.

10.2.2 Session notes

The notes for this session were reconstructed shortly after the workshop.

Brainstorming

One of the first observations made was that the data look like FFT analyses - so the

team considered using FFT and Convolution on the bins directly. An alternative could

be a multi-filter resonator with e.g. 150 bands, maybe detuned from a harmonic series.

As was noted several times in the workshop by those favouring audification, it seemed

desirable to obtain ’rawer’ data as directly as possible from the measurements; these

may be interesting to be treated as impulse responses.

The first idea the team decided to try was to create a ”signature sound” of 1-2 seconds

for one channel each data file by parameter mapping to about 15-20 dimensions; a

Page 171: Science by Ear Diss DeCampo

158

second step should be to compare two such signature sounds (for two channels of the

same file) binaurally.

Experimentation

A look at the data revealed that across all files, channel 1 seemed massively saturated in

the upper half, so we decided to take only the undistorted part of channel 1, downsample

it to e.g. 50 zones, and turn these into 50 resonators, which would ring differently for

the different materials when excited. The resonator frequencies were scaled according

to the index axis, which is roughly equal to particle size: small particles are represented

by high sounds, and big particles by lower resonant frequencies.

Based on this scheme, we proceeded to make short ’sound signatures’ for the four

materials, using delta-refraction index (channel 2) and fluorescence (channel 4) data,

with two different exciter signals: Noise and impulses.

The sound examples provided here4 present all four materials in sequence:

Delta refraction index, impulse source: Materials1 Pulse BeechBirchOatRice.mp3

Delta refraction index, noise source: Materials1 Noise BeechBirchOatRice.mp3

Fluorescence, impulse source: Materials2 Pulse BeechBirchOatRice.mp3

Fluorescence, noise source: Materials2 Noise BeechBirchOatRice.mp3

The team also started making these playable from a MIDI drum pad for a more interactive

interface, but did not have enough time to finish this approach.

Evaluation

The group agreed that having time for two sessions was much better for deeper discussion

and more interesting results. Even so, more time would be desirable. In this particular

session, the sound signatures made were easy to distinguish, so in principle, this approach

works.

What could be next steps? It would be useful to implement signatures of more than

one channel to increase reliability of properties tracking; e.g. for materials production

monitoring, this could be a useful application.

It would also be interesting to try a nonlinear complex sound generator (such as a

feedback FM algorithm) and control its inputs from the data, using on the order of

20-30 dimensions; this holistic approach would be interesting from the perspective of

sonification research, as it might lead to emergent audible properties without requiring

detailed matching of individual data dimensions to specific sound parameters. While

4http://sonenvir.at/workshop/problems/biomaterials/sound descr

Page 172: Science by Ear Diss DeCampo

159

there was no time to attempt this within the workshop setting, the idea would certainly

warrant further research.

In SDSM terms, the dimensionality of each data point is unusually high here. The

sonifications render each material (consisting of 680 measurements) to a reduced range

of the data, downsampled to 50 values, as resonator specifications, i.e., intensity and

ringtime of each band. Given an interactive design, such as one allowing tapping on the

different materials or probes, one can easily compare on the order of 5-8 samples within

short term memory limits.

Page 173: Science by Ear Diss DeCampo

Chapter 11

Examples from the ICAD 2006 Concert

The author was Concert Chair for the ICAD 2006 Conference at Queen Mary University

London, and together with Christian Daye and Christopher Frauenberger, organized

the Concert Call, the review process for the submissions, and the concert itself (see

section 4.3 for full details). This chapter discusses four of the eight pieces played in

the concert, chose for diversity of the strategies used, and clarity and completeness of

documentation.

11.1 Life Expectancy - Tim Barrass

This section discusses a sonification piece created by Tim Barrass for the ICAD 2006

Concert, described in Barrass (2006), and available as headphone-rendered audio file1.

Life Expectancy is intended to allow listeners to find relationships between life expectan-

cies and living conditions around the world. The sounds he chooses are quite literal

representations of their meanings, making them relatively easy to ’read’, even though

the piece is quite dense in information.

It is structured in three parts, beginning with a 20 second section which mainly provides

spatial orientation, a long middle section representing living conditions for each country

in a dense 2 second soundscape, and a short final section illuminating gender differences

in life expectation.

The opening section presents the spatial locations of all country capitals, ordered by

ascending life expectancy. The speaker ring is treated as if it were a band around the

equator, with the listener inside near the center of the globe. Each capital location is

marked by a bell sound (which is easy to localise), spatialised in the ring of speakers

according to the capital’s longitude; latitude (distance to the equator, North or South)

is represented by the bell’s pitch, where North is higher. A whistling tone represents

ascending life expectancy for each country, and as it is not spatialised, it is easy to

1 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/life.mp3

160

Page 174: Science by Ear Diss DeCampo

161

follow as one stream. Each country has roughly 0.1 second for its bell and whistle tone.

The main section of the piece is about six minutes long, and presents a rich, complex

audio vignette for every country, at the length of a musical bar of two seconds. The

most intriguing aspect here is the ordering of the countries: First we hear the country

with the highest life expectancy, then the lowest, the second highest, the second lowest,

and so on until the interleaved orders meet in the median.

Each sound vignette consists of the following sound components:

Two bell sounds whose pitch indicates latitude, first of the equator, then of of the

country’s capital spatial location, their horizontal spatial position being longitude.

A chorus speaking the country’s name, with the number of voices representing the

population number, and whose extension representing the country area. The capital

name is also spoken, at its spatial location.

A fast ascending major scale represents life expectancy, once for male, once for female

inhabitants of the country. The number of notes of the scale fragment represents the

number of life decades, so a life expectancy of 75 years would be represented as a scale

covering 8 steps (up to the octave) with the last note shortened by 50 percent. The

gender differences between each pair of scales, and the alternation of extreme contrasts

in the beginning of the sequence articulates this aspect very interestingly.

Clinking coins signify economic aspects: average income by density of the coin sounds,

while gross domestic product (GDP) is indicated by reverb size.

The sound of water filling a vessel indicates access to drinking water and sanitation: a

full vessel indicates good access, an empty vessel little access. Three pulses of this sound

provide total, rural, and urban values. Sanitation is rendered by adding distortion to the

water pulses when sanitation values are low (suggesting ’dirty’ water).

The final short section of the piece focuses on gender differences in life expectancy. As

the position bell moves from the North Pole to the South Pole, life expectancies for

each country are represented with a tied note, going from the value for male to female

(usually rising), and spatialised at the capital’s location.

Tim Barrass is very modest in commenting on the piece (Barrass (2006)):

I have taken a straightforward and not particularly musical approach, in

an attempt to gain a clear impression of the dataset. The sound mapping

is ”brittle”, designed specifically for the dataset. I would not expect this

approach to provide a flexible base to explore the musical, sonic and in-

formational possibilities of similar material, but it may at least serve as an

example of one direction that has been tried.

While the piece may appear ’artless’ in representing so much of the dataset with appar-

ently simplistic sound mappings, I find the piece extremely elegant, both as a sonification,

and as a composition. The sound metaphors are so clear that they almost disappear, as

Page 175: Science by Ear Diss DeCampo

162

does the spatial representation. It is quite an achievement to create concurrent sound

layers that are both rich, complex, dense enough to be demanding to listen to, and trans-

parent enough to allow for discovering different aspects as the piece proceeds. This piece

certainly provided the richest information representation of all entries for the concert.

The beginning and end sections work beautifully as frames for the piece, as orientation

help, and as alternative perspectives on the same questions. For me, the questions that

remain long after listening to the piece come from the strongest intervention in the piece,

the idea of sorting the countries so as to begin with the most extreme contrasts in life

expectancy, and moving toward the average lifespan countries.

11.2 Guernica 2006 - Guillaume Potard

This section discusses a piece created by Guillaume Potard for the ICAD 2006 Concert,

described in Potard (2006), and available as headphone-rendered audio file2.

Guernica 2006 sonifies the evolution of world population and the wars that occurred

between the year 1 and 2006. Going far beyond the data supplied with the concert call,

Potard has compiled a comprehensive list of 507 documented wars, with geographical

location, start and end year, and a flag indicating whether it was a civil war or not. He

also located estimates for world population for the same time period.

The sonification design represents the temporal and geographical distribution chrono-

logically. The temporal sequence follows historical time: the start year of each war

determines when its representing sound begins. As many more wars have occurred to-

ward the end of the period observed, the time axis was slowed down logarithmically in

the course of the piece, so the duration of a year near the end of the piece is 4 times

longer than at the beginning. This maintains the overall tendency, but still provides

better balance of the listening experience. The years 1, 1000 and 2000 are marked by

gong sounds for orientation. The entire piece is scaled to a time frame of five minutes.

The start time of each war is indicated by a weapon sound; the sounds chosen change with

the evolution of weapon technology. In the beginning, horses, swords, and punches are

heard, while after the invention of gunpowder cannons, guns, and explosions dominate.

Newer technology such as helicopters is heard only toward the end of the piece, after

the year 1900. Civil wars are marked independently by the additional sound of breaking

glass.

The spatial distribution of the sounds was handled by vector-based amplitude panning

for the directions of the sound sources relative to the reference center, the geographical

location of London. Sound distance was rendered by controlling the ratio of direct to

reverberation sound.

2 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/guernica.mp3

Page 176: Science by Ear Diss DeCampo

163

The evolution of world population is sonified concurrently as a looping drone, with

playback speed rising as population numbers rise.

Guernica 2006 was certainly the most directly dramatic piece in the concert. The use of

samples communicates the intended context very clearly, without requiring much prior

explanation. As Potard (2006) states, richer data representation with this approach

would certainly be possible; he considers representing war durations, distinguishing more

types of war, and related factors like population migrations in future versions of the

piece.

11.3 ’Navegar E Preciso, Viver Nao E Preciso’

This section discusses a sonification piece created by Alberto de Campo and Christian

Daye for the ICAD 2006 Concert, described in de Campo and Daye (2006), and avail-

able as headphone-rendered audio file3. As this piece was co-written by the author of

this dissertation, much more background can be provided than with the other pieces

discussed.

In this piece, we chose to combine the given dataset containing current (2005) social

data of 190 nations with a time/space coordinates dataset of considerable historical

significance: The route taken by the Magellan expedition to the Moluccan Islands from

1519-1522, which was the first circumnavigation of the Globe.

11.3.1 Navigation

The world data provided by the ICAD 2006 Concert Call all report the momentary

state for the year 2005, and thus free of the idea of historical progression. Also, the

choice of which variables to include in the sonification, and how, must be based on

theoretical assumptions which are not trivial to formulate on a level of aggregation

involving 6.513.045.982 individuals (the number of people estimated to have populated

this planet on April 30, 2006, see U.S. Census Bureau (2006)). The data do provide

detailed spatial information, so we decided to choose a familiar form of data organization

that combines space and time: the journey.

Traveling can be defined as moving through both space and time. While the time

dimension as we experience it is unimpressed by the desires of the traveler, s/he can

decide where to move in space. The art and science that has enabled mankind to find

out where one is, and in which direction to go to arrive somewhere specific, is known as

Navigation.

Navigation as a practice and as a knowledge system has exerted major influence on the

3 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/navegar.mp3

Page 177: Science by Ear Diss DeCampo

164

development of the world. The Western world has changed drastically by the conse-

quences of the journeys led by explorers like Christopher Columbus or Vasco da Gama.

(The art of navigation outside Europe, especially in Polynesia, is covered very interest-

ingly in Conner (2005), pp 41-58.) The first successful circumnavigation of the globe,

led by Ferdinand Magellan, proved beyond all scholastic doubts that the earth in fact

appears to be round. This would not have happened without the systematic cultivation

of all the related sciences in the school for navigation, map-making and ship-building

founded by Henry the Navigator, King of Portugal in the 15th century. (Conner (2005)

also describes their methods of knowledge acquisition vividly as mainly coercion, appro-

priation, and information hoarding, see chapter: Blue Water Navigation, pp. 201ff.)

For all these reasons, Magellan’s Route became an interesting choice for temporal and

spatial organization for our concert contribution.

11.3.2 The route

Leaving Seville on August 10, 1519, the five ships led by Magellan (called Trinidad,

San Antonio, Concepcion, Victoria, and Santiago) crossed the Atlantic Ocean to anchor

near present-day Rio de Janeiro after five months (Pigafetta (1530, 2001); Wikipedia

(2006b); Zweig (1983)). Looking for a passage into the ocean later called the Pacific,

they moved further south, where the harsh winter and nearly incessant storms forced

them to anchor and wait for almost six months.

While exploring unknown waters for this passage, the Santiago sank in a sudden storm,

and the San Antonio deserted back to Spain; the remaining three ships succeeded and

found the passage in the southernmost part of South America which was later called the

Magellan Straits, in late October 1520. The ships then headed across the Mar del Sur, the

ocean Magellan named the Pacific, towards the archipelago which is now the Philippines,

where they arrived four months later. Seeking the mythical Spice Islands, Magellan and

his crew visited several islands in this area (Limasawa, Cebu, Mactan, Palawan Brunei,

and Celebes); on Mactan, Magellan was killed in a battle, and a monument in Lapu-Lapu

City marks the site where he died.

In spite of their leader’s death, the crew decided to fulfil their mission. By now diminished

to 115 persons on just two ships (Trinidad and Victoria), they finally managed to reach

the Spice Islands on November 6, 1521. Due to a leak in the Trinidad, only the Victoria

”set sail via the Indian Ocean route home on December 21, 1521. By May 6, 1522, the

Victoria, commanded by Juan Sebastin Elcano, rounded the Cape of Good Hope, with

only rice for rations. Twenty crewmen died of starvation before Elcano put into Cape

Verde, a Portuguese holding, where he abandoned 13 more crew on July 9 in fear of

losing his cargo of 26 tons of spices (cloves and cinnamon).” Wikipedia (2006b). On

September 6, 1522, more than three years after she left Seville, Victoria reached the

port of San Lucar in Spain with a crew of 18 left. One is reminded of a song by Caetano

Page 178: Science by Ear Diss DeCampo

165

Figure 11.1: Magellan’s route in Antonio Pigafetta’s travelogue

(Primo Viaggio Intorno al Globo Terracqueo. - First travel around the terracqueous globe, see

Pigafetta (1530)).

Page 179: Science by Ear Diss DeCampo

166

Figure 11.2: Magellan’s route, as reported in wikipedia.

http://wikipedia.org/Magellan

Veloso, who, pondering the mentality and fate of the Argonauts, wrote: ”Navegar e

preciso, viver nao e preciso” - ”Sea-faring is necessary, living is not” (see appendix E).

11.3.3 Data choices

The explorers in the early 15th century were interested in spices (which Europe was

massively addicted to at the time), gold, and the prestige earned by gaining access to

good sources of both. Nowadays, other raw materials are considered premium goods.

What would someone who undertakes such a journey today hope to gain for his or her

exertions; what is as precious today as gold and spices were in the 16th century?

We imagine today’s conquistadores (or globalizadores) would likely ask first about eco-

nomic power: how rich is an area? Second, they would probably check geographical

potential; and chances are that if any one resource will be as central to economic activ-

ity in the future as spices were centuries ago, it will be drinking water resources. Water

might well become the new pepper, the new cinnamon, or even the new gold. (As the

Gulf wars showed, oil would have been the obvious current choice; however, we found the

future perspective more interesting.) Thus we chose to focus on two main dimensions:

one depicting economic characteristics of every country we pass, and another informing

us about its inhabitants’ current access to drinking water.

Page 180: Science by Ear Diss DeCampo

167

11.3.4 Economic characteristics

The variable ’GDP per capita’ included in the given data set provides some insights in

the overall economic performance of a country. Obviously, the ’GDP per capita’ variable

lacks information about the distribution of the income; it only says how much money

there would be per person if it were equally distributed. This is never the case; on the

contrary, scientists find that the rich get richer and the poor get poorer both in intra-

national and international contexts. E.g. in the US of 1980, the head of a company

earned on average 42 times as much as an employee by the year 1999, this ratio was

more than ten times higher: a company leader earned 475 times more than an average

employee (Anonymous (2001)).

Figure 11.3: The countries of the world and their Gini coefficients.

From http://en.wikipedia.org/wiki/Gini.

A measure that captures aspects of income distribution is the Gini coefficient on in-

come inequality (Wikipedia (2006a)). Developed by Corrado Gini in the 1910s, the Gini

coefficient is defined as the ratio of area between the Lorenz curve of the distribution

and the curve of the uniform distribution, to the area under the uniform distribution.

More common is the Gini index, which is the Gini coefficient times 100. The higher the

Gini index, the higher the income differences between the poorer and the richer parts

of a society. A value of 0 means perfectly equal distribution, while 100 means that one

person gets all the income of the country and the others have zero income. However,

the Gini index does not report whether one country is richer or poorer than the other.

Our sonification tries to balance the limitations of these two variables by combining

them: We include two factors that go into a Gini calculation; the ratio of the top and

bottom 10 percentile of all incomes in a population, and the ratio of the top to bottom

20%. In Denmark, at Gini index rank 1 of 124 nations for which Gini data exist, the top

10% earn 4.5x as much as the bottom 10%, for the UK (rank 51), the ratio is 13.8:1;

Page 181: Science by Ear Diss DeCampo

168

the US (rank 91) ratio is 15.9:1; in Namibia, at rank 124, the ratio is 128.8:1. (In the

sonification, missing values here are replaced by a dense cluster of near-center values,

which is easy to distinguish acoustically from the known occurring distributions.)

11.3.5 Access to drinking water

An interesting variable provided by the ICAD06 Concert data set is ’Estimated percentage

of population with access to improved drinking water sources total’. Being part of the

so-called ”Social Indicators” (UN Statistics Division (1975, 1989, 2006)), the data are

reported to the UN Statistics Division by the national statistic agencies of the UN

member states. Unfortunately, this indicator has a high percentage of missing values

(46 of 190 countries, or 24.2%). This percentage can be reduced to 16.3% (31 countries)

by excluding missing values from countries which are not touched by our Magellanian

route. Still, the problem is fundamental and must be addressed. The strategy we chose

was to estimate the missing values on the basis of the data value of the neighboring

countries, being aware that this procedure does not satisfy scientific rigor. In most cases,

though, we claim that our estimates are likely to match reality: for instance, it is very

likely that in France and Germany (as in in most EU countries), very close to 100% of

the population do have access to ”improved drinking water resources”, and that this

fact is considered too obvious to be statistically recorded.

11.3.6 Mapping choices

We deliberately chose rather high display complexity; while this requires more listener

concentration and attention for maximum retrieval of represented information, hopefully

a more complex piece invites to repeated listening, as audiences tend to do with pieces

of music they enjoy. Every country is represented by a complex sound stream composed

of a group of five resonators; the central resonator is heard most often, the outer pairs

of resonators (’satellites’) sound less often. All parameters of this sound stream are

determined by (a) data properties of the associated country and (b) the navigation

process, i.e. the ship’s current distance and direction towards this country. At any time,

the 15 countries nearest to the route point are heard simultaneously. This is both to

limit display complexity for the sake of clarity, and to keep the sonification within CPU

limits for realtime interactive use. The mapping choices in detail are given in 11.1.

In order to provide a better opportunity to learn this mapping, the author has written

a patch which plays only a single sound source/country at a time, where it is possible

to switch between the parameters for all 192 countries. This allows comparing the

multidimensional changes as one switches from say Hongkong (very dense population,

very rich) to Mongolia (very sparse population, poor). In public demonstrations and

talks, this has proven to be quite appropriate for this relatively complex mapping. When

Page 182: Science by Ear Diss DeCampo

169

Table 11.1: Navegar - Mappings of data to sound parameters

Population density of country Density of random resonator triggers

GDP per capita of country Central pitch of the resonator group

Ratio of top to bottom 10% Pitches of the outermost (top and bottom)

’satellite’ resonators

Ratio of top to bottom 20% Pitches of the inner two ’satellite’ resonators

(missing values for these become dense clusters)

Water access Decay time of resonators (short tones mean dry)

Distance from ship Volume and attack time (far away is ’blurred’)

Direction toward ship Spatial direction of the stream in the loudspeakers

(direction North is always constant)

Ship speed, direction, winds Direction, timbre and volume of wind-like noise

hearing the piece after experimenting for a while with an example of its main components,

many listeners report understanding the sonification much more clearly.

It has also been helpful to provide some points of orientation that can be identified while

the piece unfolds, as listed in table 11.2.

11.4 Terra Nullius - Julian Rohrhuber

This section discusses a sonification piece created by Julian Rohrhuber for the ICAD

2006 Concert. It is described in Rohrhuber (2006), and available as headphone-rendered

audio file here4.

11.4.1 Missing values

The concept for ’Terra Nullius’ builds on a problem present (or actually, absent) in

data from many different contexts: missing values. Rohrhuber (2006) states that in

sonification, data are assumed to have implicit meaning, and that sonifications try to

communicate such meaning. In the specific case of the data given for the concert, most

data dimensions are quantitative; thus the data can be ordered along any such dimension,

and the value for one dimension of a given data point can be mapped to a sonic property

of a corresponding sound event. For example, one could order by population size, and

map GDP per capita to the pitch of a short sound event.

However, with missing values the situation becomes considerably more complicated:

4 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/terra.mp3

Page 183: Science by Ear Diss DeCampo

170

Table 11.2: Some stations along the timeline of ’Navegar’

0:00-0:10 Very slow move from Sevilla to San Lucar

0:20-0:26 Cape Verde: very direct sound (i.e. near the capital), rather low,

dense spectrum (poor country, unknown income distribution)

0:54-1:00 Uruguay/Rio de la Plata: very direct sound, passing close by.

1:05-2:40 Port San Julian, Patagonia: very long stasis, everything is far away,

six months long winter break in Magellan’s travel

2:45-3:00 Moving into Pacific Ocean: new streams, many dense spectra;

unknown income distributions

3:20 Philippines: very direct sound (near capital), high satellites:

unequal income distribution

4:00 Brunei: very direct, high, dense sound: very rich, unknown distribution

... towards Moluccan Islands

4:50 East Timor: direct, mostly clicking, only very low frequency resonances

(very poor, little access to water, unknown income distribution)

5:15 into Indian Ocean: ’openness’, sense of distance

5:50 approaching Africa: more lower centers, with very high satellites:

poor, with very unequal distributions (but at least statistics available)

5:55 Pass Cape of Good Hope: similar to East Timor

6:10 Arrive back at San Lucar, Spain

Rohrhuber states that ”These non-values break gaps into the continuity of evaluation -

they belong to another dimension within their dimension. Missing data not only fail to

belong to the dimension they are missing from, they also fail to belong in any uniform

dimension of ’missing’.” Furthermore, one must consider that there are no fully valid

strategies for dealing with missing values: Removing data points with missing values dis-

torts the comparisons in other data dimensions; substituting likely data values introduces

possible errors and reduces data reliability; marking them by recognizably out-of-range

values may be logically correct, but these special values can be quite distracting in a

sonification rendering.

11.4.2 The piece

The piece consists of multiple cycles, each moving around the globe once. For every

cycle, all countries within a zone parallel to the equator are selected and sonified one at a

time in East to West order, as shown in figure 11.4. In the beginning, the zone contains

latitudes similar to England, or actually London, as the capitals determine geographical

position. The sound is spatialised accordingly in the ring of speakers, so one cycle around

Page 184: Science by Ear Diss DeCampo

171

the globe moves around the speaker cycle once. With every cycle, the zone of latitudes

widens until all countries are included.

Figure 11.4: Terra Nullius, latitude zones

To sonify the missing values in the 46 data dimensions given, a noise source is split into

46 frequency bands. When a value for a dimension is present, the corresponding band

remains silent; the band only becomes audible when the value for that dimension in the

current country is missing.

After all countries are included in the cycle, the latitude zone narrows again over several

cycles, and ends with the latitude and longitude of London. For this second half, the

Page 185: Science by Ear Diss DeCampo

172

filters have smaller bandwidth, so there is more separation between the dimensions.

Gradually, constant decorrelated noise fades in on all speakers, which remains for a few

seconds after the end of the last cycle.

’Terra Nullius’ plays very elegantly with different orders of ’missingness’, in fact creating

what could be called ’second-order missing values’ of what is being sonified: ”... A band

of filtered noise is used for each dimension that is missing, i.e. the noisier it is, the

less we know. In the end the missing itself seems quite rich of information - only about

what?” (Rohrhuber (2006))

Personally, I find this the most intriguing work of art in the ICAD 2006 concert. Subtly

shifting the discussion to recursively higher levels of consideration of what it is we do

not know, it is an invitation to deeper reflection on many questions about meaning and

representation.

11.5 Comparison of the pieces

In order to study the variety of approaches that artists and sonifiers took in creating

pieces, SDSM terminology and viewpoint turned out to be quite useful. For the dataset

given, a clear anchor can be provided at 190 data points and 26 dimensions for the basic

dataset, and 44 for the extended set (see figure 11.5).

Life Expectancy chooses a rather large set of data dimensions, and sonifies aspects of

it in three distinct ways: an overview of for spatial orientation, sorted by life expectancy

(LE1), a long sequence of 2 second vignettes, densely packed with information (LE2),

and a final sequence of life expectancies sorted North-South (LE3).

Orientation - LE1

Within 20 seconds, a signal sound is played for each country, ordered by total life ex-

pectancy; this renders 3 mapped dimensions (life expectancy, latitude, longitude).

Vignettes - LE2

Five streams make up each vignette:

two bell sounds - 2 dimensions: latitude and longitude;

spoken country and capital name - 6 dimensions: 2 names, spatial location again (2),

population size, and area;

scale fragment - 2 dimensions: life expectancy for males and females;

clinking coins - 2 dimensions: average income over density and GDP;

water vessel - 3x2 dimensions: 3 pulses with 2 values each, ’fullness’ and distortion.

This combination of parallel/interlocked streams with [2, 6, 2, 2, 6] dimensions each

renders a total of 16 dimensions per vignette of 2 seconds! While these could also be

rendered visually as a ’sideways’ view of the SDS map (showing the Y and Z axes), they

are shown here as 16 parallel dimensions for better comparability.

Page 186: Science by Ear Diss DeCampo

173

Figure 11.5: SDSM comparison of the ICAD 2006 concert pieces.

Ending - LE3

This section is again short (30 seconds with intro and ending clicks, 17 without), and

compares the 2 life expectancy values for males and females, with the countries sorted

North/South; including spatial location, it uses 4 dimensions.

Overall, the piece has very literal, easy to read mappings to sounds; it employs a really

complex, differentiated soundscape, and it is very true to the concept of sonification.

Guernica uses its own data, thus requiring its own data anchor. The piece renders

world population as one auditory stream with a single dimension (GUE+), while each

war is its own stream of 3 dimensions; while the maximum number of simultaneous wars

in Potard’s data is around 35, the piece does not use war durations, so the maximum

number of parallel streams is not documented.

The order of the data is chronological, and at 507 wars within 300 seconds, it has an

average event density of 5 within 3 seconds. Three dimensions are used for each event:

the war’s starting year, and its latitude/longitude. The parallelism of streams is roughly

sketched with copies of the label GUE receding along the Z axis; as this is dynamically

changing, there is no satisfying visual representation.

Like Life Expectancy, Guernica features very literal sound mappings (samples of fighting

sounds); it is based on additional data collected on wars and population since year 1,

Page 187: Science by Ear Diss DeCampo

174

which extend the starting dataset consideraby; and it adds the notion of a timeline and

historical evolution.

Navegar orders the data along a historical time/space route. Within 6 minutes, 134

countries are rendered (the others are too far away from the route to be ’touched’), which

puts the average data point density around 1 per 3 seconds. At any point, the nearest

15 countries are rendered as one stream each, with 7 dimensions per stream (NAV): lat-

itude/longitude (with moving distance and direction), population density, GDP/capita,

top 10 and top 20 richest to poorest ratios, and water access. The parallelism is again

indicated symbolically as multiple NAV labels along the Z axis. Additionally, ship speed,

direction, and weather conditions are represented, based on 76 timeline points (NAV+).

Like Guernica, Navegar introduces a historical timeline; unlike it, it juxtaposes that with

current social data. It uses metaphorically more indirect mappings than most of the other

submissions. Uniquely within the concert context, it creates a soundscape of stationary

sound sources with a subjective perspective: a moving observer (listener), and it also

sonifies context (here, speed and travel conditions).

Terra Nullius organizes the data by two criteria: selection by latitude zone, ordering by

longitude. A maximum of all 46 dimensions is used throughout the piece, which sets its

Y value on the SDSM map. Within 19 cycles, larger and larger data subsets are chosen;

first, 14 countries within 18 seconds, putting it at a gestalt size of 2-3 (TN1 in the map).

This speeds up to 190 countries at a rate of 100/sec, or ca 35 gestalt size (TN2), and

returns to roughly the original rate eventually.

What sets Terra Nullius apart from all other entries is that it assumes a meta-perspective

on data perceptualisation in general by studying missing values exclusively.

Conclusion

While the SDSM map view of all for pieces shows the large differences between the

approaches taken, it cannot fully capture or describe the radical differences in concepts

manifested in this subset of the pieces submitted. On the one hand, that would be asking

a lot of an overview-creating, orientational concept; one the other, it is interesting to

find that even within rather tightly set constraints like a concert call, creativity easily

defies straightforward categorisation.

Page 188: Science by Ear Diss DeCampo

Chapter 12

Conclusions

This work consists of three interdependent contributions to sonification research: a the-

oretical framework that is intended for systematic reasoning about design choices while

experimenting with perceptualisations of scientific and other data; a software infrastruc-

ture that pragmatically supports the process of fluid iterative prototyping of such designs;

and a body of sonifications realised using this infrastructure. All these parts were created

within one work process in parallel, interleaved streams: design sketches suggested ideas

for infrastructure that would be useful; observing and analysing design sessions led to

deeper understanding which informed the theoretical framework, and both the growing

framework and the theoretical models eventually led to a more effective design workflow.

The body of sonifications created within this system, and the theoretical models derived

from the analyses of this body of practical work (and a few selected other sonification

designs of interest) form the permanent results of this dissertation. They contribute to

the field of sonification research in the following respects:

• The Sonification Design Space Map and the related models provide a sonification-

specific alternative to TaDa Auditory Information Design, and they suggest a

clearer, more systematic methodology for future sonification research, in particular

for sonification design experimentation.

• The SonEnvir framework provided the first large-scale in-depth test of Just In Time

programming for scientific contexts, which was highly successful. The sonification

community, and other research communities have become aware of the flexibility

and efficiency of this approach.

• The theoretical models, the practical methodology and the individual solutions

developed here may help to reduce time spent to cover large design spaces, and

thus contribute to more efficient and fruitful experimentation.

The work presented here was also employed in sonification workshop settings, and nu-

merous talks and demonstrations given by the author. It proved to be helpful in giving

175

Page 189: Science by Ear Diss DeCampo

176

interested non-experts a clear impression of the central issues in sonification design work,

and has been received favourably by a number of experts in the field.

12.1 Further work

Within the SonEnvir project, many compromises had to be made due to time and capacity

constraints. Also, given the breadth of the overall approach chosen, many ideas could

not be fully explored, and would thus warrant further research.

In the theoretical models, the main desirable future research aims would be:

1. Integration of more analyses of the growing body of Model-Based Sonification designs.

2. Expansion of the user interaction model based on a deeper background in HCI

research.

In the individual research domains, several areas would warrant continued exploration.

Here, it is quite gratifying to see that one of the research strands has led to a direct

followup project: The QCDaudio project hosted at IEM Graz continues and extends

research begun by Kathi Vogt within SonEnvir.

For the EEG research activities, two strategies seem potentially fruitful and thus worth

pursuing: continuing the planned integration into the NeuroSpeed software, and starting

closer collaborations with other EEG researchers, such as the Neuroinformatics group in

Bielefeld, and individual experts in the field, such as Gerold Baier.

It is quite unfortunate that none of the designs created within this research context would

be directly usable for visually impaired people. In my opinion, providing better access to

scientific and other data for the visually impaired is one of the strongest motivations for

developing a wider variety of sonification design approaches, and would be well worth

pursuing more deeply. I hope the work presented will be found useful for future research

in that direction.

For me personally, experimenting with different forms of sonification in artistic contexts

has become even more intriguing than it was before embarking on this venture. As the

entries for the ICAD concerts, as well as many current network art pieces show, cre-

ative minds find plenty of possibilities for experimentation with data representation by

acoustic, visual and other means; creating work that is both aesthetically interesting and

scientifically well-informed is a still a fascinating activity. When more perceptual modal-

ities are included in more interactive settings, the creative options and the possibility

spaces to explore multiply once again.

Page 190: Science by Ear Diss DeCampo

Appendix A

The SonEnvir framework structure in

subversion

This section describes which parts of the framework reside in which folders in the So-

nEnvir subversion repository. Note that the state reported below is temporary; pending

discussion with the SC3 community, more SonEnvir work will move into the main distri-

bution, as well as into general SC3 Quarks, or SonEnvir-specific Quarks.

A.1 The folder ’Framework’

This folder contains the central SC3 classes written during the project, and their respec-

tive help files. The sub-folders are structured as follows:

Data: contains all the SC3 classes for different kinds of data (see the Data model

discussion above), such as EEG data in .edf format; it also includes some appli-

cations written as classes: The TimeSeriesAnalyzer (described in section 8), the

EEGScreener and EEGRealTimePlayer (described in section 9).

Interaction: contains the MouseStrum Class. Most of the user interface devices/interaction

classes are covered by the JInT quark written by Till Bovermann, and available

from the SC3 project site at sourceforge.

Patterns: contains the HilbertIndex, a pattern class that generates 2D and 3D in-

dices along Hilbert space filling curves; note that for 4D Hilbert indices there is a

quark package. It also includes support patterns for Hilbert index generation, and

Pxnrand, a pattern that avoids repeating the last n values of its own output.

Rendering: contains two UGen classes, TorusPanAz and PanRingTop, and a utility for

adjusting the individual speakers multichannel systems for more balanced sound,

SpeakerAdjust. See also section 5.5.

177

Page 191: Science by Ear Diss DeCampo

178

Synthesis: includes a reverb class (AdCVerb, used in the VirtualRoom class), sev-

eral classes for cascaded filters, a UGen to indicate loop ends in buffer playback,

PhasorClick (both are used in the EEG applications); and a dual band compressor.

Utilities: includes a model for QCD simulations, Potts2D, a library of singing voice

formants, and various extension methods.

osx, linux, windows: these folders capture platform specific development; of these,

only the OSX folder is in use for OSX-specific GUI classes. These will eventually

be converted to a crossplatform scheme.

A.2 The folder ’SC3-Support’

QtSC3GUI: contains GUIs written in Qt, which were considered an option for SC3

on Windows; this strand of development was dropped when sufficiently powerful

versions of cross-platform GUI extension package swingOSC became available.

SonEnvirClasses, SonEnvirHelp: these contain essentially obsolete variants of So-

nEnvir classes; they are kept mainly in case some users still need to run examples

using these classes.

A.3 Other folders in the svn repository

CUBE: contains the QVicon2Osc application, which can connect the Vicon tracking

system (which is in use at the IEM Cube) to any software that supports Open-

SoundControl, and a test for that system using the SonEnvir VirtualRoom for

binaural rendering.

Prototypes: contains all the sonification designs (’prototypes’) written, sorted by sci-

entific domain. These are described extensively and analysed in the chapters on

sonification designs for the domain sciences, 6 - 9.

Psychoacoustics: contains some demonstrations of perceptual principles written for

the domain scientists.

SC3-Training: contains a short Introduction to SuperCollider for sonification; this was

written for the domain scientists, both in German and in English.

SOS1, SOS2: contains demo versions of sonification designs for two presentations

(called Sound of Science 1 and 2) at IEM Graz.

testData: contains anonymous EEG data files in .edf format, for testing purposes only.

Page 192: Science by Ear Diss DeCampo

179

A.4 Quarks-SonEnvir

This folder contains all the SC3 classes written in SonEnvir that have been migrated

into Quarks packages for specific topics. Each folder can be downloaded and installed

as a Quark.

QCD contains some Quantum Chromodynamics models implemented in SC3.

SGLib contains a port of a 3D graphics library for math operations on tracking data.

gui-addons contains platform-independent gui extensions to SC3.

hilbert contains a file reader for loading pre-computed 4D Hilbert curve indices from

files.

rainData contains a data reader class for the Rain data used in the SBE workshop (see

section 10).

wavesets contains the Wavesets class, which analyses mono soundfiles into Wavesets,

as defined by Trevor Wishart. This can also be used for applying granular synthesis

methods on times series-like data.

A.5 Quarks-SuperCollider

These extension packages contains all the SC3 classes written in SonEnvir that have

been migrated into Quarks packages for specific topics. They can be downloaded and

installed from the sourceforge svn site of SuperCollider.

AmbIEM: This package for binaural sound rendering using Ambisonics has become an

official SuperCollider extension package (’Quark’). ARHeadtracker is an interface

class to a freeware tracking system.

The statistics methods implemented within SonEnvir have moved to the general SC3

quark MathLib, while others have become quarks themselves, such as the JustInTerface

quark (JInT) written by Till Bovermann (within SonEnvir). Finally, the TUIO quark

(Tangible User Interface Objects, also by Till Bovermann, of University Bielefeld) is of

interest for sonification research with strongly interactive approaches.

Page 193: Science by Ear Diss DeCampo

Appendix B

Models - code examples

B.1 Spatialisation examples

B.1.1 Physical sources

For multiple speaker setups, a simple and very effective strategy is to use individual

speakers as real physical sources. The main advantage is that physics really help in this

case; when locations only serve to identify streams, as with few fixed sources, fixed single

speakers work very well.

SuperCollider supports this directly with the Out Ugen: it determines which bus a signal

is written on, and thus, which audio hardware output it is heard on.

// a mono source playing out of channel 4 (indices start at 0)

{ Out.ar(3, Ringz.ar(Dust.ar(30), 400, 0.2)) }.play;

The JITLib library in SuperCollider3 supports a more flexible scheme: sound processes

(in JITLib speak, NodeProxies) run on their own private busses by default; when they

should be audible, they can be routed to the hardware outputs with the .play method.

~snd = { Ringz.ar(Dust.ar(30), 400, 0.2) }; // proxy inaudible, but plays

~snd.play(3); // listen to it on hardware output 4.

NodeProxies also support more flexible fixed multichannel mapping very simply: The

.playN method lets one route each audio channel of the proxy to one or several hardware

output channels, each with optional individual level controls.

// a 3 channel source

~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) };

180

Page 194: Science by Ear Diss DeCampo

181

// to individual speakers 1, 3, 5:

~snd3ch.playN([0, 2, 4]);

// to multiple speakers, with individual levels:

~snd3ch.playN(outs: [0, [1,2], [3,4]], amps: [1, 0.7, 0.7]);

B.1.2 Amplitude panning

All of the following methods work for both moving and static sources.

1D: In the simplest case the Pan2 UGen is used for equal power stereo panning.

// mouse controlled pan position

{ Pan2.ar(Ringz.ar(Dust.ar(30), 400, 0.2), MouseX.kr(-1, 1)) }.play;

2D: The PanAz UGen pans a single channel to a symmetrical ring of n speakers byazimuth, with adjustable width over how many speakers (at most) the energy is dis-tributed.

(

{ var numChans = 5, width = 2;

var pos = MouseX.kr(0, 2);

var source = Ringz.ar(Dust.ar(30), 400, 0.2);

PanAz.ar(5, source, pos, width);

}.play;

)

In case the ring is not quite symmetrical, adjustments can be made by remapping;

however, using the best geometrical symmetry attainable is always superior to post-

compensation. In order to remap dynamic spatial positions to a ring of speakers at

unequal angles such that the resulting directions are correct, the following example

shows the steps are needed: Given a five-speaker system, equal speaker angles would

be [0, 0.4, 0.8, 1.2, 1.6, 2.0] with 2.0 being equal to 0.0 (this is the behaviour of the

PanAz UGen); the actual unsymmetric speaker angles could be for example [0, 0.3, 0.7,

1, 1.5, 2.0]; so remapping should map a control value of 0.3 (where speaker 2 actually

is) to a control value value of 0.4 (the control value that positions this source directly

in speaker 2). The full map of corresponding values is given in table B.1.

( // remapping unequal speaker angles with asMapTable and PanAz:

a = [0, 0.3, 0.7, 1, 1.5, 2.0].asMapTable;

b = Buffer.sendCollection(s, a.asWavetable, 1);

{ |inpos=0.0|

var source = Ringz.ar(Dust.ar(30), 400, 0.2));

Page 195: Science by Ear Diss DeCampo

182

Table B.1: Remapping spatial control values

list of breakpoints for equally spaced output/

desired spatial position mapped control values

0.0 0.0

0.3 0.4

0.7 0.8

1.0 1.2

1.5 1.6

0.0 == 2.0 2.0 == 0.0

var pos = Shaper.kr(b.bufnum, inpos.wrap(0, 2));

PanAz.ar(a.size - 1, source, pos);

}.play;

)

Mixing multiple channel sources down to stereo:

The Splay UGen mixes an array of channels down to 2 channels, at equal pan distances,

with adjustable spread and center position. Internally, it uses a Pan2 UGen.

~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) };

~snd3pan = { Splay.ar(~ snd3ch, spread: 0.8, level: 0.5, center: 0) };

~snd3pan.playN(0);

Mixing multiple channel sources into a ring of speakers:

The SplayZ UGen pans an array of source channels into a number of output channels

at equal distances; spread and center position can be adjusted. Both larger numbers of

channels can be splayed into rings of fewer speakers, and vice versa. Internally, SplayZ

uses a PanAz UGen.

// spreading 4 channels equally into a ring of 6 speakers

~snd4ch = { Ringz.ar(Dust.ar([1,1,1,1] * 30), [400, 550, 750, 900], 0.2) };

~snd4pan = { SplayZ.ar(6, ~ snd4ch, spread: 1.0, level: 0.5, center: 0) };

~snd4pan.playN(0);

3D: The SonEnvir extension TorusPanAz does the same for setups with rings of rings of

speakers. Again, the speaker setup should be as symmetrical as possible; compensation

can be trickier here. (In general, even while compensations for less symmetrical se-

tups seem mathematically possible, spatial images will be worse outside the sweet spot.

Maximum attainable physical symmetry cannot be fully substituted by more DSP math.)

Page 196: Science by Ear Diss DeCampo

183

( // panning to 3 rings of 12, 8, and 4 speakers, cf. IEM CUBE.

~snd = { Ringz.ar(Dust.ar(30), 550, 0.2) };

~toruspan = {

var hAngle = MouseX.kr(0, 2); // all the way around (2 == 0)

var vAngle = MouseY.kr(0, 1.333); // limited to highest ring

TorusPanAz.ar([12, 8, 4],

~snd.ar(1),

hAngle,

vAngle

);

};

~toruspan.playN(0);

)

Compensating overall vertical ring angles and individual horizontal speaker angles within

each ring is straightforwrd with the asMapTable method as shown above. For placement

deviations that are both horizontal and vertical, it is preferable to have Vector Based

Amplitude Panning in SC3, which has been implemented recently by Scott Wilson and

colleagues1. However, this was not needed within the context of the SonEnvir project.

B.1.3 Ambisonics

While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team

decided to write a consistent new implementation of Ambisonics in SC3, based on a

subset of the existing PureData libraries. This package was realised up to third order

Ambisonics by Christopher Frauenberger for the AmbIEM package, available here2. It

supports the main speaker setup of interest (a half-sphere of 12, 8 and 4 speakers, the

CUBE at IEM, with several coefficent sets for different tradeoff choices), and for a setup

with 1-4-7-4 speaker rings, mainly used as a more efficient lower resolution alternative

for headphone rendering, as described below.

( // panning two sources with 3rd order ambisonics into CUBE sphere.

~snd0 = { Ringz.ar(Dust.ar(30), 400, 0.2) };

~snd1 = { Ringz.ar(Dust.ar(30), 550, 0.2) };

~pos0 = [0, 0.01]; // azimuth, elevation

~pos1 = [1, 0.01]; // azimuth, elevation

~encoded[0] = { PanAmbi30.ar( ~snd0.ar, ~pos0.kr(1, 0), ~pos0.kr(1, 1)) };

~encoded[1] = { PanAmbi30.ar( ~snd1.ar, ~pos1.kr(1, 0), ~pos1.kr(1, 1)) };

1See http://scottwilson.ca/site/Software.html2 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/

Page 197: Science by Ear Diss DeCampo

184

decode24 = { DecodeAmbi3O.ar(~ encoded.ar, ’CUBE_basic’) };

decode24.play(0);

)

B.1.4 Headphones

Ambisonics and Virtual Binaural Rendering

For complex changing scenes, the IEM has developed a very efficient approach for bin-

aural rendering (Musil et al. (2005); Noisternig et al. (2003)): In effect, taking a virtual,

symmetrical speaker setup (such as 1-4-7-4), and spatializing to that setup with Am-

bisonics; then rendering these virtual speakers as point sources with their appropriate

HRIRs, thus arriving at a binaural rendering. This provides the benefit that the Am-

bisonic field can be rotated as a whole, which is really useful when head movements of

the listener are tracked, and the binaural rendering is designed to compensate for them.

Also, the known problems with Ambisonics when listeners move outside the sweet zone

disappear; when one carries a setup of virtual speakers around one’s head, one is always

right in the center of the sweet zone.

This approach has been ported to SC3 by C. Frauenberger; its main use is in the Vir-

tualRoom class, which simulates moving sources within a rectangular box-shaped room.

This class has turned out to be very useful as a simple way to prepare both experiments

and presentations for multi-speaker setups by relatively simple headphone simulation.

(

// VirtualRoom example - adapted from help file.

// preparation: reserve more memory for delay lines, and boot the server

s.options.memSize_(8192 * 16)

.numAudioBusChannels_(1024);

s.boot;

// make a proxyspace

p = ProxySpace.push;

// set the path for the folder with Kemar files.

VirtualRoom.kemarPath = "KemarHRTF/";

)

(

// create a virtual room

v = VirtualRoom.new;

// and start its binaural rendering

v.init;

// set the room properties (reverberation time and gain,

// hf damping on reverb and early reflections gain)

v.revTime = 0.1;

Page 198: Science by Ear Diss DeCampo

185

v.revGain = 0.1;

v.hfDamping = 0.5;

v.refGain = 0.8;

)

( // set room dimension [x, y, z, x, y, z]:

// a room 8m wide (y), 5m deep(x) and 5m high(z)

// - nose is always along x

v.room = [0, 0, 0, 5, 8, 5];

// make it play to hardware stereo outs

v.out.play;

// listener is listener position, a controlrate nodeproxy;

// here movable by mouse.

v.listener.source = { [ MouseY.kr(5,0), MouseX.kr(8,0), 1.6, 0] };

)

// add three sources to the scene

( // make three different sounds

~noisy = { Decay.ar(Impulse.ar(10, 2), 0.2) * PinkNoise.ar(1) };

~ringy = { Ringz.ar(Dust.ar(10), [400, 600,950], [0.3, 0.2, 0.05]).sum };

~dusty = { Dust.ar(400) };

)

// add the three sources to the virtual room:

// source, name, xpos, ypos, zpos

v.addSource( ~noisy, \noisy, 1, 2, 2.5); // bottom right corner

v.addSource( ~ringy, \ringy, 1.5, 7, 2.5); // bottom left

v.addSource( ~dusty, \dusty, 4, 5, 2.5); // top, left of center

v.sources[\noisy].set(\xpos, 4, \ypos, 6, \zpos, 2); // set noisy position

v.sources[\noisy].getKeysValues; // check its position values

v.sources[\ringy].set(\xpos, 2.5, \ypos, 4, \zpos, 2);

// remove the sources

v.removeSource(\noisy);

v.removeSource(\ringy);

v.removeSource(\dusty);

v.free; // free the virtual room and its resources

p.pop; // and clear and leave proxyspace

Among other things, the submissions for the ICAD 2006 concert3, described also in

section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web

3 http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php

Page 199: Science by Ear Diss DeCampo

186

documentation4.

One can of course also spatialize sounds on the virtual speakers by any of the simpler

panning strategies given above as well; this trades off easy rotation of the entire setup

for better point source localisation.

To support simple headtracking, C. Frauenberger also created the ARHeadTracker ap-

plication, also available as a package from the SonEnvir website here5.

B.1.5 Handling speaker imperfections

All standard spatialisation techniques work best when speaker setups are as symmetrical

and well-controlled as possible. While it may not always be feasible to adjust mechan-

ical positions of speakers freely for very precise geometry, a number of factors can be

measured and compensated for, and this is supported by several utility classes written in

SuperCollider, which are part of the SonEnvir framework.

Latency

The Latency class plays a test signal for a given number of audio channels, and waits for

the signals to arrive back at an audio input. The resulting list of measured per-channel

latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class

described below.

// test 2 channels, max delay expected 0.2 sec,

// take default server, mic is on AudioIn 1:

Latency.test(2, 0.2, Server.default, 1);

// stop measuring and post results

Latency.stop;

// results are posted like this:

// measured latencies:

in samples: [ 1186.0, 1197.0 ]

in seconds: [ 0.026893424036281, 0.027142857142857 ]

Spectralyzer

While inter-speaker latency differences are well-known and very often addressed, we have

found another common problem to be more distracting for multichannel sonification:

Each individual channel of the reproduction chain, from D/A converter to amplifier,

4 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html5 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/

Page 200: Science by Ear Diss DeCampo

187

cable, loudspeaker, and speaker mounting location in the room, can sound quite different.

When changes in sound timbre can encode meaning, this is potentially really confusing!

To address this, the Spectralyzer class allows for simple analysis of a test signal as

played into a room, with optional smoothing over several measurements, and then tuning

compensating equalizers by hand for reasonable similarity across all speaker channels.

While this could be written to run automatically, we consider it more of an art than

an engineering task; a more detailed EQ intervention will make the frequency response

flatter, but may color the sound more by smearing its impulse behaviour.

x = Spectralyzer.new; // make a new spectralyzer

x.start; x.makeWindow; // start it, open its GUI

x.listenTo({ PinkNoise.ar }); // pink noise should look flat

x.listenTo({ AudioIn.ar(1)}); // should look similar from microphone.

Figure B.1: The Spectralyzer GUI window.

For full details see the Spectralyzer help file.

( // tuning 2 speakers for better linearity

p = ProxySpace.push;

~noyz = { PinkNoise.ar(1) }; // create a noise source

~noyz.play(0, vol: 0.5);

// filter it with two bands of parametric eq

~noyz.filter(5, { |in, f1=100,rq1=1,db1=0,f2=5000,rq2=1,db2=0|

MidEQ.ar(MidEQ.ar(in, f1, rq1, db1), f2, rq2, db2);

});

)

// tweak the two bands for better acoustic linearity

~noyz.set(\f1, 1200, \rq1, 1, \db1, -5); // take out low presence bump

~noyz.set(\f2, 150, \rq2, 0.6, \db2, 3); // boost bass dip

Page 201: Science by Ear Diss DeCampo

188

~noyz.getKeysValues.drop(1).postcs; // post settings when done

// move on to speaker 2

~noyz.play(1, vol: 0.5);

// tweak the two bands again for speaker 2

~noyz.set(\f1, 1200, \rq1, 1, \db1, 0); // likely to be different ...

~noyz.set(\f2, 150, \rq2, 0.6, \db2, 0); // from speaker 1.

~noyz.getKeysValues.drop(1).postcs; // post settings.

SpeakerAdjust

Once one has achieved usable EQ curves for every speaker channel, one can begin

to compensate for volume differences between channels (with big timbral differences

between channels, measuring volume or adjusting it by listening is rather pointless).

The SpeakerAdjust class expects simple specifications for each channel:

amplitude (as multiplication factor, typically below 1.0),

optionally: delaytime (in seconds, to be independent of the current samplerate),

optionally: eq1-frequency, eq1-gain, eq1-relative-bandwidth,

optionally: eq2-frequency, eq2-gain, eq2-relative-bandwidth,

and repeat for as many bands as desired.

// From SpeakerAdjust.help:

// adjustment for 2 channels, amp, dtime, eq specs;

// you can add as many triplets of eqspecs as you want.

(

var specs;

specs = [

// amp, dtime, eq1: frq, db, rq; eq2: frq, db, amp

[ 0.75, 0.0, [ 250, 4, 0.5], [ 800, -4, 1]],

[ 1, 0.001, [ 250, 2, 0.5], [ 5000, 3, 1]]

];

{ var ins;

ins = Pan2.ar(PinkNoise.ar(0.05), MouseX.kr(-1, 1));

SpeakerAdjust.ar(ins, specs)

}.play;

)

Such a speaker adjustment can be created and added to the end of the signal chain

to linearise the given speaker setup as much as possible; of course, adding limiters for

speaker and listener protection can be built into such a master effects unit as well.

Page 202: Science by Ear Diss DeCampo

Appendix C

Physics Background

C.1 Constituent Quark Models

The concept of constituent quarks was introduced in the 1960s by Gell-Mann (1964)

and Zweig (1964), based on symmetry considerations in the classification of hadrons, the

strongly interacting elementary particles. The first CQMs for the description of hadron

spectra were introduced in the early 1970s by de Rujula et al. (1975). The original CQMs

relied on simple models for the confinement of constituent quarks (such as the harmonic

oscillator potential) and employed rudimentary hyperfine interactions. Furthermore they

were set up in a completely nonrelativistic framework. In the meantime CQMs have

undergone a vivid development. Over the years more and more notions deriving from

QCD have been implemented, and CQMs are constructed within a relativistic formalism.

Modern CQMs all use a confinement potential of linear form, as suggested by QCD. For

the hyperfine interaction of the constituent quarks several competing dynamical concepts

have been proposed: A prominent representative is the one-gluon-exchange (OGE) CQM,

whose dynamics for the hyperfine interaction basically relies on the original ideas of

Zweig (1964): the effective interaction between the constituent quarks is generated by

the exchange of a single gluon. For the data we experimented with, we considered a

relativistic variant of the OGE CQM as constructed by Theussl et al. (2001). A different

approach is followed by the so-called instanton-induced (II) CQM (Loering et al. (2001)),

whose hyperfine forces derive from the ’t Hooft interaction. Several years ago the physics

group at Graz University has suggested a hyperfine interaction based on the exchange

of Goldstone bosons. This type of dynamics is motivated by the spontaneous breaking

of chiral symmetry (SBχS), which is an essential property of QCD at low energies. The

SBχS is considered to be responsible for the quarks to acquire a (heavier) dynamical

mass, and their interaction should then be generated by the exchange of Goldstone

bosons, the latter being another consequence of SBχS. The Goldstone-boson-exchange

(GBE) CQM was originally suggested in a simplified version, based on the exchange of

pseudoscalar bosons only (Glozman et al. (1998)). In the meantime an extended version

189

Page 203: Science by Ear Diss DeCampo

190

has been formulated by Glantschnig et al. (2005).

Quantum-Mechanical Solution of Constituent Quark Models

Modern CQMs are constructed in the framework of relativistic quantum mechanics

(RQM). They are characterised by a Hamiltonian operator H that represents the to-

tal energy of the system under consideration. For baryons, which are considered as

bound states of three constituent quarks, the corresponding Hamiltonian reads

H = H0 +∑i<j

[Vconf (i, j) + Vhf (i, j)] , (C.1)

The first term on the right-hand side denotes the relativistic kinetic energy of the sys-

tem (of the three constituent quarks), and the sum includes all mutual quark-quark

interactions. It consists of two parts, the confinement potential Vconf and the hyperfine

interaction Vhf . The confinement potential prevents the constituent quarks from escap-

ing the volume of the baryon (being of the order of 10−15 m); no free quarks have ever

been observed in nature. The hyperfine potential provides for the fine structure of the

energy levels in the baryon spectra. Different dynamical models lead to distinct features

in the excitation spectra of baryons.

In order to produce the baryon spectra of the CQMs one has to solve the eigenvalue

problem of the Hamiltonian in equation C.1. Several methods are available to achieve

solutions to any desired accuracy. The Graz group has applied both integral-equation

(Krassnigg et al. (2000)) as well as differential-equation techniques (Suzuki and Varga

(1998)).

Upon solving the eigenvalue problem of the Hamiltonian one ends up with the eigenvalues

(energy levels) and eigenstates (quantum-mechanical wave functions) of the baryons.

They are characterised according to the conserved quantum numbers, the total angular

momentum J (which is half integer in the case of baryons) and the parity P (being

positive or negative). The different baryons are distinguished by the ’flavor’ of their

constituent quarks, which can be u, d, and s (for ’up’, ’down’, and ’strange’). For

example, the proton is uud, the neutron is udd, the ∆++ is uuu, and the Σ0 is uds.

Classification of Baryons

The total baryon wave function ΨXSFC is composed of spatial (X), spin (S), flavor (F ),

and color (C) degrees of freedom corresponding to the product of symmetry spaces

ΨXSFC = ΨXSFΨsingletC , (C.2)

Page 204: Science by Ear Diss DeCampo

191

It is antisymmetric under the exchange of any two particles, since baryons must obey

Fermi statistics. There are several visual representations of the symmetries between the

different baryons based on their combinations of quarks; figure C.1 shows one of them.

Figure C.1: Multiplet structure of the baryons as a decuplet.

In this ordering of baryon flavor symmetries, all the light and strange baryons are in the lowest

layer.

Quarks are differentiated by the following properties:

Color The color quantum numbers are r, b, g (for ’red’, ’blue’, and ’green’). Only white

baryons are observed in experiment. Thus the color wave function corresponds to a

color singlet state and is therefore completely antisymmetric. As a consequence the

rest of the wave function (comprising spatial, spin, and flavor degrees of freedom)

must be symmetric.

Flavor According to the Standard Model (SM) of particle physics there are six quark

flavors: up, down, strange, charm, bottom, and top. Quarks of different flavours

have different masses. Normal hadronic matter (i.e. atomic nuclei) is basically

composed only of the so-called light flavors u and d. CQMs consider hadrons with

flavors u, d, and s. These are also the ones that are most affected by the SBχS.

Correspondingly, one works in SU(3)F and deals with baryons classified within

singlet, octet, and decuplet multiplets. For example, the nucleons (proton and

neutron) are in an octet, together with the Λ, Σ, and Ξ particles.

Spin All quarks have spin 12. The spin wave function of the three quarks is constructed

Page 205: Science by Ear Diss DeCampo

192

within SU(2)S and is thus symmetric or mixed symmetric or mixed antisymmetric.

The total spin of a baryon is denoted by S.

Orbital Angular Momentum and Parity The spatial wave function corresponds to

a given orbital angular momentum L of the three-quark system. Its symmetry

property under spatial reflections determines the parity P .

Total Angular Momentum The total angular momentum J is composed of the total

orbital angular momentum L and the total spin S of the three-quark system accord-

ing to the quantum-mechanical addition rules of angular momenta: J = L + S.

It is always half-integer. The total angular momentum J is a conserved quan-

tum number and, together with the parity P , serves for the distinction of baryon

multiplets JP .

C.2 Potts model- theoretical background

In mathematical terms, the Hamilton-function H defines the overall energy, which any

physical system, and thus also a Potts model, will try to minimize:

H = −J∑<i,j>

SiSj −M∑i

Si (C.3)

where J is the coupling parameter between spin Si and its neighbouring spin Sj. J

is inversely proportional to the temperature; M is the field strength of an exterior

magnetic field acting on each spin Si. The first sum is denoted over nearest neighbours

and describes the coupling term. It is responsible for the phase transition. If J = 0,

only the second term remains, and the Hamiltonian describes a paramagnet, being only

magnetised in the presence of an exterior magnetic field. In our simulations, M was

always 0.

When studying phase transitions macroscopically, the defining term is the free energy

F .

F (T,H) = −kBT lnZ(T,H) (C.4)

It is proportional to the logarithm of the so-called partition function Z of statistical

physics, which sums up all possible spin configurations and weights them with a Boltz-

mann factor kB. Energetically unfavorable states are less probable in the partition func-

tion than energetically favorable ones.

Z =∑Sn

e− HkBT (C.5)

Page 206: Science by Ear Diss DeCampo

193

The partition function Z (eq. C.5) is not calculable in practice due to combinatorial

explosion: a three dimensional lattice with a length of 100 and two possible spin states

has 21003= (210)105 ∼ 10300.000 configurations that would have to be summed up - at

every time step of the simulation. Also in analytical deduction only few spin models

have been solved exactly, and in three dimensions not even the simple Ising model

is analytically solvable. Therefore classical treatment relies mainly on approximation

methods, which allow partly to estimate critical exponents, and can be outlined briefly

as follows:

Early theories addressing phase transitions, like Van der Waals theory of fluids and Weiss

theory of magnetism can be subsumed under Landau theory or mean-field theory. Mean-

field theory assumes a mean value for the free energy. Landau derived a theory, where

the free energy is expanded as a power series in the order parameter, and only terms are

included which are compatible with the symmetry of the system. The problem is that all

of these approaches ignore fluctuations by relying only on mean values. (For a detailed

review of phase transition theories please refer to Yeomans (1992).)

Renormalization group theory by K. G. Wilson Wilson (1974) solved many problems

of critical phenomena, most importantly the understanding of why continuous phase

transitions fall into universality classes. The basic idea is to do a transformation that

changes the scale of the system but not its partition function. Only at the critical point

the properties of the system will not change under such a transformation, and it is then

described by so-called fixed points in the parameter space of all Hamiltonians. This is

why critical exponents are universal for different systems.

C.2.1 Spin models sound examples

The following audio files can be downloaded from

http://sonenvir.at/downloads/spinmodels/.

The first part describes sonifications that enable the listener to classify the phase of the

model (sub-critical, critical, super-critical).

Granular sonifications: Random, averaged spin blocks were used to determine the

sound grains. The spatial setting cannot be reproduced in this recording. But

even without having a clear gestalt of the system, the different characteristics of

IsingHot, IsingCritical and IsingCold may easily be distinguished.

Audification approaches: (Please consider that a few clicks in the audio files below

are artifacts of the data management and buffering in the computer.)

1. Noise: NoiseA gives the audification of a 3-state Potts model at thermal

noise (coupling J = 0.4)

Page 207: Science by Ear Diss DeCampo

194

NoiseB gives the same for the 5-state Potts model (J = 0.4), evidently

the sound becomes smoother the more states are possible, but its overall

character stays the same.

2. Critical behaviour: this example was recorded with a 4-state Potts model at

and near the critical temperature:

SuperCritical - near the critical point clusters emerge. These are rather big

but homogeneous, hence a regularity is still perceivable. (J = 0.95)

Critical - at the critical point itself, clusters of all orders of magnitude emerge,

thus the sound is much more unstable and less pleasant. (J = 1.05)

3. SubCritical - as soon as the system is equilibrated in the subcritical domain

(at T < Tcrit), one spin orientation predominates, and only few random spin

flips occur due to thermal fluctuations. (Recorded with the Ising model at J

= 1.3.)

The next examples study the order of the phase transition.

Direct audification displays only a very subtle differences between the two types of

phase transitions:

1. The 4-state Potts model is played in ContinousTransition.

2. A more sudden change can be perceived in FirstOrderTransition for the 5-

state Potts model.

Audification with separate spin channels: For each spin-orientation the lattice is se-

quentialised and the resulting audification is played on an own channel. The lattice

size was 32x32, and the system was equilibrated at each step. The examples finish

with one spin orientation prevailing, which means that only random clicks from a

non-vanishing temperature remain.

1. The transition in the 2-state Ising model and the 4-state Potts model are

continuous, the change is smooth.

2. In the 5-state and 8-state models the phase transition is abrupt (the data is

more distinct the more states are involved).

Page 208: Science by Ear Diss DeCampo

Appendix D

Science By Ear participants

The following people took part in the Science By Ear workshop:

SonEnvir members/moderators

Daye, Christian

De Campo, Alberto

Eckel, Gerhard

Frauenberger, Christopher

Vogt, Katharina

Wallisch, Annette

Programming specialists

Bovermann, Till, Neuroinformatics Group, Bielefeld University

De Campo, Alberto

Frauenberger, Christopher

Pauletto, Sandra, Music Technology Group, York University

Musil, Thomas, Audio/DSP, Institute of Electronic Music (IEM) Graz

Rohrhuber, Julian, Academy of Media Arts (KHM) Cologne

Sonification experts

Baier, Gerold, Dynamical systems, University of Morelos, Mexico

Bonebright, Terri, Psychology/Perception, DePauw University

Bovermann, Till

Dombois, Florian, Transdisciplinarity, Y Institute, Arts Academy Berne

Hermann, Thomas, Neuroinformatics Group, Bielefeld University

195

Page 209: Science by Ear Diss DeCampo

196

Kramer, Gregory, Metta Organization

Pauletto, Sandra

Stockman, Tony, Computer Science, Queen Mary Univ. London

Domain scientists

Baier, Gerold

Dombois, Florian

Egger de Campo, Marianne, Sociology, Compass Graz

Fickert, Lothar, Electrical power systems, University of Technology (TU) Graz

Grond, Florian, Chemistry / media art, ZKM Karlsruhe

Grossegger, Dieter, EEG Software, NeuroSpeed Vienna

Hipp, Walter, Electrical power systems, TU Graz

Huber, Anton, Physical Chemistry, University of Graz

Markum, Harald, Atomic Institute of the Austrian Universities, TU Vienna

Plessas, Willibald, Physics Institute, University of Graz

Shutin, Dimitri, Electrical power systems, TU Graz

Schweitzer, Susanne, Wegener Center for Climate and Global Change, University of Graz

Witrisal, Klaus, Electrical power systems, TU Graz

Page 210: Science by Ear Diss DeCampo

Appendix E

Background on ’Navegar’

The saying has a long history. Plutarch ascribes it to General Pompeius saying this line

to soldiers he sent off on a suicide mission, and Veloso may well have read it in a famous

poem by Fernando Pessoa. Here are Veloso’s lyrics:

Table E.1: Os Argonautas - Caetano Veloso

O barco, meu coracao nao aguenta the ship, my heart cannot handle it

Tanta tormenta, alegria so much torment, happiness

Meu coracao nao contenta my heart is discontent

O dia, o marco, meu coracao, o porto, nao the day, the limit, my heart, the port, no

Navegar e preciso, viver nao e preciso sea-faring is necessary, living is not

O barco, noite no ceu tao bonito the ship, night in the beautiful sky

Sorriso solto perdido the free smile, lost

Horizonte, madrugada horizon, morning dawn

O riso, o arco, da madrugada the laugh, the arc, of morning

O porto, nada the port, nothing

Navegar e preciso, viver nao e preciso sea-faring is necessary, living is not

O barco, o automovel brilhante the ship, the brilliant automobile

O trilho solto, o barulho the free track, the noise

Do meu dente em tua veia of my tooth in your vein

O sangue, o charco, barulho lento the blood, the swamp, slow soft noise

O porto silencio the port - silence

Navegar e preciso, viver nao e preciso sea-faring is necessary, living is not

(Literal English translation: Alberto de Campo.)

197

Page 211: Science by Ear Diss DeCampo

Appendix F

Sound, meaning, language

Sounds can change their meanings in different contexts. This ambiguity has also been

interesting for poetry, as this work by Ernst Jandl shows.

Ernst Jandl - Oberflachenubersetzung (’Surface Translation’)

mai hart lieb zapfen eibe hold

er rennbohr in sees kai.

so was sieht wenn mai lauft begehen,

so es sieht nahe emma mahen,

so biet wenn arschel grollt

ohr leck mit ei!

seht steil dies fader rosse mahen,

in teig kurt wisch mai desto bier

baum deutsche deutsch bajonett schur alp eiertier.

Original poem by William Wordsworth

My heart leaps up when I behold

a rainbow in the sky.

so was ist when my life began,

so is it now I am a man,

so be it when I shall grow old

or let me die!

The child is father of the man

and I could wish my days to be

bound each to each by natural piety.

198

Page 212: Science by Ear Diss DeCampo

Bibliography

Abbott, A. (1990). A Primer on Sequence Methods. Organization Sci-

ence, 1(4):375–392.

Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas.

Annual Review of Sociology, 21:93–113.

Anderson, M. L. (2003). Embodied cognition: A field guide. Artificial

Intelligence, 149(1):91–130.

Anonymous (March 20, 2001). L’histoire: PDG surpayes. Liberation.

Armstrong, N. (2006). An Enactive Approach to Digital Musical Instru-

ment Design. PhD thesis, Princeton University.

Baier, G. and Hermann, T. (2004). The Sonification of Rhythms in

Human Electroencephalogram. In Proc. Int. Conf. on Auditory Display

(ICAD), Sydney, Australia.

Baier, G., Hermann, T., Sahle, S., and Ritter, H. (2006). Sonified Epilec-

tic Rhythms. In Proc. Int Conf. on Auditory Display (ICAD), London,

UK.

Baier, G., Hermann, T., and Stephani, U. (2007). Event-based sonifica-

tion of EEG rhythms in real time. Clinical Neurophysiology, 118(6).

Barnes, J. (2007). ”The Odd Couple”. Review of ”That Sweet Enemy:

The French and the British from the Sun King to the Present” by

Robert and Isabelle Tombs. New York Review of Books, LIV(5):4–9.

Barrass, S. (1997). Auditory Information Design. PhD thesis, Australian

National University.

Barrass, S. and Adcock, x. (2004). Sonification Design Patterns. In

Proc. Int. Conf. on Auditory Display (ICAD), Sydney, Australia.

199

Page 213: Science by Ear Diss DeCampo

200

Barrass, S., Whitelaw, M., and Bailes, F. (2006). Listening to the Mind

Listening: An Analysis of Sonification Reviews, Designs and Corre-

spondences. Leonardo Music Journal, 16:13–19.

Barrass, T. (2006). Description of Sonification for ICAD 2006 Concert:

Life Expectancy. In Proc. Int Conf. on Auditory Display (ICAD), Lon-

don, UK.

Beck, U. (1992). Risk Society: Towards a New Modernity. Sage, New

Delhi.

Ben-Tal, O., Berger, J., Cook, B., Daniels, M., Scavone, G., and Cook,

P. (2002). SONART: The Sonification Application Research Toolbox.

In Proc. ICAD, Kyoto, Japan.

Blauert, J. (1997). Spatial Hearing: The Psychophysics of Human Hear-

ing. MIT Press.

Blossfeld, H.-P., Hamerle, A., and Mayer, K. U. (1986). Ereignisanalyse.

Statistische Theorie und Anwendung in den Wirtschafts- und Sozial-

wissenschaften. Campus, Frankfurt.

Blossfeld, H.-P. and Rohwer, G. (1995). Techniques of event history

modeling. New approaches to causal analysis. Lawrence Erlbaum As-

sociates, Mahwah (N. J.).

Borges, J. L. (1980). The analytical language of john wilkins. In

Labyrinths. Penguin.

Boulanger, R. (2000). The Csound Book: Perspectives in Software

Synthesis, Sound Design, Signal Processing, and Programming. MIT

Press, Cambridge, MA, USA.

Bovermann, T. (2005). MBS-Sonogram. http://www.techfak.

uni-bielefeld.de/~tboverma/sc/.

Bovermann, T., de Campo, A., Groten, J., and Eckel, G. (2007). Jug-

gling Sounds. In Proceedings of Interactive Sonification Workshop

ISon2007.

Bovermann, T., Hermann, T., and Ritter, H. (2006). Tangible data

scanning sonification model. In Proc. of the International Conference

on Auditory Display, London, UK.

Bregman, A. S. (1990). Auditory Scene Analysis. Bradford Books, MIT

Press, Cambridge, MA.

Page 214: Science by Ear Diss DeCampo

201

Bruce, J. and Palmer, N. (2005). SIFT: Sonification Integrable Flexible

Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Limerick,

Ireland.

Buxton, Bill with Billinghurst, M., Guiard, Y., Sellen, A., and

Zhai, S. (2008). Human Input to Computer Systems: Theo-

ries, Techniques and Technology. http://www.billbuxton.com/

inputManuscript.html.

Candey, R., Schertenleib, A., and Diaz Merced, W. (2006). xSonify:

Sonification Tool for Space Physics. In Proc. Int Conf. on Auditory

Display (ICAD), London, UK.

Conner, C. D. (2005). A People’s History of Science: Miners, Midwives

and ”Low Mechanicks”. Nation Books, New York, NY, USA.

Cooper, D. H. and Shiga, T. (1972). Discrete-Matrix Multichannel

Stereo. J. Audio Eng. Soc., 20:344–360.

Cruz-Neira, C., Sandin, D. J., DeFanti, T. A., Kenyon, R. V., and Hart,

J. C. (1992). The CAVE: Audio Visual Experience Automatic Virtual

Environment. Commun. ACM, 35(6):64–72.

Daye, C. and de Campo, A. (2006). Sounds sequential: Sonification in

the Social Sciences. Interdisciplinary Science Reviews, 31(6):349–364.

Daye, C., de Campo, A., and Egger de Campo, M. (2006). Sonifikationen

in der wissenschaftlichen Datenanalyse. Angewandte Sozialforschung,

24(1/2):41–56.

Daye, C., de Campo, A., Fleck, C., Frauenberger, C., and Edelmayer, G.

(2005). Sonification as a tool to reconstruct user’s actions in unob-

servable areas. In Proceedings of ICAD 2005, Limerick.

de Campo, A. (2007a). A Sonification Design Space Map. In Proceedings

of Interactive Sonification Workshop ISon2007.

de Campo, A. (2007b). Toward a Sonification Design Space Map. In

Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada.

de Campo, A. and Daye, C. (2006). Navegar E Preciso, Viver Nao E

Preciso. In Proc. Int. Conf. on Auditory Display (ICAD), London, UK.

de Campo, A. and Egger de Campo, M. (1999). Sonification of So-

cial Data. In Proceedings of the 1999 International Computer Music

Conference (ICMC) Beijing.

Page 215: Science by Ear Diss DeCampo

202

de Campo, A., Frauenberger, C., and Holdrich, R. (2004). Designing

a Generalized Sonification Environment. In Proceedings of the ICAD

2004, Sydney.

de Campo, A., Frauenberger, C., and Holdrich, R. (2005a). Sonenvir

- a progress report. In Proc. Int. Computer Music Conf. (ICMC),

Barcelona, Spain.

de Campo, A., Frauenberger, C., Vogt, K., Wallisch, A., and Daye,

C. (2006a). Sonification as an Interdisciplinary Working Process. In

Proceedings of ICAD 2006, London.

de Campo, A., Hormann, N., M., H., P., and W., Vogt, K. (2006b). Soni-

fication of lattice data: Dirac spectrum and monopole condensation

along the deconfinement transition. In Proceedings of the Minicon-

ference in honor of Adriano Di Giacomo on the Sense of Beauty in

Physics, Pisa, Italy.

de Campo, A., Hormann, N., Markum, H., Plessas, W., and Sengl, B.

(2005b). Sonification of Lattice Data: The Spectrum of the Dirac

Operator Across the Deconfinement Transition. In Proc. XXIIIrd Int.

Symposium on Lattice Field Theory, Trinity College, Dublin, Ireland.

de Campo, A., Hormann, N., Markum, H., Plessas, W., and Sengl, B.

(2005c). Sonification of Lattice Observables Across Phase Transitions.

In International Workshop on Xtreme QCD, Swansea.

de Campo, A., Hormann, N., Markum, H., Plessas, W., and Vogt, K.

(2006c). Sonification of Monopoles and Chaos in QCD. In Proc. of

ICHEP’06 - the XXXIIIrd International Conference on High Energy

Physics, Moscow, Russia.

de Campo, A., Sengl, B., Frauenberger, C., Melde, T., Plessas, W., and

Holdrich, R. (2005d). Sonification of Quantum Spectra. In Proc. Int

Conf. on Auditory Display (ICAD), Limerick, Ireland.

de Campo, A., Wallisch, A., Holdrich, R., and Eckel, G. (2007). New

Sonification Tools for EEG Data Screening and Monitoring. In Proc.

Int Conf. on Auditory Display (ICAD), Montreal, Canada.

de Rujula, A., Georgi, H., and Glashow, S. L. (1975). Hadron masses in

a gauge theory. Phys. Rev., D12(147).

Dix, A. (1996). Closing the loop: Modelling action, perception and in-

formation. In Catarci, T., Costabile, M. F., Levialdi, S., and Santucci,

Page 216: Science by Ear Diss DeCampo

203

G., editors, AVI’96 - Advanced Visual Interfaces, pages 20–28. ACM

Press.

Dix, A., Finlay, J., Abowd, G., and Beale, R. (2004). Human-Computer

Interaction. Prentice Hall, Harlow, 3rd edition.

Dombois, F. (2001). Using Audification in Planetary Seismology. In

Proc. Int Conf. on Auditory Display (ICAD), Espoo, Finland.

Drake, S. (1980). Galileo. Oxford University Press, New York.

Drori, G. S., Meyer, J. W., Ramirez, F. O., and Schofer, E. (2003).

Science in the Modern World Polity: Institutionalization and Global-

ization. Stanford University Press, Stanford.

Ebe, M. and Homma, I. (2002). Leitfaden fur die EEG-Praxis. Urban

und Fischer bei Elsevier, 3rd edition.

Eidelman, S. e. a. (2004). Review of Particle Physics. Phys. Lett.,

B592(1).

Fickert, L., Eckel, G., Nagler, W., de Campo, A., and Schmautzer, E.

(2006). New developments of teaching concepts in multimedia learning

for electrical power systems introducing sonification. In Proceedings

of the 29th ICT International Convention MIPRO, Opatija, Croatia.

Fitch, T. and Kramer, G. (1994). Sonifying the Body Electric: Superi-

ority of an Auditory over a Visual Display in a Complex Multivariate

System. In Kramer, G., editor, Auditory Display. Addison-Wesley.

Frauenberger, C., de Campo, A., and Eckel, G. (2007). Analysing time

series data. In Proc. Int Conf. on Auditory Display (ICAD), Montreal,

Canada.

Gardner, B. and Martin, K. (1994). Hrtf measurements of a kemar

dummy-head microphone. online.

Gaver, W. W., Smith, R. B., and O’Shea., T. (1991). Effective Sounds

in Complex Systems: The ARKola Simulation. In Proceedings of CHI

’91, New Orleans, USA.

Gell-Mann, M. (1964). A Schematic Model of Baryons and Mesons.

Phys. Lett., 8:214.

Gerzon, M. (1977a). Multi-System Ambisonic Decoder, Part 1: Basic

Design Philosophy. Wireless World, 83(1499):43–47.

Page 217: Science by Ear Diss DeCampo

204

Gerzon, M. (1977b). Multi-System Ambisonic Decoder, Part 2: Main

Decoder Circuits. Wireless World, 83(1500):69–73.

Ghazala, R. (2005). Circuit-Bending: Build Your Own Alien Instruments.

Wiley, Hoboken, NJ.

Giddens, A. (1990). The Consequences of Modernity. Stanford University

Press.

Giddens, A. (1999). Runaway World. A series of lectures on globalisa-

tion for the BBC. http://news.bbc.co.uk/hi/english/static/

events/reith_99/.

Glantschnig, K., Kainhofer, R., Plessas, W., Sengl, B., and Wagenbrunn,

R. F. (2005). Extended Goldstone-boson-exchange Constituent Quark

Model. Eur. Phys. J. A.

Glaser, B. and Strauss, A. (1967). The Discovery of Grounded Theory.

Aldine.

Glozman, L., Papp, Z., Plessas, W., Varga, K., and Wagenbrunn, R. F.

(1998). Unified Description of Light- and Strange-Baryon Spectra.

Phys. Rev., D58(094030).

Goodrick, M. (1987). The Advancing Guitarist. Hal Leonard.

GSL Team (2007). Gnu scientific library. http://www.gnu.org/

software/gsl/manual/gsl-ref.html.

Harrar, L. and Stockman, T. (2007). Designing Auditory Graph

Overviews. In Proceedings of ICAD 2007, pages 306–311. McGill

University.

Hayward, C. (1994). Listening to the Earth Sing. In Kramer, G., edi-

tor, Auditory Display, pages 369–404. Addison-Wesley, Reading, MA,

USA.

Hermann, T. (2002). Sonification for Exploratory Data Analysis. PhD

thesis, Bielefeld University, Bielefeld, Germany.

Hermann, T., Baier, G., Stephani, U., and Ritter, H. (2006). Vocal

Sonification of Pathologic EEG Features. In Proceedings of ICAD

2006, London.

Hermann, T. and Hunt, A. (2005). Introduction to Interactive Sonifica-

tion. IEEE Multimedia, Special Issue on Sonification, 12(2):20–24.

Page 218: Science by Ear Diss DeCampo

205

Hermann, T., Nolker, C., and Ritter, H. (2002). Hand postures for

sonification control. In Wachsmuth, I. and Sowa, T., editors, Gesture

and Sign Language in Human-Computer Interaction, Proc. Int. Gesture

Workshop GW2001, pages 307–316. Springer.

Hermann, T. and Ritter, H. (1999). Listen to your Data: Model-Based

Sonification for Data Analysis. In Advances in intelligent computing

and multimedia systems, pages 189–194, Baden-Baden, Germany. Int.

Inst. for Advanced Studies in System research and cybernetics.

Hinterberger, T. and Baier, G. (2005). POSER: Parametric Orchestral

Sonification of EEG in Real-Time for the Self-Regulation of Brain

States. IEEE Multimedia, Special Issue on Sonification, 12(2):70–79.

Hollander, A. (1994). An Exploration of Virtual Auditory Shape Percep-

tion. Master’s thesis, Univ. of Washington.

Hunt, A. and Pauletto, S. (2006). The Sonification of EMG data. In Pro-

ceedings of the International Conference on Auditory Display (ICAD),

London, UK.

Hunt, A. D., Paradis, M., and Wanderley, M. (2003). The importance

of parameter mapping in electronic instrument design. Journal of New

Music Research, 32(4):429–440.

Igoe, T. (2007). Making Things Talk. Practical Methods for Connecting

Physical Objects. O’Reilly.

Jorda Puig, S. (2005). Digital Lutherie. Crafting musical computers for

new musics’ performance and improvisation. PhD thesis, Departament

de Tecnologia, Universitat Pompeu Fabra.

Joseph, A. J. and Lodha, S. K. (2002). MUSART: Musical Audio Trans-

fer Function Real-time Toolkit. In Proc. Int. Conf. on Auditory Display

(ICAD), Kyoto, Japan.

Kramer, G. (1994a). An Introduction to Auditory Display. In Kramer,

G., editor, Auditory Display: Sonification, Audification, and Auditory

Interfaces, chapter Introduction. Addison-Wesley.

Kramer, G., editor (1994b). Auditory Display: Sonification, Audification,

and Auditory Interfaces. Addison-Wesley, Reading, Menlo Park.

Krassnigg, A., Papp, Z., and Plessas, W. (2000). Faddeev Approach to

Confined Three-Quark Problems. Phys. Rev., C(62):044004.

Page 219: Science by Ear Diss DeCampo

206

Latour, B. and Woolgar, S. (1986). Laboratory Life: The Construction of

Scientific Facts. Princeton University Press, Princeton, NJ, (Revised

edition with an introduction by Jonas Salk and a new postscript by

the authors.) edition.

Leman, M. (2006). The State of Music Perception Research. Talk at

’Connecting Media’ conference, Hamburg.

Leman, M. and Camurri, A. (2006). Understanding musical expressive-

ness using interactive multimedia platforms. Musicae Scientiae, special

issue.

Lodha, S. K., Beahan, J., Heppe, T., Joseph, A., and Zane-Ulman, B.

(1997). MUSE: A Musical Data Sonification Toolkit. In Proc. Int

Conf. on Auditory Display (ICAD), Palo Alto, CA, USA.

Loering, U., Metsch, B. C., and Petry, H. R. (2001). The light baryon

spectrum in a relativistic quark model with instanton-induced quark

forces: The non-strange baryon spectrum and ground-states. Eur.

Phys. J., A10:395.

Madhyastha, T. (1992). Porsonify: A Portable System for Data Sonifi-

cation. Master’s thesis, University of Illinois at Urbana-Champaign.

Malham, D. G. (1999). Higher Order Ambisonic Systems for the Spa-

tialisation of Sound. In Proceedings of the ICMC, Beijing, China.

Marsaglia, G. (2003). DIEHARD: A Battery of Tests for Random Number

Generators. http://www.csis.hku.hk/ diehard/.

Mathews, M. and Miller, J. (1963). Music IV programmer’s manual. Bell

Telephone Laboratories, Murray Hill, NJ, USA.

Mayer-Kress, G. (1994). Sonification of Multiple Electrode Human Scalp

Electroencephalogram. Poster presentation demo at ICAD ’94, http:

//www.ccsr.uiuc.edu/People/gmk/Projects/EEGSound/.

McCartney, J. (2003-2007). SuperCollider3. http://supercollider.

sourceforge.net.

McKusick, V. A., Sharpe, W. D., and Warner, A. O. (1957). Harvey

Tercentenary: An Exhibition on the History of Cardiovascular Sound

Including the Evolution of the Stethoscope. Bulletin of the History of

Medicine, 31:p.463–487.

Page 220: Science by Ear Diss DeCampo

207

Meinicke, P., Hermann, T., Bekel, H., Muller, H. M., Weiss, S., and

Ritter, H. (2002). Identification of Discriminative Features in EEG.

Journal for Intelligent Data Analysis.

Milczynski, M., Hermann, T., Bovermann, T., and Ritter, H. (2006).

A malleable device with applications to sonification-based data explo-

ration. In Proc. of the International Conference on Auditory Display,

London, UK.

Moore, B. C. (2004). An Introduction to the Psychology of Hearing.

Elsevier, fifth edition.

Musil, T., Noisternig, M., and Holdrich, R. (2005). A Library for Realtime

3D Binaural Sound Reproduction in Pure Data (PD). In Proc. Int.

Conf. on Digital Audio Effects (DAFX-05), Madrid, Spain.

Neuhoff, J. (2004). Ecological Psychoacoustics. Springer.

Noisternig, M., Musil, T., Sontacchi, A., and Holdrich, R. (June, 2003).

A 3D Ambisonic based Binaural Sound Reproduction System. In Proc.

Int. Conf. Audio Eng. Soc., Banff, Canada.

P. Fronczak, A. Fronczak, J. A. H. (2006). Ferromagnetic fluid as a

model of social impact. International Journal of Modern Physics,

17(8):1227–1235.

Panek, P., Daye, C., Edelmayer, G., and et al. (2005). Real Life Test with

a Friendly Rest Room (FRR) Toilet Prototype in a Daye Care Center

in Vienna – An Interim Report. In Proc. 8th European Conference for

the Advancement of Assistive Technologies in Europe, Lille.

Pauletto, S. (2007). Interactive non-speech auditory display of multivari-

ate data. PhD thesis, University of York.

Pauletto, S. and Hunt, A. (2004). A Toolkit for Interactive Sonification.

In Proceedings of ICAD 2004, Sydney.

Pelling, A. E., Sehati, S., Gralla, E. B., Valentine, J. S., and Gimzewski,

J. K. (2004). Local Nanomechanical Motion of the Cell Wall of Sac-

charomyces cerevisiae. Science, 305(5687):1147–1150.

Pereverzev, S. V., Loshak, A., Backhaus, S., Davies, J., and Packard,

R. E. (1997). Quantum Oscillations between two weakly coupled reser-

voirs of superfluid 3He. Nature, 388:449–451.

Page 221: Science by Ear Diss DeCampo

208

Piche, J. and Burton, A. (1998). Cecilia: A Production Interface to

Csound. Computer Music Journal, 22(2):52–55.

Pigafetta, A. (1530). Primo Viaggio Intorno al Globo Terracqueo (First

Voyage Around the Terraqueous World). Giuseppe Galeazzi, Milano.

Pigafetta, A. (2001). Mit Magellan um die Erde. (Magellan’s Voyage: A

Narrative Account of the First Circumnavigation). Edition Erdmann,

Lenningen, Germany. (First edition Paris 1525.).

Potard, G. (2006). Guernica 2006: Sonification of 2006 Years of War

and World Population Data. In Proc. Int Conf. on Auditory Display

(ICAD), London, UK.

Pulkki, V. (2001). Spatial Sound Generation and Perception by Ampli-

tude Panning. PhD thesis, Helsinki University of Technology, Espoo.

Raskin, J. (2000). The Humane Interface. Addison-Wesley.

Rheinberger, H.-J. (2006). Experimentalsysteme und Epistemische Dinge

(Experimental Systems and Epistemic Things). Suhrkamp, Germany.

Riess, F., Heering, P., and Nawrath, D. (2005). Reconstructing Galileo’s

Inclined Plane Experiments for Teaching Purposes. In Proc. of the In-

ternational History, Philosophy, Sociology and Science Teaching Con-

ference, Leeds, UK.

Roads, C. (2002). Microsound. MIT Press.

Rohrhuber, J. (2006). Terra Nullius. In Proc. Int Conf. on Auditory

Display (ICAD), London, UK.

Rohrhuber, J., de Campo, A., and Wieser, R. (2005). Algorithms To-

day - Notes on Language Design for Just In Time Programming. In

Proceedings of the ICMC 2005, Barcelona.

Ryan, J. (1991). Some Remarks on Musical Instrument Design at

STEIM. Contemporary Music Review, 6(1):3–17. Also available online:

http://www.steim.org/steim/texts.phtml?id=3.

Saraiya, P., North, C., and Duca, K. (2005). An insight-based method-

ology for evaluating bioinformatics visualizations. Transactions on Vi-

sualization and Computer Graphics, 11(4):443– 456.

Scaletti, C. (1994). Sound Synthesis Algorithms for Auditory Data Rep-

resentations. In Kramer, G., editor, Auditory Display: Sonification,

Audification, and Auditory Interfaces. Addison-Wesley.

Page 222: Science by Ear Diss DeCampo

209

Schaeffer, P. (1997). Traite des objets musicaux. Le Seuil, Paris.

Snyder, B. (2000). Music and Memory. MIT Press.

Speeth, S. D. (1961). Seismometer sounds. J. Acoust. Soc. Am., 33:909–

916.

Stockman, T., Nickerson, L. V., and Hind, G. (2005). Auditory graphs:

A summary of current experience and towards a research agenda. In

Proc. ICAD 2005, Limerick.

Suzuki, Y. and Varga, K. (1998). Stochastic variational approach to

quantum-mechanical few-body problems. Lecture Notes in Physics,

m54.

TAP, ACM (2004). Acm transactions of applied perception. New York,

NY, USA.

Theussl, L., Wagenbrunn, R. F., Desplanques, B., and Plessas, W.

(2001). Hadronic Decays of N and Delta Resonances in a Chiral Quark

Model. Eur. Phys. J., A12:91.

UN Statistics Division (1975). Towards A System of Social Demographic

Statistics. United Nations, Available online at UN Statistics Division

(2006).

UN Statistics Division (1989). Handbook of Social Indicators. UN Statis-

tics website.

UN Statistics Division (2006). Social Indicators.

http://unstats.un.org/unsd/demographic/products/socind/default.htm.

Urick, R. J. (1967). Principles of Underwater Sound. McGraw-Hill, New

York, NY, USA.

U.S. Census Bureau (2006). World POPClock Projection. http://www.

census.gov/ipc/www/popclockworld.html.

Vercoe, B. (1986). CSOUND: A Manual for the Audio Processing System

and Supporting Programs. M.I.T. Media Laboratory, Cambridge, MA,

USA.

Vogt, K., de Campo, A., Frauenberger, C., Plessas, W., and Eckel, G.

(2007). Sonification of Spin Models. Listening to Phase Transitions

in the Ising and Potts Model. In Proc. Int Conf. on Auditory Display

(ICAD), Montreal, Canada.

Page 223: Science by Ear Diss DeCampo

210

Voss, R. and Clarke, J. (1975). 1/f noise in speech and music. Nature,

(258):317–318.

Voss, R. and Clarke, J. (1978). 1/f Noise in Music: Music from 1/f

Noise. J. Acoust. Soc. Am., 63:258–263.

Walker, B. (2000). Magnitude Estimation of Conceptual Data Dimen-

sions for Use in Sonification. PhD thesis, Rice University, Houston.

Walker, B. and Cothran, J. (2003). Sonification Sandbox: A Graphical

Toolkit for Auditory Graphs. In Proceedings of ICAD 2003, Boston.

Walker, B. N. and Kramer, G. (1996). Mappings and Metaphors in

Auditory Displays: An Experimental Assessment. In Frysinger, S. and

Kramer, G., editors, Proc. Int. Conf. on Auditory Display (ICAD),

pages 71–74, Palo Alto, CA.

Walker, B. N. and Kramer, G. (2005a). Mappings and Metaphors in

Auditory Displays: An Experimental Assessment. ACM Trans. Appl.

Percept., 2(4):407–412.

Walker, B. N. and Kramer, G. (2005b). Sonification Design and

Metaphors: Comments on Walker and Kramer, ICAD 1996. ACM

Trans. Appl. Percept., 2(4):413–417.

Walker, B. N. and Kramer, G. (2006). International Encyclopedia of

Ergonomics and Human Factors (2nd ed.), chapter Auditory Displays,

Alarms, and Auditory Interfaces, pages 1021–1025. CRC Press, New

York.

Wallisch, A. (2007). EEG plus Sonifikation. Sonifikation von EEG-Daten

zur Epilepsiediagnostik im Rahmen des Projekts ’SonEnvir’. PhD the-

sis, Medical University Graz, Graz, Austria.

Warusfel, O. (2002-2003). LISTEN HRTF database.

http://recherche.ircam.fr/equipes/salles/listen/.

Wedensky, N. (1883). Die telefonische Wirkungen des erregten Ner-

ven - The Telephonic Effects of the Excited Nerve. Centralblatt fur

medizinische Wissenschaften, (26).

Wessel, D. (2006). An Enactive Approach to Computer Music Perfor-

mance. In GRAME, editor, Proc. of ’Rencontres Musicales Pluridisci-

plinaires’, Lyon, France.

Page 224: Science by Ear Diss DeCampo

211

Wikipedia (2006a). Gini Coefficient.

http://en.wikipedia.org/wiki/Gini coefficient.

Wikipedia (2006b). Magellan. http://en.wikipedia.org/wiki/Magellan.

Wikipedia (2007). Levy skew alpha-stable distribution.

http://en.wikipedia.org/wiki/Levy skew alpha-stable distribution.

Williams, S. (1994). Perceptual Principles in Sound Grouping. In Kramer,

G., editor, Auditory Display. Addison-Wesley.

Wilson, C. M. and Lodha, S. K. (1996). Listen: A Data Sonification

Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Santa Cruz,

CA, USA.

Wilson, K. (1974). Renormalization group theory. Physics Reports,

75(12).

Worrall, D., Bylstra, M., Barrass, S., and Dean, R. (2007). SoniPy: The

Design of an Extendable Software Framework for Sonification Research

and Auditory Display. In Proc. Int Conf. on Auditory Display (ICAD),

Montreal, Canada.

Yeo, W. S., Berger, J., and Wilson, R. S. (2004). A Flexible Framework

for Real-time Sonification with SonArt. In Proc. Int Conf. on Auditory

Display (ICAD), Sydney, Australia.

Yeomans, J. M. (1992). Statistical Mechanics of Phase Transitions.

Oxford University Press.

Zouhar, V., Lorenz, R., Musil, T., Zmolnig, J. M., and Holdrich, R.

(2005). Hearing Varese’s Poeme Electronique inside a Virtual Philips

Pavilion. In Proc. Int. Conf. on Auditory Display (ICAD), Limerick,

Ireland.

Zweig, G. (1964). An SU(3) Model for Strong Interaction Symmetry and

its Breaking. CERN Report Th.401/Th.412, page 8182/8419.

Zweig, S. (1983). Magellan - Der Mann und seine Tat. (Magellan - The

Man and his Achievement). Fischer, Frankfurt am Main. (First ed.

Vienna 1938).

Zwicker, E. and Fastl, H. (1999). Psychoacoustics-Facts and Models,

2nd Ed. Springer, Berlin.