institute for human-machine communication · published by lehrstuhl für...

69
Institute for Human-Machine Communication Activity Report 1997– 2000

Upload: others

Post on 27-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Institute forHuman-Machine CommunicationActivity Report 1997– 2000

Page 2: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Published by

Lehrstuhl fürMensch-Maschine-Kommunikation

(Institute for Human-Machine Communication)Prof. Dr. rer. nat. Manfred K. LangTechnische Universität München

D - 80290 München, Germany

Visitor address

Arcisstr. 16, 80333 München,Building S6, 2nd floor

Phone ++49 / 89 / 289-28541

Fax ++49 / 89 / 289-28535

E-mail [email protected]

Web URL http://www.mmk.ei.tum.de

The Institute for Human-MachineCommunication belongs to the Department ofElectrical Engineering and InformationTechnology of the Technische UniversitätMünchen. In the course of reorganizing thestructure of the department in 1998 it becameone of its five Institutes for Information andCommunication Technology.

The Institute is situated in the southeastern partof the main campus of the TechnischeUniversität München, close to the center ofMunich. The institute can easily be reached bypublic transport (U-Bahn U2, stop Königsplatz).Please use the map on the back side of the coverto find your way.

Page 3: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Contents

Preface 1

Staff 2

Facilities 4

Usability Lab 4Navigation Lab for Usability Studies in Cars 5Computer Vision Lab 7Anechoic Chamber and Reverberation Chamber 8

Lectures and Seminars 9

Research Topics 11

Usability Engineering and Multimodal Human-Machine Communication 12Visual Human-Machine Interaction Utilizing Image Interpretation 17Adaptation and Learning Strategies 19Emotional Aspects in Human-Machine Communication 21Automatic Speech Recognition and Speech Understanding 22Technical Acoustics and Noise Evaluation 28Psychoacoustics and Audiological Acoustics 32Acoustical Communication 34

Cooperative Research Projects 35

Industry Partners 35Collaborative Research Projects 37

Student Research Work 39

Diploma Theses 39Student Research Projects 42

Miscellaneous Events 43

Conferences and Symposia 43Retirement of Professor Ernst Terhardt 46Honors and Awards 46

Publications 47

Doctoral Dissertations 47Scientific Publications 55

Page 4: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Preface

The integration of computer andcommunication technologies offersa rapidly growing number of informa-tion services since the invention ofMorse telegraphy. Networked infor-mation and communication systemsfor data, text, voice and image makehigh-quality multimedia servicesand multimodal dialog possible be-yond geographical and political bor-ders. Due to technological and sys-tem-oriented developments thesesystems are getting more efficient,more cost-effective, but on the otherhand also more complex. Not onlyengineers and computer experts areconcerned in practice with computeroperated systems but also more andmore users with most differing pro-fessional background and manyfoldtasks. New perspectives of how toobtain information come true for ev-eryone.

Nevertheless, the users of moderninformation technologies shall notbe overstrained by the methods ofhandling the systems and by theflows of available information. Forthat reason a system operabilitywhich is adequate to the user is notonly a research and developmentaim of high priority but also a deci-sive quality criterion in the competi-tion for market success. Progress indeveloping user-friendly, co-opera-tive interfaces depends essentiallyon an optimal adaptation of the hu-man-machine interaction to the us-er’s sensory, motory, and cognitivecapabilities and limits. The crucialsignificance to invest in an adequatehuman-machine interaction is alsoemphasized by topical Government

supported research and develop-ment projects since recently.

The fields of application for user-friendly interfaces are various: net-work systems in the office, expertwork stations for teaching, training,production, quality control and main-tenance, in mobile applications, inmedical services, and, last but notleast, in the home and entertainmentsector.

The Institute for Human-MachineCommunication is concerned withresearch and teaching in the abovementioned field. The present activityreport gives a brief overview on ourrecent investigations and contribu-tions to an user adequate human-machine communication for differentareas of application. It follows ourpreceding activity report 1990-1996,and it supplements the current inter-net pages of our institute.

A consistent continuation and inclu-sion of usability engineering meth-ods into our research topics, and thereinforcement in multimodal human-machine interaction combining tac-tile modi with natural language andspeech, handwriting, vision, andgesture have been considerablechallenges during the reporting peri-od. We also continued our activitiesin technical acoustics, noise evalua-tion, and audiological acoustics. Weadapted our usability laboratory, ouranechoic chamber, and our rever-beration chamber to the settings ofnew tasks. In order to guarantee re-producible investigations and inter-

pretations of visual information in thecontext of multimodal human-ma-chine dialogs we established a newcomputer vision laboratory. As futureautomotive systems are a most im-portant area of application wherevery high requirements must be met,we installed a new navigation labo-ratory for usability studies in carstoo.

We are very grateful and proud of ex-periencing remarkable responseand approval to our research resultsfrom national and international con-ferences as well as from our co-op-eration partners in application ori-ented research and developmentlaboratories. Details of our investiga-tions are documented in a series ofdoctoral dissertations, in thesessubmitted for a diploma degree, andin a couple of student researchprojects. Persons, parties, andfriends interested in more details ofour teaching and research work arekindly invited to get in touch with us.

Manfred K. Lang

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 1

Page 5: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Sta

ff

1 Staff

Faculty Membersand Lecturers

Administrativeand Technical Staff

Scientific Staff

Manfred K. Lang, Univ.-Prof.Dr. rer. nat., head of the chair

Ernst Terhardt, Univ.-Prof. Dr.-Ing.,adjunct professor, retired 3/99

Hugo Fastl, Prof. Dr.-Ing.,academic director

Günther Ruske, Prof. Dr.-Ing.,academic director

Mathias Schneider-Hufschmidt, Dr.,lecturer (Siemens AG)

Peter Brand, Dipl.-Inf. (FH),system administrator

Ernst Ertl, mechanic

Gertrud Günther,technical assistant

Heinrich Hundhammer,electronics technician

Claus von Rücker, Dr.-Ing.,software coordinator

Melitta Schubert, secretary

Frank Althoff, Dipl.-Inf.

Josef Chalupper, Dipl.-Ing.

Stephan Demmerer, Dipl.-Phys.

Robert Faltlhauser, Dipl.-Ing.

Michael Geiger, Dipl.-Ing.

Karla Geiss, Dipl.-Inf.

Marc Hofmann, Dipl.-Ing.

Jörg Hunsinger, Dipl.-Phys.

Gregor McGlaun, Dipl.-Math.

Fred Nentwich, Dipl.-Ing.

Ralf Nieschulz, Dipl.-Ing.

Christine Patsouras(formerly Huth), Dipl.-Ing.

Björn Schuller, Dipl.-Ing.

Bernhard Seeber, Dipl.-Ing.

Martin Zobl, Dipl.-Ing.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 2

Page 6: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Sta

ff

External Postgraduates Former Staff Former ExternalPostgraduates

Sergey Astrov, Dipl.-Ing.(Siemens AG)

Josef Bauer, Dipl.-Ing.(Siemens AG)

Frank Forster, Dipl.-Inf.(Siemens AG)

Hans-Peter Grabsch, Dipl.-Ing.(Bosch)

Bernhard Niedermaier, Dipl.-Ing.(BMW AG)

Ronald Römer, Dipl.-Ing.(Infineon Technologies AG)

Wolfgang Schmid, Dipl.-Ing.(Gasteig GmbH)

Thomas Filippou, Dipl.-Ing.(until 12/99)

Gerd Gottschling, Dipl.-Ing.(until 6/97)

Frank Haferkorn, Dipl.-Phys.(until 3/98)

Dietmar Mass, Dipl.-Phys.(until 6/99)

Peter Morguet, Dr.-Ing. (until 1/00)

Johannes Müller, Dr.-Ing. (until 5/99)

Robert Neuss (formerly Grau),Dipl.-Ing. (until 9/00)

Thilo Pfau, Dr.-Ing. (until 10/00)

Christine Reischer, secretary(until 4/00)

Wolfgang Schmid, Dipl.-Ing.(until 4/98)

Holger Stahl, Dr.-Ing. (until 4/97)

Ingeborg Stemplinger, Dr.-Ing.(until 12/97)

Miriam Valenzuela, Dr.-Ing.(until 12/97)

Hans-Jürgen Winkler, Dr.-Ing.(until 1/97)

Lars Witta, Dipl.-Ing. (until 8/99)

Robert Zwickenpflug, Dr.-Ing.(until 7/97)

Udo Bub, Dr.-Ing.(Siemens AG)

Angela Engels (formerly Schreyer),Dr.-Ing. (Siemens AG)

Jochen Junkawitsch, Dr.-Ing.(Siemens AG)

Joachim Köhler, Dr.-Ing.(Siemens AG)

Christian Krapichler, Dr.-Ing.(GSF medis)

Henning Lenz, Dr.-Ing.(Siemens AG)

Helmut Spannheimer, Dr.-Ing.(BMW AG)

Christoph Wagner, Dr.-Ing.(Siemens AG)

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 3

Page 7: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Faci

litie

s

2 Facilities

2.1 Usability Lab

The main goal of the research workat the institute is to improve human-machine interaction by developingnew pattern recognition based inputmodalities and new intelligent dialogstrategies entailing advanced inter-face concepts. As there is no experi-ence of how people use and acceptthese new technologies, usabilitystudies have to be made from anearly stage of system development.Therefore the institute has recentlyinstalled a special laboratory forstudying and evaluating new interac-tion methods and techniques.

In order to employ such investiga-tions, it usually will be necessary toobserve and record the behavior ofexperimental subjects. Therebyboth, goal orientated actions forsolving a problem and spontaneous-ly occurring actions and reactions tostimuli brought in from the outsidewhile handling technical systems,

are of interest. For later analysis theactions of the experimental subjectcan be recorded by video and audioequipment and processed with com-puters.

The usability laboratory of our insti-tute has a number of different pur-poses for our research:

• Studies on usability, user friendli-ness, and acceptance of userinterfaces

• Gathering of practically relevanttraining data for new interfaceconcepts, pattern recognitionbased input modalities and dialogstrategies

• Gathering of empirical data fordeveloping and optimizing inter-face concepts and dialog strate-gies also considering ergonomicviewpoints

The usability lab consists of threeseparate rooms (see fig. 1, left):

Observation room: It is equippedwith several remote controlled videocameras and microphones for ob-servation and recording, and withloudspeakers to provide any acous-tical environment. In that room thesubject is supposed to deal with theexperimental setup.

Control room: It is provided withvideo and audio mixers, computersand various studio equipment (seefig. 1, right). The subject is addition-ally monitored through an one waymirror. From that room a scientistcontrols the whole experiment.

Recording room: A so-calledsound proved booth is used for spe-cial examinations which require au-dio signals with a high signal-to-noise ratio.

Marc Hofmann, Ralf Nieschulz

O b s e r v a t i o n r o o m

C o n t r o lr o o m

S o u n dp r o v e db o o t h

O n e w a y m i r r o r

Fig. 1: Outline (left) and control room(right) of the usability lab.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 4

Page 8: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Faci

litie

s

2.2 Navigation Lab for Usability Studies in Cars

This usability lab is used to test andoptimize new approaches for multi-modal man-machine interfaces(MMI) in the environment of a car. Itis called “navigation lab” since its pri-mary use was to investigate the MMIof automotive navigation systems.

Because of large environmental in-fluences on the test persons, MMIshave to be tested in a real environ-ment. It obviously makes little senseto evaluate the usability of a car mo-bile phone – which actually has to beoperated while driving – in a desktopscenario. Unfortunately, it is difficultto integrate the required technologyinto a car boot. Some test scenariosmight even lead to dangerous situa-tions in real traffic. To make a replicaof real conditions, the navigation labwas built up. However, real condi-tions do not only have effect on thetest-person, but also on the usedtechnology.

Image processing suffers underchanging illumination and back-ground setups – especially in the carenvironment conditions for this arereally rough. Speech recognition isaffected by driving noise or addition-al passengers in the car. Drivers mayswitch, but the system still has tofunction correctly. These conditionscall for robust algorithms and statisti-cal methods for user and situationalmodelling. Naturally, operating theMMI is not the primary task of thedriver. Thus a number of precautions(e.g., adaptation to user and situa-tion, structure of dialog, guidance tothe user) to support or even assistthe driver have to be taken.

This can all be simulated and testedin the navigation lab.

Surely, not all the technologies haveto be implemented before testing theusability of an MMI. We want to knowexactly how the whole system has to

function before wasting time invent-ing technologies not satisfying theuser’s needs.

Real car environment is simulated ina car with force-feedback steering,automatic transmission, hand brake,pedals, knobs, car-audio etc. (seefig. 2, right).

A lcd-monitor with a haptic input de-vice displays the user interface. Sev-eral cameras and microphones areinstalled inside the car in order to ob-serve the test person’s activitiesthroughout a usability test. The driv-ing simulation is projected (5 m pro-jection-diagonal) by a lcd-beameronto the wall in front of the car (seefig. 3, right). Four speakers assurethat acoustical events can be posi-tioned in every direction.

The simulator is surrounded by mol-ton walls to separate the probandfrom the lab environment.

A control room (see fig. 3, left) is also

Fig. 2: Left: Outline of the navigation lab. Right: Car equipment.

speech mic gesture cam wizard cam

face camwheelhold indicator

display 1

display 2

force feedback wheel

speaker

transmission

handbrake

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 5

Page 9: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Faci

litie

s

located in the navigation lab, whichcan be fully separated (acousticallyand visually) from the test-scenario.It is equipped with several monitors,video recorders, a video mixer, ascan-converter, audio compressors,audio mixer, audio amplifiers for caraudio, monitoring and driving simu-lation, a MMI computer (hi-speed PC

with two monitors) and the driving-sim computer (hi-speed PC with fastgraphic engine). This equipment al-lows a maximum of flexibility for allconceivable setups and scenarios.

More computers are scheduled, e.g.for gesture recognition, adaptivecomponents, emotion estimation,multimodal dialog concepts or

speech recognition (cf. sections 4.2-4.5.3). They can easily be integratedinto the setup. The navigation lab isconnected to the data processingequipment of the institute through100 Mb/s LAN, which allows the useof practically any desired computingperformance.

Michael Geiger, Martin Zobl

Fig. 3: Navigation lab. Left: Control room. Right: Car and projection from behind.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 6

Page 10: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Faci

litie

s

2.3 Computer Vision Lab

The computer vision lab was estab-lished at the institute of Human-Ma-chine Communication in the begin-ning of the year 2000. In this lab,both analytical investigations con-cerning digital image processingand image synthesis are taken up.

Currently, the technical equipment ofthe lab consists of two Silicon Graph-ics Workstations (Indigo2 Impactand Indy with special grabbing andvideo compression devices), threehigh-performance PCs (with stan-dard frame grabber cards) and sev-eral high-quality video cameras.Moreover the laboratory is equippedwith a video cassette recorder, a stu-dio TV monitor and a sound audio-mixing device to do basic usabilitystudies. To avoid interfering effectson picture sampling a flicker-freehigh-frequency lighting was in-stalled.

On the image processing side, usu-

ally the following steps are carriedout to digitize images. The scene isrecorded by one or more cameras.The analog signal of the camera(s)is sent to the A/D-converter of thecomputer, which is called framegrabber card. The sequence of digi-tized pictures, i.e. the video frames,can be viewed on the monitor andoptionally, single frames or a se-quence of frames can be saved in acomputer-compatible file format(e.g. common image formats likeJpeg, PNG, or video formats like AVIor MPEG). For further evaluation theframes can be improved by imagepreprocessing (e.g. segmentation ortexture analysis). By overlaying arectangular pattern on the image it issubdivided into single sections. Thisprocess is called screening. Fromthese frames, some special (densi-torical, geometrical etc.) featurescan be extracted. By applying spe-cial determination algorithms the

single features can be classified.Motions can be detected by compar-ing the frames of a sequence. Thus aset of translation vectors can be de-fined and rated.

On the image synthesis side, someresearch is done concerning virtual3D-scenarios that are modelled us-ing the virtual reality modelling lan-guage (VRML). One of the currentendeavors in this field is to build aprototypical virtual model of severalfacilities of the institute. In this con-text, a system is generated and test-ed which enables the user to navi-gate through the virtual room withthe help of natural speech utteranc-es. Moreover, with regard to a multi-modal handling, both usual haptic in-terfaces (keyboard and mouse) anda dynamic gesture-recognition mod-ule have been integrated.

Gregor McGlaun, Frank Althoff

Fig. 4: Left: Experimental setup for gesture recognition techniques. Right: VRML model of the lab.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 7

Page 11: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Faci

litie

s

2.4 Anechoic Chamber and Reverberation Chamber

Many physical-technical, electroa-coustical, and psychoacousticalmeasurements rely on environ-ments with defined room-acousticalparameters. Thus an anechoicchamber and a reverberation cham-ber, both available at the Institute forHuman-Machine Communication,establish the standard equipment ofacoustical measurement instrumen-tation.

Although both rooms define oppo-site acoustic environments, theyboth need to fulfill the following tworequirements: noise, even infrason-ic, must have a very low level in thechambers and noise transmittedfrom the chambers to the controlroom must not annoy or even endan-ger the operators. A combination ofroom and structural acoustics mea-sures as the costly room-in-roomconstruction ensures the fulfillmentof these requirements.

The anechoic chamber should sup-port the generation of a free, undis-turbed sound field. The sound inten-sity reflected from the walls shouldtherefore be minimal. For this reasonall walls, the ceiling, and the floor arecovered with an absorbing material.Nested side-by-side arranged wedg-es of mineral fibres achieve a highnoise absorption coefficient over awide frequency range through a con-tinuous transition of the sound wavefrom the air to the absorber. The low-er cutoff frequency of the anechoic

chamber is defined as the lowest fre-quency at which the noise absorp-tion coefficient under normal angleof incidence is at least 0.99. For thearrangement of the absorptionwedges used in this anechoic cham-ber the lower cutoff frequency of theroom is about 125 Hz and its wave-length corresponds to four-times thelength of the absorption wedges.The usable volume of the room isL∗W∗H = 7.5 m ∗ 4.2 m ∗ 2.8 m =88.2 m3. Extensive measurementswere carried out to optimize speak-er- and microphone positions

against the background of standingwaves at frequencies below the cut-off frequency of the anechoic cham-ber.

Sound fields in the reverberationchamber should be statistical, i.e.the temporal mean of the sound in-tensity should be equal for all roomdirections at all places in the roomand for all measurementfrequencies. Therefore the walls arebuilt non-parallel and all surfaces arecovered with sound-reflective mate-rial. Furthermore perspex reflectorsare installed as sound-diffusors. Thereverberation time of the room at lowfrequencies is longer than 10 s andcan be reduced for experiments bymountable plate absorbers. Mount-ing bars allow the fast attachment ofdifferent materials whose acousticalproperties are to be investigated.The size of the reverberation cham-ber is L∗W∗H = 5.5 m ∗ 4.9 m ∗3.9 m = 106 m3.

In a control room adjacent to bothchambers the experimenter findssystems to generate, analyze, con-trol and document measurementsand experiments. Besides differentelectroacoustical transducers, sys-tems for generating and analyzingmeasurement signals and proce-dures are available.

Bernhard SeeberFig. 5: Acoustical measurementsare carried out in the anechoicchamber.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 8

Page 12: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Lect

ures

and

Sem

inar

s

3 Lectures and Seminars

Lectures Lecturer Credit hrs Subject (in English)

Signaldarstellung (undergraduate) Lang 4 Signal Representations

Mensch-Maschine-Kommunikation 1

Lang 3 Human-MachineCommunication 1

Mensch-Maschine-Kommunikation 2

Lang 3 Human-MachineCommunication 2

Audiokommunikation Fastl 3 Audio Communication

Technische Akustik undLärmbekämpfung

Fastl 2 Technical Acoustics and NoiseAbatement

Musikalische Akustik Fastl 2 Musical Acoustics

Automatische Mustererkennung inder Sprachverarbeitung

Ruske 2 Automatic Pattern Recognitionin Speech Processing

Datenanalyse undInformationsreduktion

Ruske 2 Data Analysis and InformationReduction

Digitale Verarbeitung vonSprachsignalen

Ruske 2 Digital Processing of SpeechSignals

Gestaltung ergonomischerBenutzungsoberflächen

Schneider-Hufschmidt

2 Design of Ergonomic UserInterfaces

Laboratories Lecturer Credit hrs Subject (in English)

Praktikum Mensch-Maschine-Kommunikation

Lang 4 Laboratory “Human-MachineCommunication”

Praktikum "Praxis der Mensch-Maschine-Kommunikation"

Lang, Fastl, Ruske 4 Laboratory “Practical Aspects ofHuman-MachineCommunication”

Praktikum System- undSchaltungstechnik 1, 2

Lang, Ruske, andothers

2 Laboratory “Systems andCircuitry”

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 9

Page 13: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Lect

ures

and

Sem

inar

s

Teaching Scripts

[U-F1] Fastl, H.: Umdrucke zur Vorlesung Technische Akustik und Lärmbekämpfung (1999).[U-F2] Fastl, H.: Umdrucke zur Vorlesung Musikalische Akustik (2000).[U-F4] Fastl, H.: Umdrucke zur Vorlesung Audiokommunikation (2000).[U-L1] Lang, M.: Skript zur Vorlesung Signaldarstellung (1994).[U-L2] Lang, M.; Stahl, H.; Winkler, H.-J.: Übungskatalog zur Signaldarstellung (1994).[U-L3] Lang, M.: Skript zur Vorlesung Mensch-Maschine-Kommunikation 1 (1999).[U-L4] Lang, M.; Müller, J.; Winkler, H.-J.: Übungs- und Prüfungsaufgaben Mensch-Maschine-Kommunikation 1

(1994).[U-L5] Lang, M.; Morguet, P.: Versuchsmanuskript zum Praktikum Mensch-Maschine-Kommunikation (1995).[U-L6] Lang, M.: Kurzmanuskript zur Vorlesung Mensch-Maschine-Kommunikation 2 (1996).[U-L7] Lang, M., Althoff, F.; Geiger, M.: Ergänzungen und Übungen zur Vorlesung Mensch-Maschine-

Kommunikation (1999).[U-L8] Lang, M.: Ringpraktikum Elektrotechnik und Informationstechnik (1996).[U-R1] Ruske, G.: Skriptum zur Vorlesung Datenanalyse und Informationsreduktion (1999).[U-R2] Ruske, G.: Skriptum zur Vorlesung Automatische Mustererkennung in der Sprachverarbeitung (2000).[U-R3] Ruske, G.: Skriptum zur Vorlesung Digitale Verarbeitung von Sprachsignalen (2000).

Regular Seminars Lecturer Subject (in English)

Hauptseminar “Aktuelle Fragen derMensch-Maschine-Kommunikation”

Lang, Fastl, Ruske Advanced Seminar “CurrentTopics in Human-MachineCommunication”

Oberseminar Mensch-Maschine-Kommunikation

Lang, Fastl, Ruske Seminar Human-MachineCommunication

Oberseminar “Laufende Arbeitenzur Mensch-Maschine-Kommunikation”

Lang, Fastl, Ruske Seminar “Ongoing Work inHuman-MachineCommunication”

Oberseminar “InterdisziplinäreGrundlagen der Mensch-Maschine-Kommunikation”

Lang, Fastl, Ruske, and others Seminar “InterdisciplinaryFoundations of Human-MachineCommunication”

Kolloquium Informationstechnik Lang and others Seminar InformationTechnology

Seminar “Umweltaspekte derVerkehrstechnik”

Fastl Seminar “EnvironmentalAspects of TransportationTechnology”

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 10

Page 14: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4 Research Topics

Modern systems for communicationand information processing are pre-requisites for a high quality interper-sonal communication and informa-tion exchange. Moreover, thosesystems enable us to interact with allkinds of computers and computercontrolled machines, e.g. to operateentertainment electronics, to accessthe internet, to use information ser-vices, or even to navigate a car. Withongoing technological progress,these systems do not only becomemore capable and efficient, but alsomore complex. Nowadays, thesesystems have become a part of ev-eryday life, in contrast to formertimes, when only engineers and ex-perts had to operate them. For thisreason, an adequate and efficientuser interface is a major goal of re-search and development to enableeveryone to participate in moderncommunication infrastructure andtechnology.

The research topics at the Institutefor Human-Machine Communica-tion deal with the fundamentals of awidely intuitive, natural, and there-fore multimodal interaction betweenhumans and complex informationprocessing systems. All forms of in-

teraction, i.e. modalities, that areavailable to humans, are to be inves-tigated for this purpose. Both themachine’s representation of infor-mation and the interaction techniqueis to be considered in this context,like

• text and speech,• sound and music,• haptics,• graphics and vision,• gesture and mimics,• and emotions.The improvement of a single methodof interaction is important, but notour main goal of research. The coac-tion of different modalities is mostpromising to enhance the efficiencyof Human-Machine Communication.Even a combination of only two mo-dalities, e.g. speech and haptics,can be more efficient and less error-prone than a single way of interac-tion.

To ensure the end-user’s accep-tance of those new ways of Human-Machine Communication, it is ofprime interest to investigate the us-ability of new interfaces in an earlystage of development. This usabilityengineering process yields numer-ous hints to enable developers to

produce a more efficient and lesscryptic dialog between humans andmachines.

Since the user’s knowledge about asystem and its properties changesover time and differs for each individ-ual, it is advisable to create adaptiveuser interfaces. Therefore, our re-search also investigates the founda-tions of adaptivity and learning sys-tems to enable us to develop man-machine interfaces that really takethe user’s variable skills into ac-count.

Human-Machine Communication isan interdisciplinary field of research.Therefore many different subjectsare involved to reach the long-termresearch objective of a natural, intui-tive way of interaction with “ma-chines”. This chapter gives an over-view of the current research topics atthe Institute for Human-MachineCommunication, but cannot be com-plete. Please see over the list of sci-entific publications in section 8.2 foran exhaustive view of our researchwork in the four-year period of thisreport.

Claus von Rücker

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 11

Page 15: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.1 Usability Engineering and Multimodal Human-MachineCommunication

4.1.1 System Architecture for Multimodal User Interfaces

Human beings process several in-terfering perceptions at a high levelof abstraction so that they can meetthe demands of the prevailing situa-tion. Most of today’s technical sys-tems are incapable of emulating thisability yet. Another problem of cur-rent systems is that due to theirgrowing functionalism their interfac-es are often complex to handle andrequire adaptation by the user to ahigh degree.

However, those interfaces would beparticularly desirable whose han-dling can be learned in a short timeand that can be worked with quickly,easily and, above all, intuitively.Therefore we propagate the designof multimodal system interfaces asthey provide the user with greaternaturalness, expressive power andflexibility. Moreover multimodal oper-ating systems probably functionmore robustly than their unimodal

counterparts because they integrateredundant information shared be-tween the individual input modali-ties.

We are specially interested in the de-sign of a generic multimodal systemarchitecture which can easily beadopted to various conditions andapplications and thus serve as a ba-sis for multimodal interaction sys-tems. In contrast to most of the exist-ing architectures our designphilosophy is to merge several com-peting modalities (i.e. two or morespeech modules, dynamic and staticgesture modules, etc.) instead of us-ing a specially designed set of com-bined modalities.

To integrate the various informationcontents of the individual input mo-dalities we are following a new ap-proach. By employing biological mo-tivated evolutionary strategies thecore algorithm compromises a pop-

ulation of individual solutions to theproblem at hand and a set of opera-tors defined over the population it-self. According to evolutionary theo-ries, only the most suited elementsin a population are likely to surviveand generate offspring, transmittingtheir biological heredity to new gen-erations and thus lead to stable androbust solutions.

The concepts of the developed mul-timodal system architecture are vali-dated in various scenarios, both re-search and industry-sponsoredprojects. One special application forexample enables the user to navi-gate in arbitrary virtual worlds byfreely combining natural and com-mand speech, dynamic hand ges-tures and conventional graphical in-terfaces

Frank Althoff, Gregor McGlaun

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 12

Page 16: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.1.2 Usability Engineering for Adaptive Dialog Procedures in the Automobile (ADVIAProject)

Due to the multiplicity of coexistingelectronic devices in modern luxuryand upper class automobiles, thelimit of usability has been reachedfor the standard user. Some exam-ples of these devices are navigationand telematics systems, audio andvideo components like cd-changer,radio and television, mobile phone,car computer, air condition and anyadditional conceivable unit such asinternet applications. As the devicesof different manufacturers not onlylook different, but also have compa-ny-specific user interfaces and con-trol elements, these can hardly beoperated by a standard user – partic-ularly while driving.

For some years now, car manufac-turers have been trying to solve thisproblem by developing multi-func-tional integrated man-machine inter-faces (MMI), which employ a com-mon graphical display and a reducednumber of haptic control elements.

Reducing control elements for ex-ample increases the complexity of

menu structures. The user still has toread lengthy manuals to find his waythrough the menus of the resultinguser-interface. In order to make op-erating the MMI more intuitive, alter-native basic approaches and studiesare essential.

We consider the introduction of fa-miliar human communication modal-ities like natural speech and ges-tures a precondition for an intuitivedialog with machines, which also ap-plies to car MMIs. Additionally, theuser should get further support by anadaptive, help (cf. section 4.3.2) andassistance system (cf. section4.3.1). However, further problemsappear.

How does such an integrated userinterface have to be structured?What kind of modalities can or haveto be used for which function? Howand when should acoustical or visualfeedback be given? How and whencan the system sensibly adapt to theuser? How can the user interface bedesigned user-friendly, intuitively us-

able, assisting (but not dominating)the user, all without drawing the driv-ers attention away from traffic?

Our research in this domain mainlytakes place in our navigation-lab(see 2.2), a specially automotive-fit-ted usability-lab (see 2.1) which per-mits us to do usability tests in an au-tomotive environment. The MMI issimulated by a computer. With the“Wizard-of-Oz methodology” we testnew concepts in a kind of rapid pro-totyping, with a “wizard” observingthe test person and controlling theMMI. The test person gets the im-pression of controlling the MMI withgestures, speech or haptics. Promis-ing concepts are then transferredand implemented into reality.

This way we have developed, testedand implemented haptic, gesture,speech and multimodal controlledMMIs with adaptive, incrementalhelp systems and adaptive assis-tance systems.

Michael Geiger, Martin Zobl

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 13

Page 17: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.1.3 A Multimodal Mathematical Formula Editor for Audio-Visual Natural Interaction

Introduction

Electronic acquisition of mathemati-cal formulas via conventional tools isa time consuming and complicatedtask. Therefore, a soft-decision solu-tion for online handwritten formularecognition was successfully dem-onstrated in a former project[97win2]. Recently we developed anovel approach towards a multimo-dal analysis of natural, especiallyspeech and handwriting interaction,both being the fastest and most intu-itive channels for entering mathe-matical expressions into a computer[00hun1]. We utilize an integrated,multilevel probabilistic architecturewith a joint semantic and two distinctsyntactic models describing speechand script properties, respectively.Basic arithmetic operations, roots,indexed sums, integrals, trigonomet-ric functions, logarithms, convolu-tions, fourier transforms, exponenti-ations, and indexing (among others)are supported. Compared to classi-cal multistage solutions our single-stage strategy benefits from an im-plicit transfer of higher level contex-tual information into the lower levelsegmentation and pattern recogni-tion processes involved. For visual-ization and postprocessing purpos-es, a transformation into Adobe™FrameMaker™ documents is per-formed.

Methods

The syntactic-semantic attributes ofspoken and handwritten mathemati-cal formulas are represented by the

parameters of a so-called Multimo-dal Probabilistic Grammar. It com-bines properties of context-freephrase structure grammars withthose of graph grammars by allow-ing for word-type, symbol-type, andposition-type terminals.

The grammar is implemented into asingle stage semantic decoder bymeans of a compact semantic repre-sentation called Semantic StructureS [99mue1]. It is given by a hierarchi-cally structured combination out of apredefined inventory of Semuns s(semantic units) with correspondingtypes, values, and successor at-tributes, every unit referring to a cer-tain mathematical operator or oper-and. On the syntactic level everySemun of a given semantic hypothe-sis is assigned to a so-called Syntac-tic Module (SM). It consists of an ad-vanced transition network whichenables two distinct stochastic pro-cesses: 1) transitions from one nodeto another, 2) emissions of spokenwords, handwritten symbols, or localoffsets between associated symbolsor symbol groups [00hun2]. Transi-tions are responsible for modellingspeaking and writing order, whereasemissions account for varying word,symbol, or position choice, respec-tively. All the necessary transitionand emission probabilities as well assemantic type, value, and successorprobabilities were estimated fromtraining corpora obtained from sepa-rate speech and handwriting usabili-ty tests.

An extended Earley-type top-down

chart parser performs a MAP (maxi-mum a-posteriori) classificationacross all abstraction levels. On thesignal near levels, the preprocessedspeech or handwriting input se-quences are rated using 30-dimen-sional phoneme based semi-contin-uous HMMs (Hidden MarkovModels) or 7-dimensional DTW (Dy-namic Time Warping) matching, re-spectively. In a one pass search al-gorithm all possible semantichypotheses are successively testeduntil the best overall semantic repre-sentation of a given input is found.Due to a breadth-first search strate-gy, inline first-last processing is en-abled.

The most significant advantage of asingle stage semantic decoding ar-chitecture as it is used in this workresults from the simultaneous evalu-ation of knowledge belonging to allthe involved abstraction levels: Apartfrom self-focussing effects achievedby restricting the search process tolocally consistent sub hypothesesonly, semantically corrupt recogni-tion results are prohibited due to theintegrated expectation driven classi-fication scheme.

The overall system architecture issketched in fig. 6.

Results and Conclusions

For evaluation purposes we per-formed independent test classifica-tions in either modality. Fully spokenor handwritten realistic formulaswere examined, yielding a structuralrecognition accuracy of 61.1 % for

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 14

Page 18: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

speech and 83.3 % for handwriting(note: these numbers refer to com-plete formula correctness). For thefuture we wish to support freely inter-fering speech and handwriting inter-actions including mutual coreferenc-ing due to deictic wording and pen

gesturing. To this end, the use ofspeech will presumably be focussedto subterm input and error correc-tions so that we anticipate a robustand approximately real-time forth-coming system performance. Via aback and forth transformation to

FrameMaker™’s formula editor natu-ral speech and handwriting interac-tion may also be complemented byconventional input modes in the fu-ture.

Jörg Hunsinger

SyntacticKnowledge

Transitions

P(a1→z1)P(a2→z2)...

Emissions

P(B→w+)P(C→w_) Iterative

Training

Acoustic-PhoneticRepresentationand Knowledge

„Three times xsquare equalssquare root t“

SemanticLevel

SignalLevel

Speech Handwriting

Phoneme/Viseme Level

Syn

tact

ic L

evel

SemanticRepresentation

t1, ν1 t2, ν2

t3, ν3

t4, ν4

s1 s2

s3

s4

Mul

timod

al S

eman

tic D

ecod

ing

SyntacticRepresentation

SE

A

B

C

PreprocessingSegmentation

SyntacticKnowledge

Optic-GraphemicRepresentationand Knowledge

PreprocessingSegmentation

SyntacticRepresentation

Offsets

P(A1→o1)P(A2→o2)

Transitions

P(a1→z1)P(a2→z2)...

EmissionsP(B→σP)P(C→σS)

MultimodalProbabilistic

GrammarG = < Σ, P, V, T >

DataFusion

SE

B

C

A

Fig. 6: System overview of a multimodal mathematical formula editor.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 15

Page 19: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.1.4 FERMUS (Fehlerrobuste Multimodale Sprachdialoge – Error Robust MultimodalSpeech Dialogs)

The FERMUS project was started inMarch 2000 in cooperation with theindustry partners BMW AG,DaimlerChrysler AG, Siemens AGand Mannesmann VDO AG, with theprimary intention of localizing andevaluating various strategies to ana-lyze errors in information systems byusing various modalities which aremainly recognition-based.

In this regard, an effective error-management module is consideredto play a very important role. Ad hoc,three potential approaches can beidentified:

• the extensive use of multimodalinformation sources

• generation of new pieces of infor-mation by using adaptive dialogstructures

• context-sensitive interpretation ofthe available information.

A special goal is to investigate howthe robustness of technical systemscan be enhanced by using multimo-dal information already on the recog-nition level. By developing deter-mined dialogue-techniques andspecial strategies of system adapta-tion to the current situation and to theintention of the user we expect anadditional increase of system func-tionality because of a dynamic spec-ification. Moreover, the influence ofemotions, especially with regard tostress-situations, is intensively re-searched to enable a reliable sepa-ration and – in some cases – trans-formation of unusable to usableinformation.

The primary test domain is the han-dling of diverse communication facil-ities in an upper-class automobile(such as radio/cd, telephone, inter-

net etc.) in connection with the vari-ety of potential error sources by in-ternal and external troublesomeside-effects. In this context we arespecially interested in the modalityspecific impacts on the performanceof the overall system. For the exami-nations a simplified man-machineinterface which facilitates the opera-tion of basic information- and com-munication-devices is used.

The project bases on information ofalready completed projects with var-ious industry partners (mainly theBMW AG and the Siemens AG),particularly this holds for the exami-nation results concerning adaptionmechanisms as well as usability-tests with regard to multimodal com-munication.

Frank Althoff, Gregor McGlaun

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 16

Page 20: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.2 Visual Human-Machine Interaction Utilizing Image Interpretation

Introduction

Gestures and facial expressions areimportant components of interper-sonal communication. Using thesevisual modalities, human-machinedialog too can be provided with amore natural and intuitive form. Herewe will exclusively consider vision-based methods in order to achievenon-intrusive gesture and facial ex-pression recognition. The major top-ics in this field of research are speci-fication, implementation andevaluation of the human-machine in-terface and development of image-

based methods for automatic ges-ture recognition. It has turned outthat the user’s gestures can be mas-sively influenced by the graphicaldesign of the visual interface, there-fore it is very important to coordinatethe menu-driven handling and the vi-sual feedback in order to create agesture optimized application. Dur-ing the research and developmentprocess frequent usability-testshave been carried out, for which weprincipally use the “Wizard-of-Ozmethodology” (c.f. section 4.1.2).Most human gestures are move-

ments; therefore, a lot of informationis transferred by motion.

System Overview

The process of automatic gesturerecognition is described in the fol-lowing, fig. 7 shows the system over-view. The whole system does notneed any model of the target object.Thus the system can be easily trans-ferred to other objects or domains.The object in question (e.g. hand orface) is separated from the back-ground by spatial segmentation.This is done by color-segmentation

spatial segmen-tation + furtherpreprocessing

pixel-basedfeature

extraction

HMM-based,continous

recognition

dialog +internal state

control

functionality +graphics

generation

gesture

gesture end times

image processingand

gesture recognition

application

Hidden Markovmodels

USER

visual input(camera)

visual output(display)

visualcycle

indexes

Fig. 7: Block diagram of the system for automatic gesture recognition.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 17

Page 21: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

in an uncluttered environment withdefined lighting and a combination oflow-level image-processing algo-rithms with object-tracking in clut-tered scenes. The movement of thesegmented object is then classifiedwith stochastic models. HiddenMarkov Models (HMMs) are used forthis, which can reproduce non sta-tionary temporal processes. Thecentral problem with this model is tofind suitable features for transform-ing the spatio-temporal image se-quence (fig. 8, left) into a time se-quence of feature vectors (fig. 8,right). Different features, extractedfrom object-area or object-contour,have been implemented and tested.A further problem is to separate thecontinuous video stream of the ob-servation camera into meaningfulsections and non-meaningful sec-tions, like coincidental movementsor pauses. Temporal segmentationis done with a fast, however little ro-bust, two-level approach and a ro-bust, single-level approach withlarge computational cost (HMM-based spotting). The recognition

methods developed in this project al-low – depending on the used fea-tures – segmentation and classifica-tion for each type of movement.Thereby feature extraction methodshave been found, which are suitablefor recognizing dynamic human ges-tures and mimic. A real-time demon-stration system has been imple-mented, the complex functionality ofwhich can be controlled exclusivelywith dynamic gestures [98mor1,98mor2, 98mor3, 99mor1, P-L13].

Applications

Gesture recognition has beenbroadened to other domains likecontrolling electronic car devices(see ADVIA project, section 4.1.2)and navigation in VRML-worlds (seesection 4.1.1) with specificallyadapted, usability-tested visual in-terfaces and gesture vocabulary. Inorder to permit a more intuitive andhuman fitted handling of an applica-tion, the gesture recognition systemhas to be modified and improved.Currently, a gesture is defined as ahand movement with a defined start

and end position. The response ofan application takes place after therecognition of such a completed se-quence. Analyzing everyday situa-tions show that, in most cases thisform of indirect manipulation is usedwith gestures. But in some cases, itis reasonable to analyze and pro-cess the visual input directly whiledetecting a directional hand move-ment, for example setting analoguequantities or moving objects to a pre-cise point on the screen. The mainproblem here is to make the recogni-tion system distinguish automatical-ly between direct and indirect controlmodes. Further information sourcesin gestures are speed, amplitudeand repetition frequency, which arenot especially analyzed so far by thesystem. In addition to carrying im-portant information about theamount the user wants to change aparameter by, these features carryinformation about the user’s habitsand emotional situation.

Martin Zobl, Michael Geiger

time

features

time

Fig. 8: Temporal image sequence of grabbed gesture (left) and corresponding set of features (right).

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 18

Page 22: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.3 Adaptation and Learning Strategies

4.3.1 Probabilistic Modelling of the User Intention for Adaptive Human-MachineInteraction

The information sources which needto be exploited for user adaptionstrongly depend on the goal of adap-tion and the application itself. Tutor-ing systems for example providesome kind of guidance or training ona special topic, the user’s goal is ob-vious as he uses the tutoring sys-tem. Other applications allow a vari-ety of different intentions and thecomplexity of these applicationsdoesn’t allow direct inference of theuser intention. However, knowingthe user intention is the key for an ap-propriate adaptive dialog and sys-tem modelling. To calculate an esti-mation of the user’s goal, wedeveloped two different approaches,both based on probabilistic net-works, which provide methods ofdealing with uncertain and incom-plete information.

A plan recognizer is used to infer theuser intention by considering recentuser actions. Depending on the ap-plication, the user’s plans differ oftenvery much from optimal plans, i.e.users behave suboptimal. Our plan-

based approach has been optimizedfor such applications, consideringthat even suboptimal behavior maysupport plan/intentions-hypothesesto a certain extent. Trying to deter-mine the user intention merely froma small number of actions obviouslyentails a risk to fail, i.e. not to esti-mate the real user intention. To re-duce the risk of the system not to of-fer help or assistance for the realuser intention, a help system wasdeveloped that is capable of creatinghelp texts according to a number ofnearly equally likely plans withoutoverstraining the user’s cognitive ca-pabilities.

The second approach has been de-veloped for scenarios/applicationswith varying situations entailingchanges of the user’s preferencesand intentions. In a car the driver’sintentions and preferences stronglydepend on location, weather, speed,traffic and inmates, for example.Therefore a number of differentprobabilistic expert systems basedon probabilistic networks have been

created which are capable of learn-ing typical user behavior and inten-tions according to the current situa-tion. Real-time online training of theinfluence of each situation parame-ter on the user behavior allows theexpert system to infer the user inten-tion in similar situations. As a result,reasoning about a certain situationdoesn’t require the expert system tobe trained to that exact situation.

Having an estimation of the user in-tention offers a lot of potential for en-hancing the human-machine dialog.For example knowing the user inten-tion helps to reduce system requestsand to formulate system requests inregard of the situation and the user’sgoal. Additionally we use the estima-tion of the user intention not only foruser adaption, but also for a user andsituation specific evaluation of pat-tern recognition-based inputs to im-prove recognition rates.

Marc Hofmann

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 19

Page 23: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.3.2 Neuro-Fuzzy based Architecture for Help and Tutoring Systems

In order to facilitate the interactionwith modern software systems, anadaptation to the user is to beachieved. Therefore the systemmust be capable of gaining and valu-ing information about the user. Fun-damental methods and approachesto the adaptation of a software sys-tem to a user can be developed anddemonstrated best within the area of„intelligent help and tutoring sys-tems“. Central viewpoint is the mod-elling of the knowledge status of theuser. We examine both statisticaland rule-based methods for learninguser models, which are to enable anindependent, user-adequate adap-tation of the help or tutoring system.Essentially the architecture underly-ing the system consists of a combi-nation of neural networks and fuzzylogic, one speaks also from „neuro-fuzzy and soft computing“.

In the first phase of the system cre-ation (training) we use the informa-

tion won from user actions and af-flicted with a factor of uncertainty(preprocessing) in order to generateand train an easily modified neuralnetwork. Depending upon applica-tion a generalized regression net-work, a probabilistic neural network,a competitive neural network or aself organizing map is taken as a ba-sis. In order to model different char-acteristics of the users optimally,also a combination out of severalnetworks can be used. In the phaseof use it is now possible with this net-work to classify a user in differentcategories to give him appropriateassistance embedded in the context.With the user actions evaluated inthis phase we re-train the networkonline. Thus an by and by increasingadaptation of the system to the indi-vidual user is achieved.

We develop and test the fundamen-tal architecture in different projects.As a part of the ADVIA project (see

4.1.2) an adaptive help system iscreated as supporting introductionto the gesture controlled operation ofan MMI in the vehicle. In order to in-fer the neediness and the knowledgestatus of the driver, in particular theexecution quality of the gestures(confidence measure of the gesturerecognition), the execution durationof the gestures, as well as the num-ber of assistance request, in eachcase as a function of the context areanalyzed here. Our goal is to supportthe user with an automatic, but unob-trusive help to avoid or minimize theneed of help requests. Beyond thatan expansion of the help system isconceivable on the entire, multimo-dal operation of the MMI. Further-more we develop a tutoring systemfor an introduction in creating HTML-pages.

Ralf Nieschulz

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 20

Page 24: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.4 Emotional Aspects in Human-Machine Communication

Nowadays, it is well known that emo-tions play a fundamental role in per-ception, attention, reasoning, learn-ing, memory, decision making andother human abilities and mecha-nisms we generally associate withrational and intelligent behavior. Inaddition to verbal communication,nonverbal communication repre-sents an important factor to enable areasonable, purposeful and efficientinteraction between people. Besidethese communicative functions,nonverbal behavior and especiallyfacial expression can provide infor-mation about the current affectivestate of a person. Correspondingly,we can’t afford to neglect emotionsanymore, if we want to develop fu-ture computer systems capable ofsolving complex problems or to inter-act with humans in an intelligent way,that means to create human-like be-havior in machines.

Numerous questions to be dealt withare raised in the context of emotionsin human-machine communication:which emotions occur when humans

interact with computer systems, howcan they be recognized, which stim-uli cause these emotions, in whichway should a computer system re-spond to emotional reactions of theuser, etc. Once these problems getsolved, the computer system capa-ble of emotions would register thesignals sent out by the user, recog-nize the inherent patterns and as-similate this data in a model of theuser’s emotional reactions. Then thesystem would be able to transmituseful information about the user, forexample to applications that can usesuch data.

The probably largest applicationarea might be future generations ofhuman-machine interfaces, whichwill be able to recognize the emo-tional states of the users and react tothem adequately. For example, if anuser becomes frustrated or annoyedwhile interacting with an application,the system might respond to theseemotional states of the user, prefera-bly in such a way, that the user wouldsense it as intuitive. This way, follow-

ing the interaction between humans,it might be possible to achieve amore natural human-machine inter-action.

By integrating emotions as an addi-tional modality to haptics, speech,gesture, etc. it might be possible toimprove the error robustness of thecontrol of technical systems. Withinthe framework of the FERMUSproject (as described in section4.1.4) research on this topic will bedone. Thereby, the operation of com-munication devices in automobileslike phone, radio/cd, internet willserve as application area. Emotionalstates of the user will be used as asupplementary control element toachieve a better adaptation of thesystem to the current situation of theuser. Within the studies it is of partic-ular interest to detect in which casesystem errors lead to negative emo-tional reactions of the user and howit is possible to avoid them or at leastto weaken them.

Karla Geiss

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 21

Page 25: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.5 Automatic Speech Recognition and Speech Understanding

In many applications of modern hu-man-machine interaction it seems tobe necessary to allow continuousspeech as input and output of thesystem. The typical task of continu-ous speech recognition consists inevaluating the phonetic informationof an utterance (e.g. a sentence) andto represent the result as a chain ofwords which are defined in a lexicon.This task usually is carried out on thebasis of single procedures or “mod-ules” which perform the preprocess-ing of the acoustic speech signal, theapplication of phonetic units (pho-neme models), classification ofwords, utilizing syntactic constraints,recognizing the complete sentence,and analyzing the semantic mean-ing of the spoken utterance with re-spect to a given, well-defined task.

The research tasks carried out at theInstitute for Human-Machine Com-munication are:

• Preprocessing of the speech sig-nal

• Analysis and extraction of acous-tic-phonetic features

• Definingproperacoustic-phoneticdecision units

• Modeling of acoustic-phoneticunits by Hidden Markov Models(HMM)

• Application of Neural Nets (NN)• Pronunciation rules• Stochastic language models• Algorithms for fast search• Training procedures• Adaptation methods (e.g. speaker

adaptation, channel normaliza-tion)

• Stochastic methods for speechunderstanding

• Development of complete sys-tems

Since speech sounds are character-ized especially by their spectralproperties, a preprocessing step cal-culates short-time spectra in time in-tervals of about 10 ms. Beyond that,it is advantageous to take into ac-count the properties of the auditorysystem. For this purpose the Bark-or mel-scale should be chosen in-stead of a linear frequency axis. Abasic model of human auditory fre-quency analysis can provide so-called “loudness spectra” which areespecially suited for speech recogni-tion.

A fundamental problem are the coar-ticulation effects which cause astrong influence between neighbor-ing speech sounds. As a result, thefeature vectors can be strongly de-pendent of the phonetic context, sothat a phoneme unit cannot be de-scribed by a single pattern. In ourwork we tried to use syllables orparts of syllables as decision units,these are syllabic consonant clus-ters and syllabic nuclei (vowels andvowel clusters). By use of these clus-ters the main coarticulation effectsare contained within the units. As analternative, so-called triphones canbe introduced which constitute eachphoneme together with a specific leftand right phoneme context.

The classification of phonemes usu-ally is based on stochastic modelingby means of “Hidden Markov Mod-els” (HMM). These models consist ofstates within a left-to-right Markovgraph, whereby the states containthe distributions (pdf) of the feature

vectors observed in these states.During the training step these distri-butions are determined from a largetraining set. The research work now-adays is concentrated on fast esti-mation methods and the adaptationof the HMMs to new speakers andnew environments.

It seems rather attractive to combinethe HMM approach with Neural Nets(NN) giving hybrid models. The pdfswithin the HMMs can be properlycalculated by means of NNs. In thisway it is possible to get a discrimi-nant representation of the featurevector distributions. These NNs canalso favorably be used for determin-ing the syllable nuclei and for mea-suring the speaking rate.

It is important to allow for pronuncia-tion variants, which may be depen-dent on the speaking style and thespeaking rate. These variants canbe determined by inspection of largespeech corpora or on the basis ofrules.

Speech understanding can be logi-cally divided into a speech recogni-tion phase and an interpretationphase. It is important to evaluateboth tasks in common. This means,the search for the chain of spokenwords and the corresponding se-mantic structure is carried out simul-taneously. This is facilitated if the se-mantic structure is described bystochastic modeling techniques, too.In our work the semantic structure isbuilt up by semantic units which con-tain the single parts of the semanticmeaning. The semantic understand-ing module describes the syntacticand semantic structure within a

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 22

Page 26: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

probabilistic framework. Thus, rec-ognition and understanding are ableto yield a common optimal result.

Günther Ruske

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 23

Page 27: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.5.1 Robustness to Speaking Rate in Automatic Speech Recognition

The processing of spontaneouslyspoken human-to-human dialoguesis a special challenge for automaticspeech recognition systems. Com-pared to read speech, where thespeaking mode is well defined, inspontaneous speech several sourc-es of variability contribute to chang-es in the speech signal. Among oth-ers, the speaking rate is animportant factor influencing thespeech signal. Unlike human listen-ers, which can cope with a largerange of speaking rates without anyproblem, state of the art automaticspeech recognition systems showsevere degradations in recognitionperformance when the speakingrate is higher than normal. Thereforeseveral approaches to improve ro-bustness of hidden Markov model(HMM) based automatic speech rec-ognition systems towards speakingrate were evaluated.

The implications of the speaking rateon the speech signal can be dividedinto three categories. There are tim-ing effects, acoustic-phonetic andphonological effects resulting from

varying speech rates. WhereasHMM-based recognizers can handledifferent lengths of phonetic unitswithout any problem, the speechrate specific changes in the spectraldomain as well as the use of variantsdiffering from the standard pronunci-ation in general lead to higher confu-sions in the pattern recognition pro-cess. To be able to capture theseeffects, the corresponding knowl-edge sources of the recognizer haveto be adapted. These are the acous-tic models which represent the spec-tral characteristics of the phoneticunits and the pronunciation dictio-nary which includes the pronuncia-tions of the vocabulary of the recog-nition task. In comparison to speakeradaptation this adaptation problemis more difficult to handle, as thespeaking rate can change even with-in one sentence, whereas thespeaker’s identity remains constant.

Firstly, in an explicit adaptation strat-egy comprising a rule and featurebased estimation of the speakingrate [98pfa1] was combined with aswitching between several speech

rate specific recognition systems. Aretraining of the acoustic models aswell as changes in the pronunciationdictionary were performed to buildthe speech rate specific recognizers.Special emphasis was put on robustreestimation procedures for HMM-parameters such as maximum apos-teriori (MAP) training [98pfa2]. Thisis a very important issue as thespeech rate specific speech materialfor parameter reestimation is gener-ally very limited.

Secondly, different types of normal-ization procedures were carried outto reduce the variations which haveto be captured by the parameters ofthe HMMs. It proved to be useful ei-ther to reduce speech rate specificor speaker specific variations[99pfa1] as well as to capture pro-nunciation variants [99pfa1] to in-crease robustness towards speak-ing rate. Finally, it was also shownthat the improvements of the differ-ent normalization procedures are al-most cumulative [00pfa1].

Thilo Pfau

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 24

Page 28: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.5.2 Robustness to Adverse Conditions: Speaker and Speaking Mode

In the past years there has been tre-mendous progress in the perfor-mance of speech recognition sys-tems. Nevertheless there is still avariety of non-solved problems,such as different speakers, speakingstyle (fast, slow), noise.

It is well known, that speaker-depen-dent (=specifically trained for aspeaker) systems outperform theirspeaker-independent counterparts.So in the recent years there hasbeen a turnaround in the design ofspeech recognizers: most of themnow inhere some kind of speaker-dependency.

Especially for closed-speaker sys-tems, i.e. systems with a limitednumber of users, a combination ofspeaker identification and speaker-adaptation is very effective. Theidentification part allows the systemto determine which speaker is cur-rently using the recognizer, enablingthe system to successively collectenrolment data for this particular

speaker. By this way the system canbe adapted by-and-by to the individ-ual users, asymptotically reachingthe performance of a speaker de-pendent system.

The key part of an identification sys-tem are the speaker models. Com-monly, either Hidden Markov Modelsor some kind of vector quantized(VQ) codebooks can be used. Amain advantage of the latter modelsis the fact, that they do not rely onany phonetic segmentation of thespeech signal, a fact that is particu-larly advantageous for on-line appli-cation. So these models can betrained on whole utterances withoutthe need of a phonetic segmenta-tion. Such codebooks are usuallybuilt of K centroids, which can betrained using clustering techniquesor sophisticated discriminative train-ing algorithms.

With an increasing number of speak-ers the computational load becomesmore excessive in the search pro-

cess since all models have to be cal-culated in parallel. In this case effi-cient pruning strategies for thediscarding of improbable speakershave to be applied. Another effectivestrategy lies in the application oftree-based speaker clusters, whichcan be pre-computed on the trainingdata.

By collecting more and more enrol-ment data, the actual Hidden Mark-ov Models can be adapted towardsthe speakers and their mode ofspeaking. The adaptation itself canbe performed using training algo-rithms such as MLLR (MaximumLikelihood Linear Regression) incase the enrolment data is stillsparse. If more data is available,Bayesian techniques like MAP(Maximum Aposteriori) or discrimi-native algorithms like MCE (Mini-mum Classification Error) or MMI(Maximum Mutual Information) caneffectively be applied.

Robert Faltlhauser

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 25

Page 29: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.5.3 Understanding Natural Speech

Human-machine interfaces may be-come much more intuitive and effi-cient if we enhance them by estab-lished modalities of interpersonalcommunication.

Speech is considered as the mostcommon and therefore natural com-munication medium between humanbeings and offers advantages likehand-freeness and the ability ofworking in the dark when being usedas an input medium. State-of-the-artspeech recognizers in general allowonly single-word commands orphrases containing a certain key-word. The research work carried outat the Institute for Human-MachineCommunication led to a different ap-proach which encourages the userto speak spontaneously in a mostnatural manner without the require-ment of any learning process [P-L4,P-L5, 97mue, 98mue3, 97sta]. Theonly restriction of our system is thelimitation to a single domain and asingle language at a time. An aver-age recognition rate of the right us-er’s intention of 90% could bereached in different domains[98mue1].

The core of the system is realized ina top-down architecture of a one-stage maximum a-posteriori seman-

tic decoder and a signal preproces-sor. The stochastic semantic decod-er utilizes pre-trained probabilisticknowledge on the semantic, syntac-tic, phonetic and acoustic levels. Anintegrated chart parser makes use ofthe Viterbi algorithm, calculating se-mantic, syntactic, and acoustic prob-abilities on the basis of Hidden-Markov-Models (HMM) and similarnetwork structures. Its output, a se-mantic structure, is the input for arule-based intention decoder whichis placed “on top” of the system core.This decoder communicates bi-di-rectional with an external applicationand allows for online control, provid-ed that the application owns a well-defined command interface. One ofthe basic advantages of such an ap-proach is the portability to differentdomains. Front-ends for several ap-plications have already been suc-cessfully realized following this prin-ciple, e.g. for a graphics editor, for aservice robot, for medical image vi-sualization, for scheduling dialoguesand for a mathematical formula edi-tor. It will also be applied in the ongo-ing projects “speech in the automo-tive environment” (c.f. 4.1.2) and“navigation in VRML worlds” (c.f.4.1.1).

Even the difficult task of automatictranslation was successfully exam-ined with this approach [99mue1].The speech understanding systemplus a language production moduleare capable of translating Germanphrases of a special domain into En-glish, French or other languages.This module contains a word chaingenerator with the syntactic modeland a linguistic postprocessor in-cluding grammar rules and the in-flection model of the target lan-guage. Another add-on is anautomatic HMM based speech de-tection module. It allows the user tospeak at any time without pushingany button to indicate the beginningor end of dialogue instances.

Future research will deal with un-known words, confidence measuresand the idea of integrating the inten-tion decoder into the semantic de-coder. The latter idea promises high-er recognition performance due tothe more abstract and less ambigu-ous nature of the intention comparedwith the semantic structure. We alsoaim to exploit the detailed contextualknowledge obtained from the cur-rent dialogue state to integrate con-straints for even more robustness.

Björn Schuller

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 26

Page 30: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

intentiondecoder

preprocessor

semantic decoder external

application

Speech Understanding System

OS I

speech signal

referencemodel

phoneticmodel

acousticmodel

P(O | Ph)P(Ph | W)

semantic-syntactic

model

P(W | S) and P(S)

application specific rules

O observationS semantic structureI intention

System overview: The frames indicate parts that have to be adjusted for new domains.Fig. 9: Overview of a speech understanding system. The frames indicate partsthat have to be adjusted for new domains.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 27

Page 31: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.6 Technical Acoustics and Noise Evaluation

4.6.1 Advanced Numerical Techniques in the Field of Vibro-Acoustics

Noise and vibration in the passengercompartment of a vehicle contributesubstantially to the overall impres-sion of a cars quality and thereforehave a great influence on the buyingdecision of a customer. Because thetime to market is constantly decreas-ing while the pressure to save costsis increasing, the developing engi-neers seek to define and optimize avehicles vibrational and acousticcomfort characteristics as early aspossible – even before the first proto-type has been built. Due to the ad-vances in the field of numericalstructural acoustics and in computertechnology, the finite element mod-els used for this type of calculationhave increased in size continuously,nearly overcompensating the fastgrowing performance of moderncomputers. Therefore one major de-ficiency of the FE-method are thevery long computing times, whichare in the range of several hours for asimple frequency response functionup to several days for an optimiza-tion task even on a state of the art su-per-computer.

This fact, together with the batch-ori-ented computing style and the un-wieldy and non intuitive user inter-faces of commercial FE-calculationprograms prevent a flexible and cre-ative employment of FE-methodsand a detailed understanding of thestructural-acoustic coupling phe-nomena.

These problems have been suc-cessfully addressed in a cooperationwith BMW during the last threeyears.

Using a new approach for the de-scription of the vibro-acoustic equa-tions of motion of model variants thecomputing time for modificationsand optimization of coupled fluid-structure systems could be drasti-cally reduced. The so-called modalcorrection technique allows to imple-ment the entire optimization loop us-ing generalized coordinates for thedescription of the dynamic state ofthe system, thus reducing the prob-lem size roughly by a factor of 1000.Additional decrease of problem sizeand computation time was achieved

applying an adaptive mode reduc-tion technique during the solutionstage of the modal equations and bystreamlining the dataflow and work-flow.

The improvements render it possibleto carry out vibro-acoustic calcula-tions on a workstation with FE-mod-els, which up to now could only behandled by super-computers. As aconsequence, it is now possible torun the FE-calculations and optimi-zations in a quasi on-line, interactiveway. This is still between 5 up to 120times faster than the conventionalapproach on a super-computer.

To demonstrate the benefits of thisnew approach a software calledVAO (Vibro-Acoustic Optimization)was developed, which offers a fullpalette of tools for the visualization,investigation, modification and opti-mization of coupled structure-fluidsystems.

Lars Witta

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 28

Page 32: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.6.2 A new Transfer Path Technique for the Analysis and Optimization of VehicleExterior Noise Characteristics

In most countries, the noise level ofvehicle exterior noise is regulated bylegislative limits. These restrictionshave been intensified over the lasttwo decades and further reductionsof exterior noise level will be ratified.This trend requires much moreacoustical optimization of cars. Onthe other hand, development peri-ods in the automotive industry de-crease, and a modification in thepre-production state is hardly possi-ble. To meet this challenge, the de-partment develops a new method foranalyzing the exterior noise in coop-eration with the BMW Group Mu-nich. The method provides an easyway for calculating the exterior noiseof a vehicle in the early developmentstage.

The ISO R 362 regulation states therequired measuring procedure of thevehicle exterior noise. The car has toaccelerate on a 20 meters track,where a microphone measures thenoise 7.5 meters beside the lane.According to this pass-by test proce-dure, the maximum noise level is re-stricted to 74 dB(A) [1]. For acousti-cal analysis and developments, theBMW Group set up a pass-by testchamber, where the pass-by testcan be simulated. In principle, it is

the same situation as in ISO R 362,but with standing car and moving mi-crophone [2]. All necessary mea-surements will be performed in thistest chamber.

The exterior noise is being generat-ed by a couple of noise sources likemuffler, orifice, engine, etc. In orderto reduce the exterior noise level ef-fectively, the noisiest source has tobe reduced. Therefore it is neces-sary to know the noise ranking of allcomponents. A lot of effort has to bespent to determine this noise rank-ing [3]. The new method offers a niceand quick solution for this task [4]. Itis divided into 3 main steps:

Step 1: Measurement ofoperation state

The noise field will be detected in thefar field and near field by about 100microphones pk

op . For this setup, thecar is in an operational state.

Step 2: Measurement ofTransfer-Functions

The noise sources will be replacedby loudspeakers (see fig. 10). Eachloudspeaker is driven with whitenoise and the transfer function fromeach speaker to every microphonewill be detected. The loudspeaker

parameter for the transfer functionswill be the membrane accelerationaj :

Tkj = pkLS/ aj .

Step 3: Synthesis of operationalsound field

The principle idea is to replicate theoperational sound field by loud-speakers. In order to synthesize themeasured microphone sound pres-sure (step 1), the necessary speak-er adjustments have to be calculat-ed. The sound pressure synthesiswill be virtually only. Therefore, instep two the transfer function of eachloudspeaker is measured and instep three all loudspeakers will beadjusted and activated simulta-neously. In this way, each speakersounds like the component which ithas replaced.

With this result, the noise ranking ofall noise sources during the pass-bytest can be determined. Further-more, the calculation of the pass-bynoise of a car with modified sourcesis possible. The only adjustments tobe done are the loudspeakersounds, according to the modifiedcomponents.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 29

Page 33: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

References

[1] Kevin, J.: Measuring VehiclePass-by Noise. Automotive Engi-neering, March 1995, pp. 28-32.

[2] Eilker, R. et al.: Die neuen BMW-Versuchseinrichtungen für Akustik.Automobiltechnische Zeitschrift ATZ88 (1986), Heft 4, pp. 219-230.

[3] Buschmann, J.: Vorbeifahrt-geräuschmessung mit simultanerErfassung von Drehmoment undSchlupf an einem geräuschiso-lierten Fahrzeug. VDI-BerichteNr. 1224 (1995).

[4] Freymann, R.; Stryczek, R.;Riess, M. and Demmerer, S.: A newCAT-Technique for the Analysis andOptimization of Vehicle ExteriorNoise Characteristics. ImechE Conf.Trans. „Vehicle Noise and Vibration2000“, London 2000.

Stephan Demmerer

far fieldmicrophones

near fieldmicrophones

Fig. 10: Setupofstep2. Far field and nearfield microphonesplaced around the car.The noise sources arereplaced by loud-speakers.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 30

Page 34: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.6.3 Noise Evaluation

The concept that physical noiseevaluation has to be based on fea-tures of the human hearing systemwas further put forward and de-scribed in detail in invited review pa-pers [96fas2, 97fas1, 97fas3,97fas4, 99fas1, 00cha1]. The de-scription of noise emissions by loud-ness as standardized in DIN 45 631is nowadays commonplace in mostacoustic labs worldwide.

On the other hand, a firm psychoa-coustic basis had to be establishedwith respect to the physical mea-surement of noise immissions. Nu-merous studies were performed onindustrial noise [97ste2] and trafficnoise [98got2, 99got1, 00kuw1]. Inparticular, for the first time, the “rail-way bonus” could be confirmed alsoin laboratory studies [98fas1,00kuw1]. In addition, the noise im-mission from leisure noise was stud-ied for the example of tennis noise

[98ste1, 99fil1].

As a global result it turned out thatthe percentile loudness N5, i.e. theloudness which is reached or ex-ceeded in 5% of the measurementtime, is a good indicator for the im-pact of noise immissions [00fas10].Even the effects of “railway bonus”and “aircraft malus” can be predictedon the basis of N5 [00fas7].

Despite the fact that loudness con-stitutes a dominant feature in the rat-ing of sound quality, in particular forsounds with similar loudness, otherhearing sensations like sharpness,fluctuation strength, or roughnessmay play an important role[00fas11].

For physical evaluation of sounds,new metrics were proposed whichare described in an overview paper[98fas4]. In particular with respect totemporal aspects, signal processingalgorithms in loudness analysis sys-

tems have to mimic in great detailfeatures of the human hearing sys-tem [98wid1, 98wid2, 98wid3].

With respect to loudness of station-ary sounds, data of different analysissystems available on the market arein good agreement with deviations ofless than 5 % [97fas2]. In contrast,with respect to temporal processing,huge differences may occur for in-struments of different manufacturersaccording to the degree of sophisti-cation of the algorithms implement-ed [98fas3].

As concerns the physical measure-ment of noise immissions, statisticalprocedures were developed and im-plemented, which predict the accu-racy of measurement, which can beachieved in a given measurementtime [97ste4].

Hugo Fastl

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 31

Page 35: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.7 Psychoacoustics and Audiological Acoustics

4.7.1 Psychoacoustics

A new, updated and extended edi-tion of the book “Psychoacoustics –Facts and Models” was published.Some older material was re-ar-ranged, text was adapted to currentterminology, and new results wereadded, in particular in the chapterson pitch, fluctuation strength, rough-ness, and practical application(Zwicker and Fastl [99zwi1]).

The hearing sensation “pitchstrength” was studied in great detailand the results are compiled in anoverview paper (Schmid [99sch1]).

Among other things it could beshown that modulations enhancethe pitch strength of low pass noise[98sch3], but reduce the pitchstrength of pure tones [97sch3]. Forcomplex tones, the interaction of“pointing tones” and pitch strengthwas studied in detail [97cha,98sch1, 98sch2]. Correlations be-tween pitch strength and frequencydiscrimination [98fas6] as well as ef-fects of vibrato on the pitch strengthwere established [98hut1].

With respect to the Zwicker-Tone it

could be demonstrated that alsocombinations of pure tone plus lopesnoise, pure tone plus band passnoise, as well as pure tone plus bandstop noise can produce a Zwicker-Tone [00fas11]. Current psychoa-coustic models of the Zwicker-Tonecould be confirmed. In addition, neu-rologically based models of theZwicker-Tone nicely account for thepsychoacoustical facts (paper inpreparation).

Hugo Fastl

4.7.2 Sound Quality Design for High Speed Train Interior Noise

In cooperation with Müller BBM andwith the Deutsche Bahn AG investi-gations concerning the topic “Soundquality design for high speed train in-terior noise” were carried out.

For the specification of the soundquality inside future high speedtrains, tonal components producedby the motors or corrugated rails canplay an important role. Therefore, inpsychoacoustic experiments, thedominance of tonal components at630 Hz or 1250 Hz was assessed[99hut1]. For an increase of the cor-responding 1/3-octave band by20 dB, a clear tonal character is au-dible which is only half as pro-nounced for an increase of 12.5 dBat 630 Hz or 10 dB at 1250 Hz. Inline with the expectation, no tonal

quality is perceived, if the 1/3-octaveband in question is not enhanced.However, a decrease of sound ener-gy in an 1/3-octave band by 20 dBcan also produce a faint tonal sensa-tion with a magnitude of about 1/10of the tonal sensation produced byan increase of 20 dB. The results ob-tained with stimuli simulating thesound quality inside high speedtrains are in good agreement withdata from basic psychoacoustic ex-periments. Therefore, it is expectedthat sound quality evaluation of highspeed train indoor noise can profitfrom a multitude of psychoacousticdata available.

Another aspect investigated is thedisturbance of privacy in high speedtrains which has become more evi-

dent in recent years due to success-fully performed noise reductionmeasures [00pat1]. In psychoa-coustic investigations, the contradic-tory requirements of unwantedspeech intelligibility disturbing priva-cy on the one hand, and the desiredsound quality on the other hand,were assessed. Early results showthat in order to ensure privacy inlarge cabins of high speed trainsacross the tiers and at the same timenot to reduce sound quality much, in-tensive shielding measures wouldbe necessary. By means of basicpsychoacoustic tools quantitative re-sults are received by which a rea-sonable cost-benefit calculationcould be established.

Christine Patsouras

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 32

Page 36: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.7.3 Audiological Acoustics

The method of “line length” whichhas proven very successful for theevaluation of noise immissions wasadapted for its use in audiology[98got1]. In comparison to the pres-ently used categorical scaling[97bau], advantages for clinical ap-plications were verified.

For the Ukrainian language, aspeech test was developed, realized

and recorded on CD [98cha1]. More-over, the intelligibility of monosylla-bles in background noise was testedfor the languages German, Hungari-an, and Slovene [97ste1].

For patients with Cochlea-Implants,their ability to understand speechwas tested both in quiet and in back-ground noise [98fas2]. While in quietsurroundings, speech perception of

Cochlea-Implant-Patients can be re-stored nearly to normal, in noisy en-vironments they experience extremeproblems and need in comparison tonormal hearing persons more than15 dB better signal to noise ratio[98fas1].

Hugo Fastl

4.7.4 Perceptual Consequences of Hearing Impairment: Modeling, Simulation andRehabilitation

Nowadays about 12 % of the popu-lation in Germany suffer from hear-ing disorders. In order to develop sig-nal processing algorithms and fittingprocedures for hearing aids, detailedknowledge about perceptual conse-quences of hearing impairment isvery important. However, this knowl-edge still is far from being complete.

Therefore, based on results of psy-choacoustic experiments, Zwicker’smodel of loudness [99zwi1] wasmodified such that also the loudnessfor hearing impaired listeners can bepredicted. This is achieved by fittingthe loudness function to a specifichearing loss. Since the specific loud-ness time pattern can be regardedas an aurally adequate representa-tion of sound, various other psycho-acoustic hearing sensations like“fluctuation strength” can be mod-eled on the basis of this pattern. Re-sults from hearing experiments con-cerning “loudness fluctuation” –

which is nearly the same as “fluctua-tion strength” – can be accounted forby the same model for normal andhearing impaired listeners, if loud-ness fluctuation is calculated fromthe modified specific loudness timepattern [99cha1, 00cha2].

From the specific loudness functionsof normal and hearing impaired lis-teners, input-output functions for asignal processing system, whichsimulates hearing impairment, arededuced [00cha1]. Time-frequencyanalysis and synthesis within thissystem is done by the Fourier TimeTransformation and its inverse [1,98mum1]. Simulation of hearing im-pairment is helpful in evaluatinghearing aid algorithms and fittingprocedures for certain hearing loss-es as well as for providing normalhearing listeners with realistic dem-onstrations of auditory consequenc-es of hearing impairment.

In cooperation with “GEERS Höra-kustik” it was investigated, how themodels of loudness and loudnessfluctuation can be applied in an inter-active psychoacoustic fitting proce-dure (“A-Life”).

Moreover, in order to improvespeech intelligibility in noise for hear-ing impaired listeners, the influenceof so called “psychoacoustic proces-sors” on speech intelligibility was as-sessed [00cha3] and – in coopera-tion with “Gasteig München” –magnetic induction loops, which arecommonly used for speech trans-mission in churches and concerthalls, were optimized [00cha4].

Supplementary reference

[1] Terhardt, E.: Fourier transforma-tion of time signals: Conceptual revi-sion. Acustica 57, 242-256 (1985).

Sepp Chalupper

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 33

Page 37: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Res

earc

hTo

pics

4.8 Acoustical Communication

The main subject of research in thefield of Acoustical Communication isthe relation between the propertiesof acoustical signals on one hand,and the sensory analysis techniquesof the human ear on the other hand.The book “Akustische Kommunika-tion“, published by Professor ErnstTerhardt in 1998, treats almost anyaspect of this interdisciplinary field ofresearch and includes an audio-CDwith demonstrations of various phe-nomena of auditory perception[98ter1].

In the four-year period of this report,research focused on the foundationsof ear-adapted spectral representa-tions, their relation to hearing per-ception, and their technical applica-tion. The following paragraphs givean overview of the research topics.

Linear Model of Peripheral-EarTransduction

A system of linear filters was de-signed for modelling auditory pre-processing of sound [97ter]. It con-sists of a filter that accounts for thefirst two resonances of the ear-ca-nal, and a filterbank that accounts forthe spectral analysis of the inner ear.The design of the cochlear filter,which forms an element of this filter-bank, takes advantage of knowledgeabout properties of the ear, in partic-ular the threshold of hearing and thecharacteristics of tuning curves.Since the signal processing of thesystem allows easy digital computa-tion, it is especially suited to be usedas a front-end for models of morecomplex features of auditory percep-tion.

Speech Coding

Speech coding algorithms were de-veloped and investigated that arebased on contours of auditory spec-trograms, which are computed withan advancement of the Fourier TimeTransformation (FTT) [1, 98mum1].These contours are defined as “ridg-es” in the 3D-representation of themagnitude spectrogram. Contoursbearing relevant information corre-spond, among other, to part tonesperceptible by the ear. Starting offfrom a known representation, addi-tional ridges (“time contours”) and anew signal reconstruction procedureare introduced. A classification ofridges allows to separate tonal fromnoise-like signal components. Ap-plying these foundations to data re-duction algorithms, speech codecswith data rates down to 4 kbps werepresented and evaluated.

Sound Quality of Piano Tones

Models and algorithms were devel-oped to extract those sound signalparameters that are characteristicfor the specific sound of a piano toneand its quality [98val1, 98val2,00val1]. Just as the speech codingapplication mentioned before, the al-gorithms rely on the significance ofcontours of FTT spectrograms. Lis-tening experiments were carried outto investigate the perceived dissimi-larity and the quality of the sound ofdifferent piano tones. The resultswere utilized to design a model forthe sound quality of piano tones andto validate the predictions. The newmethods for the measurement of thediscrimination criteria can suit the

purpose of enhancing the quality ofelectronic and acoustic pianos andcould be used for automatic qualitycontrol in musical instrument manu-facturing.

Pitch Determination

A system for determination of the vir-tual and spectral pitches of non-sta-tionary sound signals was devel-oped [98rue1, 00rue1, 00rue2]. Itallows for those essential propertiesof the human ear that can be ob-served in psychoacoustical experi-ments concerning pitch perception.Besides the elementary feature offrequency selectivity, mainly spec-tro-temporal contrast effects are tomention in this regard. These effectsare not being addressed by previ-ously known systems. The devel-oped system is able to model boththe existence and the prominence ofperceived pitches in time-variantsound signals. Also this system usescontours of an auditory FTT spectro-gram as a basis.

Graphic Sound Modification

A method to resynthesize CD-quali-ty sounds from their auditory FTTspectrogram images was found,suggesting sophisticated soundmodification via image processing[98hor1]. In particular, resynthesis ofspecifically selected spectrogramareas has confirmed the validity ofstrong audio-visual gestalt analo-gies.

Supplementary reference

[1] see section 4.7.4.

Claus von Rücker

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 34

Page 38: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Coo

pera

tive

Res

earc

hP

roje

cts

5 Cooperative Research Projects

5.1 Industry Partners

BMW AG, München

Several projects were carried out inco-operation with BMW AG,München. Some of them are still inprogress.

Speech Dialog Design. We analyzedand developed models and proce-dures for speech dialogs for theBMW navigation system. The mainfocus was upon the use of naturalspeech and the adaptation to specif-ic situations [D97.14, D99.16].

ADVIA. The goal of the ADVIAproject is to investigate the prerequi-sites of an intuitive, adaptive human-machine dialog in automobiles. Theproject is described in detail in sec-tion 4.1.2.

Active Resonator. In co-operationwith BMW AG we developed and re-alized an active resonator in order toreduce low frequency noise in luxurycars [94spa]. Applying the activeresonator, the sound quality of theinterior noise of luxury cars can beimproved [00fas8]. In addition, afterlong hours of driving, even in highquality cars, a feeling of “dizziness”can occur, which is substantially re-duced if low frequencies are re-moved from the spectrum.

HPC-VAO. Within the scope of theEU-project ”High Performance Com-putational Environment for Vibro-Acoustic Optimization” a simulationtool was developed, which allows thefast optimization of the vibro-acous-tic properties of cars (see section4.6.1). For that purpose new optimi-

zation methods were developed thatdrastically reduce the computationtime of changes to existing models[P-L14, P-L16, 98wit1, 98mas1].

Exterior Noise Optimization. A newtransfer path technique for the analy-sis and optimization of vehicle exteri-or noise characteristics is developedin co-operation with BMW AG.Please read section 4.6.2 for a de-tailed description.

Siemens AG

Speech Recognition. In co-opera-tion with Siemens ZT, application-specific methods for in-service ad-aptation of Hidden-Markov Modelsin automatic speech recognition sys-tems were developed [P-L10,97bub1, 98bub1]. In another co-op-eration, a statistically modelledmulti-lingual phoneme inventory forspeech recognition systems wascreated [P-L9, 98koe1, 00koe1].

Document Analysis. A technique forattention-based localization and in-terpretation of information in paperdocuments was developed [P-L12,99sch2].

Traffic Flow Control. Models for traf-fic flow were investigated in co-oper-ation with Siemens AG. Nonlinear,discrete controllers were designedto reduce inhomogenities in trafficflow models [P-L7, P-L8, 97len1-4,97wag1-2, 98wag1].

Consortium, consisting ofBMW AG, DaimlerChrysler AG,Siemens AG and VDOMannesmann

The above mentioned companiesco-operate with the Institute for Hu-man-Machine Communication in theFERMUS project, which deals withthe investigation of error-robust,multimodal speech dialogs. Pleaseread section 4.1.4 for a detailed de-scription of the goals of this jointproject.

DaimlerChrysler AG, Ulm

In co-operation with FAU Erlangen,we carried out a study for Daimler-Chrysler about the role of emotionsin human-machine interaction.Please read section 4.4 for furtherinformation about this topic.

Audi AG, Ingolstadt

In co-operation with Audi AG wecontributed to an integrated, user-adequate man-machine interface forautomotive applications [D98.14].

Cellway GmbH, Hallbergmoos

Noise suppression methods for mo-bile phones used in the automotiveenvironment were investigated inco-operation with Cellway [D-97.7].

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 35

Page 39: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Coo

pera

tive

Res

earc

hP

roje

cts

Deutsche Bahn AG

In co-operation with Deutsche BahnAG, the effects of noise reduction forfreight trains, e.g. by noise barrierswere studied. For practical purpos-es, two modes have to be distin-guished, namely „to listen“ or „tohear“. As expected, difference li-mens in the mode „to hear“ are sub-stantially larger than in the mode „tolisten“ [97jae].

GSF medis, Neuherberg

In co-operation with GSF medis, weworked on a new, user-friendly man-machine interface for a system forthe visualization and analysis ofmedical 3D image data [D98.12,P-L11, 97kra, 97kra2, 98kra1,98mue2, 99kra1, 99kra2].

Institut für Rundfunktechnik(IRT), München

In co-operation with the Institute forBroadcasting Research (IRT), itcould be demonstrated that the con-ventional definition of directivity pat-terns of electroacoustical transduc-ers needs further clarification.Encouraging results are obtained,when the directivity pattern is divid-ed in a vertical and a horizontal part[97wol].

Leicher GmbH, Unterföhring

In co-operation with Leicher, weworked on the analysis, implementa-tion and evaluation of biometric iden-tification systems for high-securityareas [D98.19].

Mannesmann VDO CarCommunication GmbH

A study about the state of the art ofautomatic speech recognition wasconducted for Mannesmann/VDO.

In the SOMMIA-project, we developand evaluate a concept for a speech-based man-machine interface (MMI)for several facilities in an automobile.Since a large variety of functions iscomprised, the MMI has to be bothefficient and intuitive to handle.

Müller-BBM, Planegg

In co-operation with Müller-BBM, ac-tive noise control in ducts was stud-ied both theoretically and experi-mentally. With a Swinbanks array,tonal components of noises propa-gating in ducts can be reduced sub-stantially [97sch1].

MVP München

In co-operation with MVP München,the noise produced by the GermanMagnetic Levitation Train (TRAN-SRAPID) was studied in psychoa-coustic experiments. With respect toloudness, approximately the samenoise immission is obtained for aTRANSRAPID at 400 km/h in com-parison to an ICE with 250 km/h.However, if the TRANSRAPID runsalso at 250 km/h, the advantage ofthe Magnetic Levitation System withrespect to noise immission is about15 dB(A). Therefore, as concernsnoise immissions, the TRANS-RAPID would be well suited to con-nect an airport with the center of acity [97got].

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 36

Page 40: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Coo

pera

tive

Res

earc

hP

roje

cts

5.2 Collaborative Research Projects

Sonderforschungsbereich 204“Nachrichtenaufnahme und-verarbeitung im Hörsystem vonVertebraten”

Collaborative Reseach Center “Re-ception and processing of informa-tion in the auditory system ofvertebrates”

Sponsored by the Deutsche Fors-chungsgemeinschaft (DFG) from1983-1997, three long-term projectswithin this research center were car-ried out at the institute. Theseprojects dealt mainly with hearingsensations as a basis of acoustic in-formation transfer, and complex per-ceptual processes on speech andmusic. The results of this proliferousresearch center were published ininnumerable papers and are sum-marized in a substantial final report[00man1].

BMBF-Project “Verbmobil”

Verbmobil is a speaker-independentand bi-directional spech-to-speechtranslation system for spontaneousdialogs in mobile applications, andwas running from 1993-2000. Thismultilingual system is able to handleGerman, English and Japanesespoken utterances. In this project 33research groups were working to-gether. The speech processinggroup of the Institute for Human-Ma-chine Communication especially de-veloped methods for robust speechrecognition, for adaptation to newspeakers and for tolerating variousspeaking styles [00hai1].

Graduiertenkolleg „Sprache,Mimik und Gestik im KontexttechnischerInformationssysteme“

Graduate School “Speech, Mimicand Gestures in the context of tech-nical information systems”

This graduate school funded by theDFG aims at the improvement of theinteraction between man and ma-chine in complex information sys-tems. This goal is approached con-jointly with an interdisciplinary teamof scientists from the Ludwig-Maxi-milians-Universität and the Tech-nische Universität München. Moreinformation is available athttp://www.phonetik.uni-muenchen.de/GradKoll/

Graduiertenkolleg „Interaktionsensorischer Systeme“

Graduate School “Interaction ofSensory Systems”

The group “Technical Acoustics”(Professor H. Fastl) of the Institutefor Human-Machine Communica-tion is part of the Graduate School“Interaction of Sensory Systems”,sponsored by the Deutsche Fors-chungsgemeinschaft (DFG). In par-ticular, interactions of acoustic andvisual as well as acoustic and soma-to-sensory inputs are studied. Visithttp://www.nefo.med.uni-muenchen.de/kolleg/ for moreinformation.

Forschergruppe „Hörobjekte“

Research Group „Auditory Objects“

Within the Research Group “Audito-ry Objects“ sponsored by the Deut-sche Forschungsgemeinschaft(DFG), project 6 (Fastl) studiesquestions of basic psychoacoustics.In particular, new evidence about theZwicker-Tone is compiled and imple-mented in psychoacoustic models[00fas11]. In addition, in co-opera-tion with colleagues from project 7(van Hemmen), neurophysiological-ly based models of the Zwicker-Toneare put forward (Fastl, Patsouras,Franosch, van Hemmen NL). Pleasevisithttp://www.hoerobjekte.vo.tu-muenchen.de/ for more infor-mation.

Klinikum Großhadern, LMUMünchen

In an extended co-operation with theENT Department of the KlinikumGroßhadern, centered on the loud-ness perception of patients with dif-ferent audiological diseases, newmethods were developed which im-prove the understanding of loudnessdeficits in patients with different pa-thologies [97bau, 98got1].

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 37

Page 41: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Coo

pera

tive

Res

earc

hP

roje

cts

University of Cambridge, UK

In co-operation with the lab of RoyPatterson at the University of Cam-bridge, UK, pitch strength of regularinterval noise (RIN) was studied. Re-sults described by a neuro-physio-logically based model draw heavilyon the autocorrelation function[99hir1, 99wie1].

Osaka University, Japan

The long standing, very effective co-operation with Professor Namba andProfessor Kuwano of Osaka Univer-sity, Japan was continued and ex-tended. Topics of common interestare in particular evaluation of noiseimmissions, sound quality rating,and cultural differences in the inter-pretation of warning signals[99kuw1, 99kuw2].

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 38

Page 42: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Stu

dent

Res

earc

hW

ork

6 Student Research Work

At the Institute for Human-MachineCommunication, we incorporate ourstudents into research work as earlyas possible. The students’ contribu-tions are reflected in diploma theses

and research projects, which are list-ed in the following sections.

Please visit our website http://www.mmk.ei.tum.de if you are in-terested in an up-to-date list of avail-

able topics for student researchwork.

Note: After the subject of the work (inGerman) the name of the supervisoris given in parentheses.

6.1 Diploma Theses

1997

[D-97.1] Philip Mackensen: Messung der Außenohrübertragungsfunktion eines Kunstkopfes bei verschiedenenEntfernungen (Haferkorn).

[D-97.2] Thomas Filippou: Beurteilung der globalen Lautheit von Freizeitgeräuschen (Stemplinger).[D-97.3] Jochen Vallen: Klassifikationsverfahren zur Unterscheidung von Schallquellen in mechanisch belasteten

Materialien (Ruske, Wolfertstetter).[D-97.4] Thomas Bachun: Implementierung und Erprobung von Verfahren zur Glättung von Spektralfunktionen

als Hilfsmittel der gehörgerechten Konturierung (von Rücker).[D-97.5] Roland Trißl: Einfluß des Klangspektrums auf die subjektive Beurteilung von Klavierklängen

(Valenzuela).[D-97.6] Tibor Fabian: Implementierung von Methoden der Sprecheradaption in einem automatischen

Spracherkennungssystem (Ruske, Pfau).[D-97.7] Ramon Trombetta: Beiträge zur Grundlagenentwicklung für digitale Sprachfiltersysteme innerhalb einer

Freisprecheinrichtung (Grau).[D-97.8] Dirk Vollmerhaus: Implementierung und Evaluierung von 3D-Applikationen für den gestikbasierten

Mensch-Maschine-Dialog (Morguet).[D-97.9] Wolfgang Zimprich: Akustisch-mechanische Meßverfahren für die Komponenten eines implantierbaren

Hörgeräts (Terhardt).[D-97.10] Klaus Menzl: Extraktion und Untersuchung gehörrelevanter Schallsignalparameter von Flügelklängen

(Valenzuela).[D-97.11] Matthias Puchner: Visualisierung und numerische Auswertung dynamischer Mithörschwellenmuster

(Schmid).[D-97.12] Matthias Bayerlein: MATLAB-basiertes Rechenschema zur Bestimmung der Ausgeprägtheit der

Tonhöhe grenzfrequenzmodulierter Tiefpaßrauschen (Schmid).[D-97.13] Christine Huth: Die Ausgeprägtheit der Tonhöhe als Funktion psychoakustischer Empfindungsgrößen

(Schmid).[D-97.14] Max Kreilein: Anpassung einer Bedienoberfläche für das Navigationssystem CARIN an eine

Sprachsteuerung (Grau).

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 39

Page 43: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Stu

dent

Res

earc

hW

ork

1998

[D-98.1] Christina Niehaus: Untersuchungen zur erweiterten linearen Diskriminanzanalyse für dieSpracherkennung (Ruske, Pfau).

[D-98.2] Thorsten Ansahl: Untersuchungen eines Echokompensationsalgorithmus (Ruske).[D-98.3] Lam Song Nguyen: Intergration natürlichsprachlicher Interaktionen in eine virtuelle Umgebung für die

medizinische Bildverarbeitung (Müller).[D-98.4] Ralf Nieschulz: Verarbeitung von Zeitstrukturen in FTT-Spektrogrammen (von Rücker).[D-98.5] Pierre Eric Gerard: Physical and subjective evaluation of speech quality for Hands-Free-Terminals

(Fastl).[D-98.6] Dimitrios Haratsis: Portierung eines sprachübersetzenden Systems auf die Domäne gesprochener

Terminvereinbarungen (Müller).[D-98.7] Lars Eric Fiedler: Programm zur Visualisierung mehrdimensionaler Meßdaten (Schmid).[D-98.8] Konstantin Spasokukotskij: Entwicklung eines Sprachtests für Ukrainisch (Fastl).[D-98.9] Gerald Stöckl: Merkmalsextraktionsverfahren für die Bildsequenzerkennung (Morguet).[D-98.10] Ralf Schiffert: Einfluß zeitlicher Hüllkurvenverzerrungen auf die Sprachverständlichkeit (Chalupper).[D-98.11] Selim Ben Yedder: Labelung, zeitliche Segmentierung und Klassifikation zusammenhängender Gesten

(Morguet).[D-98.12] Thomas Volk: Entwicklung und Integration einer menübasierten 3D-Bedienoberfläche für die

medizinische Bildanalyse in virtuellen Umgebungen (Müller, Hunsinger).[D-98.13] Dimitrios Patsouras: Zur Ausgeprägtheit der Tonhöhe und ihrer Abhängigkeit von anderen

Empfindungsgrößen (Schmid).[D-98.14] Michael Geiger: Grundlagen für ein integriertes Bedienkonzept im Fahrzeug (Grau).[D-98.15] Stefan Fissek: Informationstheoretische Untersuchung menschlichen Fahrverhaltens (Grau).[D-98.16] Martin Zobl: Sprachverständlichkeitsvorhersage in einem perzeptiven Modell (Chalupper).[D-98.17] Tarek Said: Entwicklung und Test eines Verfahrens zur Aktivierung eines Einzelworterkennungssystems

durch ein spezielles Kommandowort (Ruske).[D-98.18] Stefan Hirsch: Experimental Investigation and Modelling of the Pitch Strength of Regular Interval Sounds

(Fastl).[D-98.19] Meinrad Kranewitter: Analyse, Implementierung und Evaluierung biometrischer Identifikationssysteme

im Hochsicherheitsbereich (Morguet).

1999

[D-99.1] Christian Hansen: Entwicklung von Lösungsansätzen zur Nutzung des Internet im Fahrzeug (Grau).[D-99.2] Markus Fruhmann: Stationäres Lautheitsmodell für Innenohrschwerhörige (Chalupper).[D-99.3] Bernhard Seeber: Zur Frequenzselektivität des Gehörs: Zusammenhänge zwischen Mithörschwellen

und Tuningkurven (Terhardt).[D-99.4] Lutz Hönisch: Kontextabhängige Modellierung von sprachlichen Einheiten in der automatischen

Spracherkennung mit Hilfe von Polyphonen (Faltlhauser, Pfau, Ruske).

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 40

Page 44: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Stu

dent

Res

earc

hW

ork

[D-99.5] Matthias Thomae: Optimierung der erweiterten linearen Diskriminanzanalyse durchModelltransformation (Ruske, Pfau).

[D-99.6] Zeljko Miletic: Einkanalige Geräuschreduktion für Mobiltelefon-Freisprechanlagen im Auto (Ruske).[D-99.7] Jörg Franke: Untersuchungen zur Halligkeit von zeitvarianten Schallen (Chalupper).[D-99.8] Markus Szymkowiak: Computergestützte Erkennung der Position und der Abmessung des

menschlichen Gesichtes im digitalen Bild für die automatische Bildverarbeitung (Morguet).[D-99.9] Thilo Hinz: An interaktive support environment for figuring, programming, and using the PLC Siemens

SIMATIC S7-2000 (Hofmann).[D-99.10] Jasmin Jin Qian: Implementierung eines Clustering-Verfahrens zur Inferenz in Bayes’schen Netzen

(Hofmann).[D-99.11] Andrés Ucke: Entwicklung eines Algorithmus zur automatischen Generierung einer Planbibliothek

(Hofmann).[D-99.12] Günther Steinmaßl: Evaluierung und Optimierung einer stochastischen Gestikerkennung für den

visuellen Dialog (Morguet).[D-99.13] Björn Schuller: Automatisches Verstehen gesprochener mathematischer Formeln (Hunsinger).[D-99.14] Jürgen Straßer: Einsatz des HTK-Toolkits für die kontextabhängige akustische Modellierung in der

automatischen Spracherkennung (Ruske, Pfau, Faltlhauser).[D-99.15] Richard Stenzel: Natürlichsprachliche Eingabe für das Automobil (Hunsinger, Grau).[D-99.16] Bernhard Niedermaier: Entwicklung und Bewertung eines nutzerorientierten Dialogkonzepts zur

Sprachbedienung eines Autotelefons (Grau).[D-99.17] Mark Schützendorf: Integration eines DTW-Musterbewertungsverfahrens in einen statistischen

semantischen Decoder zur Erkennung handgeschriebener mathematischer Formeln (Hunsinger).

2000

[D-00.1] Christian Lorenz: Elektro- und psychoakustische Untersuchungen an einem Oberwellengenerator(Chalupper).

[D-00.2] Stephan Bayer: Subjektive Beurteilung von Reifen-/Fahrbahngeräuschen (Fastl, Patsouras).[D-00.3] Philipp Detemple: Natürlichsprachliche Bedienung im Kraftfahrzeug – Erweiterung des Dialogkonzeptes

um eine Komponente zur Steuerung von Mensch-Maschine-Dialogen (Neuss).[D-00.4] Martin Rosnitschek: Multimodale Mensch-Maschine-Kommunikation im Fahrzeug (Neuss).[D-00.5] Rainer Schiller: Entwicklung und Evaluierung eines adaptiven Benutzerassistenzsystems (Hofmann).[D-00.6] Thomas Filippou: Audio-visuelle Interaktion beim Lautheitsurteil (Patsouras).[D-00.7] Oliver Thost: Resynthese von Getriebegeräuschen aus dem Betragsspektrum (Fastl, Chalupper).[D-00.8] Michael Kellnberger: FTT-basierte Simulation und Rehabilitation von Schwerhörigkeit (Chalupper).[D-00.9] Robert Lieb: Einstufig-probabilistische semantische Decodierung handgeschriebener mathematischer

Formeln (Hunsinger).[D-00.10] Derik Schröter: An Integral Approach to Real-time Segmentation and Tracking of Objects without explicit

Modeling (Zobl, Althoff).[D-00.11] Hubert Wimmer: Sprachübertragung mit Induktionsschleifen: Messtechnische und psychoakustische

Untersuchungen (Chalupper).

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 41

Page 45: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Stu

dent

Res

earc

hW

ork

6.2 Student Research Projects

1997 - 2000

[S-97.1] Michael Geiger: Erstellen eines Programmes zur Berechnung des Vertrauensbereiches vonLautheitsperzentilen (Stemplinger).

[S-97.2] Dimitrios Patsouras: Zur Ausgeprägtheit der Tonhöhe grenzfrequenzmodulierter Tiefpaßrauschen beiunterschiedlicher Filter-Flankensteilheit (Schmid).

[S-97.3] Bernhard Niedermaier: Drehtisch-Schrittmotorantrieb zur Positionierung elektroakustischer Wandler(Schmid).

[S-98.1] Tarek Said: Differenztonverfahren - Realisierung mit “System One” (Schmid).[S-98.2] Bernard Payer: Erzeugung eines variablen Bandpasses mit großer Flankensteilheit (Schmid).[S-98.3] Robert Lieb: Integration neuer Hidden-Markov-Modelle in ein sprachverstehendes System (Ruske,

Hunsinger).[S-00.1] Andrea Nobbe: Aufnahme, Analyse und Resynthese von Instrumententönen (Patsouras).

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 42

Page 46: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Mis

cella

neou

sE

vent

s

7 Miscellaneous Events

7.1 Conferences and Symposia

7.1.1 ICASSP, April 21-24, 1997

The 22nd IEEE International Confer-ence on Acoustics, Speech, andSignal Processing was held in Mu-nich in 1997, for the first time in Ger-many. Staff members of the Institutefor Human-Machine Communica-tion took an active part in the confer-ence committee: it was chaired byM. Lang, G. Ruske was in charge ofpublications, and H. Fastl took careof the registration procedure. Morethan 1000 authors presented theircontributions to research in the fieldsof digital signal processing, speech,and acoustics; and some 1800 par-ticipants attended this top event.

Summaries of the most relevant newideas were presented by 8 leadingexperts from each technical commit-tee on the last day of the conference.These summary sessions werehighly appreciated by the ICASSP97 attendees.

Other highlights included the com-bined opening and awards ceremo-ny in the Gasteig concert hall, wherethe 1996 IEEE Signal ProcessingSociety awards were presented. TheBavarian Government invitedICASSP attendees to a State Re-ception in the “Kaisersaal” of the Mu-

nich Residence, emphasizing thesignificance of the event. A Bavarianevening was arranged in one of theMunich brewery halls to makeknown typical entertainment andfolklore to the international audi-ence. Proceedings in print and onCD-ROM were produced to summa-rize the results. As the positive re-sponse from the participantsshowed, ICASSP 97 was a success;and it paved the way for other confer-ence venues outside of the US, e.g.Istanbul, host of the ICASSP 2000.

7.1.2 WE-Heraeus-Seminar “Speech Recognition and Speech Understanding”,April 3-5, 2000

Sponsored by the WE-Heraeus-Stif-tung, Professor M. Lang chaired aseminar about Speech Regognitionand Speech Understanding at theDeutsche Physikalische Gesell-schaft (DPG) in Bad Honnef. Profes-sor G. Ruske and Professor H. Fastlwere members of the organizing

committee. 16 internationally re-nowned experts were invited to givelectures on the state of the art andcurrent problems at issue in the fieldof speech processing. For instance,Professor G. Ruske contributed alecture about “Robust Speech Rec-ognition.” Moreover, representatives

of information technology compa-nies gave an insight into industrialapplications of speech technologyand marketing aspects as well. Itwas generally agreed that the semi-nar gave all attendees the opportuni-ty for fruitful discussions and mutuallearning.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 43

Page 47: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Mis

cella

neou

sE

vent

s

7.1.3 euro-noise 1998

The European Noise Conferenceeuro-noise 1998 was held for the firsttime in Germany, and co-chaired byDr. Joachim Scheuren from Müller-BBM and Professor Hugo Fastl fromthe Institute for Human-MachineCommunication. The conference

had the motto “Designing for Si-lence”, and internationally renownedexperts discussed questions of pre-diction, measurement and evalua-tion of noise and vibration. In addi-tion, an exhibition showed the latestdevelopments in materials for noise

abatement as well as new algo-rithms for assessing questions ofnoise problems. The results arecompiled in the proceedings editedby H. Fastl and J. Scheuren [98fas7].

7.1.4 Japan in Deutschland

Within the framework of “Japan inDeutschland” Professor Sonoko Ku-wano from the Laboratory of Envi-ronmental Psychology of OsakaUniversity, Japan and ProfessorHugo Fastl from the Institute for Hu-man-Machine Communication orga-

nized a joint symposium of theAcoustical Society of Japan (ASJ)and the German Acoustical Society(DEGA). Experts from both coun-tries discussed about questions ofnoise measurement and evaluation.The results are available in the vol-

ume Fortschritte der Akustik –DAGA 2000, published by DEGA,Oldenburg. In addition, a publicationof these papers in the Journal of theAcoustical Society of Japan, EnglishVersion (JASJ(E)) is planned.

7.1.5 Miscellaneaous Conferences and Symposia

Professor M. Lang was involved withthe organization or was a member ofthe program committee of the follow-ing conferences:

D-CSCW 2000: ”Verteiltes Arbeiten– Arbeit der Zukunft”, 11.-13.9.2000,München.

Münchner Kreis: ”Anwenderfreund-liche Kommunikationssysteme”,16.-17. 6.1999, München.

VDE/ITG-Fachtagung: ”Technik für

den Menschen”, 26.-27.10.1998,Eichstätt.

Workshops des Arbeitskreises ”Phy-sik, Informatik, Informationstechnik”der Trägergesellschaften DPG, GI,VDE/ITG:

• Optimierung in Physik und Infor-matik, 8.-11.9.1997, Rostock

• Quanten-Computing, Dyna-mische Systeme, Datenaquisi-tion, Computeranwendungen,15.3.1999, Heidelberg

• Optik in der Rechnertechnik,7.10.1999, Jena

• Optik in der Rechnertecnik undMikrooptik, 27.-29.9.2000,Hagen.

Professor G. Ruske was Head of theorganizing committee of the “Verb-mobil Akustik-Workshop”, held inMunich, 25.-26.11.1999.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 44

Page 48: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Mis

cella

neou

sE

vent

s

7.1.6 Invited Talks

Professor Manfred Lang gave thefollowing invited talks:

• Gibt es den Computer, der sichauf den Nutzer einstellt? BMBF,Bonn, 25.2.1997.

• Entwicklungen und Visionen füreine benutzeradäquate Mensch-Maschine-Kommunikation. Sie-mens AG Schweiz, World TradeCenter, Zürich, 24.9.1997.

• Beherrschung der Komplexität beimultimodaler Mensch-Maschine-Kommunikation (with Professor B.Radig). Symposium ”Realität undAbstraktion”, TU München,2.3.1998.

• Digitale Entdeckungsreisen –Fernsehen als Datenstrom. Panel

discussion at Systems 98,München, 23.10.1998.

• Aspekte einer natürlichen Men-sch-Maschine-Kommunikation.Betaresearch, Andechs, 8.-9.4.1999.

• Mensch-Maschine-Kommunika-tion. Diehl-Stiftung, Lengenfeld,16.-17.11.1999.

• Können Maschinen wie Men-schen kommunizieren? Tag derFakultät Elektrotechnik und Infor-mationstechnik, TU München,7.7.2000.

• Television interview by Focus TVand presentation of runningresearch work in the institute’slaboratories (20.7.1999).

Professor Hugo Fastl presented key-note lectures at the DAGA 1997 inKiel, Germany, at the Sound QualitySymposium 1998 in Ypsilanti, USA,and at the WESTPRAC 2000 in Ku-mamoto, Japan.

Professor Günther Ruske gave aninvited lecture about “RobustSpeech Recognition” at the GuHT-Workshop in Munich, 20.10.2000,organized by the “Gesellschaft &High-Tech eV.”

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 45

Page 49: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Mis

cella

neou

sE

vent

s

7.2 Retirement of Professor Ernst Terhardt

On the evening of 23 April 1999,some 90 invited guests attended thecelebration to mark the occasion ofthe retirement of Professor ErnstTerhardt, Professor of AcousticalCommunication and Musical Acous-tics at the Technische UniversitätMünchen.

At the momentous occasion, Profes-sor Hugo Fastl, a long standing col-league, spoke about Professor Ter-hardt’s scientific career in the field ofacoustics.

Professor Terhardt earned his de-gree in Communications Engineer-ing at the University of Stuttgart,where he also graduated. He waspromoted to Professor at the Tech-nische Universität München in 1970.His research experience is wideranging: auditory roughness andpitch perception, auditory time andfrequency resolution, musical con-

sonance, the physics of string instru-ments, and signal theory were majortopics of his research work. His con-cept of “virtual pitch,” established inthe early 1970ies to explain the pitchperception of complex tones, is cer-tainly his most influential work. In1998, Professor Terhardt publishedthe essence of his scientific achieve-ments in his substantial book “Akus-

tische Kommunikation” [98ter1].

The music for the festive occasionwas provided by Manfred Seewann,a former student and co-worker ofProfessor Terhardt, who is a profes-sional musician. He played pieces ofChopin on the grand piano for thepleasure of all those present.

After the presentations, the eveningwas completed by a buffet supper,which gave all guests the opportuni-ty to both utilize and talk aboutAcoustical Communication. Theevening was an enourmous plea-sure for all those present, and muchappreciation goes to all who contrib-uted to the organisation of the event.We all wish Professor Terhardt avery happy and positive retirementfrom University and – as a somewhatselfish wish – that he will continue hisproductivity.

Claus von Rücker

7.3 Honors and Awards

Professor M. Lang frequently re-views research projects for the fol-lowing organizations:

• Bundesministerium für Bildungund Forschung – BMBF

• Deutsche Forschungsgemein-schaft – DFG

• several state governments• VW-Stiftung• Karl Heinz Beckurts-Stiftung

Professor H. Fastl was one of threescientists from Germany who weregranted the Research Award of theJapan Society for the Promotion ofScience (JSPS) in 1998. He also fre-quently reviews research projects forthe DFG.

Professor G. Ruske is a permanentreviewer for the Australian ResearchCouncil (ARC) for large researchgrants.

Johannes Müller and Holger Stahlreceived the Rohde & Schwarz-prize in 1998 for their outstandingdoctoral dissertations [P-L4, P-L5].Their work about a single-stage sto-chastic approach to natural speechunderstanding was awarded there-with. Meanwhile, H. Stahl holds aprofessorship at the Fachhoch-schule Rosenheim.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 46

Page 50: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

8 Publications

8.1 Doctoral Dissertations

8.1.1 Supervised by Professor Lang

[P-L3]Hans-Jürgen Winkler (1996):

Entwurf und Realisierung einesauf statistischen Ansätzenbasierenden Systems zurErkennung handgeschriebenermathematischer Formeln

Design and Realization of a Systemfor the Recognition of HandwrittenMathematical Formulas Based on aStatistical Approach

In dieser Arbeit wird ein System zurErkennung handgeschriebener ma-thematischer Formeln vorgestellt.Die Problemstellung bestehend ausSymbolsegmentierung, -erkennungund struktureller Analyse wird hier-bei mittels eines statistischen Ansat-zes beschrieben und unter Verwen-dung von wissensbasierten und sto-chastischen Verfahren bearbeitet.Im Gegensatz zu den bisher vorge-stellten Analysemethoden könnensomit Entscheidungsalternativen in-nerhalb der einzelnen Verarbei-tungsstufen toleriert und im weiterenVerlauf durch neu erworbenes Wis-sen automatisch aufgelöst werden.Die erzielten Erkennungsergebnis-se demonstrieren die Leistungsfä-higkeit des realisierten Systems.

[P-L4]Johannes Müller (1997):

Die semantische Gliederung zurRepräsentation desBedeutungsinhalts innerhalbsprachverstehender Systeme

The Semantic Structure for theRepresentation of Meaning inSpeech Understanding Systems

Die semantische Gliederung wirdals eine neuartige Repräsentationdes Bedeutungsinhaltes einer ge-sprochenen Äußerung aus einervorgegebenen Domäne innerhalbeines sprachverstehenden Systemsvorgestellt. Da sie eine probabilisti-sche Aussage über die ihr zugrun-deliegende Wortkette erlaubt, wirddie unmittelbare Decodierung einerSprachsignal-Merkmalsvektorenfol-ge in eine solche semantische Glie-derung durch einen rein stochasti-schen Algorithmus ermöglicht. AlsBeispielapplikation wurde ein„sprachverstehender Grafikeditor“implementiert, mit dem dreidimensi-onale Objekte auf dem Bildschirmmit natürlichsprachlichen Komman-dos erzeugt, verändert oder ge-löscht werden können. Durch Über-tragung der Algorithmen in einen

„sprachverstehenden Servicerobo-ter“ wurde der anschauliche Nach-weis der System-Portabilität er-bracht. Darüber hinaus ermöglichtdie semantische Gliederung als In-terlingua-Ebene die automatischeÜbersetzung von natürlicher, ge-sprochener oder geschriebenerSprache.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 47

Page 51: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[P-L5]Holger Stahl (1997):

Konsistente Integrationstochastischer Wissensquellenzur semantischen Decodierunggesprochener Äußerungen

Consistent Integration of StochasticKnowledge for Semantic Decodingof Spoken Utterances

Diese Arbeit beschreibt die Entwick-lung eines Systems zum Verstehennatürlicher, fließend gesprochenerSprache. Den Kern des Systems bil-det ein semantischer Decoder, derdas Sprachsignal einer Äußerungauf den zugehörigen Bedeutungsin-halt abbildet. Dazu wird eine Maxi-mum-a-posteriori-Klassifikationdurchgeführt, d.h. auf der Basis sto-chastischen Wissens wird der wahr-scheinlichste Bedeutungsinhalt zumgegebenen Sprachsignal ermittelt.Die Einführung der semantischenGliederung zur Repräsentation desBedeutungsinhaltes und die konsis-tente, nahtlose Verknüpfung der sto-chastischen Wissensquellen er-möglichten eine äußerst effizienteImplementierung des semantischenDecoders mit hoher Treffsicherheit.

[P-L6]Robert Zwickenpflug (1997):

Entwurf und Realisierung einesSystems zur Erstellung vonverteilten Anwendungen fürkontinuierliche Medien

Design and Implementation of aSystem for the Creation of Distribut-ed Applications for Continous Me-dia

Zur Erstellung von modularen ver-teilten Anwendungen für kontinuier-liche Medien wird ein Client-Server-System vorgestellt. Es erlaubt,Dienste in für den Endbenutzer ein-fach zu handhabender Art und Wei-se auf einem Rechnernetz zu vertei-len und mehreren Benutzernzugänglich zu machen. Dienste kön-nen über definierte Anschlüsse mit-einander kommunizieren. Jeder Be-nutzer kann neue Dienste in dasRechnernetz an einer von ihm freiwählbaren Stelle einbringen unddiese untereinander und mit bereitsvorhandenen Diensten verbinden.Er kann bei diesen Verbindungenauch Dienste mitverwenden, die voneinem anderen Benutzer einge-bracht worden sind.

[P-L7]Christoph Wagner (1997):

Verkehrsflußmodelle unterBerücksichtigung eines internenFreiheitsgrades

Traffic Flow Models Allowing for anInner Degree of Freedom

Ausgehend von einer kinetischenVerkehrsgleichung auf einem durchdie Wunschgeschwindigkeit derFahrer erweiterten Ort-Geschwin-digkeit-Phasenraum wird durch Mo-mentenbildung ein verbessertesmakroskopisches Verkehrsflußmo-dell abgeleitet. Das Modell zeigt einrealistisches dynamisches Verhal-ten über den gesamten Dichtebe-reich und liefert neben der genauenForm und der funktionalen Abhän-gigkeit von bisher nur heuristischeingeführten Termen der Modellglei-chungen auch die dazugehörigenTransportkoeffizienten. Weiterhinerlaubt der zusätzlich eingeführteFreiheitsgrad und die davon abgelei-teten Größen nun eine direkte Mo-dellierung von Regeleingriffen.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 48

Page 52: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[P-L8]Henning Lenz (1999):

Entwicklung nichtlinearer,diskreter Regler zum Abbau vonVerkehrs-flußinhomogenitätenmithilfe makroskopischerVerkehrsflußmodelle

Design of Nonlinear, Discrete Con-trollers to Reduce Inhomogenitiesin Traffic Flow Models

Ein Schema für die Entwicklungnichtlinearer Regler wurde vorge-stellt, mit dem Ziel, Inhomogenitätenim Straßenverkehr abzubauen. DieAnforderungen an einen derartigenRegler wurden modellunabhängigformuliert. Eine Datenanalyse zeig-te, daß Geschwindigkeitsbegren-zungen so geschaltet werden kön-nen, daß sie diese Anforderungenerfüllen. Für einen effizienten Abbauvon Stop-&-Go-Wellen bietet sicheine im Ort vorausschauende Stra-tegie an. Weitere Anwendungen inder Verkehrstechnik wurden darge-stellt.

[P-L9]Joachim Köhler (1999):

Erstellung einer statistischmodellierten multilingualenLautbibliothek

Creation of a Statistically ModelledMulti-lingual Phoneme Inventory forSpeech Recognition

Die vorliegende Arbeit beschreibtdie Entwicklung einer multilingualenLautbibliothek für die statistischeSpracherkennung. Dazu werden dieakustisch-phonetischen Ähnlichkei-ten zwischen verschiedenen Spra-chen ausgenutzt. Basierend auf derHMM-Technologie werden Verfah-ren entwickelt, mit denen die sprach-spezifischen Modelle in multilingua-le Lautmodelle überführt werden.Dadurch läßt sich eine drastischeEinsparung von Modellparameternerreichen, ohne daß ein signifikanterAbfall der Worterkennungsrate auf-tritt. Im zweiten Teil der Arbeit wer-den dann Methoden zur Portierungder multilingualen Sprachlaute inneue Sprachen entwickelt und be-schrieben.

[P-L10]Udo Bub (1998):

Anwendungsspezifische Online-Anpassung von Hidden-Markov-Modellen in automatischenSpracherkennungssytemen

Application-Specific In-Service Ad-aptation of Hidden-Markov Modelsin Automatic Speech RecognitionSystems

Die Arbeit befaßt sich mit den Pro-blemen, die in der automatischenSpracherkennung entstehen, wennzwischen dem Trainings- und Test-datensatz eine Fehlanpassung vor-liegt. Insbesondere Ungleichheitenbei den akustisch-phonetischenLautkontexten führen zu einer ver-schlechterten Erkennung. DiesemTrend wird durch neuartige Lernal-gorithmen entgegengewirkt, diewährend der Anwendungsphase on-line ablauffähig sind. Bei unüber-wachtem Lernen kann bei 6000 Ad-aptionswörtern die Fehlerrate um56 % gesenkt werden, bei über-wachtem Lernen um 67 %. Dies ent-spricht der Erkennung eines Mo-dells, das im Falle des Vorliegensgeeigneter Sprachdatenbanken hät-te trainiert werden können.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 49

Page 53: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[P-L11]Christian Krapichler (1999):

Eine neue Mensch-Maschine-Schnittstelle für die Analysemedizinischer 3D-Bilddaten ineiner virtuellen Umgebung

A New Man-Machine Interface forthe Analysis of Medical 3D-ImageData in Virtual Reality

Durch die Entwicklung neuer Verfah-ren der 3D-Visualisierung und derintuitiven räumlichen Interaktion ent-stand ein VR-System, mit dem alleArbeitsschritte der digitalen medizi-nischen Bildanalyse durchgeführtwerden können. Die neuen Interakti-onsmethoden umfassen die Analy-se von Handgestik, Sprachverste-hen und den Einsatz von VR-Eingabegeräten ebenso wie innova-tive virtuelle Werkzeuge. Im Ver-gleich zur heute üblichen Darbie-tung unzähliger Schichtaufnahmenerleichtert der Einsatz des entwi-ckelten VR-Systems das Erfassenund analysieren räumlicher Zusam-menhänge und die weitere Verarbei-tung der tomographischen Bildda-ten. Durch die an die menschlichenSinne und Fähigkeiten angepaßtenDarstellungs- und Interaktionsfor-men ist es dem Mediziner möglich,den gesamten Arbeitsablauf in einerZeitspanne zu bewältigen, die denEinsatz im klinischen Alltag erlaubt.

[P-L12]Angela Engels (formerlySchreyer, 2000):

AufmerksamkeitsbasierteLokalisierung und Bewertungrelevanter Information aufPapierdokumenten

Attention-Based Localization andInterpretation of Information in Pa-per Documents

Die Arbeit beschreibt in einem Sen-der-Empfänger-Modell eine neue,aufmerksamkeitsbasierte Sichtwei-se auf Dokumente: der Autor mar-kiert relevante Informationen aufdem Dokument durch auffällige ge-stalterische Merkmale, die die Auf-merksamkeit eines Lesers auf denersten Blick anziehen und ihm soeine effiziente Informationsextrakti-on ermöglichen. Die Arbeit setzt die-sen Mechanismus in ein techni-sches Verfahren um, dasausschließlich auf dem Bild eineseingescannten Papierdokumentsrelevante Information findet und dieRelevanz der einzelnen Informatio-nen relativ zueinander beurteilt.Wichtige Schritte bei der Umsetzungsind die Formalisierung einer psy-chologischen Theorie zur Textur-wahrnehmung und eine Befragungzur Wahrnehmung von gestalteri-schen Merkmalen.

[P-L13]Peter Morguet (2000):

Stochastische Modellierung vonBildsequenzen zurSegmentierung und Erkennungdynamischer Gesten

Stochasic Modelling of Image Se-quences for the Segmentation andRecognition of Dynamic Gestures

In der Arbeit wird die Entwicklung ei-nes bildverarbeitungsgestütztenSystems für den mit Handgesten ge-steuerten Mensch-Maschine-Dialogvorgestellt. Mit zwei alternativen An-sätzen, die auf der stochastischenModellierung mit teilweise erweiter-ten Hidden-Markov-Modellen beru-hen, werden gestische Bewegun-gen im kontinuierlichen Videostromzeitlich segmentiert und klassifiziert.Zur Anpassung der räumlich-zeitli-chen Bildsequenzen an die serielleVerarbeitung werden mehrere Merk-malsextraktionsverfahren entwickeltund vergleichend untersucht. AlsBeispielanwendung wird die Imple-mentierung eines echtzeitfähigendreidimensionalen Szenen-Editorsbeschrieben. Über das Konzept derindirekten Manipulation sind hierinauch komplexe Aktionen über Ges-ten intuitiv steuerbar.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 50

Page 54: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[P-L14]Dietmar Mass(work in progress):

Schnelle rechnerischeKomfortoptimierung vonKraftfahrzeugen mittels modalerKorrektur

Fast Computational Optimization ofAutomotive Comfort by Means ofModal Correction

Ausgehend von einer Energie-schreibweise der Bewegungsglei-chungen eines gekoppelten Struk-tur-Hohlraum-Systems wird ein Ver-fahren entwickelt, das auf der Basismodaler Korrekturen eine Berech-nung von Modifikationen in gegenü-ber konventioneller Finite-Elemen-te-Rechnung drastisch reduzierterZeit ermöglicht. Es erlaubt auch dasEinbinden experimenteller Metho-den, um die Modellierunsgüte desRechenmodells zu verbessern. Ingrundlegenden und praxisbezoge-nen Beispielen wird die Leistungsfä-higkeit des Verfahrens demonstriertund seine Anwendungsmöglichkei-ten zur Berechnung komfortrelevan-ter Fahrzeugeigenschaften, wieSchwingungsverhalten und Innen-raumakustik, dargestellt.

[P-L15]Robert Neuss (formerly Grau,work in progress):

Usability Engineering als Ansatzzum multimodalen Mensch-Maschine Dialog

Usability Engineering as Approachto Multimodal Human-Machine In-teraction

Multimodale Mensch-Maschine-Kommunikation soll die Benutzungvon Software durch freie Verwen-dung von Kommunikationskanälenwie Sprache, Gestik etc. erleichtern.Diese Arbeit untersucht zuerst Ein-zelmodalitäten, um die Eigenschaf-ten eines multimodalen Systemspostulieren zu können. Gemäß desUsability Engineerings wird dann einPrototyp aufgebaut, um Benutzer-tests durchzuführen. Die Entwick-lung dieses Systems, welches in ei-nen Fahrsimulator integriert ist unddie Bedienung von Komponentenwie Radio, Navigationssystem undTelefon ermöglicht, erfolgt zyklischdurch Tests und Verbesserungen.Die Resultate sind ein benutzeradä-quates Design sowie Praxiserfah-rungen mit den neuen Techniken.

[P-L16]Lars Witta (work in progress):

Entwurf und Realisierunginteraktiver modalerBerechnungs- undOptimierverfahren fürgekoppelte Struktur-Fluid-Systeme

Design and Implementation of In-teractive Modal Calculation andOptimization Methods for CoupledStructure-Fluid Systems

Mit Hilfe einer neuartigen Kopp-lungsbedingung wird die Bewe-gungsgleichung eines mit schallab-sorbierendem Material ausgekleide-ten Struktur-Hohlraum-Systemsaufgestellt und modal gelöst. Dasmodale Lösungsverfahren wird zumsogenannten „modalen Korrektur-verfahren“ erweitert, mit dem es ge-lingt, den Rechenzeitbedarf für dieBerechnung von Modellvariantenund die automatische Optimierungsolcher Systeme drastisch zu sen-ken. Die durch die modalen Verfah-ren bedingten Näherungsfehler wer-den untersucht, und quantitativ er-faßt. Es wird die Realisierung einesinteraktiven Programmsystems be-schrieben, welches die Vorteile, diesich durch den Einsatz der entwi-ckelten Methoden ergeben, de-monstriert.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 51

Page 55: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

8.1.2 Supervised by Professor Terhardt

[P-T13]Markus Mummert (1997):

Sprachcodierung durchKonturierung einesgehörangepaßtenSpektrogramms und ihreAnwendung zur Datenreduktion

Speech Coding by Contourizing anEar-adapted Spectrogram and itsApplication to Data Reduction

Konturen als Träger der relevantenInformation entsprechen bei derHörwahrnehmung unter anderemden hörbaren Teiltönen. Die Arbeitbehandelt Audiorepräsentationenmit Konturen, die als ‘Gratlinien’ ei-nes gehörangepaßten Spektro-gramms definiert sind. Ausgehendvon einer bekannten Repräsentati-on werden zusätzliche Gratlinienund eine neue Signalrekonstruktioneingeführt. Eine Klassifizierung derLinien trennt tonale und geräusch-hafte Signalanteile. Damit werdenSprachcodierungen mit Datenratenbis hinab zu 4 kbit/s realisiert.

[P-T14]Miriam Noemí Valenzuela (1998):

Untersuchungen undBerechnungsverfahrenzur Klangqualität vonKlaviertönen

Investigations and CalculationMethods Concerning the SoundQuality of Piano Tones

In dieser Arbeit wurden Modelle undVerfahren entwickelt, mit denen die-jenigen Schallsignalparameter er-mittelt werden können, die für denspezifischen Klang eines Klavier-tons und dessen Qualität charakte-ristisch sind. Mit Hörversuchen wur-de untersucht, worin die hörbare Un-ähnlichkeit im Klang verschiedenerKlaviertöne besteht. Die erarbeite-ten Verfahren für die meßtechnischeErfassung der Unterscheidungs-kriterien ermöglicht die gezielte Ver-besserung sowohl elektronischerals auch akustischer Klaviere. Dasentwickelte Modell für die Berech-nung der Klangqualität von Klavier-tönen könnte als automatischeKlangqualitätskontrolle sowohl fürEinzeltöne wie auch für Instrumenteeingesetzt werden.

[P-T15]Claus von Rücker (1999):

Ein Verfahren zurTonhöhenanalyseunter Berücksichtigung zeitlich-spektraler Kontrasteffekte

A Pitch Determination Algorithm Al-lowing for Temporal and SpectralContrast Effects

Die Arbeit beschreibt ein Verfahrenzur Tonhöhenanalyse nichtstationä-rer Schallsignale. Es zeichnet sichdurch die Berücksichtigung derjeni-gen wesentlichen Gehöreigenschaf-ten aus, die in psychoakustischenExperimenten zur Tonhöhenwahr-nehmung beobachtet werden kön-nen. Neben den elementaren Eigen-schaften der Frequenzanalyse desGehörs gehören dazu insbesonderezeitlich-spektrale Kontrasteffekte,die von bisherigen Verfahren nichterfaßt werden. Das Verfahren ist inder Lage, sowohl die Tonhöhen, alsauch den Zeitverlauf ihrer Promi-nenz bei zeitvarianten Schallennachzubilden.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 52

Page 56: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

8.1.3 Supervised by Professor Fastl

[P-F4]Helmut Spannheimer (1997):

Geräuschminderung imKraftfahrzeug mit aktivenResonatoren

Noise Abatement in Cars UtilizingActive Resonators

Zur Geräuschminderung in Kraft-fahrzeugen wurde ein aktiver Reso-nator entwickelt, der in einem Fre-quenzbereich von 50 Hz bis 200 Hzdie Eigenschaften eines Helmholtz-resonators bei seiner Resonanzfre-quenz nachbildet. Das System wur-de mit einem digitalen Reglerrealisiert, der über ein Mikrofon alsSensor einen Lautsprecher ansteu-ert. Die Möglichkeiten zur Schall-druckreduktion, die optimale Anord-nung und die effektivste Auslegungdes Resonators wurden mit einermodalen Schalldruckberechnungbestimmt und an einem Modellhohl-raum überprüft. Schließlich wurdedas System für verschiedene An-wendungsfälle in Fahrzeuge inte-griert, im Fahrbetrieb erprobt und dieWirksamkeit subjektiv und objektivbeurteilt.

[P-F5]Ingeborg Stemplinger (1999):

Beurteilung, Messung undPrognose der Globalen Lautheitvon Geräuschimmissionen

Rating, Measurement und Predic-tion of the Global Loudness ofNoise Immissions

Die Analyse der subjektiv empfun-denen Globalen Lautheit von Geräu-schimmissionen als Maß für dieLärmbelastung und deren meßtech-nische Nachbildung bilden die zen-trale Fragestellung dieser Arbeit. Diedurch die psychoakustischen Expe-rimente gewonnenen Daten lassensich durch die Messung der Lautheitnach DIN 45631 und anschließendePerzentilwertberechnung gehörrich-tig nachbilden. Ein statistisches Ver-fahren zur Berechnung des Vertrau-ensbereiches von Lautheitsperzen-tilen aus der Lautheits-Zeitfunktionermöglicht erstmals deren qualitäts-gesicherte Messung. Durch ein neuentwickeltes Prognoseverfahrenkann die Globale Lautheit in Abhän-gigkeit der Lärmvorbelastung desGebietes abgeschätzt werden.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 53

Page 57: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

8.1.4 Supervised by Professor Ruske

[P-R4]Franz Wolfertstetter (1996):

Verallgemeinerte stochastischeModellierung für dieautomatische Spracherkennung

Generalized Stochastic Modelingfor Automatic Speech Recognition

Die Arbeit behandelt am Beispiel na-türlicher Sprache die Probleme undLösungen bei der stochastischenModellierung und Klassifikation vonSignalen, die stark von Zufallspro-zessen bestimmt sind. Der Schwer-punkt liegt in der Nachbildung desSignalverlaufs mit neuartigen sto-chastischen Markov-Graphen, dieals ein sich verzweigendes und wie-der rekombinierendes Pfadsystemmit Zuständen variabler Streuung in-terpretiert werden können. ZumTraining der Modellparameter wer-den das sogenannte „Maximum-Li-kelihood“- und diskriminative Verfah-ren gegenübergestellt. Für dieVerarbeitung fließender Sprachewird ein System zum Training undzur Erkennung mit beliebig struktu-rierten stochastischen Modellenentwickelt.

[P-R5]Jochen Junkawitsch (2000):

Detektion von Schlüssel-wörternin fließender Sprache

Keyword-Spotting in Fluent Speech

Der Gegenstand der vorliegendenArbeit ist die Entwicklung eines neu-artigen Verfahrens für Keyword-Spotting, das auf die speziellen An-forderungen derSchlüsselwortde-tektion ausgerichtet ist und auf derdirekten Optimierung eines Konfi-denzmaßes beruht. Es werden vierverschiedene Möglichkeiten zur De-finition von Konfidenzmaßen herge-leitet und zwei alternative Suchalgo-rithmen entwickelt, die eineOptimierung dieser Konfidenzmaßegewährleisten. Ausführliche Experi-mente bestätigen die Effektivität desvorgestellten Verfahrens, indem dieFigure-Of-Merit von 81.5 % auf87.9 % gesteigert wird.

[P-R6]Thilo Pfau (2000):

Methoden zur Erhöhung derRobustheit automatischerSpracherkennungssystemegegenüber Variationen derSprechgeschwindigkeit

Methods for Improving the Robust-ness of Automatic Speech Recog-nition Systems against Variations ofSpeech Rate

In dieser Arbeit werden verschiede-ne Ansätze zur Erhöhung der Ro-bustheit automatischer Spracher-kennungssysteme gegenüberVariationen der Sprechgeschwin-digkeit untersucht. Die Basis bildenHidden Markov Modelle (HMMs). ImRahmen einer Reduktion von Intra-modell-Variationen wird eineSprechgeschwindigkeitsnormie-rung durch Interpolation, ein Verfah-ren zur Sprechernormierung undzwei Methoden zur Modellierungvon Aussprachevarianten vorge-stellt. Für eine Anpassung des Sys-tems an unterschiedliche Sprechge-schwindigkeiten wird das MaximumAposteriori Training zur Schätzungvon HMM-Parametern ausführlichdiskutiert und ein neuartiges merk-mals- und regelbasiertes Verfahrenzur Bestimmung der Sprechge-schwindigkeit präsentiert.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 54

Page 58: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

8.2 Scientific Publications

Abbreviations

DAGA Fortschritte der Akustik – Jahrestagung der Deutschen Gesellschaft für Akustik e.V. (DEGA)DFG Deutsche ForschungsgemeinschaftEEAA East European Acoustical AssociationEIS International Symposium on Engineering of Intelligent SystemsEuro-Noise European Conference on Noise ControlEurospeechEuropean Conference on Speech Communication and TechnologyICASSP IEEE International Conference on Acoustics, Speech, and Signal ProcessingICIP IEEE International Conference on Image ProcessingICSLP IEEE International Conference on Spoken Language ProcessingIEEE Institute of Electrical and Electronics Engineers, Inc.IFAC International Federation of Automatic ControlITG Informationstechnische GesellschaftJASA Journal of the Acoustical Society of AmericaSPECOM International Workshop “Speech and Computer”VDE Verband der Elektrotechnik Elektronik Informationstechnik e.V.

1997

[97bau] U. Baumann, I. Stemplinger, B. Arnold, K. Schorn: Bezugskurven für die Hörflachenskalierung in derklinischen Anwendung. Laryngo-Rhino-Otologie, Heft 8, 1997, S. 458-465.

[97bub1] U. Bub, J. Köhler, B. Imperl: In-Service Adaptation of Multilingual Hidden-Markov-Models. TagungsbandICASSP 97 (München, 21.-24.4.1997), Vol. 2, S. 1451-1454.

[97cha] J. Chalupper, W. Schmid: Akzentuierung und Ausgeprägtheit von Spektraltonhöhen bei harmonischenKomplexen Tönen. Tagungsband DAGA 97 (Kiel, 1997), S. 357-358.

[97fas1] H. Fastl: Gehörgerechte Geräuschbeurteilung. Tagungsband DAGA 97 (Kiel, 1997), S. 57-64.[97fas2] H. Fastl, W. Schmid: Comparison of Loudness Analysis Systems. Tagungsband inter-noise 97

(Budapest, Ungarn, 1997), Band II, S. 981-986.[97fas3] H. Fastl: The Psychoacoustics of Sound-Quality Evaluation. Acustica/Acta Acustica, Band 83 (1997), S.

754-764.[97fas4] H. Fastl: Psychoacoustic Noise Evaluation. Tagungsband Acoustics - High Tatras’97 (Vysoké Tatry,

Slowakei, 1997) S. 21-26.[97got] G. Gottschling, H. Fastl: Akustische Simulation von 6-Sektionen-Fahrzeugen des Transrapid.

Tagungsband DAGA 97 (Kiel, 1997), S. 254-255.[97hau1] M. Haubner, C. Krapichler, A. Lösch et al.: Virtual reality in medicine-computer graphics and interaction

techniques. IEEE Transactions on Information Technology in Biomedicine, März 1997, 1(1) S. 61-72.[97hoj] E. Hojan, I. Stemplinger, H. Fastl: Zur Verständlichkeit deutscher Sprache im Störgeräusch nach Fastl

durch polnische Hörer mit verschiedenen Deutschkenntnissen. Audiologische Akustik - AudiologicalAcoustics, Band 36 (1997), Heft 1, S. 32-37.

[97hol] M. Holzapfel, G. Ruske, H. Höge: Failure Simulation for a Phoneme HMM Based Keyword Spotter.Tagungsband ICASSP 97 (München, 1997), S. 911-914.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 55

Page 59: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[97jae] K. Jäger, H. Fastl, F. Schöpf, G. Gottschling, U. Möhler: Wahrnehmung von Pegeldifferenzen beiVorbeifahrten von Güterzügen. Tagungsband DAGA 97 (Kiel, 1997), S. 228-229.

[97jun] J. Junkawitsch, G. Ruske, H. Höge: Efficient Methods for Detecting Keywords in Continuous Speech.Tagungsband Eurospeech ‘97, (Rhodos, Griechenland, 1997), S. 259-262.

[97kra] C. Krapichler, M. Haubner, A. Lösch, M. Lang, K.-H. Englmeier: Human-Machine Interface for a VR-based Medical Imaging Environment. Tagungsband SPIE Medical Imaging ‘97 (Newport Beach, USA,1997), S. 527-534.

[97kra2] C. Krapichler, M. Haubner, A. Lösch, K.-H. Englmeier: A human-machine interface for medical imageanalysis and visualization in virtual environments. Tagungsband ICASSP 97 (München, 21.-24.4.1997),Vol. 4, S. 2613-2616.

[97kuw] S. Kuwano, S. Namba, H. Fastl, A. Schick: Evaluation of the Impression of Danger Signals - Comparisonbetween Japanese and German Subjects. In A. Schick (Hrsg.): 7. Oldenburger Symposium, BISOldenburg, 1997, S. 115-118.

[97len1] H. Lenz, D. Obradovic: Effectivity of the Stabilization of Higher Period Orbits of Chaotic Maps. Proc. of Int.Conf. on Control of Oscillations and Chaos (COC’97, St. Petersburg, Rußland, 27.-29.08.1997), Vol. 1,S. 152-155.

[97len2] H. Lenz, R. Berstecher: Sliding-Mode Control of Chaotic Pendulum: Stabilization and Targeting of anUnstable Periodic Orbit. Proc. of Int. Conf. on Control of Oscillations and Chaos (COC’97, St. Petersburg,Rußland, 27.-29.08.1997), Vol. 3, S. 586-589.

[97len3] H. Lenz, D. Obradovic: Global Control of Lorenz Chaos. Proc. of IEEE Conf. on Decision and Control(CDC 97, San Diego, USA, 10.-12.12.1997), Vol. 2, S. 1486-1487.

[97len4] H. Lenz, D. Obradovic: Robust Control of the Chaotic Lorenz System. Int. Journal of Bifurcation andChaos, 1997, Vol. 7, Nr. 12, S. 2847-2854.

[97mor1] P. Morguet, M. Lang: Feature Extraction Methods for Consistent Spatio-Temporal Image SequenceClassification Using Hidden-Markov-Models. Tagungsband ICASSP 97 (München, 1997), S. 2893-2896.

[97mor2] P. Morguet, M. Lang: A Universal HMM-Based Approach to Image Sequence Classification.Tagungsband ICIP 97 (Santa Barbara, USA, 1997), S. III/146-III/149.

[97mue] J. Müller, H. Stahl: The Semantic Structure in Comparison with Other Semantic Representations.Tagungsband SPECOM 97 (Cluj-Napoca, Rumänien, 1997), S. 7-12.

[97obr1] D. Obradovic, H. Lenz: When is OGY Control more than just Pole Placement. Int. Journal of Bifurcationand Chaos, 1997, Vol. 7, Nr. 3, S. 691-699.

[97pfa] T. Pfau, M. Beham, W. Reichl, G. Ruske: Creating Large Subword Units for Speech Recognition.Tagungsband Eurospeech ‘97, (Rhodos, Griechenland, 1997), S. 1191-1194.

[97rue] C. von Rücker: Berechnung von Erregungsverteilungen aus FTT-Spektren. Tagungsband DAGA 97 (Kiel,1997), S. 484-485.

[97sch1] R. Schirmacher, P. Maier, H. Fastl, J. Scheuren: Aktive Geräuschminderung an einem Lüftungskanal, S.199-200. Tagungsband DAGA 97 (Kiel, 1997).

[97sch2] W. Schmid, J. Chalupper: Die Ausgeprägtheit der Tonhöhe als psychoakustisches Kriterium zurQualitätsbeurteilung elektroakustischer Komponenten - warum ist der Phasengang des Differenztons 2.Ordnung so wichtig? 19. Tonmeistertagung - International Convention on Sound Design (Karlsruhe,1996), Bildungswerk des Verbandes Deutscher Tonmeister (VDT), Berlin, 1997, S. 861-874.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 56

Page 60: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[97sch3] W. Schmid: Zur Ausgeprägtheit der Tonhöhe gedrosselter und amplitudenmodulierter Sinustöne.Tagungsband DAGA 97 (Kiel, 1997), S. 355-356.

[97sta] H. Stahl, J. Müller, M. Lang: Controlling Limited-Domain Applications by Probabilistic Semantic Decodingof Natural Speech. Tagungsband ICASSP 97 (München, 1997), S. 1163-1166.

[97ste1] I. Stemplinger, M. Schiele, B. Meglic, H. Fastl: Einsilberverständlichkeit in unterschiedlichenStörgeräuschen für Deutsch, Ungarisch und Slowenisch. Tagungsband DAGA 97 (Kiel, 1997), S. 77-78.

[97ste2] I. Stemplinger: Beurteilung der Globalen Lautheit bei Kombination von Verkehrsgeräuschen mitsimulierten Industriegeräuschen. Tagungsband DAGA 97 (Kiel, 1997), S. 353-354.

[97ste3] I. Stemplinger, G. Gottschling: Auswirkungen der Bündelung von Verkehrswegen auf die Beurteilung derGlobalen Lautheit. Tagungsband DAGA 97 (Kiel, 1997), S. 401-402.

[97ste4] I. Stemplinger, H. Fastl: Accuracy of Loudness Percentile Versus Measurement Time. Tagungsbandinter-noise 97 (Budapest, Ungarn, 1997), Band III, S. 1347-1350.

[97ter] E. Terhardt: Lineares Modell der peripheren Schallübertragung im Gehör. Tagungsband DAGA 97 (Kiel,1997), S. 367-368.

[97val] M.N. Valenzuela: Extraktion gehörrelevanter Schallsignalparameter aus Flügelklängen. TagungsbandDAGA 97 (Kiel, 1997), S. 321-322.

[97wag1] C. Wagner: Successive deceleration in Boltzmann-like traffic equations. Physical Review E, 1997, Vol.55, S. 6969.

[97wag2] C. Wagner: A Navier-Stokes-like traffic model. Physica A., 1997, Vol. 245, S. 124.[97win1] H.-J. Winkler, M. Lang: On-line Symbol Segmentation and Recognition in Handwritten Mathematical

Expressions. Tagungsband ICASSP 97 (München, 1997), S. 3377-3380.[97win2] H.-J. Winkler, M. Lang: Symbol Segmentation and Recognition for Understanding Handwritten

Mathematical Expressions. In: A.C. Downton, S. Impedovo (Hrsg.): „Progress in HandwritingRecognition“, Tagungsband 5th International Workshop on Frontiers in Handwriting Recognition IWFHR-5 (Essex, England, 1996), World Scientific, 1997, S. 407-412.

[97wol] H. Wollherr, S. Goossens, K.D. Ruth, H. Fastl: Horizontal / Vertikal differenziertes Bündelungsmaß.Tagungsband DAGA 97 (Kiel, 1997), S.123-124.

1998

[98bub1] U. Bub, H. Höge: Boosting Long-Term Adaptation of Hidden-Markov-Models: Incremental Splitting ofProbability Density Functions. Proc. of ICASSP 98 (Seattle, USA, 12.-15.5.1998), Vol. 1, S. 429-432.

[98cha1] J. Chalupper, K. Spasokukotskij, I. Stemplinger, H. Fastl: Ein Zweisilber-Sprachtest für Ukrainisch.Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 310-311.

[98eng1] K.-H. Englmeier, C. Krapichler, M. Haubner et al.: Virtual reality and multimedia human-computerinteraction in medicine. Proc. of IEEE Workshop on Multimedia Signal Processing (MMSP ’1998, LosAngeles, USA), S. 193-202.

[98eng2] K.-H. Englmeier, C. Krapichler et al.: A New Hybrid Renderer for Virtual Bronchoscopy. J.D. Westwood etal. (Eds.): Studies in Health Technology and Informatics: Medicine Meets Virtual Reality, IOS PressAmsterdam, 1999, Band 62, S. 109-115.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 57

Page 61: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[98fal1] R. Faltlhauser, G. Ruske: Automatische Topologiegenerierung für kontinuierliche Hidden-Markov-Modelle. Tagungsband ITG-Fachtagung „Sprachkommunikation“ (Dresden, 1998),S.33-36.

[98fas1] H. Fastl, Th. Filippou, W. Schmid, S. Kuwano, S. Namba: Psychoakustische Beurteilung der Lautheit vonGeräuschimmissionen verschiedener Verkehrsträger. Tagungsband DAGA 98 (Zürich, Schweiz, 1998),S. 70-71.

[98fas2] H. Fastl, H. Oberdanner, W. Schmid, I. Stemplinger, I. Hochmair-Desoyer, E. Hochmair: ZumSprachverständnis von Cochlea-Implantat-Patienten bei Störgeräuschen. Tagungsband DAGA 98(Zürich, Schweiz, 1998), S. 358-359.

[98fas3] H. Fastl, W. Schmid: Vergleich von Lautheits-Zeitmustern verschiedener Lautheits-Analysesysteme.Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 466-467.

[98fas4] H. Fastl: Psychoacoustics and Sound Quality Metrics. Proceedings of the 1998 Sound QualitySymposium (Ypsilanti, Michigan, USA, 1998) S. 3-10.

[98fas5] H. Fastl: Intelligibility of Speech in Noise by Cochlea-Implant Patients. Proceedings Fall Meeting,Acoustical Society of Japan (Yonezawa, Japan, 1998) Vol. 1, S. 359-360.

[98fas6] H. Fastl: Pitch Strength and Frequency Discrimination for Noise Bands or Complex Tones."Psychophysical and Physiological Advances in Hearing", Whurr Publishers Ltd. London, England, S.238-245.

[98fas7] H. Fastl, J. Scheuren (Eds.): euro-noise 98, Designing for Silence. Proceedings of Euro-Noise 98(München 1998).

[98got1] G. Gottschling, W. Schmid, H. Fastl: Vergleich psychoakustischer Methoden zur Skalierung derLautstärke: I. Grundlagen. Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 476-477.

[98got2] G. Gottschling, H. Fastl: Prognose der globalen Lautheit von Geräuschimmisionen anhand der Lautheitvon Einzelereignissen. Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 478-479.

[98hau1] M. Haubner, C. Krapichler et al.: Neue Techniken der virtuellen Realität für den Einsatz in der Medizin.EDV-Jahrbuch 1998, Hüthig Fachverlage, Heidelberg, 1998, Computer in Zahnarztpraxis undDentallabor, S. 90-93.

[98hor1] T. Horn: Image Processing of Speech with Auditory Magnitude Spectrograms. Acustica/Acta Acustica,Band 84 (1998), S. 175-177.

[98hut1] Ch. Huth, W. Schmid: Psychoakustische Untersuchungen zur Ausgeprägtheit der Tonhöhe beimusikalischen Klängen mit und ohne Vibrato. Tagungsband DAGA 98 (Zürich, Schweiz 1998), S. 448-449.

[98koe1] J. Köhler: Language Adaptation of Multilingual Phoneme Models for Vocabulary Independent SpeechRecognition Tasks. Proc. of ICASSP 98 (Seattle, USA, 12.-15.5.1998), Vol. 1, S. 417-420.

[98kra1] C. Krapichler, M. Haubner et al.: VR interaction techniques for medical imaging applications. ComputerMethods and Programs in Biomedicine, April 1998, 56(1), S. 65-74.

[98len1] H. Lenz, R. Berstecher, M. Lang: Adaptive Sliding-Mode Control of the Absolute Gain. Proc. of 4th IFACNonlinear Control System Design Symposium (NOLCOS 98, Enschede, Niederlande, 01.-03.07.1998),Vol. 3, S. 667-672.

[98mas1] D. Mass, M. Lang: Effects of Porous Absorbing Material on the Coupling of Mode Shapes via the ModalDamping Matrix of a Fluid. Tagungsband Euro Noise 98, Vol. 2 (München, 1998), S. 883-888.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 58

Page 62: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[98mor1] P. Morguet, M. Lang: An Integral Stochastic Approach to Image Sequence Segmentation andClassification. Tagungsband ICASSP 98 (Seattle, USA, 1998), S. 2705-2708.

[98mor2] P. Morguet, M. Lang: Spotting Dynamic Hand Gestures in Video Image Sequences Using Hidden MarkovModels. Tagungsband ICIP 98 (Chicago, USA, 1998), Vol. 3, S. 193-197.

[98mor3] P. Morguet, M. Lang: Dynamische Gesten und indirekte Manipulation als Grundlage für eine intuitiveMensch-Maschine-Kommunikation. Tagungsband der ITG-Fachtagung: Technik für den Menschen -Gestaltung und Einsatz benutzerfreundlicher Produkte, 1988 (Eichstätt, 1998), S. 131-139.

[98mue1] J. Müller, H. Stahl: Speech Understanding and Speech Translation in Various Domains by Maximum a-posteriori Semantic Decoding. Speech Understanding and Speech Translation in Various Domains byMaximum a-posteriori Semantic Decoding. Tagungsband EIS 98 (La Laguna, Spanien, 1998), Band 2„Neural Networks“, S. 256-267.

[98mue2] J. Müller, C. Krapichler, L.S. Nguyen, K.H. Englmeier, M. Lang: Speech Interaction in Virtual Reality.Tagungsband ICASSP 98 (Seattle, USA, 1998), S. 3757-3760.

[98mue3] J. Müller, M. Lang: Verstehen natürlicher Sprache für den Mensch-Maschine-Dialog. In H. Bubb (Hrsg.):„Bay-FORERGO - Plädoyer für einen bayerischen Forschungsverbund Ergonomie“, Herbert Utz Verlag,München, 1998, S. 91-113.

[98mum1] M. Mummert: Sprachcodierung durch Konturierung eines gehörangepaßten Spektogramms und ihreAnwendung zur Datenreduktion. Düsseldorf: VDI Verlag, 1998, 240 S., 42 Abb., 11 Tab. (VDI Reihe 10:Informatik/Kommunikationstechnik, Nr. 522)

[98pfa1] T. Pfau, G. Ruske: Estimating the Speaking Rate by Vowel Detection. Tagungsband ICASSP 98 (Seattle,USA, 1998), S. 945-948.

[98pfa2] T. Pfau, G. Ruske: Creating Hidden Markov Models for Fast Speech. Tagungsband ICSLP 98 (Sydney,Australien 1998), Paper no. 255.

[98rue1] C. von Rücker: Spektraltonhöhenanalyse unter Berücksichtigung von Akzentuierung. TagungsbandDAGA 98 (Zürich, Schweiz, 1998), S. 498-499.

[98rus1] G. Ruske, R. Faltlhauser, T. Pfau: Extended Linear Discriminant Analysis (ELDA) for SpeechRecognition. Tagungsband ICSLP 98 (Sydney, Australien 1998), Paper no. 100.

[98sch1] W. Schmid, J. Chalupper: Spektraltonhöhen Komplexer Töne: Psychoakustische Experimente undBerechnung der Ausgeprägtheit der Tonhöhe. Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 480-481.

[98sch2] W. Schmid: Die akzentuierende Wirkung auf Spektraltonhöhen Komplexer Töne. Tagungsband DAGA 98(Zürich, Schweiz, 1998), S. 468-469.

[98sch3] W. Schmid: Zur Ausgeprägtheit der Tonhöhe von Rauschen mit zeitvarianter Bandbegrenzung.Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 470-471.

[98sch4] A. Schreyer, G. Suda, G. Maderlechner: Font Style Detection Using Textons. Proc. of Document AnalysisSystems Workshop (DAS 98, Nagano, Japan, 4.-6.11.1998), S. 99-108.

[98sch5] A. Schreyer, G. Suda, G. Maderlechner: The Idea of Attention-Based Document Analysis. Proc. ofDocument Analysis Systems Workshop (DAS 98, Nagano, Japan, 4.-6.11.1998), S. 214-217.

[98ste1] I. Stemplinger, Th. Filippou: Psychoakustische Untersuchungen zur Lautheit und zur Lästigkeit vonTennislärm. Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 66-67.

[98ter1] E. Terhardt: Akustische Kommunikation. Springer, Berlin/Heidelberg, 1998, 505 S., 221 Abb. 15 Tab., 1DDD Audio-CD.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 59

Page 63: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[98val1] M.N. Valenzuela: Bewertung der Gehörrelevanz von Partialtonzeitstrukturen in Klaviertönen.Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 632-633.

[98val2] M.N. Valenzuela: Untersuchungen und Berechnungsverfahren zur Klangqualität von Klaviertönen.München: Herbert Utz Verlag Wissenschaft, 1998, 154 S.

[98val3] J. Vallen, G. Ruske: Acoustic Emission Source Discrimination by Analyzing Short-term FrequencySpectra. Intern. Symposium on Acoustic Emission: Standards and Technology Update, AmericanStandards for Testing Materials ASTM STP 1353 (Plantation, Florida, USA, 1998), Paper ID #4604.

[98wag1] C. Wagner: Traffic flow models considering an internal degree of freedom. Journal of Statistical Physics,1998, Vol. 90, S. 1251.

[98wid1] U. Widmann, R. Lippold, H. Fastl: Ein Computerprogramm zur Simulation der Nachverdeckung fürAnwendungen in Akustischen Meßsystemen. Tagungsband DAGA 98 (Zürich, Schweiz, 1998), S. 96-97.

[98wid2] U. Widmann, R. Lippold, H. Fastl: A Computer Program Simulating Post-Masking for Applications inSound Analysing Systems. Proceedings of NOISE-CON ‘98 (Ypsilanti, Michigan, USA, 1998), S. 451-456.

[98wid3] U. Widmann, H. Fastl: Calculating Roughness using Time-Varying Specific Loudness Spectra.Proceedings of the 1998 Sound Quality Symposium (Ypsilanti, Michigan, USA, 1998), S. 55-60.

[98wit1] L. Witta, M. Lang: A Finite Element for Bulk Reacting Porous Sound Absorbers. Tagungsband Euro Noise98, Vol. 2 (München, 1998), S. 943-948.

1999

[99cha1] Chalupper, J.: Loudness fluctuation and temporal masking in normal and hearing-impaired listeners.JASA Vol. 105, No. 2, Pt. 2 (1999), S. 1023.

[99fal1] Faltlhauser, R.; Pfau, T.; Ruske, G.: Hidden Markov Models for Fast Speech by Optimized Clustering.Proc. of the 6th European Conf. on Speech Communication and Technology, Budapest, Ungarn, 5.-9.9.1999. Hrsg.: G. Olaszy et al. Bonn: ESCA, 1999, S. 407-410. (EUROSPEECH’99; Vol. 1)

[99fas1] Fastl, H.: Analysis Systems for Psychoacoustic Magnitudes. 8th Oldenburg Symposium onPsychological Acoustics, Bad Zwischenahn, 29.8.-1.9.1999. Eds.: A. Schick, M. Meis, C. Reckhardt.Oldenburg: BIS-Verlag, 2000, S. 85-101. (Contributions to Psychological Acoustics)

[99fas2] Fastl, H.: Psychoacoustic evaluation of noise emissions. JASA Vol. 105, No. 2, Pt. 2 (1999), S. 1082-1083.

[99fil1] Filippou, Th.: Comparison of Subjective and Physical Evaluation of Tennis-Noise. Proceedings of the28th Int. Congress on Noise Control Engineering, Fort Lauderdale, Florida, USA, 6.-8.12.1999. Eds.: J.Cushieri, St. Glegg, Y. Yong. Washington DC: Inst. of Noise Control Engineering, 1999. S. 1881-1886.(INTER-NOISE 99; Vol. 3)

[99fil2] Filippou, Th.; Fastl, H.: Estimates of the instantaneous versus overall loudness of noise emissions. JASAVol. 105, No. 2, Pt. 2 (1999), S. 1298.

[99got1] Gottschling, G.: On the relations of instantaneous and overall loudness. Acustica/acta acustica, 1999,Vol. 85, Nr. 3, S. 427-429.

[99hir1] Hirsch, H.S.; Wiegrebe, L.; Patterson, R.D.; Fastl, H.: Temporal dynamics of pitch strength; frequencyeffects and auditory modelling. British J. Audiology, 33.2 (1999), S. 111.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 60

Page 64: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[99hut1] Huth, Ch.; Fastl, H.; Hölzl, G.; Widmann, U.: Sound quality design for high-speed trains’ indoor noise:Psychoacoustic evaluation of tonal components. JASA Vol. 105, No. 2, Pt. 2 (1999), S. 1280.

[99kra1] Krapichler, Ch.: Eine neue Mensch-Maschine-Schnittstelle für die Analyse medizinischer 3D-Bilddatenin einer virtuellen Umgebung. München: Herbert Utz Verlag Wissenschaft, 1999, 130 S.

[99kra2] Krapichler, Ch.; Haubner, M. et al.: Physicians in virtual environments: Multimodal human-computerinteraction. Interacting with Computers, 1999, Vol. 11, 4, S. 427-452.

[99kuw1] Kuwano, S.; Namba, S.; Florentine, M.; Zheng, D.R.; Fastl, H.; Schick, A.: A cross-cultural study of thefactors of sound quality of environmental noise. JASA Vol. 105, No. 2, Pt. 2 (1999), S. 1081.

[99kuw2] Kuwano, S; Namba, S.; Florentine, M.; Zheng, D.R.; Fastl, H.; Schick, A.; Weber, R.; Höge, H.: A cross-cultural study of the factors of sound quality of environmental noise. ASA/DEGA/EAA ConferenceFORUM ACUSTICUM, Berlin, 14.-19.3.1999. Hrsg.: DEGA, Oldenburg, als CD-ROM (Collected Papersfrom the Joint Meeting „Berlin 99“).

[99kuw3] Kuwano, S.; Fastl, H.; Namba, S.: Loudness, Annoyance and Unpleasantness of Amplitude ModulatedSounds. Proceedings of the 28th Int. Congress on Noise Control Engineering, Fort Lauderdale, Florida,USA, 6-8.12.1999. Eds.: J. Cushieri, St. Glegg, Y. Yong. Washington DC: Inst. of Noise ControlEngineering, 1999. S. 1195-1200. (INTER-NOISE 99; Vol. 2)

[99len1] Lenz, H.; Obradovic, D.: Stabilizing Higher Periodic Orbits of Chaotic Discrete-Time Maps. Int. Journal ofBifurcation and Chaos, 1999, Vol. 9, Nr. 1, S. 251-266.

[99len2] Lenz, H.; Sollacher, R.: Nonlinear Speed-Control for a Continuum Theory of Traffic Flow. Proc. of the 14thWorld Congress of Int. Federation of Automatic Control, IFAC, Peking, China, 05.-09.01.1999, Vol. Q, S.67-72.

[99len3] Lenz, H.; Wagner, C.; Sollacher, R.: Multi-Anticipative Car-Following Model. European Physical JournalB, Vol. 7, S. 331-335.

[99mad1] Maderlechner, G.; Schreyer, A.; Suda, P.: Information Extraction from Document Images Using AttentionBased Layout Segmentation. Proc. of Document Layout Interpretation and its Applications Workshop(DLIA 99, Bangalore, Indien, 18.09.1999) Online-Proceedings, Beitrag Ib: http://www.science.uva.nl/events/dlia99/.

[99mor1] Morguet, P.; Lang, M.: Comparison of Approaches to Continuous Hand Gesture Recognition for a VisualDialog System. Proceedings ICASSP 99 (Phoenix, Arizona, USA, 15.-19.3.1999), IEEE, Vol. 6, S. 3549-3552.

[99mue1] Müller, J.; Stahl, H.: Speech Understanding and Speech Translation by Maximum a-Posteriori SemanticDecoding. Artificial Intelligence in Engineering, Vol. 13 (1999), No. 4, Elsevier-Verlag, S. 373-384.

[99pfa1] Pfau, T.; Faltlhauser, R.; Ruske, G.: Speaker Normalization and Pronunciation Variant Modeling: HelpfulMethods for Improving Recognition of Fast Speech. Proc. of the 6th European Conf. on SpeechCommunication and Technology, Budapest, Ungarn, 5.-9.9.1999. Hrsg.: G. Olaszy et al. Bonn: ESCA,1999, S.299-302. (EUROSPEECH ’99; Vol. 1)

[99rus1] Ruske, G.; Lee, K.Y.: Speech recognition and enhancement by a nonstationary AR HMM with gainadaptation under unknown noise. Proceedings ICASSP 99 (Phoenix, Arizona, USA, 15.-19.3.1999).IEEE, 1999, Vol. 1, S. 441-444.

[99sch1] Schmid, W.: Zur Ausgeprägtheit der Tonhöhe: Konzepte und neuere Ergebnisse psychoakustischerExperimente. Tagungsband der 20. Tonmeistertagung - International Convention on Sound Design,Karlsruhe, 20.-23.11.1998. Hrsg.: Bildungswerk des Verbandes Deutscher Tonmeister (VDT), Bergisch-

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 61

Page 65: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

Gladbach. München: Sauer-Verlag, 1999, S. 1051-1066.[99sch2] Schreyer, A.; Suda, P.; Maderlechner, G.: A Formal Approach To Textons and its Application to Font Style

Detection. Lee, S.-W.; Nakano, Y. (Eds.): Document Analysis Systems; Theory and Practice, Third IAPRWorkshop, DAS’98, Nagano, Japan, 04.-06.11.1998, Selected Papers, Springer-Verlag, 1999, LectureNotes in Computer Science, Vol. 1655.

[99ste1] Stemplinger, I.: Beurteilung, Messung und Prognose der Globalen Lautheit von Geräuschimmissionen.München: Herbert Utz Verlag Wissenschaft, 1999, 112 S.

[99wie1] Wiegrebe, L.; Hirsch, H.S.; Patterson, R.D.; Fastl, H.; : Time constants of pitch processing arising fromauditory filtering. JASA Vol. 105, No. 2, Pt. 2 (1999), S. 1234.

[99zwi1] Zwicker, E.†; Fastl, H.: Psychoacoustics Facts and Models. 2nd updated edition. Berlin/Heidelberg:Springer-Verlag, 1999, 416 S., 289 Abb.

2000

[00bau1] Baumann, U.: Identification and segregation of multiple auditory objects. In: Auditory Worlds: SensoryAnalysis and Perception in Animals and Man (siehe [00man1]), S. 274-278.

[00cha1] Chalupper, J.; Fastl, H.: Simulation of Hearing Impairment Based on the Fourier Time Transformation.Proc. ICASSP’2000, Istanbul, Türkei, 5.-9.6.2000. IEEE, Vol. 2, S. 857-860.

[00cha2] Chalupper, J.: Modellierung der Lautstärkeschwankung für Normal- und Schwerhörige. TagungsbandDAGA 2000, 26. Jahrestagung Akustik, Oldenburg, 20.-24.3.2000, S. 254-255.

[00cha3] Chalupper, J.: Aural Exciter and Loudness Maximizer: What’s Psychoacoustic about "PsychoacousticProcessors"? 109th AES Convention in Los Angeles, USA, 22.-25.9.2000, Audio Engineering SocietyPreprint, Nr. 5208.

[00cha4] Chalupper, J.; Wimmer, H.; Schmid, W.: Sprachübertragung mit Induktionsschleifen (SpeechTransmission by Induction Loops: Physical and Psychoacoustic Measurements). Tagungsband der 21.Tonmeistertagung – VDT International Audio Convention, Hannover, 24.-27. November 2000, VerbandDeutscher Tonmeister.

[00fal1] Faltlhauser, R.; Pfau, T.; Ruske, G.: On-line Speaking Rate Estimation using Gaussian Mixture Models.Proc. ICASSP’2000, Istanbul, Türkei, 5.-9.6.2000. IEEE, Vol. 3, S. 1355-1358.

[00fal2] Faltlhauser, R.; Pfau, T.; Ruske, G.: On-line Speaking Rate Estimation Using a GMM/NN Approach.Tagungsband ITG-Fachtagung "Sprachkommunikation", Ilmenau, 9.-12.10.2000, VDE Verlag, Berlin,Offenbach 2000, Hrsg.: ITG, S. 101-105.

[00fal3] Faltlhauser, R.; Pfau, T.; Ruske, G.: On the Use of Speaking Rate as a Generalized Feature to ImproveDecision Trees. Proc. of ICSLP 2000, Peking, China, 16.-20.10.2000, China Military Friendship Publish,Vol. 1, S. 317-320.

[00fas1] Fastl, H.: The presentation of stimuli in psychoacoustics. In: Auditory Worlds: Sensory Analysis andPerception in Animals and Man (siehe [00man1]), S. 246-247.

[00fas2] Fastl, H.: Masking effects. In: Auditory Worlds: Sensory Analysis and Perception in Animals and Man(siehe [00man1]), S.247-251.

[00fas3] Fastl, H.: Basic hearing sensations. In: Auditory Worlds: Sensory Analysis and Perception in Animals andMan (siehe [00man1]), S. 251-258.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 62

Page 66: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[00fas4] Fastl, H.: Loudness and noise evaluation. In: Auditory Worlds: Sensory Analysis and Perception inAnimals and Man (siehe [00man1]), S. 258-267.

[00fas5] Fastl, H. for Zwicker, E.†: Psychoacoustically-based models of the inner ear. In: Auditory Worlds: SensoryAnalysis and Perception in Animals and Man (siehe [00man1]), S. 76-80.

[00fas6] Fastl, H. for Zwicker, E.†: Otoacoustic Emissions in human test subjects. In: Auditory Worlds: SensoryAnalysis and Perception in Animals and Man (siehe [00man1]), S. 120-127.

[00fas7] Fastl, H.: Railway Bonus and Aircraft Malus: Subjective and Physical Evaluation. Proc. of the 5th Int.Symposium Transport Noise and Vibration, TRANSPORT NOISE 2000, 6.-8.6.2000, St. Petersburg,Russland. CD-ROM publ. by EEAA.

[00fas8] Fastl, H.: Sound Measurements Based on Features of the Human Hearing System. Tagungsband DAGA2000, 26. Jahrestagung Akustik, Oldenburg, 20.-24.3.2000, S. 90-91.

[00fas9] Fastl, H.; Patsouras, D.: Pure tone plus bandlimited noise as Zwicker-tone-exciter. Tagungsband DAGA2000, 26. Jahrestagung Akustik, Oldenburg, 20.-24.3.2000, S. 306-307.

[00fas10] Fastl, H.: Noise Evaluation Based on Hearing Sensations. Proc. of the 7th Western Pacific RegionalAcoustics Conference, WESTPRAC VII, 3.-5.10.2000, Kumamoto, Japan. The Acoustical Society ofJapan, Tokyo, Vol. 1, S. 33-41.

[00fas11] Fastl, H.: Sound Quality of Electric Razors – Effects of Loudness. Proc. INTER-NOISE’2000, 28.-30.8.2000, Nizza, Frankreich. CD-ROM.

[00hai1] Haiber, U.; Mangold, H.; Pfau, T.; Regel-Brietzmann, P.; Ruske, G.; Schleß, V.: Robust Recognition ofSpontaneous Speech. W. Wahlster (Ed.): Verbmobil: Foundations of Speech-to-Speech Translation.Springer-Verlag Berlin, Heidelberg, 2000, S. 46-62.

[00hof1] Hofmann, M.; Lang, M.: Belief Networks for a Syntactic and Semantic Analysis of Spoken Utterances forSpeech Understanding. Proc. ICSLP 2000, Peking, China, 16.-20.10.2000, China Military FriendshipPublish, Vol. II, S. 875-878.

[00hun1] Hunsinger J.; Lang, M.: A Speech Understanding Module for a Multimodal Mathematical Formula Editor.Proceedings ICASSP 2000, Istanbul, Türkei, 5.-9.6.2000. IEEE, Vol. 4, S. 2413-2416.

[00hun2] Hunsinger, J.; Lang, M.: A Single-Stage Top-Down Probabilistic Approach towards UnderstandingSpoken and Handwritten Mathematical Formulas. Proc. ICSLP 2000, Peking, China, 16.-20.10.2000,China Military Friendship Publish, Vol. 4, S. 386-389.

[00jun1] Junkawitsch, J.: Detektion von Schlüsselwörtern in fließender Sprache. Aachen: Shaker Verlag, 2000,149 S. (Berichte aus der Informatik)

[00koe1] Köhler, J.: Erstellung einer statisch modellierten multilingualen Lautbibliothek für die Spracherkennung.Aachen: Shaker Verlag, 2000, 158 S. (Berichte aus der Informatik)

[00kru1] Krump, G.: Der akustische Nachton: Beschreibung und Funktionsschema. Beiträge zur Vibro- undPsychoakustik 3/00. Neubiberg: Universität der Bundeswehr München, 2000, Hrsg.: H. Fleischer, H.Fastl, 101 S.

[00kuw1] Kuwano, S.; Namba, S.; Schick, A.; Höge, H.; Fastl, H.; Filippou, Th.; Florentine, M.; Muesch, H.: TheTimbre and Annoyance of Auditory Warning Signals in Different Countries. Proc. INTER-NOISE’2000,28.-30.8.2000, Nizza, Frankreich. CD-ROM.

[00lan1] Lang, M.; Reichwald, R. (Hrsg.): Anwenderfreundliche Kommunikationssysteme. Tagungsband vomKongress vom 17.-19.6.1999 in München. Heidelberg: Hüthig Verlag, 2000, 376 S. (ForumTelekommunikation des Münchner Kreises, Bd. 17)

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 63

Page 67: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Pub

licat

ions

[00mad1] Maderlechner, G.; Schreyer, A.; Suda, P.: Extraction of Relevant Information from Document ImagesUsing Measures of Visual Attention. Proc. of the 15th Int. Conf. on Pattern Recognition, ICPR’2000,Barcelona, Spanien, 03.-08.09.2000, Int. Assoc. of Pattern Recognition IAPR, Surrey, UK, 2000, Vol. 4,S. 385-388.

[00man1] Manley, G.A.; Fastl, H.; Kössl, M.; Oeckinghaus, H.; Klump, G. (Eds.): DFG: Auditory Worlds: SensoryAnalysis and Perception in Animals and Man. Final report of the collaborative research centre 204,„Nachrichtenaufnahme und -verarbeitung im Hörsystem von Vertebraten (Munich)“, 1983-1997.Weinheim: Wiley-VCH Verlag, 2000, 359 S.

[00nie1] Niedermaier, B.: "Eyes free - Hands free" oder "Zeit der Stille". Ein Demonstrator zur multimodalenBedienung im Automobil. DGLR-Bericht 2000-02 der 42. Fachausschusssitzung Anthropotechnik,München, 24.-25.10.2000, "Multimodale Interaktion im Bereich der Fahrzeug- und Prozessführung", S.299-307.

[00pat1] Patsouras, Ch.: Zur Unterscheidbarkeit und Bevorzugung der Werckmeisterschen Temperaturgegenüber der Gleichschwebung. Tagungsband DAGA 2000, 26. Jahrestagung Akustik, Oldenburg, 20.-24.3.2000, S. 224-225.

[00pat2] Patsouras, Ch.; Fastl, H.; Widmann, U.; Hölzl, G.: Privacy versus Sound Quality in High Speed Trains.Proc. INTER-NOISE’2000, 28.-30.8.2000, Nizza, Frankreich. CD-ROM.

[00pfa1] Pfau, T.; Faltlhauser, R.; Ruske, G.: A Combination of Speaker Normalization and Speech RateNormalization for Automatic Speech Recognition. Proc. of ICSLP 2000, Peking, China, 16.-20.10.2000,China Military Friendship Publish, Vol. 4, S. 362-365.

[00rue1] von Rücker, C.: Ein Verfahren zur Tonhöhenanalyse unter Berücksichtigung zeitlich-spektralerKontrasteffekte. München: Herbert Utz Verlag Wissenschaft, 2000, 131 S.

[00rue2] von Rücker, C: The role of accentuation of spectral pitch in auditory information processing. In: AuditoryWorlds: Sensory Analysis and Perception in Animals and Man (siehe [00man1]), S 278-285.

[00sai1] Said, A.; Fleischer, D.; Fastl, H.; Grütz, H.-P.; Hölzl, G.: Laborversuche zur Ermittlung vonUnterschiedsschwellen bei der Wahrnehmung von Erschütterungen im Schienenverkehr. TagungsbandDAGA 2000, 26. Jahrestagung Akustik, Oldenburg, 20.-24.3.2000, S. 496-497.

[00sch1] Schorn, K.; Fastl, H.: Hearing impairment: Evaluation and rehabilitation. In: Auditory Worlds: SensoryAnalysis and Perception in Animals and Man (siehe [00man1]), S. 286-310.

[00see1] Seeber, B.: Zum Zusammenhang zwischen Mithörschwellen und Tuningkurven. Tagungsband DAGA2000, 26. Jahrestagung Akustik, Oldenburg, 20.-24.3.2000, S. 290-291.

[00spa1] Spannheimer, H.; Freymann, R.; Fastl, H.: An Active Absorber to Improve the Sound Quality in thePassenger Compartment of Vehicles. Proc. INTER-NOISE’2000, 28.-30.8.2000, Nizza, Frankreich. CD-ROM.

[00ter1] Terhardt, E.: Linear model of peripheral-ear transduction (PET). In: Auditory Worlds: Sensory Analysisand Perception in Animals and Man (siehe [00man1]), S. 81-89.

[00tho1] Thomae, M.; Ruske, G.; Pfau, T.: A New Approach to Discriminative Feature Extraction using ModelTransformation. Proc. ICASSP’2000, Istanbul, Türkei, 5.-9.6.2000. IEEE, Vol. 3, S. 1615-1618.

[00val1] Valenzuela, M.N.: Perceived differences and quality judgement of piano sounds. In: Auditory Worlds:Sensory Analysis and Perception in Animals and Man (siehe [00man1]), S. 268-278.

[00wie1] Wiegrebe, L; Hirsch, H.S.; Patterson, R.D.; Fastl, H.: Temporal dynamics of pitch strength in regular-interval noises: Effect of listening region and an auditory model. JASA Vol. 107, No. 6, S. 3343-3350.

Technische Universität München Institute for Human-Machine Communication Activity Report 1997-2000 p. 64

Page 68: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Imprint

Published byProf. Dr. rer. nat. Manfred K. LangLehrstuhl fürMensch-Maschine-KommunikationTechnische Universität München

D - 80290 München

Revision and LayoutClaus von RückerDecember 2000

PDF Versionrevised February 2001

Page 69: Institute for Human-Machine Communication · Published by Lehrstuhl für Mensch-Maschine-Kommunikation (Institute for Human-Machine Communication) Prof. Dr. rer. nat. Manfred K. Lang

Institute forHuman-MachineCommunication

Gabelsbergerstraße

Theresienstraße

Luis

enst

raß

e

Arc

isst

raß

e

Bar

erst

raß

e

main campus

S6

N

U

U

Königsplatz

Briennerstraße Karolinen-platz

2nd floor

www.mmk.ei.tum.de