lista workshop, edinburgh, 2-3 may 2012 slide 1 © 2012 the ...roger/publications/moore -...

19
1 © 2012 The University of Sheffield LISTA Workshop, Edinburgh, 2-3 May 2012 slide 1 Some insights into talker-listener- environment coupling, energetics and the contrastive particulate structure of spoken language Prof. Roger K. Moore Prof. Roger K. Moore Chair of Spoken Language Processing Dept. Computer Science, University of Sheffield, UK (Visiting Prof., Dept. Phonetics, University College London) (Visiting Prof., Bristol Robotics Laboratory) © 2012 The University of Sheffield LISTA Workshop, Edinburgh, 2-3 May 2012 slide 2 ‘hyper’ speech “close” .jk?Tr. “close” .jk?Ty. “league .kh9f. “leak” .kh9j. Local Timescale Variability Analysis Moore, R. K., Russell, M. J., & Tomlinson, M. J. (1982). Locally constrained dynamic programming in automatic speech recognition, IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Paris.

Upload: lybao

Post on 11-Nov-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

1

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 1

Some insights into talker-listener-environment coupling, energetics and the contrastive particulate structure of

spoken language

Prof. Roger K. MooreProf. Roger K. Moore

Chair of Spoken Language Processing

Dept. Computer Science, University of Sheffield, UK

(Visiting Prof., Dept. Phonetics, University College London)

(Visiting Prof., Bristol Robotics Laboratory)

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 2

‘hyper’ s

peech

“close” .jk?Tr.

“close” .jk?Ty.

“league .kh9f.

“leak” .kh9j.

Local Timescale Variability Analysis

Moore, R. K., Russell, M. J., & Tomlinson, M. J. (1982). Locally constrained dynamic programming in automatic speech recognition, IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Paris.

2

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 3

H&H Theory

Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech Production and

Speech Modelling (pp. 403-439): Kluwer Academic Publishers.

"The speaker ... dynamically tunes the production ... for either output-oriented

control (hyperspeech) or system-oriented control (hypospeech)."

"Speakers can, and typically do, tune their performance

according to communicative and situational demands."

"What he/she needs to control is ... that their signal attributes possess sufficient

contrast ... discriminative power that is sufficient for

lexical access."

"... the lack of invariance that speech signals commonly

exhibit is a direct consequence of this adaptive organization ..."

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 4

Evidence for H&H Behaviour• People naturally tend to speak louder/differently in noise

(Lombard, 1911)

• Caregivers talk differently to children (Fernald, 1985)

• Users talk differently to machines (Moore & Morris, 1992)

• Being able to hear your own voice has a profound effect on speaking (as evidenced by the need for sidetone on a telephone)

• Hearing-impaired individuals can have great difficulty maintaining clear pronunciations (or level control)

• Delayed auditory feedback causes stuttering-like behaviour

• Altered auditory feedback evokes compensations (Munhall et al, 2009; MacDonald et al, 2011)

• People with speaking difficulties (e.g. caused by cerebral palsy) report that it takes immense effort to produce even the simplest utterance

3

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 5

Speech Energetics

Robin Hofe

Hofe, R., & Moore, R. K. (2008). Towards an investigation of speech energeticsusing 'AnTon': an

animatronic model of a human tongue and

vocal tract. Connection Science,

20(4), 319–336.

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 6

‘AnTon’ – Animatronic Tongue

4

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 7

What Principle is Operating Here?

• The effort/energy involved in production is being tradedagainst the clarity/discriminability of the end product

• Why? Because living systems have to move in order to signal, and movement costs energy

• Large movements create clear signals but use more energy (and vice versa)

• The efficient management of energy is a major evolutionary factor in the behaviour of living systems (not just for signalling)

• Sometimes large movements are necessary due to the need to overcome an obstacle in the environment

This is an ‘optimisation’

problem

This is a ‘compensation’

problem

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 8

Analogy with WalkingWhich

situations are ‘natural’?

5

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 9

Behaviour

“… principal determinant of a system is not a stimulus, an event in the past, but a future result

of the behaviour …”

Alexandrov, Y. I., & Sams, M. E. (2005). Emotion and consciousness: ends of a continuum. Cognitive Brain Research, 25, 387-405.

“feedback … is the central and determining factor in all observed behavior”

W. T. Powers (1973). Behaviour: The Control

of Perception, Aldine, Chicago.

These are examples of ‘regulatory’ processes …

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 10

A Regulatory Mechanism:Feedback Control

Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech Production and

Speech Modelling (pp. 403-439): Kluwer Academic Publishers.

6

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 11

Perceptual Control Theory (PCT)

W. T. Powers (1973). Behaviour:

The Control of

Perception, Aldine, Chicago.

Intention

RealisationSensation

Perception

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 12

Communicative Intentions?

• The ‘target’ is a perception not a signal

• So optimisation is over competing perceptions not competing signals (e.g. not signal-to-noise ratio)

• The intention is sufficient contrast at the pragmatic level (which will lead to suitable compensations at the semantic, syntactic, lexical, phonemic, phonetic and acoustic levels)

• The obstacles are …– alternative interpretations (internal)– competing signals (external)

Hawkins, S. (2003). Roles and representations of systematic fine phonetic detail in speech

understanding. Journal of

Phonetics, 31, 373-405.

“I … do … not … know”“ I do not know”“I don’t know”

“I dunno”“dunno”Z?}?}?}\

7

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 13

An Affective Model of Behaviour

Intention

Attention

Consequence

Emotion(valence)

Motivation/Effort

(arousal)Kp

Motor Command

ActionAppraisal

DiStefano III, J. J., Stubberud, A. R., & Williams, I. J. (1990). Feedback and Control Systems (2nd ed.). New York: McGraw-Hill.

Disturbance

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 14

TTS HSRs ww

dn+d

ASR

+

-

Reactive Speech Synthesis

Moore, R. K. (2007). Spoken

language processing:

piecing together the puzzle.

Speech

Communication, 49, 418-435.

This is ‘synthesis-by-recognition’

8

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 15

Reactive Speech Synthesis

FE

ED

FO

RW

AR

D

FE

ED

BA

CK

Mauro Nicolao

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 16

Reactive Speech Synthesis

Mauro Nicolao

Moore, R. K., & Nicolao, M. (2011). Reactive speech synthesis: actively managing phonetic contrast along

an H&H continuum, 17th International Congress of Phonetics Sciences (ICPhS). Hong Kong.

9

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 17

Emulation Mechanisms

CONTROLLER PLANT

EMULATOR

goalgoal behaviourcontrol signal

duplicate

control

signal

Inhibition of motor commands facilitates mental imagery and

planning

Grush, R. (2004). The emulation

theory of representation: motor control, imagery, and perception.

Behavioral and

Brain Sciences, 27, 377-442.

Forward model facilitates prediction

• Negative feedback control works OK as long as the loop delays are not too high, but neural delays can be quite significant in living systems

• Delays can be overcome using an ‘emulation’ mechanism (pseudo-closed-loop control)

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 18

Emulate ⇒ Predict ⇒ Understand• Simulations (forward models) generate top-down

expectations/predictions

• The emulation of an organism’s abilities can be recruited to emulate the behaviour of others for:– predicting their behaviour– mimicry– learning

• The (emergent) consequences would be:– an ability to interpret the behaviour of other(s) by

referencing self– an ability to estimate the ‘beliefs, desires and

intentions’ of other(s)– i.e. recognition and understanding

Wilson, M., & Knoblich, G.

(2005). The case for motor

involvement in perceiving

conspecifics. Psychological

Bulletin, 131(3), 460-473.

This is ‘recognition-by-synthesis’

10

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 19

PreSenCEPredictive Sensorimotor Control and Emulation

Moore, R. K. (2007). Spoken language processing: piecing together the puzzle.

Speech Communication, 49, 418-435.

Moore, R. K. (2007). PRESENCE: A human-inspired architecture for speech-based human-machine interaction. IEEE

Trans. Computers, 56(9), 1176-1188.

‘SbR’+

‘RbS’

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 20

PreSenCE

S:i S:mx-x -

S:E(U:m)

S:E(U:E(S:m))

S:E(U:m)

S:E(U:E(S:i))

S:E(U:i)

-

-

x

S:E(U:n)

-

S:n

-mo

tivation

feeling

sensitivity

atte

ntion

interpretation

actionneeds

nois

e, dis

tort

ion, re

action, dis

turb

ance

intention

11

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 21

PreSenCE

S:i S:mx-x -

S:E(U:m)

S:E(U:E(S:m))

S:E(U:m)

S:E(U:E(S:i))

S:E(U:i)

-

-

x

S:E(U:n)

-

S:n

-mo

tivation

feeling

sensitivity

atte

ntion

interpretation

actionneeds

nois

e, dis

tort

ion, re

action, dis

turb

ance

intention

Primary route for System’s

motor behaviour

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 22

PreSenCE

S:i S:mx-x -

S:E(U:m)

S:E(U:E(S:m))

S:E(U:m)

S:E(U:E(S:i))

S:E(U:i)

-

-

x

S:E(U:n)

-

S:n

-mo

tivation

feeling

sensitivity

atte

ntion

interpretation

actionneeds

nois

e, dis

tort

ion, re

action, dis

turb

ance

intention

Emulation of System’s

possible motor

actions

12

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 23

PreSenCE

S:i S:mx-x -

S:E(U:m)

S:E(U:E(S:m))

S:E(U:m)

S:E(U:E(S:i))

S:E(U:i)

-

-

x

S:E(U:n)

-

S:n

-mo

tivation

feeling

sensitivity

atte

ntion

interpretation

actionneeds

nois

e, dis

tort

ion, re

action, dis

turb

ance

intention System’s emulation of User’s emulation of System

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 24

PreSenCE

S:i S:mx-x -

S:E(U:m)

S:E(U:E(S:m))

S:E(U:m)

S:E(U:E(S:i))

S:E(U:i)

-

-

x

S:E(U:n)

-

S:n

-mo

tivation

feeling

sensitivity

atte

ntion

interpretation

actionneeds

nois

e, dis

tort

ion, re

action, dis

turb

ance

intention

System’s interpretation

of User behaviour

13

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 25

PreSenCE

S:i S:mx-x -

S:E(U:m)

S:E(U:E(S:m))

S:E(U:m)

S:E(U:E(S:i))

S:E(U:i)

-

-

x

S:E(U:n)

-

S:n

-mo

tivation

feeling

sensitivity

atte

ntion

interpretation

actionneeds

nois

e, dis

tort

ion, re

action, dis

turb

ance

intention

Affective signals drive the control

loop(s)

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 26

PreSenCE• Energy management

– effort– attention

• Time management– synchrony– coordination

• Entropy management– complexity– combinatorics

• Emotion management– needs/drives– personality/mood

Fundamental Dimensions of

Behaviour

14

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 27

Predict ⇒ Optimise ⇒ Survive

• Predictive models facilitate planning and optimisation of behaviour (along the dimensions of energy, time, entropy and emotion)

• Predictive models of an organism’s interaction with the environment facilitate efficient manipulation

• Predictive models of an organism’s interaction with another organism facilitate efficient communications

• The fidelity of the predictors governs the degree of efficiencies that are possible

• Matched predictors in two organisms facilitates high levels of coupling with low information rates

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 28

Enactivism

physiologyorganism

A ‘cognitive unity’(self-regulating self-generating closure)

environment

behaviournervous system

Maturana, H. R., & Varela, F. J. (1987). The Tree of Knowledge: The Biological Roots of Human Understanding. Boston, MA:

New Science Library/Shambhala Publications.

15

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 29

Enactivism

Maturana, H. R., & Varela, F. J. (1987). The Tree of Knowledge:

The Biological Roots of Human Understanding. Boston, MA: New Science Library/Shambhala Publications.

organism

environment

organism

3rd-order coupling

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 30

Emulation

Predicting other’s external behaviour

Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for

envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive

Science. Cumberland Lodge, Great Park of Windsor.

Modelling the ‘surface behaviour’ of another agent

Similar to Dennett’s ‘physical stance’

16

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 31

Empathy

Predicting other’s affective state(s)

Modelling the ‘internal states’ of

another agent

Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for

envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive

Science. Cumberland Lodge, Great Park of Windsor.

Similar to Dennett’s ‘design stance’

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 32

Theory of Mind

Predicting other’s beliefs & intentions

Modelling the ‘perspective’ of another agent

Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for

envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive

Science. Cumberland Lodge, Great Park of Windsor.

Similar to Dennett’s

‘intentional stance’

17

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 33

Languaging

Predicting other’s knowledge of self’s

knowledge of other’s knowledge of …

Modelling the ‘recursive particulate

coupling’ between one agent and

another

Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for

envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive

Science. Cumberland Lodge, Great Park of Windsor.

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 34

Predictive Coding

… then only the difference between the

intentions and the predictions need to be sent

(thereby minimising the entropy)

If a sender knows that the receiver has a predictive model of the talker, …

18

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 35

The Listening Talkerdeclarative

change state of other

interrogative

change state of self

change state of world

imperative

dialogue

conversational interaction

These are being optimised by the talker

These are being optimised by the talker

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 36

Would we study walking by suspending someone in the air

and asking them to walk?

No? So why do we put people in a recording booth

and ask them to speak?

In both cases the subject is obliged to imagine a crucial conditioning aspect of their behaviour

And Finally …

An appropriate experimental methodology

is the key to future progress

19

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 37

Thanks for Listeningto this Talker

http://www.dcs.shef.ac.uk/~roger

© 2012 The University of Sheffield

LISTA Workshop, Edinburgh, 2-3 May 2012 slide 38

Universals in Vocal Behaviour ?

http://www.youtube.com/watch?v=E4aMHK7AH5M