lista workshop, edinburgh, 2-3 may 2012 slide 1 © 2012 the ...roger/publications/moore -...
TRANSCRIPT
1
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 1
Some insights into talker-listener-environment coupling, energetics and the contrastive particulate structure of
spoken language
Prof. Roger K. MooreProf. Roger K. Moore
Chair of Spoken Language Processing
Dept. Computer Science, University of Sheffield, UK
(Visiting Prof., Dept. Phonetics, University College London)
(Visiting Prof., Bristol Robotics Laboratory)
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 2
‘hyper’ s
peech
“close” .jk?Tr.
“close” .jk?Ty.
“league .kh9f.
“leak” .kh9j.
Local Timescale Variability Analysis
Moore, R. K., Russell, M. J., & Tomlinson, M. J. (1982). Locally constrained dynamic programming in automatic speech recognition, IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Paris.
2
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 3
H&H Theory
Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech Production and
Speech Modelling (pp. 403-439): Kluwer Academic Publishers.
"The speaker ... dynamically tunes the production ... for either output-oriented
control (hyperspeech) or system-oriented control (hypospeech)."
"Speakers can, and typically do, tune their performance
according to communicative and situational demands."
"What he/she needs to control is ... that their signal attributes possess sufficient
contrast ... discriminative power that is sufficient for
lexical access."
"... the lack of invariance that speech signals commonly
exhibit is a direct consequence of this adaptive organization ..."
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 4
Evidence for H&H Behaviour• People naturally tend to speak louder/differently in noise
(Lombard, 1911)
• Caregivers talk differently to children (Fernald, 1985)
• Users talk differently to machines (Moore & Morris, 1992)
• Being able to hear your own voice has a profound effect on speaking (as evidenced by the need for sidetone on a telephone)
• Hearing-impaired individuals can have great difficulty maintaining clear pronunciations (or level control)
• Delayed auditory feedback causes stuttering-like behaviour
• Altered auditory feedback evokes compensations (Munhall et al, 2009; MacDonald et al, 2011)
• People with speaking difficulties (e.g. caused by cerebral palsy) report that it takes immense effort to produce even the simplest utterance
3
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 5
Speech Energetics
Robin Hofe
Hofe, R., & Moore, R. K. (2008). Towards an investigation of speech energeticsusing 'AnTon': an
animatronic model of a human tongue and
vocal tract. Connection Science,
20(4), 319–336.
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 6
‘AnTon’ – Animatronic Tongue
4
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 7
What Principle is Operating Here?
• The effort/energy involved in production is being tradedagainst the clarity/discriminability of the end product
• Why? Because living systems have to move in order to signal, and movement costs energy
• Large movements create clear signals but use more energy (and vice versa)
• The efficient management of energy is a major evolutionary factor in the behaviour of living systems (not just for signalling)
• Sometimes large movements are necessary due to the need to overcome an obstacle in the environment
This is an ‘optimisation’
problem
This is a ‘compensation’
problem
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 8
Analogy with WalkingWhich
situations are ‘natural’?
5
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 9
Behaviour
“… principal determinant of a system is not a stimulus, an event in the past, but a future result
of the behaviour …”
Alexandrov, Y. I., & Sams, M. E. (2005). Emotion and consciousness: ends of a continuum. Cognitive Brain Research, 25, 387-405.
“feedback … is the central and determining factor in all observed behavior”
W. T. Powers (1973). Behaviour: The Control
of Perception, Aldine, Chicago.
These are examples of ‘regulatory’ processes …
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 10
A Regulatory Mechanism:Feedback Control
Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech Production and
Speech Modelling (pp. 403-439): Kluwer Academic Publishers.
6
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 11
Perceptual Control Theory (PCT)
W. T. Powers (1973). Behaviour:
The Control of
Perception, Aldine, Chicago.
Intention
RealisationSensation
Perception
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 12
Communicative Intentions?
• The ‘target’ is a perception not a signal
• So optimisation is over competing perceptions not competing signals (e.g. not signal-to-noise ratio)
• The intention is sufficient contrast at the pragmatic level (which will lead to suitable compensations at the semantic, syntactic, lexical, phonemic, phonetic and acoustic levels)
• The obstacles are …– alternative interpretations (internal)– competing signals (external)
Hawkins, S. (2003). Roles and representations of systematic fine phonetic detail in speech
understanding. Journal of
Phonetics, 31, 373-405.
“I … do … not … know”“ I do not know”“I don’t know”
“I dunno”“dunno”Z?}?}?}\
7
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 13
An Affective Model of Behaviour
Intention
Attention
Consequence
Emotion(valence)
Motivation/Effort
(arousal)Kp
Motor Command
ActionAppraisal
DiStefano III, J. J., Stubberud, A. R., & Williams, I. J. (1990). Feedback and Control Systems (2nd ed.). New York: McGraw-Hill.
Disturbance
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 14
TTS HSRs ww
dn+d
ASR
+
-
Reactive Speech Synthesis
Moore, R. K. (2007). Spoken
language processing:
piecing together the puzzle.
Speech
Communication, 49, 418-435.
This is ‘synthesis-by-recognition’
8
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 15
Reactive Speech Synthesis
FE
ED
FO
RW
AR
D
FE
ED
BA
CK
Mauro Nicolao
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 16
Reactive Speech Synthesis
Mauro Nicolao
Moore, R. K., & Nicolao, M. (2011). Reactive speech synthesis: actively managing phonetic contrast along
an H&H continuum, 17th International Congress of Phonetics Sciences (ICPhS). Hong Kong.
9
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 17
Emulation Mechanisms
CONTROLLER PLANT
EMULATOR
goalgoal behaviourcontrol signal
duplicate
control
signal
Inhibition of motor commands facilitates mental imagery and
planning
Grush, R. (2004). The emulation
theory of representation: motor control, imagery, and perception.
Behavioral and
Brain Sciences, 27, 377-442.
Forward model facilitates prediction
• Negative feedback control works OK as long as the loop delays are not too high, but neural delays can be quite significant in living systems
• Delays can be overcome using an ‘emulation’ mechanism (pseudo-closed-loop control)
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 18
Emulate ⇒ Predict ⇒ Understand• Simulations (forward models) generate top-down
expectations/predictions
• The emulation of an organism’s abilities can be recruited to emulate the behaviour of others for:– predicting their behaviour– mimicry– learning
• The (emergent) consequences would be:– an ability to interpret the behaviour of other(s) by
referencing self– an ability to estimate the ‘beliefs, desires and
intentions’ of other(s)– i.e. recognition and understanding
Wilson, M., & Knoblich, G.
(2005). The case for motor
involvement in perceiving
conspecifics. Psychological
Bulletin, 131(3), 460-473.
This is ‘recognition-by-synthesis’
10
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 19
PreSenCEPredictive Sensorimotor Control and Emulation
Moore, R. K. (2007). Spoken language processing: piecing together the puzzle.
Speech Communication, 49, 418-435.
Moore, R. K. (2007). PRESENCE: A human-inspired architecture for speech-based human-machine interaction. IEEE
Trans. Computers, 56(9), 1176-1188.
‘SbR’+
‘RbS’
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 20
PreSenCE
S:i S:mx-x -
S:E(U:m)
S:E(U:E(S:m))
S:E(U:m)
S:E(U:E(S:i))
S:E(U:i)
-
-
x
S:E(U:n)
-
S:n
-mo
tivation
feeling
sensitivity
atte
ntion
interpretation
actionneeds
nois
e, dis
tort
ion, re
action, dis
turb
ance
intention
11
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 21
PreSenCE
S:i S:mx-x -
S:E(U:m)
S:E(U:E(S:m))
S:E(U:m)
S:E(U:E(S:i))
S:E(U:i)
-
-
x
S:E(U:n)
-
S:n
-mo
tivation
feeling
sensitivity
atte
ntion
interpretation
actionneeds
nois
e, dis
tort
ion, re
action, dis
turb
ance
intention
Primary route for System’s
motor behaviour
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 22
PreSenCE
S:i S:mx-x -
S:E(U:m)
S:E(U:E(S:m))
S:E(U:m)
S:E(U:E(S:i))
S:E(U:i)
-
-
x
S:E(U:n)
-
S:n
-mo
tivation
feeling
sensitivity
atte
ntion
interpretation
actionneeds
nois
e, dis
tort
ion, re
action, dis
turb
ance
intention
Emulation of System’s
possible motor
actions
12
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 23
PreSenCE
S:i S:mx-x -
S:E(U:m)
S:E(U:E(S:m))
S:E(U:m)
S:E(U:E(S:i))
S:E(U:i)
-
-
x
S:E(U:n)
-
S:n
-mo
tivation
feeling
sensitivity
atte
ntion
interpretation
actionneeds
nois
e, dis
tort
ion, re
action, dis
turb
ance
intention System’s emulation of User’s emulation of System
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 24
PreSenCE
S:i S:mx-x -
S:E(U:m)
S:E(U:E(S:m))
S:E(U:m)
S:E(U:E(S:i))
S:E(U:i)
-
-
x
S:E(U:n)
-
S:n
-mo
tivation
feeling
sensitivity
atte
ntion
interpretation
actionneeds
nois
e, dis
tort
ion, re
action, dis
turb
ance
intention
System’s interpretation
of User behaviour
13
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 25
PreSenCE
S:i S:mx-x -
S:E(U:m)
S:E(U:E(S:m))
S:E(U:m)
S:E(U:E(S:i))
S:E(U:i)
-
-
x
S:E(U:n)
-
S:n
-mo
tivation
feeling
sensitivity
atte
ntion
interpretation
actionneeds
nois
e, dis
tort
ion, re
action, dis
turb
ance
intention
Affective signals drive the control
loop(s)
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 26
PreSenCE• Energy management
– effort– attention
• Time management– synchrony– coordination
• Entropy management– complexity– combinatorics
• Emotion management– needs/drives– personality/mood
Fundamental Dimensions of
Behaviour
14
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 27
Predict ⇒ Optimise ⇒ Survive
• Predictive models facilitate planning and optimisation of behaviour (along the dimensions of energy, time, entropy and emotion)
• Predictive models of an organism’s interaction with the environment facilitate efficient manipulation
• Predictive models of an organism’s interaction with another organism facilitate efficient communications
• The fidelity of the predictors governs the degree of efficiencies that are possible
• Matched predictors in two organisms facilitates high levels of coupling with low information rates
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 28
Enactivism
physiologyorganism
A ‘cognitive unity’(self-regulating self-generating closure)
environment
behaviournervous system
Maturana, H. R., & Varela, F. J. (1987). The Tree of Knowledge: The Biological Roots of Human Understanding. Boston, MA:
New Science Library/Shambhala Publications.
15
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 29
Enactivism
Maturana, H. R., & Varela, F. J. (1987). The Tree of Knowledge:
The Biological Roots of Human Understanding. Boston, MA: New Science Library/Shambhala Publications.
organism
environment
organism
3rd-order coupling
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 30
Emulation
Predicting other’s external behaviour
Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for
envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive
Science. Cumberland Lodge, Great Park of Windsor.
Modelling the ‘surface behaviour’ of another agent
Similar to Dennett’s ‘physical stance’
16
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 31
Empathy
Predicting other’s affective state(s)
Modelling the ‘internal states’ of
another agent
Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for
envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive
Science. Cumberland Lodge, Great Park of Windsor.
Similar to Dennett’s ‘design stance’
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 32
Theory of Mind
Predicting other’s beliefs & intentions
Modelling the ‘perspective’ of another agent
Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for
envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive
Science. Cumberland Lodge, Great Park of Windsor.
Similar to Dennett’s
‘intentional stance’
17
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 33
Languaging
Predicting other’s knowledge of self’s
knowledge of other’s knowledge of …
Modelling the ‘recursive particulate
coupling’ between one agent and
another
Moore, R. K. (2012). Extending Maturana and Varela's symbolic representation of autopoiesis to create a rich visual language for
envisioning a wide range of enactive systems with different degrees of complexity, Foundations of Enactive Cognitive
Science. Cumberland Lodge, Great Park of Windsor.
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 34
Predictive Coding
… then only the difference between the
intentions and the predictions need to be sent
(thereby minimising the entropy)
If a sender knows that the receiver has a predictive model of the talker, …
18
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 35
The Listening Talkerdeclarative
change state of other
interrogative
change state of self
change state of world
imperative
dialogue
conversational interaction
These are being optimised by the talker
These are being optimised by the talker
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 36
Would we study walking by suspending someone in the air
and asking them to walk?
No? So why do we put people in a recording booth
and ask them to speak?
In both cases the subject is obliged to imagine a crucial conditioning aspect of their behaviour
And Finally …
An appropriate experimental methodology
is the key to future progress
19
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 37
Thanks for Listeningto this Talker
http://www.dcs.shef.ac.uk/~roger
© 2012 The University of Sheffield
LISTA Workshop, Edinburgh, 2-3 May 2012 slide 38
Universals in Vocal Behaviour ?
http://www.youtube.com/watch?v=E4aMHK7AH5M