geppeto 1 : a modeling approach to study the production of speech gestures
DESCRIPTION
GEPPETO 1 : A modeling approach to study the production of speech gestures. Pascal Perrier (ICP – Grenoble) with Stéphanie Buchaillard (PhD) Matthieu Chabanas (ICP) Ma Liang (PhD), Yohan Payan (TIMC – Grenoble). - PowerPoint PPT PresentationTRANSCRIPT
GEPPETO1: A modeling approach to study
the production of speech gesturesPascal Perrier (ICP – Grenoble)
withStéphanie Buchaillard (PhD)
Matthieu Chabanas (ICP)Ma Liang (PhD),
Yohan Payan (TIMC – Grenoble)1 GEstures shaped by the Physics and by a
PErceptually oriented Targets Optimization
Outline• Introduction• Current hypotheses implemented
in GEPPETO• Some results obtained with a 2D
biomechanical tongue model• New issues raised by the use of 3D
biomechanical tongue model
Basic issuesin Speech Production
Research• Phonology/Phonetics Interface
– Link between discrete representations and continuous physical signals
– Nature of physical correlates of speech units
Basic issuesin Speech Production
Research• Control and Production of Speech
Gestures– Control variables– Central representations of physical
characteristics of the speech production apparatus
– Interaction Perception-Action
Basic issuesin Speech Production
Research• From Gestures to Speech Sounds
– Nature of acoustic sources– Relations between motor commands
and acoustics– Interaction between airflow and
articulatory gestures.
What is GEPPETO?• An evolutive modeling framework to
quantitatively test hypotheses about the control and the production of speech gestures.
• It includes– Hypotheses about the physical correlates of
phonological units.– Models of motor control– Physical models of the speech production
apparatus
Current Hypotheses• Phonology/Phonetic Interface
– The smallest phonological unit is the phoneme
– Phonemes are associated with target regions in the auditory domain
– Larger phonological units are associated with speech sequences for which specific constraints exist for target optimization or for motor commands sequencing
Current Hypotheses• Control of speech gestures
– Control variables: commands (EP Hypothesis, Feldman, 1966)
– No on line use of feedback going through the cortex.
– Short-delay orosensory and proprioceptive feedbacks are taken into account.
– Existence in the brain of internal representations of the speech apparatus (internal models).
Current Hypotheses• Control of speech gestures
– Internal representations do not account for the whole physical complexity of the speech production apparatus
– Kinematic characteristics are not directly controlled. They are the results of the interaction between motor control setups and physical phenomena of speech production
• Which characteristics of speech signals are specifically controlled?
Application to the generation of speech gestures with a 2 D biomechanical tongue model• Implementation of the model of
control• Inversion from desired perceptual
objectives to motor commands• Generation of gestures
2D Biomechanical Model• Finite element structure• Linear elasticity (small
deformations)• No account of the gravity
2D Biomechanical Model
Posterior genioglossus Anterior Genioglossus Hyoglossus
2D Biomechanical Model
VerticalisStyloglossus Inferior Longitudinalis
Learning a static internal model
From commands to formants
Step 1:- Uniform sampling ofthe commands space-Generation of thecorresponding tongueshapes. 9000 simulations
Learning a static internal model
From commands to formantsStep 2: Computation of the area function.
Step 3: Formants computation for 2 lip apertures (red dots: spread lips; blue dots: rounded lips)
Learning a static internal model
From commands to formants
Learning a static internal model
From commands to formantsStep 4: Learning and generalizing with radial basis functions
2nd couche
p neurones
1ere couche
m neurones
...
...
++
+
F
+ F
+ F
X1
Xn
...Y1
Yp
1W11
...
1Wnm
1W1m
2W11
2Wmp
...
1st layer 2nd layer
InversionFrom target regions to commands
Target regions for some non rounded French phonemes
Target regions•Dispersion ellipses in the (F1, F2, F3) space• Currently defined by Fc1, Fc2, Fc3 and F1, F2, F3
InversionFrom target regions to commands
Target regions•Dispersion ellipses in the (F1, F2, F3) space• Currently defined by Fc1, Fc2, Fc3 and F1, F2, F3
Target regions for some non rounded French phonemes
InversionFrom target regions to commands
+
Cost for a sequence made of N phonemes
with
OptimizationCost minimization (Gradient descent technique)
Speaker oriented Listener oriented
InversionFrom target regions to commands
Example 1 Sequence [œ-e-
k-i]
Example 2 Sequence [œ-e-
k-a]
InversionFrom target regions to commands
Production of tongue movements from inferred
commandsSerial command patterns
No difference between vowels and consonants
[oe] [e] [k] [a]
Execution of tongue movements from inferred commandsÖhman’s model: Vowel-to-Vowel basis
Consonants are seen as perturbation of V-V
[oe] [e] [k] [a]
Observed flesh point
Execution of tongue movements from inferred
commands
Production of tongue movements from inferred
commandsSerial command patterns [a]
[i]
Production of tongue movements from inferred
commandsÖhman’s command patterns [a]
[i]
R. Houde (1969)[aka] [ika]
Example: the Articulatory loops
Interaction control / physics.Influence on the shapes of the articulatory paths
Fluid-Wall Interaction
Forces Mechanics of the tissues.Finite element model)
Flow model
Imposedpressure
difference
Deformation
40 50 60 70 80 90 100 110 120
90
95
100
105
110
115
120
Deplacement X - Y
X - mm
Y -
mm
+++ PS = 3000 Pa...... PS = 800 Pa-------No aerodynamics
[aka]
Example: the Articulatory loops
Interaction control / physics.Influence on the shapes of the articulatory paths
[aka]No aerodynamics With aerodynamics
Interaction control / physics.Influence on the shapes of the articulatory paths
Example: the Articulatory loops
61 62 63 64 65 66 67107
108
109
110
111
112
113Deplacement X - Y
X - mm
Y -
mm
... PS = 1600 Pa ---- No aerodynamics
[ika]
Interaction control / physics.Influence on the shapes of the articulatory paths
Example: the Articulatory loops
[ika]No aerodynamics With aerodynamics
Interaction control / physics.Influence on the shapes of the articulatory paths
Example: the Articulatory loops
A 3D biomechanical tongue model:
For a better account of physics• Visible Human Project ® data
(Wilhelms-Tricarico, 2003)• Finite Element Mesh made of Hexahedres• Adaptation of the mesh to a specific speaker (PB)
Wilhelms-Tricarico R.,1995 Gerard et al., ICP Grenoble
Inner muscle structure of the tongue
Genioglossus (medium)Genioglossus (anterior)Styloglossus GeniohyoidGenioglossus (posterior)HyoglossusVerticalisTransversusInferior longitudinalisMylohyoidSuperior longitudinalis
Vocal tract structure
HYOID BONE
MANDIBLE
PALATE
OTHER MUSCLES
TONGUE’S BODY
Linear Non Linear
Displacement
0 Force
Tongue Indentator
Elastical properties of tongue muscles
• Hyperelastic material (2nd order Yeoh model) with large deformation hypothesis
Effect of gravity
[1s]
[300ms]
Dealing with gravity with the EP hypothesis
Dealing with gravity with the EP hypothesis
•Activation of GGp and MH
Increase of reflex activity
[300ms]
Dealing with gravity with the EP hypothesis
GGP activation
[300ms]
Dealing with gravity with the EP hypothesis
Example of a good choice of control parameters
Conclusions• A model of control based on perceptual
objectives specified in terms of formants target regions associated with motor commands and on an optimization process using a static model of the motor-perception relations can generate realistic speech movements if it is applying to a realistic physical model of speech production.
Conclusions• It supports our hypothesis that
there is not need to assume the existence of a central optimization process that would apply to the articulatory trajectories in their whole (i.e. minimum of jerk, minimum of torque…)
Conclusions• It gives an interesting account of
coarticulation phenomena by separating the effects of planning and those of physics.
• It permits to test hypotheses about the phonological units (see serial model versus Öhman’s model).
ConclusionsHowever• a systematic comparison with data
is required (currently in progress for French, German, Chinese, Japanese)
• No account for time control, or for hypo/hyperspeech
• No account for gravity
Conclusions• Necessity to work on a more
complex internal representations that would integrate some aspects of articulatory dynamics.
Influence of elasticity modelingHyperelasti
c
Small defo.Linear
Large defo.Linear
Activation of the Hyoglossus (2N)
EP Hypothesis(Feldman, 1966)
Perrier, Ostry, Laboissière, 1996
EP Hypothesis (Feldman, 1966)
Perrier, Ostry, Laboissière, 1996
Static Internal Models
Peripheral motor system
Formants
Direct Model
yi(t)2nd couche
p neurones
1ere couche
m neurones
...
...
++
+
F
+ F
+ F
X1
Xn
...Y1
Yp
1W11
...
1Wnm
1W1m
2W11
2Wmp
...
Desiredformants
Inverse Model
d
2nd couche
p neurones
1ere couche
m neurones
...
...
++
+
F
+ F
+ F
X1
Xn
...Y1
Yp
1W11
...
1Wnm
1W1m
2W11
2Wmp
...
Central Nervous System