Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
Speech production at theneuro-muscular level
Ohman, S. and Persson, A. and Leanderson,R.
journal: STL-QPSRvolume: 8number: 2-3year: 1967pages: 015-019
http://www.speech.kth.se/qpsr
11. SPEECH PRODUCTION
A. SPEECH PRODUCTION 1-T THE NETJRO-MUSCULAR LEVEL* -
The p,ionological output of a generative g r a m m a r may be regarded
a s an abstract description of a se t of input commands to a speech syn-
thesizer . This synthesizer should be a t rue model of human speech
production. The question a s to what fo rm the outputs of the phono-
logical component should take i s thus dependent on the nature of the
parameters controlling the synthesizer.
Modern acoustic phonetics has provided a f i r m understanding of
the physical bas is relating articulatory configurations to the resulting
sound p res su re wave. On the other hand, r e sea rch at this level has
demonstrated an overwhelming inconstancy in the acoustic cor re la tes
of the phonological invariants. Studies of articulatory dynamics sug-
gest , however, that much of this variability may be due to built -in
physiological propert ies of the articulatory organs and of the neural c i r -
cuits controlling them. A speech synthesizer fo r phonology should
thus incorporate a neuro-motor level exhibiting these properties.
An attempt to devise a par t ia l model of this so r t is summarized i n
Fig. II-A-1. The upper left co rne r shows schematized sound spectro-
g rams of the VCV utterances /#:g#:/, /b:ga :/, /a :gjd:/, and /a :ga :/ a s spoken by a male Swedish talker. These data have been extracted
(1) f rom a l a r g e r study described elsewhere ,
Note that the formant transit ions f rom the initial /b / into the me-
dial /g/ a r e different when /b/ and /a / occupy the final position of the
utterance. Similarly, the transit ions f r o m the medial /g/ into the
final /a/ a r e different when / b / and /a/ occupy the initial position of ( 2 ) the utterance. Thus, a s was also observed by Menzerath and Lacerda ,
the production of an intervocalic consonant i s great ly modified by the
vowel context. In fact, the variability of the formant transit ions con-
tained i n the initial vowel suggests that the speaker s t a r t s a ges ture
towards the final vowel while he is making the consonant gesture. It
is a s if the consonant gesture i s superimposed on a diphthong movement.
* Paper read at the VIth International Congress of Phonetic Sciences, Prague, Sept. 7-13, 1967
** Central Neurophysiological Laboratory, Karolins ka Sjukhuset, Stockholm
*** Phoniatric Clinic, Karolinska Sjukhuset, Stockholm
STL-QPSR 2-3/1967 16.
The block d iagram of the lower par t of Fig. 11-A-1 summarizes a
s t rategy for the synthesis of t ime varying vocal t r ac t shapes that re -
produce coarticulation. This strategy has been implemented on a
digital computer and tested against data collected f r o m X-ray motion
pictures of a human talker. Without going into details that have been
published elsewhere ( 3 ) , I should like only to draw attention to the
separate representations of the commands fo r the timing of the vowels
and the consonants, and the commands specifying which consonants
and vowels a r e to be synthesized. Target configurations fo r the initial
vowel, the medial consonant, and the final vowel a r e fed to the model
in the f o r m of se t s of numbers representing distances along the coordi-
nate l ines shown i n the upper right corner of the figure. The consonant
and vowel timing pulses shown on the left of the block d iagram a r e then
passed through the smoothing f i l te rs and enter the coarticulation model
a s one-dimensional signals marked K(t) and Q(t). A VCV ges ture com-
plex is then calculated according to the formulas shown below the block
diagram. In this process the vowel diphthong gesture is governed by
Q(t) and the superimposed consonant ges ture is governed by K(t). The
calculation is done i n such a way that, a t the moment of consonantal
c losure, a residue of the underlying vowel ges ture is always present
i n the vocal-tract configuration, so that the consonant becomes colored
by the vowel environment. Hence variance i s reproduced a t the output
while invariance is preserved a t the input.
The articulatory mechanisms enclosed by dashed lines in Fig.
11-A- 1 have been represented by a single "black box" in Fig. 11-A-2.
Here, again, the timing commands a r e fed over two separate channels
corresponding to consonants and vowels, and they a r e responsible to-
gether fo r the temporal integration of the syllable. The feature com-
mands, on the other hand, specify the target configurations that the
initial, medial, and final ges tures of the syllable a r e aiming at.
In t e r m s of this model the phonetic variance of phonological enti-
t i es a s observed a t the acoustic level may be related to th ree types
of physiological processes: coarticulation, undershoot, and reorgani-
zation.
We have already discussed coarticulation and this concept is i l lus-
t ra ted again i n the uppermost par t of Fig. 11-A-2. Coarticulation r e -
sults when the ar t iculators a r e moving i n response to distinct but tem-
porally ove r-lapping commands.
CAUSES OF PHONETIC VARIABILITY
COMMAND VOWEL j-'J, ARTICULA - OUTPUT GESTURE
COMMAND MECHANISMS - time 7-7-T
UNDER-SHOOT
CONSONANT^ [ I - '
COMMAND - - VOWEL COMMAND 7
ARTICULA -
MECHANISMS
REORGANIZATION
VOWEL COMMAND -----"--------
ARTICULA -
MECHANISMS R, z 0 2
s + z W 3 0 W In
t
OUTPUT GESTURE
OUTPUT GESTURE
Fig. 11-A-2.
STL-QPSR 2-3/1967 17.
Undershoot i s indicated in the middle part of the figure by the
difference i n timing of the second consonant pulse. Undershoot thus
results when an incomplete articulatory gesture i s interrupted by a
neural command that brings about the next gesture of the utterance.
Examples of undershoot a r e found in the neutralization of vowels and
consonants under increased rates of speech, a s demonstrated by
Lindblom (4). This type of variance may be accounted for quantita-
tively in t e rms of the numerical model discussed i n connection with
Fig. 11-A-1.
Reorganization, finally, i s the result of a context dependent change
of the feature specification, a s illustrated in the bottom part of Fig.
11-A-2. This sor t of effect i s found, f o r example, i n the devoicing of
final voiced consonants in German and Russian, and i n the quality al-
ternations of vowels under vowel-harmony in a great many languages.
With respect to the underlying neural control there i s thus an es-
sential difference between the phonetic variance due to reorganization
on the one hand and the variance typified by coarticulation and under-
shoot on the other hand. The sequence of feature specifications may
be viewed a s a phonological signal that modulates a phonetic c a r r i e r
consisting of the standard timing pattern of the syllable. F rom this
point of view reorganization i s a perturbation of the modalating signal
while coarticulation and undershoot a r e due to the s tructure of the
ca r r i e r .
The model discussed so f a r has grown out of analyses of acoustic
records and X-ray motion pictures. It i s therefore of considerable
interest to compare the picture of speech production summarized by
the model with data obtained at the peripheral neural level by means
of electromyographic methods. W e shall here examine a few examples
f rom a study i n which this concentric needle electrodes were used to
record the motor unit activity f rom the facial muscles of a Swedish
subject.
The top part of Fig. 11-A-3 shows a t race f rom a muscle that lifts
the upper lip. In this record each motor unit potential i s represented
by a vertical line that incidates the amplitude of the spike. Thc VCV
utterance /y:hi:/ i s embedded in the f rame /seja d3:hf:dare-/.
t I - I l & J & & I l l :
LEV LAB. SUF!
LEV LAB. SUP rr' ,
1 I l l # I I I I I . . . . I I
ORB. INF:
Fig. 11-A-3.
STL-QPSR 2-3/1967 18.
Note that this muscle i s tonically active between the utterances of
the l is t that the subject read in the recording session. During the ut-
terances, however, this background activity i s depressed o r enhanced
in synchrony with the rounding and spreading gestures of the lips.
This behavior i s typical and shows that in speech the articulators
assume a basic and apparently fixed posture on top of which excitatory
and inhibitory motor commands a r e superposed a s a modulating signal.
This phonetic modulation of the basic speech posture occurs, of course,
a t a lower level than the phonological modulation of the syllable timing
ca r r i e r , discussed earl ier .
The phonetic modulation i s also shown in the lower part of Fig.
XI-A-3. The upper t race derives f r om the muscle just mentioned that
lifts the upper lip, and the lower t race was picked up f rom the muscle
that shortens and protrudes the lower lip. The utterance contains the
VCV sequence /y:hu:/.
Note the reciprocal nature of these two signals. Whenever the I
lower t r ace shows activity for rounding the upper t race i s depressed,
and vice versa. In this way the labial configuration a s a whole i s made
to fluctuate about a constant average posture, so to speak.
Do we find invariance of motor commands at the peripheral neural
level? Fig. 11-A-4 gives a negative answer to this question. The up-
per t race shows the motor unit activity of the lower l ip muscle referred
to earl ier . Here the VCV sequence /y:hu:/ i s embedded in the stand-
ard frame. In the utterance of the lower t race, recorded f rom the same
lower lip muscle, the vowels of the VCV sequence have the opposite
order , /u:hy:/. The degree of protrusion for these two vowels is in-
dicated schematically by the straight lines below the electromyographic
records.
When the vowel /u/ i s preceded by the l e ss protruded vowels of the
frame, a transitional overshoot appears at the beginning of the motor
command. This overshoot is absent when /u/ follows the vowel /y/
which i s more protruded. Hence the commands a r e not invariant. The
steady state activity levels of these vowels s eem to be l e ss variable,
however.
It i s evident that the peripheral motor commands a r e calculated by
the brain not only with respec t to the constant effort necessary to main-
tain a target configuration, but also with respect to the variable effort
EMG I1 I ORB. INF I I
PROTRUSION
ORB. INF:
PROTRUSION 7
Fig. 11-A-4.
PHONOLOGICAL 1 SYLLABI- FEATURE F I CAT ION SEQUENCE MECHANISMS
STAGES OF MODULATION IN SPEECH PRODUCTION
SYLLABLE TIMING
CARRIER
BASK: SPEECH POSTURE
MOTOR COMMAND
ACOUSTIC SOUND
SOURCES
SOUND PRESSURE WAVE
PERIPHERAL NEUROMOTOR MECHANISMS
Fig. 11-A-5.
ARTICULA - ' TORY GESTURE
ACOUSTIC VOCALTRACT MECHANISMS
STL-QPSR 2-3/1967 19.
needed to - move the ar t iculators f r o m wherever they a r e to the desired
target. It is quite likely that the l a s t mentioned phase of the calcula-
tion takes place a t a ra ther per ipheral level involving feedback f rom the
many sensory receptors in the o r a l region.
To s u m up, a model of speech production - adequate for phonologic-
a l purposes - should incorporate a sequence of s tages of modulation
a s suggested i n Fig. 11-A-5. As was emphasized by Dudley (5) the
acoustic sound p res su re wave resul ts f r o m the modulation of quasi-
periodic o r noisy sound sources by the relatively slow ar t iculatory
gestures . The la t te r gesture sequence comes about through the modula-
tion of a basic speech posture by a sequence of excitatory and inhibi-
tory peripheral commands. These commands, finally, may be regarded
a s the outputs of a se t of more-central neural c i rcui ts through which a
complex timing c a r r i e r is channeled by means of a cer ta in phonological
signal. This signal determines specific phonetic features of the var i-
ous phases of the resulting syllables.
References
(1) ahman, S. E. G. : "Coarticulation i n VCV Utterances: Spectro- . .
graphic Measurements", J. Acoust. Soc. Am. , - 39 (1 966), pp. 151-168.
(2) Menzerath, P. and de Lacerda, A. : Koartikulation, Steuerung und Lautabgrenzung (Berlin 1933).
( 3 ) ahman, S. E. G. : "Numerical Model of Coarticulation", J.Acoust.Soc.Am., - 41 (1967), pp. 310-320.
(4) Lindblom, B. : "Spectrographic Study of Vowel Reduction", J. Acoust. Soc. Am., - 35 (1963), pp. 1773- 1781.
(5) Dudley, H. : "The C a r r i e r Nature of Speech", Bell System Techn. J . , 19 (1940), pp. 495-515. -