Download - Speech production at the neuro-muscular · PDF fileSpeech production at the neuro-muscular level ... The p,ionological output of a generative grammar may be ... the vowels of the VCV

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

Speech production at theneuro-muscular level

Ohman, S. and Persson, A. and Leanderson,R.

journal: STL-QPSRvolume: 8number: 2-3year: 1967pages: 015-019

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

11. SPEECH PRODUCTION

A. SPEECH PRODUCTION 1-T THE NETJRO-MUSCULAR LEVEL* -

The p,ionological output of a generative g r a m m a r may be regarded

a s an abstract description of a se t of input commands to a speech syn-

thesizer . This synthesizer should be a t rue model of human speech

production. The question a s to what fo rm the outputs of the phono-

logical component should take i s thus dependent on the nature of the

parameters controlling the synthesizer.

Modern acoustic phonetics has provided a f i r m understanding of

the physical bas is relating articulatory configurations to the resulting

sound p res su re wave. On the other hand, r e sea rch at this level has

demonstrated an overwhelming inconstancy in the acoustic cor re la tes

of the phonological invariants. Studies of articulatory dynamics sug-

gest , however, that much of this variability may be due to built -in

physiological propert ies of the articulatory organs and of the neural c i r -

cuits controlling them. A speech synthesizer fo r phonology should

thus incorporate a neuro-motor level exhibiting these properties.

An attempt to devise a par t ia l model of this so r t is summarized i n

Fig. II-A-1. The upper left co rne r shows schematized sound spectro-

g rams of the VCV utterances /#:g#:/, /b:ga :/, /a :gjd:/, and /a :ga :/ a s spoken by a male Swedish talker. These data have been extracted

(1) f rom a l a r g e r study described elsewhere ,

Note that the formant transit ions f rom the initial /b / into the me-

dial /g/ a r e different when /b/ and /a / occupy the final position of the

utterance. Similarly, the transit ions f r o m the medial /g/ into the

final /a/ a r e different when / b / and /a/ occupy the initial position of ( 2 ) the utterance. Thus, a s was also observed by Menzerath and Lacerda ,

the production of an intervocalic consonant i s great ly modified by the

vowel context. In fact, the variability of the formant transit ions con-

tained i n the initial vowel suggests that the speaker s t a r t s a ges ture

towards the final vowel while he is making the consonant gesture. It

is a s if the consonant gesture i s superimposed on a diphthong movement.

* Paper read at the VIth International Congress of Phonetic Sciences, Prague, Sept. 7-13, 1967

** Central Neurophysiological Laboratory, Karolins ka Sjukhuset, Stockholm

*** Phoniatric Clinic, Karolinska Sjukhuset, Stockholm

STL-QPSR 2-3/1967 16.

The block d iagram of the lower par t of Fig. 11-A-1 summarizes a

s t rategy for the synthesis of t ime varying vocal t r ac t shapes that re -

produce coarticulation. This strategy has been implemented on a

digital computer and tested against data collected f r o m X-ray motion

pictures of a human talker. Without going into details that have been

published elsewhere ( 3 ) , I should like only to draw attention to the

separate representations of the commands fo r the timing of the vowels

and the consonants, and the commands specifying which consonants

and vowels a r e to be synthesized. Target configurations fo r the initial

vowel, the medial consonant, and the final vowel a r e fed to the model

in the f o r m of se t s of numbers representing distances along the coordi-

nate l ines shown i n the upper right corner of the figure. The consonant

and vowel timing pulses shown on the left of the block d iagram a r e then

passed through the smoothing f i l te rs and enter the coarticulation model

a s one-dimensional signals marked K(t) and Q(t). A VCV ges ture com-

plex is then calculated according to the formulas shown below the block

diagram. In this process the vowel diphthong gesture is governed by

Q(t) and the superimposed consonant ges ture is governed by K(t). The

calculation is done i n such a way that, a t the moment of consonantal

c losure, a residue of the underlying vowel ges ture is always present

i n the vocal-tract configuration, so that the consonant becomes colored

by the vowel environment. Hence variance i s reproduced a t the output

while invariance is preserved a t the input.

The articulatory mechanisms enclosed by dashed lines in Fig.

11-A- 1 have been represented by a single "black box" in Fig. 11-A-2.

Here, again, the timing commands a r e fed over two separate channels

corresponding to consonants and vowels, and they a r e responsible to-

gether fo r the temporal integration of the syllable. The feature com-

mands, on the other hand, specify the target configurations that the

initial, medial, and final ges tures of the syllable a r e aiming at.

In t e r m s of this model the phonetic variance of phonological enti-

t i es a s observed a t the acoustic level may be related to th ree types

of physiological processes: coarticulation, undershoot, and reorgani-

zation.

We have already discussed coarticulation and this concept is i l lus-

t ra ted again i n the uppermost par t of Fig. 11-A-2. Coarticulation r e -

sults when the ar t iculators a r e moving i n response to distinct but tem-

porally ove r-lapping commands.

CAUSES OF PHONETIC VARIABILITY

COMMAND VOWEL j-'J, ARTICULA - OUTPUT GESTURE

COMMAND MECHANISMS - time 7-7-T

UNDER-SHOOT

CONSONANT^ [ I - '

COMMAND - - VOWEL COMMAND 7

ARTICULA -

MECHANISMS

REORGANIZATION

VOWEL COMMAND -----"--------

ARTICULA -

MECHANISMS R, z 0 2

s + z W 3 0 W In

t

OUTPUT GESTURE

OUTPUT GESTURE

Fig. 11-A-2.

STL-QPSR 2-3/1967 17.

Undershoot i s indicated in the middle part of the figure by the

difference i n timing of the second consonant pulse. Undershoot thus

results when an incomplete articulatory gesture i s interrupted by a

neural command that brings about the next gesture of the utterance.

Examples of undershoot a r e found in the neutralization of vowels and

consonants under increased rates of speech, a s demonstrated by

Lindblom (4). This type of variance may be accounted for quantita-

tively in t e rms of the numerical model discussed i n connection with

Fig. 11-A-1.

Reorganization, finally, i s the result of a context dependent change

of the feature specification, a s illustrated in the bottom part of Fig.

11-A-2. This sor t of effect i s found, f o r example, i n the devoicing of

final voiced consonants in German and Russian, and i n the quality al-

ternations of vowels under vowel-harmony in a great many languages.

With respect to the underlying neural control there i s thus an es-

sential difference between the phonetic variance due to reorganization

on the one hand and the variance typified by coarticulation and under-

shoot on the other hand. The sequence of feature specifications may

be viewed a s a phonological signal that modulates a phonetic c a r r i e r

consisting of the standard timing pattern of the syllable. F rom this

point of view reorganization i s a perturbation of the modalating signal

while coarticulation and undershoot a r e due to the s tructure of the

ca r r i e r .

The model discussed so f a r has grown out of analyses of acoustic

records and X-ray motion pictures. It i s therefore of considerable

interest to compare the picture of speech production summarized by

the model with data obtained at the peripheral neural level by means

of electromyographic methods. W e shall here examine a few examples

f rom a study i n which this concentric needle electrodes were used to

record the motor unit activity f rom the facial muscles of a Swedish

subject.

The top part of Fig. 11-A-3 shows a t race f rom a muscle that lifts

the upper lip. In this record each motor unit potential i s represented

by a vertical line that incidates the amplitude of the spike. Thc VCV

utterance /y:hi:/ i s embedded in the f rame /seja d3:hf:dare-/.

t I - I l & J & & I l l :

LEV LAB. SUF!

LEV LAB. SUP rr' ,

1 I l l # I I I I I . . . . I I

ORB. INF:

Fig. 11-A-3.

STL-QPSR 2-3/1967 18.

Note that this muscle i s tonically active between the utterances of

the l is t that the subject read in the recording session. During the ut-

terances, however, this background activity i s depressed o r enhanced

in synchrony with the rounding and spreading gestures of the lips.

This behavior i s typical and shows that in speech the articulators

assume a basic and apparently fixed posture on top of which excitatory

and inhibitory motor commands a r e superposed a s a modulating signal.

This phonetic modulation of the basic speech posture occurs, of course,

a t a lower level than the phonological modulation of the syllable timing

ca r r i e r , discussed earl ier .

The phonetic modulation i s also shown in the lower part of Fig.

XI-A-3. The upper t race derives f r om the muscle just mentioned that

lifts the upper lip, and the lower t race was picked up f rom the muscle

that shortens and protrudes the lower lip. The utterance contains the

VCV sequence /y:hu:/.

Note the reciprocal nature of these two signals. Whenever the I

lower t r ace shows activity for rounding the upper t race i s depressed,

and vice versa. In this way the labial configuration a s a whole i s made

to fluctuate about a constant average posture, so to speak.

Do we find invariance of motor commands at the peripheral neural

level? Fig. 11-A-4 gives a negative answer to this question. The up-

per t race shows the motor unit activity of the lower l ip muscle referred

to earl ier . Here the VCV sequence /y:hu:/ i s embedded in the stand-

ard frame. In the utterance of the lower t race, recorded f rom the same

lower lip muscle, the vowels of the VCV sequence have the opposite

order , /u:hy:/. The degree of protrusion for these two vowels is in-

dicated schematically by the straight lines below the electromyographic

records.

When the vowel /u/ i s preceded by the l e ss protruded vowels of the

frame, a transitional overshoot appears at the beginning of the motor

command. This overshoot is absent when /u/ follows the vowel /y/

which i s more protruded. Hence the commands a r e not invariant. The

steady state activity levels of these vowels s eem to be l e ss variable,

however.

It i s evident that the peripheral motor commands a r e calculated by

the brain not only with respec t to the constant effort necessary to main-

tain a target configuration, but also with respect to the variable effort

EMG I1 I ORB. INF I I

PROTRUSION

ORB. INF:

PROTRUSION 7

Fig. 11-A-4.

PHONOLOGICAL 1 SYLLABI- FEATURE F I CAT ION SEQUENCE MECHANISMS

STAGES OF MODULATION IN SPEECH PRODUCTION

SYLLABLE TIMING

CARRIER

BASK: SPEECH POSTURE

MOTOR COMMAND

ACOUSTIC SOUND

SOURCES

SOUND PRESSURE WAVE

PERIPHERAL NEUROMOTOR MECHANISMS

Fig. 11-A-5.

ARTICULA - ' TORY GESTURE

ACOUSTIC VOCALTRACT MECHANISMS

STL-QPSR 2-3/1967 19.

needed to - move the ar t iculators f r o m wherever they a r e to the desired

target. It is quite likely that the l a s t mentioned phase of the calcula-

tion takes place a t a ra ther per ipheral level involving feedback f rom the

many sensory receptors in the o r a l region.

To s u m up, a model of speech production - adequate for phonologic-

a l purposes - should incorporate a sequence of s tages of modulation

a s suggested i n Fig. 11-A-5. As was emphasized by Dudley (5) the

acoustic sound p res su re wave resul ts f r o m the modulation of quasi-

periodic o r noisy sound sources by the relatively slow ar t iculatory

gestures . The la t te r gesture sequence comes about through the modula-

tion of a basic speech posture by a sequence of excitatory and inhibi-

tory peripheral commands. These commands, finally, may be regarded

a s the outputs of a se t of more-central neural c i rcui ts through which a

complex timing c a r r i e r is channeled by means of a cer ta in phonological

signal. This signal determines specific phonetic features of the var i-

ous phases of the resulting syllables.

References

(1) ahman, S. E. G. : "Coarticulation i n VCV Utterances: Spectro- . .

graphic Measurements", J. Acoust. Soc. Am. , - 39 (1 966), pp. 151-168.

(2) Menzerath, P. and de Lacerda, A. : Koartikulation, Steuerung und Lautabgrenzung (Berlin 1933).

( 3 ) ahman, S. E. G. : "Numerical Model of Coarticulation", J.Acoust.Soc.Am., - 41 (1967), pp. 310-320.

(4) Lindblom, B. : "Spectrographic Study of Vowel Reduction", J. Acoust. Soc. Am., - 35 (1963), pp. 1773- 1781.

(5) Dudley, H. : "The C a r r i e r Nature of Speech", Bell System Techn. J . , 19 (1940), pp. 495-515. -

Download - Speech production at the neuro-muscular · PDF fileSpeech production at the neuro-muscular level ... The p,ionological output of a generative grammar may be ... the vowels of the VCV

Top Related