articulatory gestures as phonologicalunits* · phonological structures areformed. ... behavioral...

33
Haskins Laboratories Status Report on Speech Research 1989, SR-99/100, 69-101 Articulatory Gestures as Phonological Units* Catherine P. Browman and Louis Goldstein t 1 A GESTURAL PHONOLOGY Over the past few years, we have been investigating a particular hypothesis about the nature of the basic 'atoms' out of which phonological structures are formed. The atoms are assumed to be primitive actions of the vocal tract articulators that we call 'gestures.' Informally, a gesture is identified with the (and release) of a characteristic constrictlon WIthm one of the relatively independent articulatory subsystems of the vocal tract (i.e., oral, velic). Within the oral subsystem, constrictIOns can be formed by the action of one of three relatively independent sets of articulators: the lips, the tongue tip/blade and the As actions, gestures have some mtrmsI.c associated with them-they are characterizatlons of movements through space and over time (see Fowler et aI., 1980). Within the view we are developing, phonological structures are stable 'constellations' (or 'molecules', to avoid mixing metaphors) assembled out of these gestural atoms. In this paper, we examine some of the evidence for, and some of the consequences of, the assumption that gestures are the basic atoms of phonological structures. First, we attempt to establish that gestures are pre- linguistic discrete units of action that are inherent in the maturation of a developing child and that therefore can be harnessed as elements of a phonological system in the course of development 1.1). We then give a more detailed, formal characterization of gestures as phonological units within the context of a computational model 1.2), and show that a number of phonological This paper has benefited from criticisms by Cathi Best, Faber, Elliot Saltzman, Michael Studdert-Kennedy, Enc Vatikiotis-Bateson, Doug Whalen and two anonymous reviewers. Our thanks to Mark Tiede for help with manuscript preparation, and Zefang Wang for help with the graphics. This work was supported by NSF grant BNS-8520709 and NIH grants HD-01994 and NS-13617 to Haskins Laboratories. 69 regularities can be captured by constellations of gestures (each haVIng mherent duration) using gestural scores 1.3). Finally, we show how the proposed gestural structures relate to proposals offeature geometry (§§ 2 - 3). 1.1 Gestures as pre-linguistic primitives Gestures are units of action that can be identified by observing the coordinated movements of vocal tract articulators. That is, repeated observations of the production of a given utterance will reveal a characteristic pattern of constrictions being formed and released. The fact that these patterns of (discrete) gestures are similar in structure to the nonlinear phonological representations being currently postulated (e.g., Clements 1985; Hayes, 1986; Sagey, 1986), together some of the evidence presented in Browman and Goldstein (1986, in press), leads us to make the strong hypothesis that gestures themselves constitute basic phonological units. This hypothesis has the attractive feature that the basic units of phonology can be identified directly with cohesive patterns of movement within the vocal tract. Thus, the phonological system is built out of inherently discrete units of action. This state of affairs would be particularly useful for a child learning to speak. If we assume that discrete gestures (like those that will eventually function as phonological units) emerge in the child's behavioral repertoire in advance of any specifically linguistic development, then it is possible to view phonological development as harnessing these action units to be the basic units ofphonological structures. The idea that pre-linguistic gestures are employed in the service of producing early words has been proposed and supported by a number of writers, for example, Fry (1966, in Vihman in press), Locke (1983), Studdert-Kennedy (1987) and Vihman (in press), where what we identify as 'gestures' are referred to as 'articulatory routines' or the like. The view we are proposing extends

Upload: dinhdang

Post on 19-Apr-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Haskins Laboratories Status Report on Speech Research1989, SR-99/100, 69-101

Articulatory Gestures as Phonological Units*

Catherine P. Browman and Louis Goldsteint

1 A GESTURAL PHONOLOGYOver the past few years, we have been

investigating a particular hypothesis about thenature of the basic 'atoms' out of whichphonological structures are formed. The atoms areassumed to be primitive actions of the vocal tractarticulators that we call 'gestures.' Informally, agesture is identified with the .fo~mat~on. (andrelease) of a characteristic constrictlon WIthm oneof the relatively independent articulatorysubsystems of the vocal tract (i.e., oral, la~n~eal,velic). Within the oral subsystem, constrictIOnscan be formed by the action of one of threerelatively independent sets of articulators: thelips, the tongue tip/blade and the ~on~e ~ody: Asactions, gestures have some mtrmsI.c t~me

associated with them-they are characterizatlonsof movements through space and over time (seeFowler et aI., 1980).

Within the view we are developing, phonologicalstructures are stable 'constellations' (or'molecules', to avoid mixing metaphors) assembledout of these gestural atoms. In this paper, weexamine some of the evidence for, and some of theconsequences of, the assumption that gestures arethe basic atoms of phonological structures. First,we attempt to establish that gestures are pre­linguistic discrete units of action that are inherentin the maturation of a developing child and thattherefore can be harnessed as elements of aphonological system in the course of development(§ 1.1). We then give a more detailed, formalcharacterization of gestures as phonological unitswithin the context of a computational model(§ 1.2), and show that a number of phonological

This paper has benefited from criticisms by Cathi Best, Ali~Faber, Elliot Saltzman, Michael Studdert-Kennedy, EncVatikiotis-Bateson, Doug Whalen and two anonymousreviewers. Our thanks to Mark Tiede for help with manuscriptpreparation, and Zefang Wang for help with the graphics. Thiswork was supported by NSF grant BNS-8520709 and NIHgrants HD-01994 and NS-13617 to Haskins Laboratories.

69

regularities can be captured by ~epr~senting

constellations of gestures (each haVIng mherentduration) using gestural scores (§ 1.3). Finally, weshow how the proposed gestural structures relateto proposals offeature geometry (§§ 2 - 3).

1.1 Gestures as pre-linguistic primitivesGestures are units of action that can be

identified by observing the coordinatedmovements of vocal tract articulators. That is,repeated observations of the production of a givenutterance will reveal a characteristic pattern ofconstrictions being formed and released. The factthat these patterns of (discrete) gestures aresimilar in structure to the nonlinear phonologicalrepresentations being currently postulated (e.g.,Clements 1985; Hayes, 1986; Sagey, 1986),together ~ith some of the evidence presented inBrowman and Goldstein (1986, in press), leads usto make the strong hypothesis that gesturesthemselves constitute basic phonological units.This hypothesis has the attractive feature that thebasic units of phonology can be identified directlywith cohesive patterns of movement within thevocal tract. Thus, the phonological system is builtout of inherently discrete units of action. Thisstate of affairs would be particularly useful for achild learning to speak. If we assume that discretegestures (like those that will eventually functionas phonological units) emerge in the child'sbehavioral repertoire in advance of anyspecifically linguistic development, then it ispossible to view phonological development asharnessing these action units to be the basic unitsof phonological structures.

The idea that pre-linguistic gestures areemployed in the service of producing early wordshas been proposed and supported by a number ofwriters, for example, Fry (1966, in Vihman inpress), Locke (1983), Studdert-Kennedy (1987)and Vihman (in press), where what we identify as'gestures' are referred to as 'articulatory routines'or the like. The view we are proposing extends

70 Browman and Goldstein

this approach by hypothesizing that these pre­linguistic gestures actually become the units ofcontrast. Additional phonological developmentsinvolve differentiating and tuning the gestures,and developing patterns of intergesturalcoordination that correspond to largerphonological structures.

The evidence that gestures are pre-linguisticunits of action can be seen in the babblingbehavior of young infants. The descriptions ofinfant babbling (ages 6-12 months) suggest apredominance of what are transcribed as simpleCV syllables (Locke, 1986; Oller & Eilers, 1982).The 'consonantal' part of these productions can beanalyzed as simple, gross, constriction maneuversof the independent vocal tract subsystems and(within the oral subsystem) the separate oralarticulator sets. For example, based on frequencycounts obtained from a number of studies, Locke(1983) finds that the consonants in (1) constitutethe 'core' babbling inventory: these 12 'consonants'account for about 95% of the babbles of Englishbabies. Similar frequencies obtain in otherlanguage environments.

(1) h bdg ptk mn jw s

These transcriptions are not meant to be eithersystematic phonological representations (the childdoesn't have a phonology yet), or narrow phonetictranscriptions (the child cannot be producing thedetailed units of its 'target' language, because, asnoted below, there do not seem to be systematicdifferences in the babbles produced by infants indifferent language environments). Others havenoted the problems inherent in using atranscription that assumes a system of units andrelations to describe a behavior that lacks such asystem (e.g., Kent & Murray, 1982; Koopmans-vanBeinum & van der Stelt, 1986; Oller, 1986;Studdert-Kennedy, 1987). As Studdert-Kennedy(1987) argues, it seems likely that thesetranscriptions reflect the production by the infantof simple vocal constriction gestures, of the kindthat evolve into mature phonological structures(which is why adults can transcribe them usingtheir phonological categories). Thus, fhJ can beinterpreted as a laryngeal widening gesture and/bdgl as 'gross' constriction gestures of the threeindependent oral articulator sets (lips, tongue tip,and tongue body). Iptkl combine the oralconstriction gestures with the laryngealmaneuver, and 1m nI combine oral constrictionswith velie lowering. These combinations do notnecessarily indicate an ability on the part of theinfant to coordinate the gestures. Rather, any

accidental temporal coincidence of two suchgestures would be perceived by the listener as thesegments in question.

The analysis outlined above suggests thatbabbling involves the emergence, in the infant, ofsimple constriction gestures of independent partsof the vocal tract. As argued by Locke (1986), thepattern of emergence of these actions can beviewed as a function of anatomical andneurophysiological developments, rather than thebeginning of language acquisition, per se. This canbe seen, first of all, in the fact that the babblinginventory and its developmental sequence havenot been shown to vary as a function of theparticular language environment in which thechild finds itself (although individual infants mayvary considerably from one another in the relativefrequencies of particular gestures-Studdert­Kennedy 1987; Vihman, in press). In fact, in thelarge number of studies reviewed by Locke (1983),there appear to be no detectable differences(either instrumentally or perceptually) in the'consonantal' babbling of infants reared indifferent language environments. (More recentstudies have found some language environmenteffect on the overall long term spectrum of vocalicutterances-de Boysson-Bardies et al., 1986; andon prosody-de Boysson-Bardies et al., 1984.Other subtle effects may be uncovered withimprovement of analytic techniques.)

Secondly, Locke (1983) notes that thedevelopmental changes in frequency of particularbabbled consonants can likely be explained byanatomical developments. Most of the consonantsproduced by very young infants (less than sixmonths) involve tongue body constrictions, usuallytranscribed as velars. Some time shortly after thebeginning of repetitive canonical babbling (usuallyin the seventh month), tongue tip and lipconstrictions begin to outnumber tongue bodyconstrictions, with tongue tip constrictionseventually dominating. Even deaf infants show aprogression that is qualitatively similar, at leastin early stages, although their babbling can bedistinguished from that of hearing infants on anumber of acoustic measures (Oller & Eilers,1988). Locke suggests an explanation in terms ofvocal tract maturation. At birth, the infant'slarynx is high, and the tongue virtually fills theoral cavity (Lieberman, 1984). This would accountfor the early dominance of tongue bodyconstrictions. After the larynx drops, tongue tipand lip constrictions-without simultaneoustongue constrictions-are more readily formed. Inparticular, the closing action of the mandible will

Articulatory Gestures as Phonological Units 71

then contribute to constrictions at the front of themouth.

Finally, Locke (1986) notes that the timing ofthe development of repetitive 'syllabic' babblingcoincides with the emergence of repetitive motorbehaviors generally. He cites Thelen's (1981)observation of 47 different rhythmic activities thathave their peak frequency at 6-7 months. Lockeconcludes (1986, p. 145) that 'it thus appears thatthe syllabic patterning of babble-like thephonetic patterning of its segments-isdetermined mostly by nonlinguistic developmentsofvocal tract anatomy and neurophysiology.'

The pre-linguistic vocal gestures becomelinguistically significant when the child begins toproduce its first few words. The child seems tonotice the similarity of its babbled patterns to thespeech s/he hears (Locke 1986), and begins toproduce 'words' (with apparent referentialmeaning) using the available set of vocal gestures.It is possible to establish that there is a definiterelationship between the (nonlinguistic) gesturesof babbling and the gestures employed in earlywords by examining individual differences amongchildren. Vihman et al. (1985) and Vihman (inpress) find that the particular consonants thatwere produced with high frequency in thebabbling of a given child also appear with highfrequency in that child's early word productions.Thus, the child is recruiting its well-practicedaction units for a new task. In fact, in some earlycases (e.g., 'baby' words like m a m a, etc.),'recruiting' is too active a notion. Rather, parentsare helping the child establish a referentialfunction with sequences that already exist as partof the babbling repertoire (Locke, 1986).

Once the child begins producing words (complexunits that have to be distinguished one fromanother) using the available gestures as buildingblocks, phonology has begun to form. If wecompare the child's early productions (using thesmall set of pre-linguistic gestures) to the gesturalstructure of the adult forms, it is clear that thereare (at least) two important developments that arerequired to get from one to the other: (1)differentiation and tuning of individual gesturesand (2) coordination of the individual gesturesbelonging to a given word. Let us examine these inturn.

Differentiation and .tuning. While therepertoire of gestures inherent in the consonantsof (1) above employs all of the relativelyindependent articulator sets, the babbled gesturesinvolve just a single (presumably gross)movement. For e,xample, some kind of closure is

involved for oral constriction gestures. In general,however, languages employ gestures producedwith a given articulator set that contrast in thedegree of constriction. That is, not only are closuregestures produced but also fricative and wider(approximant) gestures. In addition, the exactlocation of the constriction formed by a givenarticulator set may contrast, e.g., in the gesturesfor leI, lsi and III. Thus, a single pre-linguisticconstriction gesture must eventually differentiateinto a variety of potentially contrastive gestures,tuned with different values of constriction locationand degree. Although the location and degree ofthe constriction formed by a given articulator setare, in principle, physical continua, thedifferentiated gestures can be categoricallydistinct. The partitioning of these continua intodiscrete categories is likely aided by quantal (i.e.,nonlinear) articulatory-auditory relations of thekind proposed by Stevens (1972, 1989). Inaddition, Lindblom (1986) has shown how thepressures to keep contrasting words perceptuallydistinct can lead to discrete clustering along somearticulatorylacoustic contina. Even so, the processof differentiation may be lengthy. Nittrouer,Studdert-Kennedy, and McGowan (1989) presentdata on fricative production in American Englishchildren. They find that differentiation between lsiand III is increasing in children from ages three toseven, and hasn't yet reached the level shown byadults. In addition, tuning may occur even wheredifferentiation is unnecessary. That is, even if alanguage has only a single tongue tip closuregesture, its constriction location may be tuned to aparticular language-specific value. For example,English stops have an alveolar constrictionlocation, while French stops are more typicallydental.

Coordination. The various gestures thatconstitute the atoms of a given word must beorganized appropriately. There is some evidencethat a child can know what all the relevantgestures are for some particular word, and canproduce them all, without either knowing or beingable to produce the appropriate organization.Studdert-Kennedy (1987) presents an example ofthis kind from Ferguson and Farwell (1975). Theylist ten attempts by a 15-month old girl to say theword pen in a half-hour session, as shown in (2):

(2) [md:!, wA, dedo , hm, mbo, phm, thl)thl)thl), bah,q,hauo, bud]

While these attempts appear radically different,they can be analyzed, for the most part, as the set

72 Browman and Goldstein

of gestures that constitute pen misarranged invarious ways: glottal opening, bilabial closure,tongue body lowering, alveolar closure and velumlowering. Eventually, the 'right' organization is hitupon by the child. The search is presumably aidedby the fact that the coordinated structureembodied in the target language is one of arelatively small number of dynamically stablepatterns (see also Boucher, 1988). The formationof such stable patterns may ultimately beilluminated by research into stable modes incoordinated human action in general (e.g., Hakenet a1., 1985; Schmidt et a1., 1987) being conductedwithin the broad context of the non-lineardynamics relevant to problems of patternformation in physics and biology (e.g., Glass &Mackey, 1988; Thompson & Stewart, 1976). Inaddition, aspects of coordination may emerge asthe result of keeping a growing number of wordsperceptually distinct using limited articulatoryresources (Lindblom et a1., 1983).

If phonological structures are assumed to beorganized patterns of gestural units, a distinctmethodological bonus obtains: the vocal behaviorof infants, even pre-linguistic behavior, can bedescribed using the same primitives (discreteunits of vocal action) that are used to describe thefully elaborated phonological system of adults.This allows the growth of phonological form to beprecisely monitored, by observing the developmentof the primitive gestural structures of infants intothe elaborated structures of adults (Best &Wilkenfeld, 1988 provide an example of this). Inaddition, some of the thorny problems associatedwith the transcription ofbabbling can be obviated.Discrete units of action are present in the infant,and can be so represented, even if adult-likephonological structures have not yet developed. Asimilar advantage applies to describing variouskinds of 'disordered' speech (e.g., Kent, 1983;Marshall et a1., 1988), which may lack theorganization shown in 'normal' adult phonology,making conventional phonological/phonetictranscriptions inappropriate, but which may,nevertheless, be composed of gestural primitives.Of course, all this assumes that it is possible togive an account of adult phonology using gesturesas the basic units-it is to that account that wenow turn.

1.2 The nature of phonological gesturesIn conjunction with our colleagues Elliot

Saltzman and Philip Rubin at HaskinsLaboratories, we are developing a computationalmodel that produces speech beginning with a

representation of phonological structures in termsof gestures (Browman et a1., 1984; Browman et a1.,1986; Browman & Goldstein, 1987; Saltzman etal., 1987), where a gesture is an abstractcharacterization of coordinated task-directedmovements of articulators within the vocal tract.Each gesture is precisely defined in terms of theparameters of a set of equations for a 'task­dynamic' model (Saltzman, 1986; Saltzman &Kelso, 1987). When the control regime for a givengesture is active, the equations regulate thecoordination of the model's articulators in such away that the gestural 'task' (the formation of aspecified constriction) is reached as the articulatormotions unfold over time. Acoustic output isobtained from these articulator motions by meansof an articulatory synthesizer (Rubin et a1., 1981).The gestures for a given utterance are themselvesorganized into a larger coordinated structure, orconstellation, that is represented in a gesturalscore (discussed in § 1.3). The score specifies thesets of values of the dynamic parameters for eachgesture, and the temporal intervals during whicheach gesture is active. While we use analyses ofarticulatory movement data to determine theparameter values for the gestures and gesturalscores, there is nevertheless a strikingconvergence between the structures we derivethrough these analyses and phonologicalstructures currently being proposed in otherframeworks (e.g., Anderson & Ewen, 1987;Clements, 1985; Ewen, 1982; Lass, 1984;McCarthy, 1989; Plotkin, 1976; Sagey, 1986).

Within task dynamics, the goal for a givengesture is specified in terms of independent taskdimensions, called uocal tract uariables. Eachtract variable is associated with the specific sets ofarticulators whose movements determine thevalue of that variable. For example, one such tractvariable is Lip Aperture (LA), corresponding tothe vertical distance between the two lips. Threearticulators can contribute to changing LA: thejaw, vertical displacement of the lower lip withrespect to the jaw, and vertical displacement ofthe upper lip. The current set of tract variables inthe computational model, and their associatedarticulators, can be seen in Figure 1. Within thetask dynamic model, the control regime for a givengesture coordinates the ongoing movements ofthese articulators in a flexible, but task-specificmanner, according to the demands of otherconcurrently active gestures. The motionassociated with each of a gesture's tract variablesis specified in terms of an equation for a second­order dynamical system. l The equilibrium

Articulatory Gestures as Phonological Units 73

position parameter of the equation [xO] specifiesthe tract variable target that will be achieved, andthe stiffness parameter k specifies (roughly) thetime required to get to target. These parametersare tuned differently for different gestures. Inaddition, their values can be modified by stress.

Gestures are currently specified in terms of oneor two tract variables. Velic gestures involve asingle tract variable of aperture size, as do glottalgestures. Oral gestures involve pairs of tractvariables that specify the constriction degree (LA,TTCD, and TBCD) and constriction location (LP,TTCL, and TBCL). For simplicity, we will refer tothe sets of articulators involved in oral gesturesusing the name of the end-effector, that is, thename of the single articulator at the end of thechain of articulators forming the particularconstriction: the LIPS for LA and LP, the tonguetip (TT) for gestures involving TTCD and TTCL,and the tongue body (TB) for gestures involvingTBCD and TBCL. As noted above, each tract

tract variable

variable is modelled using a separate dynamicalequation; however, at present the paired tractvariables use identical stiffness and are activatedand de-activated simultaneously. The dampingparameter b for oral gestures is always set forcritical damping-the gestures approach theirtargets, but do not overshoot it, or 'ring.' Thus, agiven oral gesture is specified by the values ofthree parameters: target values for each of a pairof tract variables, and a stiffness value (used forboth equations).

This set of tract variables is not yet complete, ofcourse. Other oral tract variables that need to beimplemented include an independent tongue rootvariable (Ladefoged & Halle, 1988), and (asdiscussed in § 2.1) variables for controlling theshape of TT and TB constrictions as seen in thethird dimension. Additional laryngeal variablesare required to allow for pitch control and forvertical movement of the larynx, required, forexample, for ejectives and implosives.

articulators involved

LPLATTCLTTCDTBCLTBCDVELGLO

lip protrusionlip aperturetongue tip constrict locationtongue tip constrict degreetongue body constrict locationtongue body constrict degreevelic apertureglottal aperture

upper & lower lips, jawupper & lower lips, jawtongue tip, body, jawtongue tip, body, jawtongue body, jawtongue body, jawvelumglottis

- LP-

Figure 1. Tract variables and contributing articulators of computational model.

74 Browman and Goldstein

TB (con degree, con location, con shape" , stiffness)

TBCD TBCl

TT (con degree, con location, con shape" , stiffness)

TIC0 TICl

Figure 2. Inventory of articulator sets and associatedparameters.

For the present, we list without comment thepossible descriptor values for the constrictiondegree (CD) and constriction location (CL)dimensions in (4). In § 2.1, we will discuss thegestural dimensions and these descriptors indetail, including a comparison to currentproposals offeatural geometry.

, stiffnesSj

, stiffness'

, stiffness)

I stiffness)

(con degree,VEL

(con degree", con location",

(con degree, con location",GlO

(con degree, con location,lA LP

Gestures:Articulator Dimensions~

TR*

VEL

LIPS

GLO

Constriction Shape is relevant only for oralgestures, and refers to the xO value ofconstriction shape tract variables. It is notcurrently implemented.

Stiffness refers to the k value of the tractvariables.

Figure 2 displays the inventory of articulatorsets and associated parameters that we posit arerequired for a general gestural phonology. Theparameters correspond to the particular tractvariables of the model shown below them. Thoseparameters with asterisks are not currentlyimplemented.

The representations of gestures employingdistinct sets of tract variables are categoricallydistinct within the system outlined here. That is,they are defined using different variables thatcorrespond to different sets of articulators. Theyprovide, therefore, an inherent basis for contrastamong gestures (Browman & Goldstein, in press).However, for contrasting gestures that employ thesame tract variables, the difference between thegestures is in the tuned values of the continuousdynamic parameters (for oral gestures:constriction degree, location and stiffness). Thatis, unlike the articulator sets being used, thedynamic parameters do not inherently definecategorically distinct classes. Nonetheless, weassume that there are stable ranges of parametervalues that tend to contrast with one anotherrepeatedly in languages (Ladefoged & Maddieson,1986; Vatikiotis-Bateson, 1988). The discretevalues might be derived using a combination ofarticulatory and auditory constraints appliedacross the entire lexicon of a language, asproposed by Lindblom et al. (1983). In addition,part of the basis for the different ranges mightreside in the nonlinear relation between theparameter values and their acoustic consequences,as in Stevens' quantal theory (Stevens 1972,1989).In order to represent the contrastive rangesof gestural parameter values in a discrete fashion,we employ a set of gestural descriptors. Thesedescriptors serve as pointers to the particulararticulator set involved in a given gesture, and tothe numerical values of the dynamical parameterscharacterizing the gestures. In addition, they canact as classificatory and distinctive features forthe purposes of lexical and phonological structure.Every gesture can be specified by a distinctdescriptor structure. This functional organizationcan be formally represented as in (3), whichrelates the parameters of the dynamical equationsto the symbolic descriptors. Contrasting gestureswill differ in at least one of these descriptors.

(3) Gesture = articulator set (constrictiondegree, constriction location, constrictionshape, stiffness)

Constriction Degree is always present, andrefers to the xO value for the constrictiondegree tract variables (LA, TTCD, TBCD,VEL, or GLO).

Constriction Location is relevant only fororal gestures, and refers to the xO value forthe constriction location tract variables (LP,TTCL, or TBCL).

Articulatory Gestures as Phonological Units 75

(4) CD descriptors: closed critical narrow midwide

CL descriptors: protruded labial dentalalveolar post-alveolarpalatal velar uvularpharyngeal

In the phonology of dynamically definedarticulatory gestures that we are developing,gestures are posited to be the atoms ofphonological structure. It is important to note thatsuch gestures are relatively abstract. That is, thephysically continuous movement trajectories areanalyzed as resulting from a set of discrete,concurrently active gestural control regimes. Theyare discrete in two senses: (1) the dynamicparameters of a gesture's control regime remainconstant throughout the discrete interval of timeduring which the gesture is active, and (2)gestures in a language may differ from oneanother in discrete ways, as represented bydifferent descriptor values. Thus, as argued inBrowman and Goldstein (1986) and Browman andGoldstein (in press), the gestures for a givenutterance, together with their temporalpatterning, perform a dual function. Theycharacterize the actual observed articulatormovements (thus obviating the need for anyadditional implementation rules), and they alsofunction as units of contrast (and more generallycapture aspects of phonological patterning). Asdiscussed in those papers; the gesture as aphonological unit differs both from the feature andfrom the segment (or root node in current featur~geometries). It is a larger unit than the feature,being effectively a unitary constriction action,parameterized jointly by a linked structure offeatures (descriptor values). Yet it is a smallerunit than the segment: several gestures linkedtogether are necessary to form a unit at thesegmental, or higher, levels.

1.3 Gestural scores: Articulatory tiers andinternal duration

In the preceding section, gestures were definedwith reference to a dynamical system that shapespatterns of articulatory movements. Each gesturepossesses, therefore, not only an inherent spatialaspect (Le., a tract variable goal) but also anintrinsic temporal aspect (Le., a gesturalstiffness). Much of the power of the gesturalapproach follows from these basic facts aboutgestures (combined with their abstractness), sincethey allow gestures to overlap in time as well as inarticulator and/or tract variable space (see also

Bell-Berti & Harris, 1981; Fowler, 1980, 1983;Fujimura, 1981a,b). In this section, we show howoverlap among gestures is represented, anddemonstrate that simple changes in the patternsof overlap between neighboring gestural units canautomatically produce a variety of superficiallydifferent types of phonetic and phonologicalvariation.

Within the computational model describedabove, the pattern of organization, orconstellation, of gestures corresponding to a givenutterance is embodied in a set of phasingprinciples (see Kelso & Tuller, 1987; Nittrouer etaI., 1988) that specify the spatiotemporalcoordination of the gestures (Browman &Goldstein, 1986). The pattern of intergesturalcoordination that results from applying thephasing principles, along with the interval ofactive control for individual gestures, is displayedin a two-dimensional gestural score, witharticulatory tiers on one dimension and temporalinformation on the other (Browman et aI., 1986;Browman & Goldstein, 1987). A gestural score forthe word palm (pronounced [pam]) is displayed inFigure 3a. As can be seen in the figure, the tiers ina gestural score, on the vertical axis, represent thesets of articulators (or the relevant subset thereof)employed by the gestures, while the horizontaldimension codes time.

The boxes in Figure 3a correspond to individualgestures, labelled by their descriptor values forconstriction degree and constriction location(where relevant). For example, the initial oralgesture is a bilabial closure, represented as aconstriction of the LIPS, with a [closed]constriction degree, and a [labial] constrictionlocation. The horizontal extent of each boxrepresents the interval of time during which thatparticular gesture is active. During theseactivation intervals, which are determined in thecomputational model from the phasing principlesand the inherent stiffnesses of each gesture, theparticular set of dynamic parameter values thatdefines each gesture is actively contributing toshaping the movements of the model articulators.In Figure 3b, curves are added that show thetime-varying tract variable trajectories generatedby the task dynamic model according to theparameters indicated by the boxes. For example,during the activation interval of the initial labialclosure, the curve representing LA (the verticaldistance between the lips) decreases. As can beseen in these curves, the activation intervalsdirectly capture something about the durations ofthe movements of the gestures.

76

(a)

VEL

TB

LIPS

GLO

clolabial

wide

Browman and Goldstein

narrowpharyngeal

wide

clolabial

[ p h Q m ]

VEL

(b)

TB(TBCD)

LIPS(LA)

GLO

-';':'~'~l~~~~ !

I......~;

r··"'· ·· · ~

~ ~~ ~

Lv.,._...,..v"._,•.••••••." ••"HH••.••••_.....u_..._..._~

[

100

p h

200 300

TIME (MS)

Q

400

m ]

Figure 3. Gestural score for palm [pam] using box notation. (a) Activation intervals only; (b) model generated tractvariable motions added.

Articulatory Gestures as Phonological Units 77

Figure 4 presents an alternative symbolicredisplay2 of the gestural score. Instead of the'box' notation of Figure 3, a 'point' notation isemployed that references only the gesturaldescriptors and relative timing of their 'targets.'The extent of activation of the gestures in the boxnotation is not indicated. In addition, associationlines between the gestures have been added.These lines indicate which gestures are phasedwith respect to each other. Thus, the display is ashorthand representation of the phasing rulesdiscussed in Browman and Goldstein (1987).The pair of gestures connected by a givenassociation line are coordinated so that a specifiedphase of one gesture is synchronized with somephase of the other. For example, the peak openingof the GLO [wide] gesture (180 degrees) issynchronized with the release phase (290 degrees)of the LIPS [cIo labial] gesture. Also important forthe phase rules is the projection of oralconstriction gestures onto separate Vowel andConsonant tiers, which are not shown here(Browman & Goldstein, 1987, 1988; see alsoKeating, 1985).

The use of point notation highlights theassociation lines, and, as we shall see in § 2.2.2, isuseful for the purpose of comparing gesturalorganizations with feature geometryrepresentations in which individual units lack anyextent in time. For the remainder of this section,however, we will be concerned with showing theextent of temporal overlap of gestures, andtherefore will employ the box notation form of thegestural score.

The information represented in the gesturalscore serves to identify a particular lexical entry.The basic elements in such a lexical unit are the

VEL

gestures, which, as we have already seen, cancontrast with one another by means of differingdescriptor values. In addition, gestural scores fordifferent lexical items can contrast in terms of thepresence vs. absence of particular gestures(Browman & Goldstein, 1986, in press; Goldstein& Browman, 1986). Notice that one implication oftaking gestures as basic units is that the resultinglexical representation is inherently under­specified, that is, it contains no specifications forirrelevant features. When a given articulator isnot involved in any specified gesture, it isattracted to a 'neutral' position specific to thatarticulator (Saltzman et al., 1988; Saltzman &Munhall, 1989).

The fact that each gesture has an extent in time,and therefore can overlap with other gestures, hasa variety of phonological and phoneticconsequences. Overlap between invariantlyspecified gestures can automatically generatecontextual variation of superficially differentsorts: (1) acoustic noninvariance, such as thedifferent formant transitions that result when aninvariant consonant gesture overlaps differentvowel gestures (Liberman & Mattingly, 1985); (2)allophonic variation, such as the nasalized vowelthat is produced by overlap between a syllable­final velic opening gesture and the vowel gesture(Krakow, 1989); and (3) various kinds of'coarticulation,' such as the context-dependentvocal tract shapes for reduced schwa vowels thatresult from overlap by the neighboring full vowels(Browman & Goldstein, 1989; Fowler, 1981). Here,however, we will focus on the implications ofdirectly representing overlap among phonologicalunits for the phonological/phonetic alternations influent speech.

[wide]

18

LIPS

GLO

r narrow JY LPharyngeal~

clo cloLabia . lablaJ

'"[wide]

'palm' [pam]

Figure 4. Gestural score for palm [pam] using point notation, with association lines added.

78 Browman and Goldstein

Browman and Goldstein (1987) have proposedthat the wide variety of differences that have beenobserved between the canonical pronunciation of aword and its pronunciation in fluent contexts (e.g.,Brown, 1977; Shockey, 1974) all result from twosimple kinds of changes to the gestural score: (1)reduction in the magnitude of individual gestures(in both time and space) and (2) increase inoverlap among gestures. That paper showed howthese two very general processes might accountfor variations that have traditionally beendescribed as segment deletions, insertions,assimilations and weakenings. The reason thatincreased overlap, in particular, can account forsuch different types of alternations is that thearticulatory and acoustic consequences ofincreased overlap will vary depending on thenature of the overlapping gestures. We willillustrate this using some of the examples ofdeletion and assimilation presented in Browmanand Goldstein (1987), and then compare thegestural account with the treatment of suchexamples in non-linear phonological theories thatdo not directly represent overlap amongphonological units.

Let us examine what happens when a gesturalscore is varied by increasing the overlap betweentwo oral constriction gestures, for example fromno overlap to complete synchrony. This 'sliding'will produce different consequences in thearticulatory and acoustic output of the model,depending on whether the gestures are on thesame or different articulatory tiers, i.e., whetherthey employ the same or different tract variables.If the gestures are on different articulatory tiers,as in the case of a LIPS closure and a tongue tip(TT) closure, then the resulting tract variablemotion for each gesture will be unaffected by theother concurrent gesture. Their tract variablegoals will be met, regardless of the amount ofoverlap. However, with sufficient overlap, onegesture may completely obscure the otheracoustically, rendering it inaudible. We refer tothis as gestural 'hiding.' In contrast, when twogestures are on the same articulatory tier, forexample, the tongue tip constriction gesturesassociated with ItJ and 18/, they cannot overlapwithout perturbing each others' tract variablemotions. The two gestures are in competition­they are attempting to do different tasks with theidentical articulatory structures. In this case, thedynamical parameters for the two overlappinggestural control regimes are 'blended' (Saltzmanet aI., 1988; Saltzman & Munhall, in press).

Browman and Goldstein (1987) presentedexamples of articulations in fluent speech thatshowed the hiding and blending behaviorpredicted by the model. Examples of hiding aretranscribed in (5).

(5) (a) Ip~f~kt 'mem~rri/ ~ [p~f~k'mem~ri]

(b) lsev'Q 'plAs/~ [seVI'Q' pJAs]

In (5a), careful listening to a speaker's productionof perfect memory produced in a fluent sentencecontext failed to reveal any audible It} at the endof perfect, although the ItJ was audible when thetwo words were produced as separate phrases in aword list. This deletion of the final ItJ is anexample of a general (variable) process in Englishthat deletes final ItJ or Id/ in clusters, particularlybefore initial obstruents (Guy, 1980). However,the articulatory data for the speaker examined inBrowman and Goldstein, (1987) showed thatnothing was actually deleted from a gesturalviewpoint. The alveolar closure gesture at the endof 'perfect' was produced in the fluent context,with much the same magnitude as when the twowords were produced in isolation, but it wascompletely overlapped by the constrictions of thepreceding velar closure and the following labialclosure. Thus, the alveolar closure gesture wasacoustically hidden. This increase in overlap isrepresented in Figure 5, which shows the (partial)gestural scores posited for the two versions of thisutterance, based on the observed articulatorymovements. The gestures for the first syllable ofmemory (shown as shaded boxes) are wellseparated from the gestures for the last syllable ofperfect (shown as unshaded boxes) in the word listversion in Figure 5a, but they slide earlier in timeas shown in Figure 5b, producing substantialoverlap among three closure gestures. Note thatthese three overlapping gestures are all onseparate tiers.

In (5b), seven plus shows an apparentassimilation, rather than deletion, and wasproduced when the phrase was produced at a fastrate. Assimilation of final alveolar stops andnasals to a following labial (or velar) stop is acommon connected speech process in English(Brown, 1977; Gimson, 1962). Here again,however, the articulatory data in Browman andGoldstein (1987) showed that the actual changewas 'hiding' due to increased overlap: the alveolarclosure gesture at the end of seven was stillproduced by the speaker in the 'assimilated'version, but it was hidden by the preceding labial fricative and following labial stop.

Articulatory Gestures as Phonological Units

'perfect memory'

79

TB

(a) TT

r-;-,~

LIPSr;itl~

[ f Q k t h

1:lii.ill!~I!:ijll

m e

l:i:::i~~~~t~::"1

m ]

TB

(b) TT

LIPS

f

1111:lil!~~I~:iil

Q k m e

1:li:i:I~1111:11

m]

Figure 5. Partial gestural score for two versions of perfect memory, posited from observed articulator movements. Lastsyllable of perfect shown in unshaded boxes; first syllable of memory shown in shaded boxes. Only oral tiers areshown. (a) Words spoken in word list; (b) words spoken as part of fluent phrase.

The changes in the posited gestural score can beseen in Figure 6. Because the velum loweringgesture (VEL [wide]) at the end of seven in Figure6b overlaps the labial closure, the hidingisperceived as assimilation rather than deletion. Evidence for such hid-den gestures (in somecases having reduced magnitude) has also beenprovided by a number of electropalatographic studies, where they have been taken asevidence of 'partial assimilation' (Barry, 1985;Hardcastle & Roach, 1979; Kohler, 1976 (forGerman». Thus, from a gestural point of view,deletion (5a) and assimilation (5b) of finalalveolars may involve exactly the same

process-increase of overlap resulting in a hiddengesture.

When overlap is increased between two gestureson the same articulatory tier, rather than ondifferent articulatory tiers as in the aboveexamples, the increased overlap results inblending between the dynamical parameters ofthe two gestures rather than hiding. Thetrajectories of the tract variables shared by thetwo gestures are affected by the differing amountsof overlap. Evidence for such blending can be seenin examples like (6).

(6) /ten 8imz/ ~ [teQ8imz]

80 Browman and Goldstein

, seven plus'

VEL I wide

la) TT

LIPS

GLO

~~

r;,n~

I~~=~1::::'I!II'::11

filII

tI1i:I~~i':'~::!'!:1

[ v n, p s ]

VEL EJ

(b) TT

LIPS

GLO

Im!~~l

[ v ~ P I A s ]

Figure 6. Partial gestural score for two versions of seven plus, posited from observed articulator movements. Lastsyllable of seven shown in unshaded boxes; plus shown in shaded boxes. The starred [alveolar dol gesture indicatesthat laterality is not represented in these scores. (a) Spoken at slow rate; (b) spoken at fast rate.

The apparent assimilation in this case, as we]] asin many other cases that involve conflictingrequirements for the same tract variables, hasbeen characterized by Catford (1977) as involvingan accommodation between the two units. Thiskind of accommodation is exactly what ispredicted by parameter blending, assuming thatthe same underlying mechanism of increasedgestural overlap occurs here as in the examples in(5). The (partial) gestural scores hypothesized for(6), showing the increase in overlap, are displayedin Figure 7. The hypothesis of gestural overlap,and consequent blending, makes an specificprediction: the observed motion of the TT tract

variables resulting from overlap and blendingshould differ from the motion exhibited by eitherof individual gestures alone. In particular, thelocation of the constriction should not be identicalto that of either an alveolar or a dental, but rathershould fa]] somewhere in between. If thisprediction is confirmed, then three (superficia]]y)different fluent speech processes-deletion offinalalveolar stops in clusters, assimilation of finalalveolar stops and nasals to fo]]owing labials andvelars, and assimilation of final alveolar stops toother tongue tip consonants---ean all be accountedfor as the consequence of increasing overlapbetween gestures in fluent speech.

Articulatory Gestures as Phonological Units

'ten themes'

81

TB

(a) TT

LIPS

widepalatal ~'·.iii~·[·i·!ii·::·i··:i·i:i·i:::ii:i:.::·:.i:i::i:::i!·ii·:.:·!::,·:::ii·i::1

L.-_-' li::II~!~:::1 r:ii,~i~11

1::'::114'1·::'11

[ t n 8 i m z

TB

(b) TT

LIPS

[ t e IJ 8 i m z ]

Figure 7. Hypothesized gestured score for two versions of ten themes. ten shown in unshaded boxes; themes shown inshaded boxes. Only oral tiers are shown. (a) Spoken with pause between words; (b) spoken as part of fluent phrase.

How does the gestural analysis of these fluentspeech alternations compare with analysesproposed by other theories of non-linearphonology? Assimilations such as those in (6) havebeen analyzed by Clements (1985) as resultingfrom a rule that operates on sequences of alveolarstops or nasals followed by [+coronal] consonants.The rule, whose effect is shown in (7), delinks theplace node of the first segment and associates theplace node of the second segment to the first (byspreading).

[- cont] [+ cons]

1 Ir·············.....!

(7) Manner tier

Supralaryngeal tier

Place tier[

+ cor]- ant

[+ cor]

Since the delinked features are assumed to bedeleted, by convention, and not realizedphonetically, the analysis in (7) predicts that theplace of articulation of the assimilated sequenceshould be indistinguishable from that of thesecond consonant when produced alone. Thisclaim differs from that made by the blendinganalysis, which predicts that the assimilatedsequence should show the influence of bothconsonants. These conflicting predictions can bedirectly tested.

The overlap analysis accounts for a wider rangeof phenomena than just the assimihitions in (6).The deletion of final alveolars (in clusters) and theassimilation offinal alveolars to following stops in(5) were also shown to be cases of hiding due toincreased gestural overlap. Clement's analysisdoes not handle these additional cases, and cannotbe extended to do so without major reinter­pretation of autosegmental formalisms. To see

(8) ROOT ROOTr oont] ionSI

Supralaryngeal Supralaryngeal

I IPlace Place

r·····················.. ·· .J

However, the example in (6) could not be handledin this way. In Sagey's framework, complexsegments can only be created in case there aredifferent articulator nodes, whereas in (6), thesame articulator is involved. Thus, an analysis inSagey's framework will not treat the examples in(5) and (6) as resulting from a single underlyingprocess. If the specific prediction made by theoverlap and blending analysis for cases like (6)proves correct, this would be evidence that aunitary process (overlap) is indeed involved-aunity that is directly captured in the overlappinggesture approach, but not in Sagey's framework.

Finally we note that, in general, the gesturalapproach gives a more constrained andexplanatory account of the casual speech changes.

that this is the case, suppose Clements' analysis isextended by eliminating the [+coronal]requirement on the second segment (for fluentspeech). This would produce an assimilation for acase like (5b), but it would not be consistent withthe data showing that the alveolar closure gestureis, in fact, still produced. To interpret a delinkedgesture as one that is articulatorily produced, butauditorily hidden, would require a major changein the assumptions ofthe framework.

Within the framework ofSagey (1986), the casesof hiding in (5) could be handled, but the analysiswould fail to capture the parallelism betweenthese cases and the blending case in (6). InSagey's framework, there are separate class nodesfor each of the independent oral articulators,corresponding to the articulatory tiers. Thus, anassimilation like that in (5b) could be handled asin (8): the labial node of the second segment isspread to the preceding segment's place node,effectively creating a 'complex' segment involvingtwo articulations.

82

Coronal Labial

Browman and Goldstein

All changes are hypothesized to result from twosimple mechanisms, which are intrinsicallyrelated to the talker's goals of speed and fluency­reduce the size of individual gestures, andincrease their overlap. The detailed changes thatemerge from these processes are epiphenomenalconsequences of the 'blind' application of theseprinciples, and they are explicitly generated assuch by our model. Moreover, the gesturalapproach makes predictions about the kinds offluent speech variation expected in otherlanguages. Given differences between languagesin the canonical gestural scores for lexical items(due to language-specific phasing principles), thesame casual speech processes are predicted tohave different consequences in differentlanguages. For example, in a language such asGeorgian in which stops in the canonical form arealways released, word-finally and in clusters(Anderson, 1974), the canonical gestural scoreshould show less overlap between neighboringstops than is the case in English. The gesturalmodel predicts that overlap between gestures willincrease in casual speech. But in a language suchas Georgian, an increase of the same magnitudeas in English would not be sufficient to causehiding. Thus, no casual speech assimilations anddeletions would be predicted for such a language,at least when the increase of gestural overlap isthe same magnitude as for English.

The usefulness of internal duration and overlapamong phonological elements has begun to berecognized by phonologists (Hammond, 1988;Sagey, 1988). For example, Sagey (1986, pp. 20­21) has argued that phonological association lines,which serve to link the features on separate tiers,'represent the relation of overlap in time... Thus...the elements that they link...must have internalduration.' However, in nongestural phonologies,the consequences of such overlap cannot beevaluated without additional principles specifyinghow overlapping units interact. In contrast, in agestural phonology, the nature of the interactionis implicit in the definition of the gesturesthemselves, as dynamical control regimes for a(model) physical system. Thus, one of the virtuesof the gestural approach is that the consequencesof overlap are tightly constrained and madeexplicit in terms of a physical model. Explicitpredictions (e.g., about the relation between theconstrictions formed by the tongue tip in themesand in ten themes, as well as about languagedifferences) can be made that test hypothesesabout phonological structures.

Articulatory Gestures as Phonological Units 83

In summary, a gestural phonology characterizesthe movements of the vocal tract articulatorsduring speech in terms of a minimal set of discretegestural units and patterns of spatiotemporalorganization among those units, using dynamicand articulatory models that make explicit thearticulatory and acoustic consequences of aparticular gestural organization (Le., gesturalscore). We have argued that a number ofphonological properties of utterances are inherentin these explicit gestural constellations, and thusdo not require postulation of additionalphonological structure. Distinctiveness (Browrnan& Goldstein, 1986, in press) and syllable structure(Browrnan & Goldstein, 1988) can both be seen ingestural scores. In addition, a number ofpostlexical phonological alternations (Browman &Goldstein, 1987) can be better described in termsof gestures and changes in their organization(overlap) than in terms of other kinds ofrepresentations.

2 RELATION BETWEEN GESTURALSTRUCTURES AND FEATURE

GEOMETRYIn the remainder of this paper, we look more

closely at the relation between gestural structuresand recent proposals of feature geometry. Thecomparison shows that there is much overallsimilarity and compatibility between featuregeometry and the geometry of phonologicalgestures (§ 2.1). Nevertheless, there are somedifferences. Most importantly, we show that thegesture is a cohesive unit, that is, a coordinatedaction of a set of articulators, moving to achieve aconstriction that is specified with respect to itsdegree as well as its location. We argue that thegesture, and gestural scores, could usefully beincorporated into feature geometry (§ 2.2). Thegestural treatment of constriction degree as partof a gestural unit leads, however, to an apparentdisparity with how manner features are currentlyhandled in feature geometry. We conclude,therefore (§ 3) by proposing a hierarchical tubegeometry that resolves this apparent disparity,and that also, we argue, clarifies the nature ofmanner features and how they should be treatedwithin feature geometry, or any phonologicalapproach.

2.1 Articulatory geometry and gesturaldescriptors

In this section, the details of the gesturaldescriptor structures-the distinctive categories of

the tract variable parameters-are laid out. Manyaspects of these structures are rooted in longtradition. Jespersen (1914), for example,suggested an 'analphabetic' system of specifyingarticulatory place along the upper tract, thearticulatory organ involved, and the degree ofconstriction. Pike (1943), in a more elaboratedanalphabetic system, included variables forimpressionistic characterization of thearticulatory movements, including 'crests,''troughs,' and 'glides' (movements between crestsand troughs). More recently, various authors (e.g.,Campbell, 1974; Halle, 1982; Sagey, 1986;Venneman & Ladefoged, 1973) have argued thatphonological patterns are often formed on thebasis of the moving articulator used (thearticulator set, in the current system). Thegestural structures described in this paper differfrom these accounts primarily in the explicitnessof the functional organization of the articulatorsinto articulatory gestures, and in the use of adynamic model to characterize the coordinatedmovements among the articulators.

The articulatory explicitness of the gesturalapproach leads to a clear-cut distinction betweenfeatures of input and features of output. That is, afeature such as 'sonority' has very little to do witharticulation, and a great dealto do with acoustics(Ladefoged, 1988a,b). This difference can becaptured by contrasting the input to the speechproduction mechanism-the individual gestures inthe lexical entry-and the output-th earticulatory, aerodynamic and acousticconsequences of combining several gestures indifferent parts of the vocal tract. The gesturaldescriptors, then, characterize the inputmechanism-they are the 'features' of a purelyarticulatory phonology. Most traditional featuresystems, however, represent a conflation ofarticulatory and acoustic properties. In order toavoid confusion with such combined featuresystems, and in order to emphasize that gesturaldescriptors are purely articulatory, we retain thenon-standard terminology developed in thecomputational model for the category names of thegestural descriptor values.

In §2.1.1, we discuss the descriptorscorresponding to the articulator sets and showhow they are embedded in a hierarchicalarticulatory geometry, comparable to recentproposals of feature geometry. In §§ 2.1.2-5, weexamine the other descriptors in turn,demonstrating that they can be used to definenatural classes, and showing how the differences

84 Browman and Goldstein

Figure 8. Articulatory geometry tree.

The importance of an articulatory geometry hasrepeatedly been noted, for example in the use ofarticulatory subsystems by Abercrombie (1967)and Anderson (1974). More recently, it has formedthe basis of a number of proposals of feature

between these descriptors and other featuresystems stem from their strictly articulatoryand/or dynamic status.

2.1.1 Anatomical hierarchy and articulatorsets

The gestural descriptors listed in Figure 2 arenot hierarchically organized. That is, eachdescriptor occupies a separate dimensiondescribing the movement of a gesture. However,there is an implicit hierarchy for the sets ofarticulators involved. This can be seen in Figure 8,which redisplays (most of) the inventory ofarticulator sets and associated parameterdimensions from Figure 2, and adds a grouping of?estures into a hierarchy based on articulatorymdependence. Because the gestures characterizemovements within the vocal tract, they areeffectively organized by the anatomy of the vocaltract. TT and TB gestures share the individualarticulators of tongue body and jaw, so they definea class of Tongue gestures. Similarly, both LIPSand Tongue gestures use the jaw, and arecombined in the class of Oral gestures. Finally,Oral, velie (VEL), and glottal (GLO) constituterelatively independent subsystems that combineto form a description of the overall Vocal Tract.This anatomical hierarchy constitutes, in effect,an articulatory geometry.

geometry (Clements, 1985; Ladefoged, 1988a,b;Ladefoged & Halle, 1988; McCarthy, 1988; Sagey,1986). The hierarchy of the moving articulators iscentral to all these proposals of featuralorganization, although the evidence discussedmay be phonological patterns rather thanphysiological structure. Leaving aside mannerfeatures (to be discussed in §§ 2.2.1 and 3), thevarious hierarchies differ primarily in theinclusion of a Supralaryngeal node in the featuregeometries of Clements (1985) and Sagey (1986),and a Tongue node in the articulatory geometrydiagrammed in Figure 8.

There is no Supralaryngeal node included in thearticulatory geometry of Figure 8, because thisgeometry effectively organizes the vocal tractinput. As we will argue in § 3, the Supralaryngealnode is important in characterizing the output ofthe vocal tract. However, using the criterion ofarticulatory independence, we see no anatomicalreason to combine any of the three majors~bsystems into a higher node in the inputhIerarchy. Rather than being part of a universalgeo.metry, further combinations of the laryngeal,vehc, and Oral subsystems into class nodes suchas Supralaryngeal, or Central (Oral) vs.~eripheral (velie and laryngeal), should beInvoked where necessitated by language­particular organization. For example, the central­peripheral distinction was argued to exist for TobaBatak (Hayes, 1986).

The Tongue node in Figure 8 is proposed onanatomical grounds, again using the criterion ofarticulatory independence (and relatedness). Thatis, the TT articulator set shares two articulatorswith the TB articulator set-the tongue body and~he jaw. Put another way, the tongue is anmtegral structure that is clearly separate from thelips. There is some evidence for this node inphonological patterns as well. A number of.articulations made with the tongue cannot beclearly categorized as being made with either TTor TB. For example, laterals in Kuman (PapuaNew Guinea) alternate between velars andcoronals (Lynch, 1983, cited in McCarthy, 1988).They are always Tongue articulations but are not. 'categorIzable as exclusively TT or TB. Rather,they require some reference to both articulatorsets. This is also the case for English lateralswhich can alternate between syllable-initiaicoronals and syllable-final velars (Ladefoged1982). '

Palatal and palatoalveolar consonants areanother type of articulation that falls between TTand TB articulations (Keating, 1988; Ladefoged &

[¥.J[¥.J

Gestures:

[CD ]

LO CLstiffness

VocalTract

Articulatory Gestures as Phonological Units 85

Maddieson, 1986; Recasens, in preparation).Keating (1988) suggests that the intermediatestatus of palatals-partly TT and partly TB-canbe handled by treating palatal consonants ascomplex segments, represented under both thetongue body and tongue tip/blade nodes. However,without the higher level Tongue node, such arepresentation equates complex segments such aslabio-velars, consisting of articulations of twoseparate articulators (lips and tongue), withpalatals, arguably a single articulation of a singlepredorsal region of the tongue (Recasens, inpreparation). With the inclusion of the Tonguenode, labio-velars and palatals would be similar inbeing double Oral articulations, but different inbeing LIPS plus Tongue articulations vs. twoTongue articulations. Thus, the closeness of thearticulations is reflected in the level of the nodes.

This evidence is suggestive, but not conclusive,on the role of the Tongue node in phonologicalpatterns. We predict that more evidence ofphonological patterns based on the anatomicalinterdependence of the parts of the tongue shouldexist. One type of evidence should result from thefact that one portion of the tongue cannot movecompletely independently of the other portions.This lack of independence can be seen, forexample, in the suggestion that the tongue bodymay have a characteristically more backed shapefor consonants using the tip/blade in dentals asopposed to alveolars (Ladefoged & Maddieson1986; Stevens et aI., 1986). We would expect tofind blocking rules based on such considerations.

2.1.2 Constriction degree (CD)CD is the analog within the gestural approach of

the manner feature(s). However, it is crucial tonote that, unlike the manner classes in featuregeometry, CD is first and foremost an attribute ofthe gesture-the constriction made by the movingset of articulators-and therefore, at the gesturallevel, is solely an articulatory characterization.(This point will be elaborated in § 2.2.1). In ourmodel, CD is a continuum divided into thefollowing discrete ranges: [closed], [critical),[narrow], [mid], and [wide] (Ladefoged, 1988b,refers to such a partitioning of a continuum as an'ordered set' of values). The two most closedcategories correspond approximately to acousticstops and fricatives; the names used for thesecategories indicate that these values arearticulatory rather than acoustic. Thus, thesecond degree of constriction, labelled [critical),indicates that critical degree of constriction for agesture at which some particular aerodynamicconsequences could obtain if there were

appropriate air flow and muscular tension. Thatis, the critical constriction value permits frication(turbulence) or voicing, depending on the set ofarticulators involved (oral or laryngeal). Similarly,[closed] refers to a tight articulatory closure forthat particular gesture; the overall state of thevocal tract might cause this closure to be,acoustically, either a stop or a sonorant. This, inturn, will be determined by the combined effects ofthe concurrently active gestures. § 3 will discussin greater detail how we account for naturalclasses that depend on the consequences ofcombining gestures.

The categorical distinctions among [closed],[critical), and the wider values (as a group) areclearly based on quantal articulatory-acousticrelations (Stevens, 1972). The basis for thedistinction among wider values is not as easy tofind in articulatory-acoustic relations, although[narrow] might be identified with Catford's (1977)[approximant] category. Catford is also careful todefine 'articulatory stricture' in solely articulatoryterms. He defines [approximant] as a constrictionjust wide enough to yield turbulent flow underhigh airflow conditions such as in open glottis forvoicelessness, but laminar flow under low airflowconditions such as in voicing. The otherdescriptors are required to distinguish amongvowels. For example, contrasts among frontvowels differing in height are represented as[palatal) constriction locations (cf. Wood, 1982)with [narrow], [mid], or [wide] CDs, where thesecategories might be established on the basis ofsufficient perceptual and articulatory contrast inthe vowel system (Lindblom, 1986; Lindblom etat, 1983). If additional differentiation is required,values are combined to indicate intermediatevalues (e.g., [narrow mid]). In addition, [wide] vs.[narrow] can be used to distinguish the size ofglottal aperture-the CD for GLO-associatedwith aspirated and unaspirated stops,respectively.

2.1.3 Constriction location (CL)Unlike CD, which differs from its featural

analog of manner by being articulatory ratherthan acoustic, both CL and its featural analog ofplace are articulatory in definition. However, CLdiffers from place features in not beinghierarchically related to the articulator set. Thatis, the set of articulators moving to make aconstriction, and the location of that constriction,are two independent (albeit highly related)dimensions of a gesture. Thus, we use the label'Oral' rather than 'place' in the articulatoryhierarchy, not only to emphasize the anatomical

86 Browman and Goldstein

TT

dental

linguo-labial

Articulator set

LI

bilabial

dental labio-dental

labialCL

(9)

discussed in Maddieson 1987). (A similar matrix ispresented in Ladefoged & Maddieson, 1986, butnot as part of a formal representation.) That is,the [labial] constriction location cannot beassociated exclusively with the LIPS articulatorset. Similarly, the [dental] constriction locationcannot be associated exclusively with the TTarticulator set. Thus, we view CL as anindependent cross-classifying descriptordimension whose values cannot be hierarchicallysubsumed uniquely under particular articulatorsets.

We use multivalued (rather than binary-valued)CL descriptors, following Ladefoged (1988b), sincethe CL descriptor values correspond to categoricalranges of the continuous dynamic parameter. Inthe front of the mouth, the basis for the discreteranges of CL presumably involve the actualdifferentiated anatomical landmarks, so thatparameter values are tuned with respect to them.There may, in addition, be relatively invariantauditory properties associated with the differentlocations (Stevens, 1989). For the categories thatare further back (palatal and beyond), Wood(1982) hypothesizes that the distinct CLs emergefrom the alignment of Stevens' quantalconsiderations with the positioning possibilitiesallowed by the tongue musculature. To the extentthat this set of descriptor values is too limited, itcan, again, be extended by combining descriptors,e.g., [palatal velar], or by using 'retracted' and'advanced' (Principles of the IPA, 1949).

2.1.4 Constriction shape (CS)

In some cases, gestures involving the samearticulator sets and the same values of CL and CDmay differ in the shape of the constriction, aslooked at in the frontal, rather than sagittal,plane. For example, constrictions involving TTgestures may differ as to whether they are formedwith the actual tip or blade of the tongue, theshape of the constriction being 'wider' in the thirddimension if produced with the blade. Theimportance of this difference has been built intorecent feature systems as apical vs. laminal(Ladefoged, 1988a), or [distributed] (Sagey, 1986).

Constriction LocationsArticulator~

TR * op pharyngeal

uvular

TBvelar

palata

post-alveolar

TTalveolar

LIPS ----004

nature of the hierarchy, but also to avoid theconflation of moving articulator and location alongthe tract that 'place' conveys.

The constriction location refers to the locationon the upper or back wall of the vocal tract wherea gestural constriction occurs, and thus isseparate from, but constrained by, the articulatorset moving to make the constriction. Figure 9expands the articulator sets, the CL descriptorvalues and the possible relations between them:the locations where a given set of articulators canform a constriction. Notice that each articulatorset maps onto a subset of the possible constrictionlocations (rather than all possible CLs), where thesubset is determined by anatomical possibility.Notice also that there is a non-unique mappingbetween articulator sets and CL values. Forexample, both TT and TB can form a constrictionat the hard palate; indeed, it may be possible forTB to form a constriction even further forward inthe mouth. Thus constriction location cannot besubsumed under the moving articulator hierarchy,contrary to the proposals by Ladefoged (1988a)and Sagey (1986), among others.

Figure 9. Possible mappings between articulator sets andconstriction locations.

This can be seen most clearly in the case of the[labial] and [dental] locations, where either theLIPS or TT articulator sets can form constrictions,as shown in (9) (the unusual linguo-labials are

Articulatory Gestures as Phonological Units 87

Some method for controlling such differencesneeds to be built into our computational model. Anadditional TT tract variable (TTR) that specifiesthe orientation (angle) of the tongue tip in thesagittal plane with respect to the CL and CD axesis currently being incorporated into tire taskdynamic model. It is possible that differentsettings of this variable will be able to produce therequired apicalllaminal differences, as well asallowing the sublaminal contact involved inextreme retroflex stops (Ladefoged & Maddieson,1986).

For TB gestures, an additional tract variable isalso required to control cross-sectional shape. Oneof the relevant shape differences involves theproduction of laterals, in which at least one of thesides of the tongue does not make firm contactwith the molars. Ladefoged (1980) suggests thatthe articulatory maneuver involved is a narrowingof the tongue volume, so that it is pulled awayfrom the sides of the mouth. Such narrowing is anattractive option for an additional TB shapingtract variable. In this proposal, an alveolar lateralwould essentially involve two gestures: a TTclosure, and a TB gesture with a [narrowed] valuefor CS and perhaps defaults for CL and CD. (Somefurther specification of TBCD would clearly berequired for lateral fricatives, as opposed tolateral approximants). In this sense, lateralswould be complex constellations of gestures, assuggested by Ladefoged and Maddieson (1986),and similar to the proposal of Keating (1988) fortreating palatals as complex segments. Anotherrole for a TB shaping tract variable might involvebunching for rhotics (Ladefoged, 1980). Finally, itis unclear whether an additional shape parameteris required for LIPS gestures. It might be requiredto describe differences between the two kinds ofrounding observed in Swedish (e.g., Lindau, 1978;Linker, 1982). On the other hand, given properconstraints on lip shape (Abry & Boe, 1986), itmay be possible to produce all the required lipshapes with just LA and LP.

2.1.5 Dynamic descriptors

The stiffness (k) of a gesture is a dynamicalparameter, inferred from articulatory motions,that has been shown to vary as a function ofgestural CD, stress and speaking rate (Browman& Goldstein, 1985; Kelso et aI., 1985; Ostry &Munhall, 1985). In addition, however, wehypothesize that stiffness may be tunedindependently, so that it can serve as the primarydistinction between two gestures. Ij/ and Iwl aretwo cases in which gestural stiffness as an

additional independent parameter may bespecified. Ij/ is a TB [narrow palatal] gesture, andIwl is a complex formed by a TB [narrow velar]gesture and a LIPS [narrow protruded] gesture.Our current hypothesis is that these gestureshave the same CD and CL as for thecorresponding vowels (Iii and lui), but that theydiffer in having an [increased] value of stiffness.This is similar to the articulatory description ofglides in Catford (1977). Finally, it is possible thatthe stiffness value that governs the rate ofmovement into a constriction is related to theactual biomechanical stiffness of the tissuesinvolved. If so, it would be relevant to thosegestures that, in fact, require a specific muscularstiffness: trills and taps likely involvecharacteristic values of oral gesture stiffness, andpitch control and certain phonation types requirespecified vocal fold stiffnesses (Halle & Stevens,1971; Ladefoged, 1988a).

2.2 The representation of phonologicalunits

In § 2.1, we laid out the details of gesturaldescriptors and how they are organized by anarticulatory geometry. Setting aside for themoment the critical aspects of gestures discussedin § 1.3 (internal duration and overlap), thedifferences between feature geometry and theorganization of gestural descriptors have so farbeen comparatively minor-primarily theproposed Tongue node and the non-hierarchicalrelation between constriction location and themoving articulators. In this section, we considertwo ways in which a gestural analysis suggests adifferent organization of phonological structurefrom that proposed by the feature geometriesreferred to in § 2.1.1. First, we present evidencethat gestures function as cohesive units ofphonological structure (§ 2.2.1). Second, wediscuss the advantages of the gestural score as aphonological notation (§ 2.2.2).

2.2.1 The gesture as a unit in phonologicalpatterns

In Figure 8, gestures are in effect the terminalnodes of a feature tree. Notice that theconstriction degree, as well as constrictionlocation, constriction shape and stiffness, is part ofthe descriptor bundle. That is, constriction degreeis specified directly at the articulator node-it isone dimension of gestural movement. Looked at interms of the gestural score (e.g., Figure 3),successive units on a given oral tier containspecifications for both constriction location and

88 Browman and Goldstein

degree of that articulator set. This positioning ofCD differs from that proposed in current featuregeometries. While CL and CS (or their analogs)are typically considered to be dependents of thearticulator nodes, CD (or its analogs) is not.Rather, some of the closest analogs to CD­[stricture], [continuant], and [sonorant]-areusually associated with higher levels, either theSupralaryngeal node (Clements, 1985) or the Rootnode (Ladefoged, 1988a,b; Ladefoged & Halle,1988; McCarthy, 1988; Sagey, 1986). Is there anyevidence, then, that the unit of the gesture plays arole in phonological organization? Within theapproach of feature geometry, such evidencewould consist, for example, of rules in which anarticulator set and the degree and location of theconstriction it forms either spread or deletetogether, as a unitary whole.

It is in fact generally assumed, implicitlyalthough not explicitly, that velic and glottalfeatures have this type of unitary gesturalorganization. That is, features such as [+nasal]and [-voice] combine the constriction degree (wide)along with the articulator set (velum or glottis).Assuming a default specification for these features([-nasal] and [+voice]), then denasalizationconsists of the deletion of a nasal gesture, andintervocalic voicing of the deletion of a laryngealgesture. Additional implicit use of a gestural unitcan be found in Sagey's (1986) proposal that[round] be subordinate to the Labial node. This isexactly the organization that a gestural analysissuggests, since [round] is effectively a specificationof the degree and nature of the constriction of thelips.

It is in the case of primary oral gestures thatproposals positioning CD at the gestural level andthose positioning it at higher levels contrast mostsharply. The inherent connection among all thecomponent aspects of making a constriction, or inother words, the unitary nature of the gesture, canbe seen mostly clearly when oral gestures aredeleted, a phenomenon sometimes described asdelinking of the Place (or Supralaryngeal) node­debuccalization. In a gestural analysis, when themovement of an articulator to make a constrictionis deleted, everything about that constriction isdeleted, including the constriction degree.

For example, Thrainsson (1978, cited inClements & Keyser, 1983) demonstrates that inIcelandic, the productive phenomenon wherebythe first of a sequence of two identical voicelessaspirated stops is replaced by [h], consists ofdeleting the first set of supralaryngeal features.

This set of supralaryngeal features corresponds tothe unit of an oral gesture; it includes theconstriction degree, whether described as [clo],[stop] or [-continuant]. Thus, in this example, theentire oral gesture is deleted.

Another example is cited in McCarthy (in press),using data from Straight (1976) and Lombardi(1987). In homorganic stop clusters and affricates(with an intervening word boundary) in YucatecMaya, the oral portion of the initial stop isdeleted-for example, lk#kI -+ Ih#k/, and(affricate) /ts#tI -+ /s#tI. Again in this case theconstriction degree is deleted along with the place,supporting a gestural analysis: the entire oralclosure gesture is deleted. Note that McCarthyeffectively supports this analysis, in spite of hispositing [continuant] as dependent on the Rootnode, when he says 'a stop becomes a segmentwith no value for [continuant], which isincompatible with supraglottal articulation.'

Thus, there is some clear evidence inphonological patterning for the association of CDwith the articulator node. This aspect of gesturalorganization is totally compatible with the basicapproach offeature geometry, requiring only thatCD be linked to the articulator node rather thanto a higher level node. In § 3, we will show howthis gestural affiliation of CD is also compatiblewith phonological examples in which CD isseemingly separable from the gesture-where it is'left behind' when a particular articulator-CLcombination is deleted, or where it appears not tospread along with the articulatory set and CL. Fornow, we turn to a second aspect of the gesturalapproach that could usefully be incorporated intofeature geometry, this one involving the use ofthegestural score as a two-dimensional projection ofthe inherently three-dimensional phonologicalrepresentation.

2.2.2 Gestural scores, articulatorygeometry, and phonological units

The gestural score, in particular the 'point' formin Figure 4, is topologically similar to a nonlinearphonological representation. When combined withthe hierarchical articulatory geometry, thegestural score captures several relevantdimensions of a nonlinear representationsimultaneously, in a clear and revealing fashion.Figure 10 shows a gestural score for palm, usingpoint notation and association Hnes, combinedwith the articulatory geometry on the left. Notethat the geometry tree is represented on its side,rather than the more usual up-down orientation.

Articulatory Gestures as Phonological Units 89

VT~-

VEL

GLO

[wide]

~narrow l

pharyngea:J

do c~[ labial ""[ labial]

"[wide]

'palm'[pam]

Figure 10. Gestural score for palm [paml in point notation, with articulatOl)' geometry.

This seemingly trivial point in fact is a very usefulconsequence of using gestural scores as phono­logical notation, as we shall see.It permits spatialorganization such as the articulatory geometry,which is represented on the vertical axis, to beseparated from temporal information includingsequencing of phonological units, which isrepresented on the horizontal axis. Such aseparation is particularly useful in those instancesin which the gestures in a phonological unit arenot simultaneous.

For example, prenasalized stops are singlephonological units consisting of a sequence ofnasal specifications. Figure lla depicts a gesturalscore for prenasalized [mb], which contains aclosure gesture on the LIPS tier associated with asequence of two gestures on the VEL tier, one withCD = [wide] followed by one with CD = [clo].Double articulations also constitute a singlephonological unit. Figure llb displays a gesturalscore for [Q6], which consists of a [clo labial]gesture on the LIPS tier associated with a [clovelar] gesture on the TB tier. Sagey (1986) termsthe first type of unit a contour segment, and thesecond type a complex segment, a terminologythat is an apt description of the two figures.

Compare the representations in Figures lla andllb to those in Figures llc and lld, which areSagey's (1986) feature geometry specifications ofthe same two phonological units. In the featuregeometry representation, the two types ofsegments appear to be equally sequential, orequally non-sequential. This is a consequence-an

unfortunate consequence-of conflating two usesof branching notation, that of indicating (order­free) hierachical information and that ofindicating sequencing information. Thus, inFigure lld (on the right), the branching linesrepresent order-free branching in a hierarchicaltree. In Figure llc, however, the branching linesrepresent the associations between two orderedelements and a single node on another tier. Thisconflation results from the particular choice ofhow to project an inherently three-dimensionalphonological structure (feature hierarchy xphonological unit constituency x time) onto twodimensions.'In the gestural score, the two-dimensional

projection avoids this conflation of the two uses ofbranching notation. Here nodes always representgestures and lines are always association (orphasing) lines. Thus any branching, as in Figurella, always indicates temporal information aboutsequencing. The hierarchical information aboutthe articulatory geometry is present in theorganization of the articulatory tiers, and is thusrepresented (once) by a sideways tree all the wayat the left of the gestural score (as in Figure 10).In this way, the gestural score retains the virtuesof earlier forms of phonological notation in whichsequencing information was clearlydistinguishable, while also providing the benefitsof the spatial geometry that organizes the tiershierarchically. It would, of course, be possible toadopt this kind of representation within featuregeometry.

90 Browman and Goldstein

Finally, the reader may have observed that, indiscussions to now, constituency in phonologicalunits has been indicated solely by usingassociation lines between gestures. In particular,there has been no separate representation ofprosodic phonological structure. This is not aninherent aspect of a phonology of articulatorygestures; rather, it is the result of our currentresearch strategy, which is to see how muchstructure inheres directly in the relations among

gestures, without recourse to higher level nodes.However, once again, it is possible to integrate thegestural score with other types of phonologicalrepresentation. To exemplify how the gesturalscore can be integrated with explicit phonologicalstructure, Figure 12 displays a mapping betweenthe gestural score for palm and a simplifiedversion of the prosodic structure of Selkirk (1988),with the articulatory geometry indicated on theleft.

(a)

LIPS

VEL

r CIOJUabia

1\[wldel [clol

(b)

LIPS

TB

[ Cia]labial

I[ cia J

velar

(c)

roolI

lra

i\[+nasal] [·nasal]

prenasallzed SlOP [mli]COntour segmenl

(eI)

roolI

supraI

Adorsal labial

lablovelar SlOP~Complex segmenl

Figure 11. Comparison of gestural score and feature geometry representations for contour and complex segments. (a)Gestural score for contour segment [riiD)i (b) gestural score for complex segment [gD]i (c) feature geometry for contoursegment [riiD); (d) feature geometry for complex segment [gD].

Syllable Tier

Mora tier

RootTler •

Syl

VT

1\ ~\ [~id'l

[ narrow ~pharyngeal

['il.~aIJ [ la1lYaJ

[wida)

'palm' [pam]

Figure 12. Gestural score for palm [pam] in point notation, mapped on to prosodic structure, with articulatory hierarchyon left.

Articulatory Gestures as Phonological Units 91

In short, although gestures and gestural scoresoriginated in a description of articulatormovement patterns, they nevertheless provideconstructs that are useful for phonologicalrepresentation. Gestures constitute anorganization (combining a moving articulator setwith the degree and location of the constrictionformed) that can act as a phonological unit.Gestural scores provide a useful notation fornonlinear phonological representations, one thatpermits phonological constituency to be expressedin the same representation with 'phonetic' orderinformation, which is indicated by the relativepositions of the gestures in the temporal matrix.The gestures can be grouped into higher-levelunits, using either association lines or a mappingonto prosodic structure, regardless of thesimultaneity, sequentiality, or partial overlap ofthe gestures. In addition, the articulatorygeometry organizing the tiers can be expressed inthe same representation.

3 TUBE GEOMETRY ANDCONSTRICTION DEGREE HIERARCHY

So far, we have been treating individualgestures as the terminal nodes of an anatomicalhierarchy. From this perspective, articulatorygeometry serves as a way of creating naturalclasses of gestures on anatomical grounds. In thissection, we turn to the vocal tract as a whole, andconsider how additional natural classes emergefrom the combined effect (articulatory,aerodynamic and acoustic consequences) of a set ofconcurrent gestures. We argue that these naturalclasses can best be characterized using ahierarchy for constriction degree within the vocaltract based on tube geometry.

Rather than viewing the vocal tract as a set ofarticulators, organized by the anatomy, tubegeometry views it as a set of tubes, connectedeither in series or in parallel. The sets ofarticulators move within these tubes, creatingconstrictions. In other words, articulatorygestures occur within the individual tubes. Morethan one gesture may be simultaneously active;together, these gestures interact to determine theoverall aerodynamic and acoustic output of theentire linked set of tubes. Similarities in the tubeconsequences of a set of different gestures maylead to similar phonological behavior: this is thesource of acoustic features (Ladefoged, 1988a)such as [grave] and [flat] (Jakobson, Fant, &Halle, 1969) that organize different gestures (inour terms) according to standing wave nodes in

the oral tube (Ohala, 1985; Ohala & Lorentz,1977).

Within this tube perspective, there is a vocaltract hierarchy that characterizes aninstantaneous time slice of the output of the vocaltract. Such a hierarchy may appear identical tothe feature geometry of Sagey (1986). There are,however, two crucial differences. Unlike Sagey'sroot node, which 'corresponds neither to anatomyof the vocal tract nor to acoustic properties' (1986,p. 16), the highest node in the vocal tracthierarchy characterizes the physical state of thevocal tract at a single instant in time. In addition,tube geometry characterizes manner-CD-atmore than just the highest level node. CD ischaracterized at each level of the hierarchy. Thus,each of the tubes and sets of compound tubes inthe hierarchy will have its own effectiveconstriction degree that is completely predictablefrom the CD of its constituents, and henceultimately from the CD of the currently activegestures. At the vocal tract level, the effective CDof the supralaryngeal tract, taken together withthe initiator power provided by the lungs,determines the nature of the actual airflowthrough the vocal tract: none (complete occlusion),turbulent flow, or laminar flow. This vocal tracthierarchy thus serves as the basis for a hierarchyfor CD.

The important point here for phonology is that,in the output system, CD exists simultaneously atall the nodes in the vocal tract hierarchy-it is notisolable to any single node, but rather forms itsown CD hierarchy. As we argue in § 3.2, all levelsof the hierarchy are potentially important inaccounting for phonological regularities, and mayform the basis of a natural class. First, however,in § 3.1, we expand on the nature of tubegeometry.

3.1 How tube geometry worksFigure 13 provides a pictorial representation of

the tubes that constitute the vocal tract,embedded in the space defined by the anatomicalhierarchy of articulatory geometry, in the verticaldimension, and by the constriction degreehierarchy of tube geometry, in the horizontaldimension. The constriction action of individualgestures is schematically represented by the smallgrey disks. Thus, the two dimensions in the figureserve to organize the gestures occurring in thevocal tract tubes, vertically in terms ofarticulatory geometry, and horizontally in termsof tube geometry. Looking at the tube structure inthe center, we can see that the vocal tract

92 Browman and Goldstein

branches into three distinct tubes or airflowchannels: a Nasal tube, a Central tongue channeland a Lateral tongue channel. The complex isterminated by GLO gestures at one end. At theother end, the Central and Lateral channels aretogether terminated by LIPS gestures.

The tube geometry tree, shown at the top of thefigure, reflects the combinations of the tubes andtheir terminators. The tube level of the tree hasfive nodes, one for each of the three basic tubesand the two terminators. Within each of thesebasic tubes, the CD will be determined by the CDof the gestures acting within that tube, which areshown as subordinates to the tube nodes. Forexample, the CD ofTT gestures will contribute to

determining the CD of the Central tube. TBgestures will contribute to the Central and/orLateral tube, depending on the constriction shapeof the TB gesture. Each of the superordinatenodes corresponds to a tube junction, forming acompound tube from simpler tubes and/or thetermination of a tube. Thus, the Central andLateral tubes together form a compound tubelabelled the Tongue tube. The compound Tonguetube is termin~ted by the LIPS configuration,which combines with the Tongue tube to form anOral tube. The Oral and Nasal tubes form anothercompound tube, the Supralaryngeal, which isterminated by GLO gestures to form the overallVocal Tract compound tube.

Vocal Tract

GLO NasalI

VEL

LIPS· ..••

oral~TT/' ~TOngUe<.....-

VocaV TB ••Tract

VEL' •

GLO· ••.

Oral

Ton~ "

.j \ "Lafera, 9~' LIPS ~ Tube

TB TB TT

Figure 13. Vocal tract hierarchy: Articulatory and tube geometry.

Articulatory Gestures as Phonological Units 93

GLO [crill is value appropriate for Voicing

Table 2. Percolation ofCD up through Supralaryngealnode.

Table 1. Possible constriction degree (CD) values atgestural tract variable level: open = = (narrow OR midOR wide).

The percolation principles follow fromaerodynamic considerations. Basically, airflowthrough a tube system will follow the path of leastresistance, as we can illustrate with examples ofnasals and laterals. The Oral and Nasal tubes areconnected in parallel, forming a compoundSupralaryngeal tube. If the Oral tube is [closed]

but the Nasal tube is [open], Table 2d indicatesthat the CD of the combined Supra tube will bethe wider of the two openings, i.e., [open], as it isin a nasal stop. To take a less obvious example,consider the case where the Nasal tube is [open],but the Oral tube is [crit], Le., appropriate forturbulence generation under the appropriateairflow conditions. Table 2d predicts that in thiscase as well the Supra CD is [open], implying thatthere will be no turbulence generated. This intum predicts that nasalized fricatives, whichconsist of exactly the VEL [open] and Oral [crit]gestures, should not exist. That is, given that, atnormal airflow rates, the air in this configurationwill tend to follow the path of least resistancethrough the [open] nasal passage, nasalizedfricatives would require abnormal degrees of totalairflow in order to generate airflow through the[crit] constriction that is sufficient to produceturbulence.

Ohala (1975) has proposed that such nasalizedfricatives are, indeed, rare, precisely because theyare hard to produce. As Ladefoged and Maddieson(1986) argue, this could account for alternations inwhich the nasalized counterparts of voicedfricatives are voiced approximants, such as inGuarani (Gregores & Suarez, 1967, in Ladefoged& Maddieson, 1986). However, they also presentevidence from Schadeberg (1982) for a nasalizedlabio-dental fricative in Umbundu. This suggeststhat the percolation principles may be overriddenin certain special cases, perhaps by increasedairflow settings.

At the highest level, that of the Vocal Tract, theglottal (GLO) CD and stiffness and the Supra CDcombine with initiator (pulmonic) action todetermine the actual aerodynamic and acousticcharacteristics of airflow through the vocal tract.At this point, then, it becomes more appropriate tolabel the states of the Vocal Tract in terms of thecharacteristics of this 'output' airflow, rather thanin terms of the CD, which is one of its determiningparameters. A gross, but linguistically relevant,characterization of the output distinguishes thethree states defined in Table 3a: occlusion, noiseand resonance. Assuming some 'average' value forinitiator power, Table 3b shows how GLO andSupra CD jointly determine these properties. Acomplete closure of either system results inocclusion. If GLO is [crit] (appropriate position forvoicing, assuming also appropriate stiffness), thenit combines with an open Supralaryngeal tract toproduce resonance. Any other condition produces

= clo crit open= clo crit open= clo crit open= clo crit open= clo open= clo crit open

[COl[CD][CD. CS = nonnal][CD, CS = narrowed][CD][CD]

[CD] = VEL [CD][CD] = TB [CD, CS = narrowed][CD] =MIN (IT [COl, TB [CD, CS =nonnaI])[CD] = MAX (Central [CD], Lateral [CD])[CD] =MIN (Tongue [CD], LIPS [CD])[CD] =MAX (Oral [CD], Nasal [CD])

a. LIPSb.lTc.TBdTBe.VELf.GLO

a. NasalLateralCentral

b. Tonguec. Orald. Supra

The effective CD at each superordinate node canbe predicted from the CD of the tubes being joinedand the way they are joined. When tubes arejoined in parallel, the effective CD of thecompound tube has the CD of the widestcomponent tube, that is, the maximum CD.Whenthey are joined in series, the compound tube hasthe CD of the narrowest component tube, that is,the minimum CD. Terminations and multipleconstrictions within the same tube work like tubesconnected in series. Using these principles, it ispossible to 'percolate' CD values from the valuesfor individual gestures up to the various nodes inthe hierarchy. Table 1 shows the possible valuesfor CD of individual gestures, and Table 2 showshow to determine the CD values at eachsuccessive node up through the Supralaryngealnode, referred to hereafter as the Supra node (theVocal Tract node will be discussed below).

94 Browman and Goldstein

Table 3. Acoustic consequences at Vocal Tract level.

Default values are determined by the effect of the modelarticulators' neutral configuration within a tube

Table 4. Default CD values for basic tubes andterminators.

b. VT [CD] = occlusion / Supra [cio] OR OLD [clo]= resonance / Supra [open] AND OLD [crit]=noise / otherwise

3.2 The constriction degree hierarchyThe importance of tube geometry for phonology

resides in the fact that constriction degree is notisolable to any single level of the vocal tract.Rather, constriction degree exists at a number oflevels simultaneously, with the CD at each level inthis hierarchy defining a potential natural class.In this section, we present some examples inwhich CD from different levels is phonologicallyrelevant, and argue that the CD hierarchy isimportant and clarifying for any featural system,as well as for gestural phonology.

We are proposing (1) that there is a universalgeometry for CD in which all nodes in the CDhierarchy are simultaneously present, and (2) thatdifferent CD nodes are used to represent differentnatural classes. This approach differs from thatadopted in current feature geometries, in whichmanner features such as [nasa}], [sonorant],[continuant] or [consonantal] are usuallyrepresented at a single level in the featurehierarchy. Different geometries, however, choosedifferent levels. For example, [nasal] has beenvariously considered to be a feature dependent onthe highest (Root) node (McCarthy, 1988), on theSupralaryngeal node (via a manner node)(Clements, 1985), or on a Soft Palate node(Ladefoged, 1988a,b; Ladefoged & Halle, 1988;Sagey, 1986). It is possible that this variability inthe treatment of [nasal], and other features,reflects the natural variation in the CD hierarchy,such that different phonological phenomena arecaptured using CD at different levels in thehierarchy.

To see how this might work, consider the fourhierarchies in Figure 14, which use the tubegeometry at the top of Figure 13 to characterizethe linked CD structures for a vowel, a lateral, anasal, and an oral stop. Since tube geometry playsthe same role in the representation as thearticulatory geometry in Figure 10, the tubehierarchies are rotated 90 degrees just as theanatomical hierarchy was. The circles are agraphic representation of the CD for each node,with the filled circles indicating CD =[clo], thewavy lines indicating CD = [crit], and the opencircles indicating CD = [open] (the symbolsindicate occlusion, noise, and resonance,respectively, at the Vocal Tract level). These CDhierarchies express the various classificatory(featural) similarities and dissimilarities amongthe four segments, but they do so at differentlevels.

[CD] = cio[CD] = Central [CD][CD] = open[CD] = open[CD] = crit

NasalLateralCentralUPSOW

a. VT outputsocclusion: no airflow through VT; silence or low- amplitude

voicingnoise: turbulent airflowresonance: laminar airflow with voicing; formant structure

Finally, note that the percolation of CD throughlevels of the tube geometry can be defined at anyinstant in time, and depends on the actual size ofthe constriction within each tube at that point intime, regardless of whether some gesture isactively producing the constriction. Thus, theinstantaneous CD is the output consequence of thedefault values for articulators not under activecontrol, the history of the articulator movements,and the constellation of gestures currently active.Table 4 shows the default value we are assumingfor each basic tube and terminator. (The defaultfor Lateral is seriously oversimplified, and doesnot give the right Lateral CD in the case ofCentral fricatives, although percolation to theOral level will work correctly).

noise. In some cases, e.g., GLO [open] and Supra[open], this is weak noise generated at the glottis(aspiration). In other cases, e.g., Supra [crit], thenoise is generated in the Oral tube (frication).Further details of the output, such as where theturbulence is generated, and whether voicingaccompanies occlusion or noise, are beyond ourscope here. Ohala (1983) gives many examples ofhow the principles involved are relevant to aspectsof phonological patterning.

Articulatory Gestures as Phonological Units 95

~I-

CI:loo>

e!o

Q):::JC>C

~

Q).0

~

e!:::J...(J)Q)

CJ

CI:lQ)... C>

g C

... ~e!I- CI:l Q)

e! :::J :::JCI:l

CI:lC> Q) ...

0 a. c: .0 (J)

0 ::::J ... 0 :::J Q)

> C/) 0 l- I- CJ

•Vowel !a! Lateral!l!

0-0

Nasal!n! Oral Stop !dJ

Figure 14. CD hierarchies for vowel, lateral, nasal and oral stop.

For example, the nasal has the samecharacterization as the vowel and lateral at theSupralaryngeal and Vocal Tract levels, butdiverges at lower levels. This aspect of the CDhierarchy thus defines a phonological naturalclass consisting of nasals, laterals, and vowels,where the class is characterized by Supra [open]and VT ['open' =resonance]. This natural class istypically represented by the feature [sonorant],although different feature systems select one orthe other of these levels in their definition of

sonority. For example, Ladefoged's (1988a,b)acoustic feature distinction of [sonorant] utilizesthe identity of the value in the CD hierarchy atthe VT level. That is, for Ladefoged [+sonorant]differs from [-sonorant] in terms of output at theVT level. However, for Chomsky and Halle (1968)and Stevens (1972), it is the identity at theSupralaryngeal level that characterizes sonority,since they consider IhJ and n/ to be sonorants eventhough they differ from the nasal, vowel andlateral at the VT level.

96 Browman and Goldstein

At the Oral level (and below), the nasal and stopdisplay the same linked CD structure, whichdiffers from that of the lateral and that of thevowel. That is, nasals and stops form a naturalclass in having Oral [clo], whereas laterals andvowels form a class in having Oral [open]. Thisdifference in CD resides at a lower level of thehierarchy than the Vocal Tract or Supralaryngeallevels. And at the lowest levels, the Tubeand Gestural, the nasal differs from the stop,lateral, and vowel in having both a Nasal [CD]and VEL [CD] that are [open]. Thus, constrictiondegree at various levels can serve to categorizeand distinguish phonological units in differentways.

Within traditional feature systems, each of thenatural classes described above receives aseparate name, which obscures the systematicrelation among the various classes. Moreover,even in feature geometry, when manner featuresare restricted to a single level there is noprincipled representation of the hierarchicalrelation among the natural classes. However,feature geometries sometimes incorporate piecesof the CD hierarchy, for example, in theassignment of [nasal] and [continuant] asdependents of two different nodes in the hierarchy(Soft Palate and Root: Sagey, 1986), or theassignment of [sonorant] as part of the Root nodeitself, but [continuant] as a dependent on the Rootnode (McCarthy, 1988). Note, however, thathierarchical relations among manner classes haveto be stipulated in feature geometries, whereassuch relations are inherent in the CD hierarchy.In addition, the percolation principles of tubegeometry provide a mechanism for relating valuesof the different levels to each other, againsomething that would need to be stipulated in ahierarchy not based on tube geometry.

All the levels of the CD hierarchy appear to beuseful for establishing natural classes, and forrelating the CD natural classes to one another. Inaddition, various kinds of phonological patterns,such as phonological alternations, can beexamined in light of this hierarchy. In particular,we can investigate whether regularities are bestexpressed as processes that treat CD as tied to aparticular gesture (as an 'input' parameter) or as aconsequence of gestural combination at thevarious 'output' levels of the CD hierarchy. Forexample, the cases discussed in § 2.2.1 (e.g., /k#kI~ lh#kI) can be best described as processes thatdelete entire gestures. CD is tied to the gestureand is deleted along with the articulator set andCL. This deletion automatically accounts (by

percolation) for the CD changes at other levels ofthe hierarchy.

In other cases, however, there is muchoverdetermination-the phonological behavior canbe described equally well from more than oneperspective. For example, McCarthy (1988)discusses the common instances in which lsi ~ /hi,and glottalized consonants Ip' t' k'l ~ nt. Both ofthese examples can be straightforwardly analyzedas deletion of the oral gesture, as in the cases in§ 2.2.1. But it is also true that, in both cases, theVocal Tract output CD is unchanged by thegestural deletion. In the first case, it remainsnoisy, and in the second it remains an occlusion.Thus, the description of the phonological behaviorcould also focus on the (apparent) relativeindependence of the VT [CD] and the articulatorset involved-the articulator set(s) changes, butCD does not. The equivalence of the twoperspectives results from that fact that thepercolated values of VT [CD] are the same for thetwo gestures alone and for their combination.Thus, deletion of one or the other will not changethe VT [CD]. Examples of this kind, of which thereare likely to be many, can be viewed from a dualperspective.

One example of process for which it seems (atfirst glance) that a dual perspective cannot bemaintained involves assimilation. Steriade (inpress) has argued that assimilation is problematicfor the gestural approach, since sometimes onlythe place, and not the manner, features areassimilated. For example, in Kpelle nasalsassimilate in place but not manner to a followingstop or fricative, so that IN-fl becomes a sequence(broadly) transcribed as [mv] (Sagey, 1986). AsSteriade points out, this separation of the placeand manner features appears to argue against agestural analysis, in which the assimilationresults from increased overlap between the oralgesture for the fricative (LIPS [crit den)) and thevelum lowering gesture (VEL [wide)). Since thenasal does not become a nasal fricative, it wouldappear either that there is no increased overlap,or that the LIPS [crit] gesture changes its CDwhen it overlaps the velum lowering gesture.From the perspective of the CD hierarchy, theSupra CD of the nasal is the same before and afterthe assimilation ([open)), and therefore the Supralevel (or VT level) is the significant one for CD,rather than the Gestural level. However, thepercolation principles from tube geometry predictthat the Supra CD will be [open] even if the nasalis overlapped by a fricative gesture, as derived in(10).

Articulatory Gestures as Phonological Units

(10)

Oral [crit]~~ Tongue [open]

Supra [open] ... ..........Central [open]

97

LIPS [crit]

- VEL [wide]

There could also be a TT gesture in the derivationthat is either hidden or deleted-the Supra CDwould still be [open]. Thus, an increase in gesturaloverlap could produce both the place assimilationas well as the correct CD at the Supra level,without any change of the Gestural CD. (Anarticulatory study is needed to determine whetherthe Gestural CD does indeed change).

The examples presented so far are processesthat are either best described with CD attachedfirmly to a given gesture as it undergoes change(e.g., deletion or sliding), or are at least equallywell described that way. However, there arephenomena whose description requires a'loosening' of the relation between CD and thegesture. One such situation, in which the GesturalCD is effectively separated from its articulator set,occurs in historical change (Browman &Goldstein, in press), in particular in the types ofhistorical change that Ohala (1981) terms'listener-based' sound changes. Such cases ariseparticularly when gestural overlap leads toambiguity in the acoustic signal that the listenercan parse into one of two gestural patterns. Insuch cases, Browman and Goldstein (in press)argue that there is a (historical) 'reassignment' ofthe constriction degree parameter betweenoverlapping gestures. This analysis was used toaccount for the Ix! ~ If! changes in English wordslike cough, based on an argument originally putforth by Pagliuca (1982). Briefly, the analysisproposed that the [crit] descriptor for the TBgesture for Ix! was re-assigned to an overlappingLIPS gesture.

A related synchronic situation, involving twooverlapping gestures in a complex segment, isdiscussed in Sagey (1986). Her analysis suggeststhat manner features must be represented onmore than a single node in a feature hierarchy.

Specifically, Sagey demonstrates that multipleoral gestures cohering in a complex segment maybe restricted to a single distinctive constrictiondegree. She represents this single distinctiveconstriction degree at the highest level in thehierarchy, but still must represent whichparticular articulation bears the contrastive CD.Thus, CD is specified at two different levels. Sageydiagrams the relation between the two levels by alooping arrow drawn between the distinctive leveland the 'major' articulator making the distinctiveconstriction. In the CD hierarchy, the Oral CDwould be the lowest possible distinctive level forthese double articulations. The gesture thatcarries the distinctive CD could be marked as[head], and would automatically agree in CD withthe mother node, as in (11).

(11)Gesture [aCD]

<[head]

Oral [aCD]

Gesture

Moreover, in examples such as this, thepercolation principles do not contribute todetermining the CD of the distinctive node, exceptin a negative way to be discussed shortly. Sageyargues that the distinctive CD cannot be predictedfrom physical principles, since in Margi labio­coronals, the coronal articulation is major, andhence in /PSI the less radical constriction is thedistinctive constriction, rather than the moreradical one. Thus, the relation between the Oraland Gestural levels in (11) must be a statementabout a functional phonological unit, a gesturalconstellation, rather than a statement

98 Browman and Goldstein

characterizing an instantaneous time slice interms of tube geometry. That is, only one of thegestures in the constellation may bear adistinctive CD. Nevertheless, the importance ofphysical overlap relations can also be seen in thisexample. Maddieson and Ladefoged (1989) haveshown that complex segments such as 19D] are notcompletely synchronous, apparently lendingsupport to the distinction between phoneticordering, on the one hand, and phonologicalunordering, on the other hand. However, we arguethat the ordering of gestures within a complexsegment is phonologically important in exactly thecase of a phonological unit like Margi !PSI, wherethe distinctive value of CD at higher levels is notpredicted from the lower level CDs by thepercolation principles. If two gestures overlapcompletely (i.e., are precisely coextensive), andthere are no additional phonetic cues of the sortdiscussed in Maddieson & Ladefoged, then thepercolation principles will determine the higherlevel constriction degree throughout the entiretime-course of the phonological unit, and thedistinctive CD will fail to be conveyed. Forexample, in the case of !PSI, if the two gestureswere precisely aligned, the (distinctive) fricationwould never appear at the VT level. Only if thegestures are slightly offset can the distinctive CDbe communicated.

In general, the CD hierarchy affords a structurewithin which a typology of phonological processescan be developed, based on the CD level thatseems most relevant. It is an interesting researchchallenge to develop this typology, and to ask howit is related to other ways of categorizing theprocesses. For example, are there systematicdifferences in relevant CD level that correlatewith whether the process involves spreading(sliding) rules or deletion rules, or with whetherthe process is prelexical or postlexical?

4 SUMMARYWe have argued that dynamically-defined

articulatory gestures are the appropriate units toserve as the atoms of phonological representation.Gestures are a natural unit, not only because theyinvolve task-oriented movements of thearticulators, but because they arguably emerge asprelinguistic discrete units of action in infants.The use of gestures, rather than constellations ofgestures as in Root nodes, as basic units ofdescription makes it possible to characterize avariety of language patterns in which gesturalorganization varies. Such patterns range from themisorderings of disordered speech through

phonological rules involving gestural overlap anddeletion to historical changes in which the overlapof gestures provides a crucial explanatoryelement.

Gestures can participate in language patternsinvolving overlap because they are spatiotemporalin nature and therefore have internal duration. Inaddition, gestures differ from current theories offeature geometry by including the constrictiondegree as an inherent part of the gesture. Sincethe gestural constrictions occur in the vocal tract,which can be characterized in terms of tubegeometry, all the levels of the vocal tract will beconstricted, leading to a constriction degreehierarchy. The values of the constriction degree ateach higher level node in the hierarchy can bepredicted on the basis of the percolation principlesand tube geometry. In this way, the use ofgestures as atoms can be reconciled with the useof constriction degree at various levels in the vocaltract (or feature geometry) hierarchy.

The phonological notation developed for thegestural approach might usefully be incorporated,in whole or in part, into other phonologies. Fivecomponents of the notation were discussed, allderived from the basic premise that gestures arethe primitive phonological unit, organized intogestural scores. These components include (1)constriction degree as a subordinate of thearticulator node and (2) stiffness (duration) as asubordinate of the articulator node. That is, bothCD and duration are inherent to the gesture. Thegestures are arranged in gestural scores using (3)articulatory tiers, with (4) the relevant geometry(articulatory, tube or feature) indicated to the leftof the score and (5) structural information abovethe score, if desired. Association lines can also beused to indicate how the gestures are combinedinto phonological units. Thus, gestures can serveboth as characterizations of articulatorymovement data and as the atoms of phonologicalrepresentation.

REFERENCESAbercrombie, D. (1967). Elements of general phonetics. Edinburgh:

Edinburgh University Press.Abry, C, & BOe, L.-J. (1986). 'Laws' for lips. Speech Comm.mication,

5,97-104.Anderson, J., & Ewen, C J. (1987). Principles of dependency

phonology. Cambridge: Cambridge University Press.Anderson, S. R. (1974). The organization of phonology. New York:

Academic Press.Barry, M. C (1985). A palatographic study of connected speech

processes. Cambridge Papers in Phonetics and ExperimentalLinguistics,4,1-16.

Bell-Berti, F., & Harris, K. S. (1981). A temporal model of speechproduction. Phonetica, 38,9-20.

Articulatory Gestures as Phonological Units 99

Best, C. T., & D. Wilkenfeld (1988). Phonologically motivatedsubstitutions in a 20-22 month old's imitations of intervocalicalveolar stops. Journal of the Acoustical Society of America, 84,(S1), S114. (Abstract)

Boucher, V. (1988). For a phonology without 'rules': Peripheraltiming-structures and allophonic patterns in French andEnglish. In S. Embleton (Ed.), The Fourteenth LACUS Forum 1987(pp. 123-140). Lake Bluff IL: Linguistic Association of Canadaand the U.S.

Boysson-Bardies, B. de, Sagart, L., & Durand, C. (1984).Discernible differences in the babbling of infants according totarget language. Journal ofChild Language, 11, 1-15.

Boysson-Bardies, B. de, Sagart, L., Halle, P., & Durand, C. (1986).Acoustic investigations of cross-linguistic variability inbabbling. In B. Lindblom & R. Zetterstrom (Eds.), Precursorsof early speech (pp.113-126). Basingstoke, Hampshire:MacMillan.

Browman, C. P., & Goldstein, L. (1985). Dynamic modeling ofphonetic structure. In V. Fromkin (Ed.), Plwnetic linguistics (pp.35-53). New York: Academic.

Browman, C. P., & Goldstein, L. (1986). Towards an articulatoryphonology. Plwnology Yearbook,3, 219-252.

Browman, C. P., & Goldstein, L. (1987). Tiers in ArticulatoryPhonology, with some implications for casual speech. HaskinsLaboratories Status Report on Speech Research, SR-92 (pp. 1-30). Toappear in J. Kingston & M. E. Beckman (Eds.), Papers inLaboratory Plwnology I: Between the Grammar and the Physics ofSpeech. Cambridge: Cambridge University Press.

Browman, C. P., & Goldstein, L. (1988). Some notes on syllablestructure in articulatory phonology. Phonetica, 45,140-155.

Browman, C. P., & Goldstein, L. (1989). 'Targetless' schwa: Anarticulatory analysis. Paper presented at the Second Conferenceon Laboratory Phonology. Edinburgh, June 29-July 2,1989.

Browman, C. P., & Goldstein, L. (in press). Gestural structuresand phonological patterns. In I. G. Mattingly & M. Studdert­Kennedy (Eds.), Modularity and the motor theory of speechperception. Hillsdale, NJ: Lawrence Erlbaum Associates.

Browman, C. P., Goldstein, L., Kelso, J. A. S., Rubin, P., &Saltzman, E. (1984). Articulatory synthesis from underlyingdynamics. Journal of the Acoustical Society of America, 75, S22-523.(Abstract)

Browman, C. P., Goldstein, L., Saltzman, E., & Smith, C. (1986).GEST: A computational model for speech production usingdynamically defined articulatory gestures. Journal of theAcoustical Society ofAmerica, 80, S97. (Abstract)

Brown, G. (1977). Listening to spoken English. London: Longman.Campbell, L. (1974). Phonological features: Problems and

proposals. Language, 50,52-65.Catford, J. C. (1977). Fundamental problems in phonetics.

Bloomington: Indiana University Press.Chomsky, N., & Halle, M. (1968). The sound pattern of English. New

York: Harper and Row.Oements, G. N. (1985). The geometry of phonological features.

Phonology Yearbook, 2,225-252.Oements, G. N., & Keyser, S. J. (1983). CV phonology: A generative

theory of the syllable. Cambridge, MA: MIT Press.Ewen, C. J. (1982). The internal structure of complex segments. In

H. van der Hulst & N. Smith (Eds.), The structure of plwnologicalrepresentations (pp. 27-67). Cinnaminson, NJ: Foris PublicationsU.s.A.

Ferguson, C. A., & Farwell, C. B. (1975). Words and sounds inearly language acquisition. Language, 51, 419-439.

Fowler, C. A. (1980). Coarticulation and theories of extrinsictiming. Journal of Phonetics, 8, 113-133.

Fowler, C. A (1981). A relationship between coarticulation andand compensatory shortening. Plwnetica, 38, 35-50.

Fowler, C. A (1983). Converging sources of evidence on spokenand perceived rhythms of speech: Cyclic production of vowelsin sequences of monosyllabic stress feet. Journal of ExperimentalPsyclwlogy: General, 112,386-412.

Fowler, C. A, Rubin, P., Remez, R. E., & Turvey, M. T. (1980).Implications for speech production of a general theory ofaction. In B. Butterworth (Ed.), Language production (pp.373­420). New York: Academic Press.

Fry D. B. (1966). The development of the phonological system inthe normal and the deaf child. In F. Smith & G. Miller (Eds.),The genesis of language: A psycholinguistic approach (pp. 187-206).Cambridge, MA: MIT Press.

Fujimura, O. (1981a). Temporal organization of articulatorymovements as a multidimensional phrasal structure. Phonetica,38,66-83.

Fujimura, O. (1981 b). Elementary gestures and temporalorganization-What does an articulatory constraint mean? InT. Myers, J. Laver, & J. Anderson (Eds.), The cognitiuerepresentation of speech (pp. 101-110). Amsterdam: NorthHolland.

Gimson, A. C. (1962). An introduction to the pronunciation ofEnglish. London: Edward Arnold Publishers, Ltd.

Glass, L., & Mackey, M. C. (1988). From clocks to chaos. Princeton:Princeton University Press.

Goldstein, L., & Browman, C. P. (1986). Representation of voicingcontrasts using articulatory gestures. Journal of Phonetics, 14,339-342.

Gregores, E., & Suarez, J. A (1967). A description of colloquialGuaranf. The Hague: Mouton.

Guy, G. R. (1980). Variation in the group and the individual: Thecase of final stop deletion. In W. Labov (Ed.), Locating languagein time and space (pp. 1-36). New York: Academic Press.

Haken, H., Kelso, J. AS., & Bunz, H. (1985). A theoretical modelof phase transitions in human hand movements. BiologicalCybernetics, 51, 347-356.

Halle, M. (1982). On distinctive features and their articulatoryimplementation. Natural Language and Linguistic Theory, 1, 91­105.

Halle, M., & Stevens, K. (1971). A note on laryngeal features. MIT­RLE Quarterly Progress Report,11, 198-213.

Hammond, M. (1988). On deriving the well-formednesscondition. Linguistic Inquiry, 19,319-325.

Hardcastle, W. J., & Roach, P. J. (1979). An instrumentalinvestigation of coarticulation in stop consonant sequences. InP. Hollien & H. Hollien (Eds.), Current issues in the phoneticsciences (pp. 531-540). Amsterdam: John Benjamins AG.

Hayes, B. (1986). Assimilation as spreading in Toba Batak.Linguistic Inquiry, 17, 467-499.

Jakobson, R., Fant, C. G. M., & Halle, M. (1969). Preliminaries tospeech analysis: The distinctiue features and their correlates.Cambridge, MA: The MIT Press.

Jespersen, O. (1914). Lehrbuch der phonetik. Leipzig: B. G. Teubner.Keating, P. A. (1985). CV phonology, experimental phonetics, and

coarticulation. UCLA Working Papers in Phonetics, 62, 1-13.Keating, P. A. (1988). Palatals as complex segments: X-ray

evidence. UCLA Working Papers in Plwnetics, 69,77-91.Kelso, J. A. S., & Tuller, B. (1987). Intrinsic time in speech

production: Theory, methodology, and preliminaryobservations. In E. Keller & M. Gopnik (Eds.), Motor and sensoryprocesses of language (pp. 203-222). Hillsdale, NJ: Erlbaum.

Kelso, J. A. S., Vatikiotis-Bateson, E., Saltzman, E. L., & Kay, B.(1985). A qualitative dynamic analysis of reiterant speech

100 Browman and Goldstein

production: Phase portraits, kinematics, and dynamicmodeling. Journal of the Acoustical Society ofAmerica, 77,266-280.

Kent, R. D. (1983). The segmental organization of speech. In P.F.MacNeilage (Ed.), The production of speech (pp. 57-89). New YorkSpringer-Verlag.

Kent, R. D., &: Murray, A. D. (1982). Acoustic features of infantvocalic utterances at 3, 6, and 9 months. Journal of the AcousticalSociety ofAmerica, 72, 353-365.

Kohler, K. (1976). Die Instabilitiit wortfinaler Alveolarplosive imDeutschen: Eine elektropalatographische Untersuchung [Theinstability of word-final alveolar plosives in German: Anelectropalatographic investigation.) Phonetica,33, 1-30.

Koopmans-van Beinum, F. J., &: Van der Stelt, J. M. (1986). Earlystages in the development of speech movements. In B.lindblom &: R. Zetterstrom (Eds.), Precursors ofearly speech (pp.37-50). Basingstoke, Hampshire: MacMillan.

Krakow, R. A. (1989). The articulatory organization of syllables: Akinematic analysis of labial and lIelar gestures. Doctoraldissertation,Yale University.

Ladefoged, P. (1980). What are linguistic sounds made of?umguage,56,485-502.

LadefogOO, P. (1982). A course in phonetics (2nd 00.). New York:Harcourt Brace Jovanovich.

LadefogOO, P. (1988a). Hierarchical features of the InternationalPhonetic Alphabet. UCLA Working Papers in Phonetics, 70, 1-12.

Ladefoged, P. (1988b). The many interfaces between phoneticsand phonology. UCLA Working Papers in Phonetics, 70, 13-23.

Ladefoged, P., &: Halle, M. (1988). Some major features of theInternational Phonetic Alphabet. Language, 64, 577-582.

LadefogOO, P., &: Maddieson, I. (1986). Some of the sounds of theworld's languages. UCLA Working Papers in Phonetics, 64, 1-137.

Lass, R. (1984). Phonology: An introduction to basic concepts.Cambridge: Cambridge University Press.

Liberman, A. M., &: Mattingly, I. G. (1985). The motor theory ofspeech perception revised. Cognition, 21 , 1-36.

Lieberman, P. (1984). The biology and ellolution of language.Cambridge: Harvard University Press.

Lindau, M. (1978). Vowel features. LAnguage,54, 541-563.Lindblom, B. (1986). Phonetic universals in vowel systems. In J. J.

Ohala &: J. Jaeger (Eds.), Experimental phonology (pp. 13-44).Orlando: Academic Press.

Lindblom, B., MacNeilage, P. F., &: Studdert-KennOOy, M. (1983).Self-organizing processes and the explanation of phonologicaluniversals. In B. Butterworth, B. Comrie &: 6. Dahl (Eds.),Explanations of linguistic unillersals (pp. 182-203). The Hague:Mouton.

Linker, W. (1982). Articulatory and acoustic correlates of labialactivity in vowels: A cross-linguistic study. UCLA WorkingPapers in Phonetics, 56, 1-134.

Locke, J. L. (1983). Phonological acquisition and change. New York:Academic Press.

Locke, J. L. (1986). The linguistic significance of babbling. In B.Lindblom &: R. Zetterstrom (Eds.), Precursors ofearly speech (pp.143-160). Basingstoke, Hampshire: MacMillan.

Lombardi, L. (1987). On the representation of the affricate. Ms,University of Massachusetts, Amherst.

Lynch, J. (1983). On the Kuman liquids. LAnguage and Linguistics inMelanesia, 14, 98-112.

Maddieson, I. (1987). Revision of the IPA: Linguo-Iabials as a testcase. Journal of the International Phonetic Association,17, 26-30.

Maddieson, I., &: Ladefoged, P. (1989). Multiply articulatedsegments and the feature hierarchy. UCLA Working Papers inPhonetics, 72, 116-138.

Marshall, R. c., Gandour, J., &: Windsor, J. (1988). Selectiveimpairment of phonation: A case study. Brain and LAnguage, 35,313-339.

McCarthy, J. J. (1988). Feature geometry and dependency.Phonetica,45,~108

McCarthy, J. J. (1989). Linear ordering in phonologicalrepresentation. Linguistic Inquiry,20, 71-99.

Nittrouer, S., Studdert-Kennedy, M., &: McGowan, R. S. (1989).The emergence of phonetic segments evidence from thespectral structure of fricative-vowel syllables spoken bychildren and adults. Journal of Speech and Hearing Research, 32,120-132.

Nittrouer, S., Munhall, K., Kelso, J. A. S., Tuller, B., &: Harris, K. 5.(1988). Patterns of interarticulator phasing and their relation tolinguistic structure. Journal of the Acoustical Society of America,84,1653-1661.

Ohala, J. J. (1975). Phonetic explanations for nasal sound patterns.In C. A. Ferguson, L. M. Hyman, &: J. J. Ohala (Eds.), Nasalfest(pp. 289-316). Stanford: Stanford University.

Ohala, J. J. (1981). The listener as a source of sound change. InC. S. Masek, R. A. Hendrick, &: M. F. Miller (Eds.), Papers fromthe parasession on language and behallior (pp. 178-203). Chicago:Chicago linguistic Society.

Ohala, J. J. (1983). The origin of sound patterns in vocal tractconstraints. In P. F. MacNeilage (Ed.), The production of speech(pp. 189-216). New York: Springer-Verlag.

Ohala, J. J. (1985). Around 'flat: In V. A. Fromkin (Ed), PhoneticLinguistics: Essays in honor of Peter LAdefoged (pp. 223-241). NewYork: Academic Press.

Ohala, J. J., &: Lorentz, J. (1977). The story of [wI: An exercise inthe phonetic explanation for sound patterns. Proceedings,Annual Meeting of the Berkeley Linguistic Society,3, 577-599.

Oller, D. K. (1986). Metaphonology and infant vocalizations. InB. Lindblom &: R. Zetterstrom (Eds.), Precursors of early speech(pp. 21-36). Basingstoke, Hampshire: MacMillan.

Oller, D. K., &: Eilers, R. E. (1982). Similarity of babbling inSpanish- and English-learning babies. Journal of Child LAnguage,9,565-577.

Oller, D. K. &: Eilers, R. E. (1988). The role of audition in infantbabbling. Child DetJelopment, 59, 441-449.

Ostry, D. J. &: Munhall, K. (1985). Control of rate and duration ofspeech movements. Journal of the Acoustical Society of America,77,640-648.

Pagliuca, W. (1982). Prolegomena to a theory ofarticulatory elIolution.Doctoral dissertation, State University of New York at Buffalo.Ann Arbor, MI: University Microfilms International.

Pike, K. L. (1943). Phonetics. Ann Arbor: University of MichiganPress.

Plotkin, V. Y. (1976). Systems of ultimate units. Phonetica, 33,81­92.

Principles of the International Phonetic Association (1949, reprinted1978). London University College.

Recasens, D. (in preparation). The articulatory characteristics ofalveolopalatal and palatal consonants. Haskins Laboratories.New Haven, CT.

Rubin, P., Baer, T., &: Mermelstein, P. (1981). An articulatorysynthesizer for perceptual research. Journal of the AcousticalSociety ofAmerica, 70, 321-328.

Sagey, E. C. (1986). The representation of features and relations in non­linear phonology. Unpublished doctoral dissertation, MIT.

Sagey, E. C. (1988). On the ill-formedness of crossing associationlines. LinguisticInquiry, 19, 109-118.

Saltzman, E. (1986). Task dynamic coordination of the speech. articulators A preliminary model. In H. Heuer &: C. Fromm

(Eds.), Generation and modulation ofaction patterns (pp. 129-144).(Experimental Brain Research Series 15). New York Springer­Verlag.

Saltzman, E:, Goldstein, L., Browrnan, C. P., &: Rubin, P. E. (1988).Dynamics of gestural blending during speech production.

Articulatory Gestures as Phonological Units 101

Presented at 1st Annual International Neural Network Society(INNS) Boston, September 6-10,1988.

Saltzman, E., & Kelso, J. A. S. (1987). Skilled actions: A taskdynamic approach. Psychological RerJiew, 94,84-106.

Saltzman, E., & K. Munhall (1989). A dynamical approach togestural patterning in speech production. Ecological Psychology.

Saltzman, E., Rubin, P. E., Goldstein, L., & Browman, C. P. (1987).Task-dynamic modeling of interarticulator coordination.Journal ofthe Acoustical Society ofAmerica, 82,515. (Abstract)

Schadeburg, T. C. (1982). Nasalization in UMbundu. Journal ofAfrican Languages and Linguistics, 4, 109-132.

Schmidt, R. c., Carello, C., & Turvey, M. T. (1987). Visualcoupling of biological oscillators. PAW RerJiew, 2(1),22-24.

Selkirk, E. (1988). A two-root theory of length. Presented at the19th Meeting of the New England linguistic Society (NELS),November 1988.

Shockey, L. (1974). Phonetic and phonological properties ofconnected speech. Ohio State Working Papers in Linguistics, 17,iv-I43.

Steriade, D. (in press). Gestures and autosegments: Comments onBrowman and Goldstein's 'Gestures in ArticulatoryPhonology: In J. Kingston & M. E. Beckman (Eds.), Papers inlaboratory phonology I: Between the grammar and the physics ofspeech. Cambridge: Cambridge University Press.

Stevens, K. N. (1972). The quantal nature of speech: Evidencefrom articulatory-acoustic data. In E. E. David & P. B. Denes(Eds.), Human communication: A unified t7iew (pp. 51-66). NewYork McGraw Hill.

Stevens, K. N. (1989). On the quantal nature of speech. Journal ofPhonetics, 17, 113-45.

Stevens, K. N., Keyser, S. J., & Kawasaki, H. (1986). Toward aphonetic and phonological theory of redundant features. In J. S.Perkell & D. H. Klatt (Eds.), Int7ariance and t7ariability in speechprocesses (pp. 426-449). Hillsdale, NJ: Lawrence ErlbaumAssociates.

Straight, H. S. (1976). The acquisition ofMaya phonology: Variation inYucatec child language. New York Garland.

Studdert-Kennedy, M. (1987). The phoneme as a perceptuomotorstructure. In A. Allport, D. MacKay, W. Prinz & E. Scheerer(Eds.), Language perception and production (pp. 67-84). LondonAcademic Press.

Thelen, E. (1981). Rhythmical behavior in infancy: An ethologicalperspective. DetJelopmental Psychology, 17,237-257.

Thompson, J. M. T., & Stewart, H. B. (1976). Nonlinear dynamicsand chaos. New York: John Wiley & Sons.

Thrainsson, H. (1978). On the phonology of Icelandicpreaspiration. Nordic Journal of Linguistics, 1(1), 3-54.

Vatikiotis-Bateson, E. (1988). Linguistic structure and articulatorydynamics. Bloomington: Indiana Linguistics Oub.

Vennemann, T., & Ladefoged, P. (1973). Phonetic features andphonological features. Lingua, 32, 61-74.

Vihman, M. M. (in press). Ontogeny of phonetic gestures: Speechproduction. In I. G. Mattingly & M. Studdert-Kennedy (Eds.),Modularity and the motor theory ofspeech perception. Hillsdale, NJ:Lawrence Erlbaum Associates.

Vihman, M. M., Macken, M. A., Miller, R., Simmons, H., &Miller, J. (1985). From babbling to speech: A re-assessment ofthe continuity issue. Language, 61, 397-445.

Wood, S. (1982). X-ray and model studies of vowel articulation.Working Papers in Linguistics, 23, Lund University.

FOOTNOTES

• Phonology, 6,201-151 (1989).t Also, Department of Linguistics, Yale University.1 A simple dynamical system consists of a mass attached to the

end of a spring-a damped mass-spring model. If the mass ispulled, stretching the spring beyond its rest length (equilibriumposition), and then released, the system will begin to oscillate.The resultant movement patterns of the mass will be a dampedsinusoid described by the solution to the equation below. Whensuch an equation is used to model the movements of coordinatedsets of articulators, the 'object'-motion variable-in theequation is considered to be the tract variable, for example, lipaperture (LA). Thus, the sinusoids trajectory would describehow lip aperture changes over time:

mX+ bk+ k(x- xO) = 0

where m =mass of the object, b =damping of the system, k =stiffness of the spring, xO = rest length of the spring (equilibriumposition), x = instantaneous displacement of the object, x=instantaneous velocity of the object, x= instantaneousacceleration of the object.

2 For ease of reference to the gesture, it is possible to use either abundle of gestural descriptors (or a selected subset) or gesturalsymbols. We have been trying different approaches to thequestion of what gestural symbols should be; our present bestestimate is that gestures should be treated like archiphonemes.Thus, our current proposal is to use the capitalized form of thevoiced IPA symbol for oral gestures, capitalized and diacritized{HI for glottal gestures, and {±NI for [velie clo] and [velie open]gestures, respectively. In order to clearly distinguish gesturalsymbols from other phonetic symbols, we enclose them in curlybrackets: {I. This approach should permit gestural descriptionsto draw upon the full symbol resources of IPA, rather thanattempting to develop an additional set of symbols. However,we welcome comments from others on this decision, particularlyon the proposal to capitalize gestural symbols, a choice thatminimizes confusion with phonemic transcriptions-but thatleads to a conflict with uvular symbols in the current IPAsystem.