goldstein fowler 2003

53
Articulatory Phonology: A phonology for public language use Louis Goldstein and Carol A. Fowler 1 1. Introduction The goals of the theoretical work that we describe here are twofold. We intend first to develop a realistic understanding of language forms as language users know them, produce them and perceive them. Second we aim to understand how the forms might have emerged in the evolutionary history of humans and how they arise developmentally, as a child interacts with speakers in the environment. A seminal idea is that language forms (that is, those entities of various grain sizes that theories of phonology characterize) are the means that languages provide to make between-person linguistic communication possible, and, as such, they are kinds of public action, not the exclusively mental categories of most theories of phonology, production and perception. A theory of phonology, then, should be a theory about the properties of those public actions, a theory of speech production should be about how those actions are achieved, and a theory of speech perception should be about how the actions are perceived. A theory of the emergence of phonological structure in language, from this perspective, is about how particulate language forms emerged in the course of communicative exchanges between people. Therefore, it predicts that the forms will have properties that adapt them for public language use: for speaking and for being perceived largely from acoustic speech signals.

Upload: ubirata1

Post on 22-Nov-2015

32 views

Category:

Documents


1 download

TRANSCRIPT

  • Articulatory Phonology: A phonology for publiclanguage use

    Louis Goldstein and Carol A. Fowler1

    1. Introduction

    The goals of the theoretical work that we describe here are twofold.We intend first to develop a realistic understanding of languageforms as language users know them, produce them and perceivethem. Second we aim to understand how the forms might haveemerged in the evolutionary history of humans and how they arisedevelopmentally, as a child interacts with speakers in theenvironment.

    A seminal idea is that language forms (that is, those entities ofvarious grain sizes that theories of phonology characterize) are themeans that languages provide to make between-person linguisticcommunication possible, and, as such, they are kinds of publicaction, not the exclusively mental categories of most theories ofphonology, production and perception. A theory of phonology, then,should be a theory about the properties of those public actions, atheory of speech production should be about how those actions areachieved, and a theory of speech perception should be about how theactions are perceived.

    A theory of the emergence of phonological structure inlanguage, from this perspective, is about how particulate languageforms emerged in the course of communicative exchanges betweenpeople. Therefore, it predicts that the forms will have properties thatadapt them for public language use: for speaking and for beingperceived largely from acoustic speech signals.

  • 2 Articulatory Phonology

    Articulatory Phonology provides the foundation on which webuild these theoretical ideas.

    2. Articulatory Phonology

    2.1. Phonology as a combinatoric system

    The most fundamental property of speech communication is itsphonological structure: it allows a small (usually

  • Articulatory Phonology 3

    articulatory, electromyographic, aerodynamic, and auditory domains(but cf. Stevens [1989, 1999] quantal theory which attempts toisolate some invariant acoustic properties). Continuous, context-dependent motion of a large number of degrees of freedom is whatwe find in physical records. As a response to this failure,phonological units (segments) have been removed from the domainof publicly observable phenomena, and have been hypothesized to befundamentally mental units that are destroyed or distorted in the actof production, only to be reconstructed in the mind of the perceiver(e.g., Hockett 1955; Ohala 1981).

    Articulatory Phonology (Browman and Goldstein 1992a,1995a) has proposed, following Fowler (1980), that the failure to findphonological units in the public record was due to looking at tooshallow a description of the act of speech production, and that it is, infact, possible to decompose vocal tract action during speechproduction into discrete, re-combinable units. The central idea is thatwhile the articulatory and acoustic products of speech productionactions are continuous and context-dependent, the actions themselvesthat engage the vocal tract and regulate the motions of its articulatorsare discrete and context-independent. In other words, phonologicalunits are abstract with respect to the articulatory and acousticvariables that are typically measured, but not so abstract as to leavethe realm of the vocal tract and recede into the mind. They areabstract in being coarse-grained (low dimensional) with respect to thespecific motions of the articulators and to the acoustic structure thatmay specify the motions.

    2.2. Units of combination (atoms) are constriction actions of vocalorgans

    Articulatory Phonology makes three key hypotheses about the natureof phonological units that allow these units to serve their dual roles asunits of action and units of combination (and contrast). These are:that vocal tract activity can be analyzed into constriction actions ofdistinct vocal organs, that actions are organized into temporally

  • 4 Articulatory Phonology

    overlapping structures, and that constriction formation isappropriately modeled by dynamical systems.

    2.2.1. Constriction actions and the organs that produce them

    It is possible to decompose the behavior of the vocal tract duringspeech into the formation and release of constrictions by six distinctconstricting devices or organs: lips, tongue tip, tongue body, tongueroot, velum, and larynx. Although these constricting organs sharemechanical degrees of freedom (articulators and muscles) with oneanother (for example, the jaw is part of the lips, tongue tip, andtongue body devices), they are intrinsically distinct and independent.They are distinct in the sense that the parts of the vocal anatomy thatapproach one another to form the constrictions are different, and theyare independent in the sense that a constriction can be formed by oneof these devices without necessarily producing a constriction in oneof the others (a point made by Halle 1983, who referred to organs asarticulators). Thus, constricting actions of distinct organs, actionsknown as gestures, can be taken as atoms of a combinatoric system they satisfy the property of discrete differences (Browman andGoldstein 1989; Studdert-Kennedy 1998). Two combinations ofgestures can be defined as potentially contrasting with one another if,for example, they include at least one distinct constriction gesture.The words pack and tack contrast with one another in that the formerincludes a lips gesture and the latter a tongue tip gesture.

    The words hill, sill, and pill contrast with one another in thecombination of constricting organs engaged at the onset of the word.Hill begins with a gesture of the larynx (glottal abduction); sill adds agesture of the tongue tip organ to the laryngeal one and pill adds a lipgesture. These three gestures (larynx, tongue tip, and lips) allcombine to create the contrasting molecule spill. The analysis of theonset of spill as composed of three gestures differs from an analysisof this form as composed of a sequence of two feature bundles (/s/

  • Articulatory Phonology 5

    and /p/), in which each of those bundles has some specification forthe larynx and a supralaryngeal gesture. Evidence in favor of thethree-gesture specification has been discussed in Browman andGoldstein (1986).

    Of course, not all phonological contrasts involve gestures ofdistinct organs. For example, pin and fin differ in the nature of the lipgestures at the beginning of the words. The discrete differentiation ofthe gestures involved in such contrasts critically depends on thepublic properties of a phonological system, and is discussed in thelast section of this paper. However, it can also be argued thatbetween-organ contrasts are the primary ones within phonologicalsystems. For example, while all languages contrast constrictiongestures of the lips, tongue tip, and tongue body, within-organcontrasts (such as [p-f], or [t-]) are not universal.

    2.2.2. Coordination of gestures and overlap

    While traditional theories of phonology hypothesize that theprimitive units combine by forming linear sequences, ArticulatoryPhonology hypothesizes that gestures are coordinated into moreelaborated molecular structures in which gestures can overlap intime. Such coproduction of gestures can account for much of thesuperficial context-dependence that is observed in speech. The reasonfor this can be found in the nature of the distinct constricting organs,which share articulators and muscles. When two gestures overlap, theactivities of the individual mechanical degrees of freedom willdepend on both (competing) gestures. For example, consider thecoproduction of a tongue tip constriction gesture with the tonguebody gesture for different vowels (as in /di/ and /du/). The same(context-independent) tongue tip gesture is hypothesized to beproduced in both cases, but the contribution of the variousarticulatory degrees of freedom (tongue tip, tongue body) will differ,due to the differing demands of the vowel gestures. The theory of

  • 6 Articulatory Phonology

    task dynamics (Saltzman 1986, 1995, as discussed below) provides aformal model of such context-dependent variability.

    Gestures are hypothesized to combine into larger molecules bymeans of coordinating (bonding) individual gestures to one another.This coordination can be accomplished by specifying a phase relationbetween the pair of coupled dynamical systems that control theproduction of the gestures within a task dynamic model (Browmanand Goldstein 1990; Saltzman and Byrd 2000). Figure 1 shows howthe gestures composing the utterance team, are arranged in time,using a display called a gestural score. The arrows connectindividual gestures that are coordinated with respect to one another.Recent work within Articulatory Phonology (Browman andGoldstein 2000; Byrd 1996) has shown that the pairs of coordinatedof gestures vary in how tightly they are bonded. Bonding strength isrepresented in the figure by the thickness of the arrows. Note thatwhile there is no explicit decomposition of this molecule intotraditional segmental units, the gestures that constitute segments(tongue tip and larynx gesture for /t/, lip and velum gesture for /m/)are connected by the thicker lines. High bonding strength is alsofound for gestures that compose a syllable onset, even when they donot constitute a single segment (e.g., tongue tip and lip gestures in theonset cluster /sp/), and it is not known at present whether theirbonding differs from the intrasegmental bonds. In principle, bondingstrength can be used to used to define a hierarchy of unit types,including segments, onset and rimes, syllables, feet, and words. Themore tightly bonded units are those that we would expect to cohere inspeech production and planning, and therefore, in errors.

  • Articulatory Phonology 7

    Figure 1. Gestural score for the utterance team as generated automatically by themodel of Browman and Goldstein (1992a). The rows correspond to distinctorgans (TB = Tongue Body, TT = Tongue Tip). The labels in the boxesstand for gestures goal specification for that organ. For example, alveolarstands for a tongue tip constriction location 56 degrees up from the horizontaland clo stands for a tongue tip constriction degree of 3.5mm (the tonguetip compresses against the palate). The arrows connect gestures that arecritically coordinated, or phased, with respect to one another. The thickerarrows represent tighter bonding strengths between coordinated gestures.

    Bonding strength can also be seen to correlate with combinatoricstructure of gestures and may, in fact, emerge from that combinatoricstructure developmentally. Syllable onsets are tightly bonded, and thenumber and nature of consonant combinations that can occur assyllable onsets (in English and in languages generally) is highlyrestricted (usually in accordance with the sonority hierarchy). Thecoordination of the onset gestures with the vowel is much looser, andthere are virtually no constraints on combinations of onsets withvowels in languages it is here that systems typically exhibit freesubstitutability. Thus, each onset recurs frequently with differentvowels, a distribution that could allow the coordination of the onsets

    wide

    bilabialclo

    palatalnarrow

    alveolarclo

    wide

    VELUM

    LIPS

    TB

    TT

    LARYNX

    "team"

  • 8 Articulatory Phonology

    to be highly stabilized (through frequency of experience), but wouldnot afford such stabilization for particular onset-vowel combinations(since they occur relatively infrequently). Thus, onsets (and in thelimiting case, segments) can be considered ions of a combinatoricsystem: internally cohesive structures of atoms that recombinereadily with other such structures.

    2.2.3. Dynamical specification

    One aspect of the physical description of speech seems particularly atodds with a description of discrete units: the fact that articulatorsmove continuously over time. This continuous motion is modeled inmost theories of phonology or speech production by assuming thatthere are targets for phonological units, and that speech productioninvolves interpolation between the targets (e.g., Keating 1990;Perrier, Loevenbruck and Payan 1996). In this view, the phonologicalinformation (the target) is not part of the production itself it ishidden within the mind of the producer, and is only communicatedthough the resulting interpolation.

    In Articulatory Phonology, gestural units are modeled asdynamical systems. A dynamical system is characterized by anequation (or set of equations) that expresses how the state of a systemchanges over time. Crucially, production of a gestural constrictioncan be modeled as a (mass-spring) dynamical system with a fixed setof parameter values (Browman and Goldstein 1995a; Saltzman1995). During the time that a gesture is active, its equation is fixedand time-invariant, even though the articulators are movingcontinuously. Moreover, the way in which the articulators move overtime is specific to the particular dynamical system involved, andtherefore, the gestural unit is specified directly in the motion itself.

  • Articulatory Phonology 9

    2.3. Evidence for gestures as units of speech production

    How can we test the hypothesis that the act of speech production canbe decomposed into gestures actions that form local constrictions?There is little direct evidence in the products of speech production(acoustics or articulator movements) for discrete units, as has beenlong observed. One kind of indirect evidence involves analysis-by-synthesis. It is possible to examine articulator motions and to inferfrom them a plausible gestural structure (such as that in Figure 1).The resulting structure can then be used as input to a gesturalproduction model in order to test whether motions matching those ofthe original data are generated. This strategy has, in limited contextsat least, been used successfully to support particular hypotheses aboutgestural decomposition (Browman 1994; Browman and Goldstein1992b; Saltzman and Munhall 1989).

    Recently, some more direct evidence for decomposing speechproduction and/or planning into gestural structures has been obtainedfrom speech production errors. Errors have long been used asevidence for linguistic units of various types (e.g., Fromkin 1973).The underlying assumption is that errors can result from systematicmisplacement of linguistic units within a larger structure that isactive during speech planning. For example, errors have been used tosupport the reality of abstract segments (internal, non-public units) asunits of phonological encoding in speech production (e.g., Shattuck-Hufnagel 1983; Shattuck-Hufnagel and Klatt 1979). Transcriptions oferrors suggest that segments are the most common units involved inerrors (exhibiting changes in position, such as anticipations,perseverations, and exchanges). Evidence that the segments involvedin errors are abstract (and not phonetic) is found in the fact thatsegments appear to be phonetically accommodated to the newcontexts created for them by the errors. For example in the errorslumber party lumber sparty (Fromkin 1973), the /p/ is aspiratedin party but unaspirated in sparty. In addition, errors involvingsegments have also been claimed to be (almost always)phonotactically well-formed strings of the language. These facts havebeen modeled (Shattuck-Hufnagel 1983) as arising during

  • 10 Articulatory Phonology

    phonological planning by the insertion of a segmental unit into anincorrect slot in the phonological frame. Though incorrect, the slot isof the appropriate type for the segment in question, which accountsfor the observed phonotactic well-formedness. Thus, utterances witherrors are thought to be generally well-formed, both phonetically andphonotactically.

    That this view of speech errors is complete was called intoquestion by Mowrey and McKay (1990) who collected EMG dataduring the production of tongue twisters and found anomalouspatterns of muscle activity that were not consistent with the notionthat utterances with errors are phonetically and phonotactically wellformed. Experiments in our laboratory have generated kinematic datathat confirm and extend their results (Goldstein, Pouplier, Chen,Saltzman and Byrd submitted; Pouplier and Goldstein submitted;).These experiments involve repetition of simple two-word phrasessuch as cop top, in which the syllables are identical except for theirinitial consonants. Speakers who produce these phrases consistentlyproduce errors of gestural intrusion, in which a copy of the oralconstriction gesture associated with one of the initial consonants isproduced during the other initial consonant. For example, in cop topa copy of the tongue dorsum gesture for the initial /k/ in cop can beobserved during the /t/ in top (and conversely, a copy of the tonguetip gesture for /t/ can be observed during the /k/). These intrusions areusually reduced in magnitude compared to an intended gesture (andtherefore such errors are often not perceived), but they can become aslarge as intended gestures, in which case a speech error is perceived.

    These gestural intrusion errors cannot result from moving anabstract segment to the wrong position within a phonotactically well-formed frame. First, an intruded gesture is typically partial inmagnitude. If an abstract segment were to move to an incorrectposition, there is no reason why it should result in a reduced gesture.Second, intruded gestures are produced concurrently with thegesture(s) of the intended consonant, which may not exhibitreduction (For example, when the tongue dorsum gesture of /k/intrudes on the /t/ of top, the tongue tip gesture for /t/ is still

  • Articulatory Phonology 11

    produced). Every speaker exhibits more errors of gestural intrusionthan reduction. Thus, the speaker appears to be producing thegestures of the two words at the same time, which is not aphonotactically well-formed structure. In contrast, gestural intrusionerrors can be readily accounted for if the structures involved inspeech production and/or planning are gestural. The intrusion errorscan be viewed as spontaneous transitions to a more intrinsicallystable mode of coordination in which all gestures are produced in a1:1 frequency-locking mode (which is known to be the most stablecoordinative state in motor activities generally, e.g., Haken et al.1996). For example, in the correct production of cop top, the tonguedorsum gesture for /k/ is coordinated in a 1:2 pattern with thegestures for the rime /p/ (op), as is the tongue tip gesture for /t/. Inproductions with gestural intrusion, the relation with the rimegestures becomes 1:1, and, if both initial consonants exhibit intrusionerrors (which does occur), then all gestures are produced in a 1:1pattern.

    It is important to note that gestural structures remain discrete inspace and time in gestural intrusion errors, even when they arepartial. Spatially, the articulatory record shows the simultaneousproduction of two distinct gestures, not a blended intermediatearticulation. Temporally, the erroneous gesture is synchronized withthe other initial consonant gesture, and does not slide continuouslybetween its home and its intrusion site.

    In some cases, a reduction of an intended gesture is observedwhen a gestural intrusion occurs. Such errors can be modeled asresulting from the competition between the intrinsically stable 1:1mode and the learned gestural coordination patterns (molecules) ofthe language. These learned coordination patterns (for English) donot allow for both gestures (e.g., tongue tip for /t/ and tongue dorsumfor /k/) to occur together.

    How does this approach to speech errors account for cases likeslumber party lumber sparty that have been used to argue for therole of abstract mental units in speech production? The oralconstriction gesture for /s/ (and possibly its laryngeal gesture) couldbe assumed to intrude at the beginning of the word party, and to

  • 12 Articulatory Phonology

    reduce at the beginning of the word slumber. Alternatively, thegestural ion for /s/ may be mis-coordinated or mis-selected in theproduction of the phrase (the possible existence of such errors has notbeen ruled out). Under either interpretation, the resulting pattern ofcoordination of the oral and laryngeal gestures for /s/ and /p/ couldcombine to produce an organization that would be identified as anunaspirated [p], though articulatory data would be required todetermine exactly how this comes about (see Browman andGoldstein 1986; Munhall and Lfqvist 1992; Saltzman and Munhall1989).

    2.4. Phonological knowledge as (abstract) constraints on gesturalcoordination

    The goal of a phonological grammar is to account for nativespeakers (implicit) knowledge of phonological structure andregularities in a particular language, including an inventory oflexically contrastive units, constraints on phonological forms, andsystematic alternations to lexical forms that result frommorphological combination and embedding in a particular prosodiccontext. If phonological forms are structures of coordinated gestures,as hypothesized in Articulatory Phonology, then gestural analysisshould reveal generalizations (part of speakers knowledge) that areobscured when phonological form is analyzed in some other way (interms of features, for example). The property that most salientlydifferentiates gestural analyses from featural analyses is that gesturalprimitives are intrinsically temporal and thus can be explicitlycoordinated in time. We therefore expect to find phonologicalgeneralizations that refer to patterns or modes of coordination,abstracting away from the particular actions that are beingcoordinated. Several examples of abstract coordination constraintshave, in fact, been uncovered.

  • Articulatory Phonology 13

    The coordination relation specified between two closuregestures will determine the resulting aerodynamic and acousticconsequences whether there is an audible release of trappedpressure between the closures or not. So in principle one couldcharacterize a sequence of closures either in terms of their abstractcoordination pattern or their superficial release characteristics. Whichcharacterization do speakers employ in their phonologicalknowledge? Gafos (2002) has shown that a crucial test case can befound when the same abstract coordination pattern can give rise todifferent consequences depending on whether the closures employthe same or distinct organs. In such cases, he finds languages inwhich the generalization must be stated in terms of coordinationpattern not in terms of the superficial release properties. In SierraPopoluca (Clements 1985; Elson 1947), releases are found betweensequences of heterorganic consonants, but lack of release is found forhomorganic sequences. A generalization can be stated in terms of anabstract coordination pattern (the onset of movement for the secondclosure beginning just before the release of the first closure), but notin terms of release. Note that an abstract decomposition of thearticulatory record into gestures is required here. In the case of ahomorganic sequence, it is not possible to observe the onset ofmovement of the second closure in the articulatory record a singleclosure is observed. But the temporal properties of this closure fallout of a gestural analysis, under the assumption that there are twogestural actions in the homorganic case, coordinated just as they arein the heterorganic case (where they can be observed). In SierraPopoluca, the generalization is a relatively superficial phonetic one(characterizing the form of the consonant sequence) that does nothave consequences for the deeper (morpho)-phonology of thelanguage. But Gafos then shows that in Moroccan Arabic, a similarcoordination constraint interacts with other constraints (such as agestural version of the obligatory contour principle) in determiningthe optimal set of (stem) consonants to fill a morphological template.In this case, constraints on the coordination of gestures contribute toan account of morphophonological alternations.

  • 14 Articulatory Phonology

    Another example of abstract coordination modes in phonologyinvolves syllable structure. Browman and Goldstein (1995b) haveproposed that there are distinct modes of gestural coordination forconsonant gestures in an onset versus those in a coda, and that thesemodes are the public manifestations of syllable structure. In an onset,a synchronous mode of coordination dominates consonant gesturestend to be synchronous with one another (to the extent allowed by anoverall constraint that the gestures must be recoverable by listeners),while in a coda, a sequential mode dominates. Synchronousproduction is most compatible with recoverability when a narrowconstriction is coproduced with a wider one (Mattingly 1981), andtherefore is most clearly satisfied in the gestures constituting singlesegments, which typically involve combinations of a narrowconstriction of the lips, tongue tip, or tongue body with a widerlaryngeal or velic gesture (e.g., voiceless or nasal stops) or a widersupralaryngeal gesture (e.g., secondary palatalization, velarization,or rounding). Indeed the compatibility of synchronous productionwith recoverability may be what leads to the emergence of suchsegmental ions in phonology. But there is evidence that even multi-segment gestural structures (e.g., /sp/) exhibit some consequences ofa tendency to synchronize onsets (Browman and Goldstein 2000).

    Browman and Goldstein show that there are several featurallydistinct types of syllable position-induced allophony that can bemodeled as lawful consequences of these different coordinationstyles of onsets and codas. For example, vowels in English arenasalized before coda nasal consonants (but not before onset nasals).Krakow (1993) has shown that this can be explained by the fact thatthe oral constriction and velum-lowering gestures are synchronous inonsets, but sequential in codas, with the velum gesture preceding.The lowered velum superimposed on an open vocal tract is what isidentified as a nasalized vowel. A very similar pattern of intergesturalcoordination can account for the syllable position differences in [l] lighter in onsets and represented featurally as [-back], darker incodas and represented as [+back] (Browman and Goldstein 1995b;Krakow 1999; Sproat and Fujimura 1993). In this case the two

  • Articulatory Phonology 15

    coordinated gestures are the tongue tip closure and tongue dorsumretraction. Thus, two processes that are quite distinct in terms offeatures are lawful consequences of a single generalization aboutgestural coordination the distinct styles of coordination thatcharacterize onsets and codas. Compatible differences incoordination have been found for [w] (Gick in press) and for [r](Gick and Goldstein 2002).

    3. Parity in public language use

    Articulatory Phonology provides a foundation on which compatibletheories of speech production and perception may be built. Becausethese compatible theories are about between-person communication,a core idea that they share is that there must be a commonphonological currency among knowers of language forms, producersof the forms and perceivers of them. That is, the language forms thatlanguage users know, produce and perceive must be the same.2

    3.1. The need for a common currency in perceptually guided action,including speech

    The common currency theme emerges from research and theoriesspanning domains that, in nature, are closely interleaved, but thatmay be traditionally studied independently. In speech, the need for acommon currency so that transmitted and received messages may bethe same is known as the parity requirement (e.g., Liberman andWhalen 2000).

    The various domains in which the need for a common currencyhas been noted are those involving perception and action. Anexample is the study of imitation by infants. Meltzoff and Moore(1977, 1983, 1997, 1999) have found that newborns (the youngest 42minutes after birth) are disposed to imitate the facial gestures of anadult. In the presence of an adult protruding his tongue, infants

  • 16 Articulatory Phonology

    attempt tongue protrusion gestures; in the presence of an adultopening his mouth, infants attempt a mouth opening gesture.

    It is instructive to ask how they can know what to do. AsMeltzoff and Moore (e.g., 1997, 1999) point out, infants can see theadults facial gesture, say, tongue protrusion, but they cannot seetheir own tongue or what it is doing. They can feel their tongueproprioceptively, but they cannot feel the adults tongue. Meltzoffand Moore (1997, 1999) suggest that infants employ a supramodalrepresentation, that is, a representation that transcends any particularsensory system. To enable the infant to identify his or her facial orvocal tract organs with those of the model, the representation has toreflect what is common between the visually perceived action of theadult model and the proprioceptively perceived tongue and its actionby the infant. The commonality is captured if the supramodalrepresentation is of distal objects and events (tongues, protrusiongestures), not of the perceptual-system-specific proximal properties(reflected light patterns, proprioceptive feels) of the stimulation.

    We will use the term common currency in place ofsupramodal representation to refer, not to properties of arepresentation necessarily, but merely to what is shared ininformation acquired cross-modally, and as we will see next, to whatis shared in perceptual and action domains. To be shared, theinformation acquired cross-modally and shared between perceptionand action plans has to be distal; that is, it has to be about theperceived and acted-upon world. It cannot be about the proximalstimulation.

    Imitation is perceptually guided action, and Meltzoff andMoore might have noted that a similar idea of a common currency isneeded to explain how what infants perceive can have an impact onwhat they do. Hommel, Msseler, Aschersleben and Prinz (2001)remark that if perceptual information were coded in some sensoryway, and action plans were coded in some motoric way, perceptualinformation could not guide action, because there is no commoncurrency. They propose a common coding in which features of theperceived and acted upon world are coded both perceptually and in

  • Articulatory Phonology 17

    action plans. Here, as for Meltzoff and Moore, common currency isachieved by coding distal, not proximal properties in perception, and,in Hommel et al.s account, by coding distal action goals, not, say,commands to muscles, in action plans.

    In speech, two sets of ideas lead to a conclusion that speechproduction and perception require a common currency.

    First, infants imitate speech (vowels) beginning as young as 12weeks of age (Kuhl and Meltzoff 1996), and they integrate speechinformation cross-modally. That is, they look longer at a film of aperson mouthing the vowel they are hearing than at a film of a modelmouthing some other vowel (Kuhl and Meltzoff 1982). Both findingsraise issues that suggest the need for a common currency, firstbetween perceived infant and perceived adult speech, and secondbetween speech perceived by eye and by ear.

    Although imitation of vowels may involve informationobtained intra- rather than cross-modally (as in tongue protrusion),how infants recognize a successful imitation remains a challengingpuzzle. The infant has a tiny vocal tract, whereas the adult model hasa large one. Infants cannot rely on acoustic similarity between theirvowels and the models to verify that they are imitating successfully.They need some way to compare their vowels with the adults.Likewise, they need a way to compare visible with acousticallyspecified vowels. How can the infant know what facial gesture goeswith what acoustic signal? The problem is the same one confrontedby infants attempting to imitate facial gestures.

    A solution analogous to that proposed by Meltzoff and Moore(1997, 1999) is to propose that listeners extract distal properties andevents from information that they acquire optically and acoustically.3In this case, the distal events are gestural. Infants can identify theirvowel productions with those of the adult model, because theyperceive actions of the vocal tract (for example, tongue raising andbacking and lip protrusion). In the gestural domain, these actions arethe same whether they are achieved by a small or a large vocal tract.(This is not to deny that a nonlinear warping may be required to mapan infants vocal tract onto an adults. It is to say that infants have

  • 18 Articulatory Phonology

    lips, alveolar ridges on their palates, soft palates, velums, etc. Theycan detect the correspondence between the organs of their vocal tractand regions of their vocal tract with those of an adult. They candetect the correspondence of their actions using their organs in thoseregions with the actions of an adult.) The actions are also the samewhether they are perceived optically or acoustically.

    A second route to a conclusion that speaking and listeningrequire a common currency is different from the first, but notunrelated to it. It concerns the fact that language serves a between-person communicative function. Liberman and colleagues (e.g.,Liberman and Whalen 2000) use the term parity to refer to threerequirements of language if it is to serve its communicative function.The first two relate to between-person communication. (The third isthat, within a language user, specializations for speaking andlistening must have co-evolved, because neither specialization wouldbe useful without the other.) The first is that what counts for thespeaker as a language form also has to count for the listener. AsLiberman and Whalen put it, /ba/ counts, a sniff does not. If speakersand listeners did not share recognition of possible language forms,listeners would not know which of the noises produced by a speakershould be analyzed for its linguistic content. The second requirementis more local. It is that, for language to serve as a communicationsystem, characteristically, listeners have to perceive accurately thelanguage forms that talkers produce. There has to be parity orsufficient equivalence between phonological messages sent andreceived.

    Articulatory Phonology provides a hypothesis about a commoncurrency for speaking and listening. Like the other commoncurrencies that we have considered, this one is distal.

    3.2. Articulatory Phonology provides the common currency forspeech

  • Articulatory Phonology 19

    To communicate, language users have to engage in public,perceivable activity that counts as doing something linguistic formembers of the language community. Listeners have to perceive thatactivity as linguistic and to perceive it accurately for communicationto have a chance of taking place.

    Language forms are the means that languages provide formaking linguistic messages public. If language is well adapted to itspublic communicative function, then we should expect the forms tobe (or to be isomorphic with) the public actions that count as talking.In particular, we should expect the forms to be such that they can bemade immediately available to a listener available, that is, withoutmediation by something other than the language forms. This is theirnature in Articulatory Phonology.

    However, in most phonologies, as we noted earlier (section2.1), this is not their nature. In phonologies other than ArticulatoryPhonology, atomic language forms are mental categories. Moreover,in accounts of speech production and perception, the activities of thevocal tract that count as speaking are not isomorphic with thoselanguage forms, due in part, to the ostensibly destructive or distortingeffects of coarticulation. This means that elements of phonologicalcompetence are only hinted at by vocal tract actions. Further, becausevocal tract actions cause the acoustic signals that constitute thelisteners major source of information about what was said, theacoustic signals likewise do not provide information that directlyspecifies the elements of phonological competence. Perception has tobe reconstructive. In this system, talkers phonological messagesremain private. If communication succeeds, listeners represent thespeakers message in their head. However, transmission of themessage is quite indirect. Language forms go out of existence as themessage is transformed into public action and come back intoexistence only in the mind of the listener. This is not a parity-fostering system.

    In contrast, Articulatory Phonology coupled with compatibletheories of speech production and perception represents a parity-fostering system. Elements of the phonological system are the publicactions of the vocal tract that count as speaking. What language users

  • 20 Articulatory Phonology

    may have in their heads is knowledge about those phonologicalelements, not the elements themselves. If phonological atoms arepublic actions, then they directly cause the structure in acousticspeech signals, which, then, provides information directly about thephonological atoms. In this theoretical approach, language forms arepreserved throughout a successful communicative exchange; they arenot lost in the vocal tract and reconstituted by the perceiver.

    Note that this hypothesis is not incompatible with thepossibility that there are certain types of phonological processes thatdepend on the coarser topology of the gestural structures involved(e.g., syllable count, syllabification, foot structure and other prosodicdomains) and not on the detailed specification of the actionsinvolved. The hypothesis would claim that these coarser propertiesare ultimately derivable from the more global organization of publicactions and do not represent purely mental categories that existindependently of any actions at all.

    3.3. An integrative theory of phonological knowing, acting andperceiving

    The gestures of Articulatory Phonology are dynamical systems that,as phonological entities, are units of contrast; as physical entities,they are systems that create and release constrictions in the vocaltract. A compatible theory of speech production is one that spells outhow those systems generate constrictions and releases and how thegestures that form larger language forms are sequenced. To maintainthe claim that atomic language forms are preserved from languageplanning to language perception, the account has to explain howcoarticulated and coarticulating gestures nonetheless maintain theiressential properties.

    One such account is provided by task dynamics theory.

  • Articulatory Phonology 21

    3.3.1. Task dynamics

    Speech production is coordinated action. Like other coordinatedactions, it has some properties for which any theory of speechproduction needs to provide an account. One property is equifinality.This occurs in systems in which actions are goal-directed and inwhich the goal is abstract in relation to the movements that achieveit. In Articulatory Phonology, the gesture for /b/ is lip closure. Agesture counts as a lip closure whether it is achieved by a lot of jawclosing and a little lip closing or by jaw opening accompanied byenough lip closing to get the lips closed.

    Talkers exhibit equifinality in experiments in which anarticulator is perturbed on line (e.g., Gracco and Abbs 1982; Kelso,Tuller, Vatikiotis-Bateson and Fowler 1984; Shaiman 1989). Theyalso exhibit it in their coarticulatory behavior. In the perturbationresearch of Kelso et al., for example, a speaker repeatedly produced/baeb/ or /baez/ in a carrier sentence. On a low proportion of trials,unpredictably, the jaw was mechanically pulled down during closingfor the final consonant in the test syllable. If the consonant was /b/,extra downward movement of the upper lip achieved lip closure onperturbed as compared to unperturbed trials. If the consonant was /z/,extra activity of a muscle of the tongue allowed the tongue tocompensate for the low position of the jaw. Equifinality suggests thatgestures are achieved by systems of articulators that are coordinatedto achieve the gestures goals. These systems are known as synergiesor coordinative structures. The synergy that achieves bilabial closureincludes the jaw and the two lips, appropriately coordinated.

    It is the equifinality property of the synergies that achievephonetic gestures that prevents coarticulation from destroying ordistorting essential properties of gestures. An open vowelcoarticulating with /b/ may pull the jaw down more-or-less as themechanical jaw puller did in the research of Kelso et al. However, lipclosure, the essential property of the labial stop gesture is nonethelessachieved.4

  • 22 Articulatory Phonology

    The task dynamics model (e.g., Saltzman 1991, 1995; Saltzmanand Kelso 1987; Saltzman and Munhall 1989) exhibits equifinalityand therefore the nondestructive consequences of coarticulationneeded to ensure that gestures are achieved in public languageperformance.

    In the task dynamics model, the synergies that achieve gesturesare modeled as dynamical systems. The systems are characterized byattractors to the goal state of a phonetic gesture. These goal states arespecified in terms of tract variables. Because, in ArticulatoryPhonology, gestures create and release constrictions, tract variablesdefine the constriction space: they are specific to the organs of thevocal tract that can achieve constrictions, and they specify aparticular constriction degree and location for that constricting organ.For example, /d/ is produced with a constriction of the tongue tiporgan; the relevant tract variables are TTCL and TTCD (tonguetip constriction location and constriction degree). They areparameterized for /d/ so that the constriction location is at thealveolar ridge of the palate and constriction degree is a completeclosure. In task dynamics, gestural dynamics are those of a dampedmass-spring, and the dynamics are such that gesture-specific tractvariable values emerge as point attractors. The dynamical systemsexhibit the equifinality property required to understand howcoarticulation can perturb production of a gesture without preventingachievement of its essential properties.

    In one version of task dynamics, gestures are sequenced by thegestural scores generated by Articulatory Phonology from aspecification of coordination (phase) relation among the gestures.Each gesture is associated with an activation wave that determinesthe temporal interval over which the gesture exerts an influence onthe dynamics of the vocal tract. Due to coarticulation, activationwaves overlap.

  • Articulatory Phonology 23

    Figure 2. Simulation of lip and jaw behavior under task dynamic control. Thedashed line shows the position of the jaw and lips when the jaw is perturbed(by being mechanically restrained in a low position), and the solid lineshows the positions when the jaw is free to move. The lips close in bothcases, but increased movement of both upper and lower lips contributes tothe closure goal in the perturbed condition (after Saltzman 1986).

    Figure 2 shows a model vocal tract (lips and jaw only) undertask dynamic control achieving lip closure under perturbed andunperturbed (free) conditions. When the jaw is prevented fromclosing (the dashed line in Figure 2), the lips compensate by movingfarther than on unperturbed productions (the solid line). Figure 3shows equifinal production of /d/ during coarticulation with differentvowels. The top of the figure shows the shape of the front part of thetongue during the /d/ closure in the utterances /idi/, /ada/, /udu/ (asmeasured by x-ray microbeam data). The bottom of the figure showsthe front part of the tongue during closure for /idi/, /ada/, /udu/ asgenerated by the computational gestural model (Browman andGoldstein 1992a), incorporating the task-dynamics model. Theconstriction of the tip of the tongue against the palate is the same (inlocation and degree) across vowel contexts, even though the overallshape of the tongue varies considerably as a function of vowel.

    UL

    LLperturbedjaw

    freejaw

    lips meet

    free

    perturbed

  • 24 Articulatory Phonology

    Figure 3. Equifinality of tongue tip closure: data and simulation. Top: Spatialpositions of X-ray pellets during medial consonants of /pitip/, /patap/, and/putup/. Pellets on the surface of the tongue are connected by line segmentsare numbered from 1 (anterior) to 5 (posterior). Curve is an estimate of theposition of speakers palate. Units are millimeters. Bottom: Simulation of/idi/, /ada/, /udu/ using computational gestural model (Browman andGoldstein 1992a), incorporating task dynamics. Front part of tongue during/d/ is shown superimposed on overall outline of vocal tract. (Simulation wasnot an attempt to model the specific data shown, but employed generalprinciples of gestural score constriction for English.)

    Equifinality (or motor equivalence) in the task space is also akey property of other models of speech production, for example thatdeveloped by Guenther and his colleagues (e.g., Guenther 1995; thisvolume). A salient difference between Guenthers model andSaltzmans task dynamics model is that Guenthers model employsacoustic states in addition to orosensory states (which includeconstriction parameters) to specify speech goals, while the taskdynamics model is completely constriction based. Summarizing thearguments for or against acoustic goals for various types of

  • Articulatory Phonology 25

    phonological units goes well beyond the scope of this paper. Sufficeit to say that the final experiments on this question have not yet beenperformed.

    3.3.2. The direct realist theory of speech perception

    In the task dynamics account, gestures, the atoms of the phonology oflanguages, directly structure the acoustic speech signal. This allows,but does not guarantee, that listeners receive acoustic structurehaving sufficient information to specify the atoms. If this happens,then, in the context of the theories of Articulatory Phonology andtask dynamics, language forms are preserved throughout acommunicative exchange. Gestures are the common currency.

    In James Gibsons direct realist theory of perception (e.g.,Gibson 1966, 1979), informational media such as light for seeing,and acoustic energy for hearing do specify their distal sources. This isbecause there is a causal relation between properties of distal objectsand events and structure in those media, and because distinctiveproperties of objects and events tend to structure the mediadistinctively. In this way, structure in media imparted to sensorysystems serves as information for its distal source. In the account,whereas perceivers detect patterned energy distributions thatstimulate their sense organs (the proximal stimulus), they perceivethe properties and events in the world (distal events) that theenergy distributions provide information about.

    In the direct realist theory of speech perception (Best 1995;Fowler 1986, 1996), the gestures (distal events) that causallystructure acoustic speech signals (proximal stimulation) are, in turn,specified by them. When the same gestures causally structurereflected light, they may also be specified by reflected light structure.This gives rise to the common currency that allows infants as well asadults to integrate speech information cross-modally.

    Readers unfamiliar with direct realism sometimes ask howinfants learn to connect the patterns of stimulation at their senseorgans with the properties that stimulation specifies in the world.

  • 26 Articulatory Phonology

    This question, however, has to be ill-posed. The only way thatperceivers can know the world is via the information in stimulation.Therefore, the infant cannot be supposed to have two things that needto be connected: properties in the world and stimulation at senseorgans. They can only have stimulation at the sense organs that, fromthe start, gives them properties in the world. This is not to deny thatperceptual learning occurs (e.g., E. Gibson and Pick 2000; Reed1996). It is only to remark that learning to link properties of theworld with stimulation at the sense organs is impossible if infantsbegin life with no such link enabled.

    The claim that listeners to speech perceive gestures has notbeen accepted in the speech community, but there is evidence for it.Summaries of this evidence are available elsewhere. Here we providejust a few examples.

    3.3.2.1. Perception tracks articulation, I: Findings that led toLibermans motor theory

    Two findings were pivotal in Libermans development of his motortheory of speech perception (e.g., Liberman 1957, 1996; Liberman,Cooper, Shankweiler, and Studdert-Kennedy 1967). Both findingsreflect how listeners perceive stop consonants. Liberman andcolleagues had found that identifiable stop consonants in consonant-vowel syllables could be synthesized by preserving either of twosalient cues for it, the second formant transition into the vowel or thestop burst.

    In a study in which consonants were specified by their secondformant transitions, Liberman and colleagues (Liberman, Delattre,Cooper, and Gerstman 1954) found that the transitions for the /d/ in/dV/ syllables were quite different depending on the vowel. /di/ and/du/ provided a striking comparison in this respect. The secondformant transition of /di/ is high and rising in frequency to the levelof the high second formant (F2) for /i/. The transition of /du/ is low

  • Articulatory Phonology 27

    and falling in frequency down to the level of the low F2 for /u/.Isolated from the rest of the syllable, the transitions sound differentfrom one another in just the ways that their acoustic propertiessuggest; that is, they sound like high and low pitch glides. In context,however, they sound alike. Liberman (1957) recognized that theproperty that the /d/s in /di/ and /du/ share is articulatory. They areproduced as constriction gestures of the tongue tip against thealveolar ridge of the palate. Because of coarticulation with the vowel,the transitions, which are generated after the constriction is released,are determined not only by the consonantal gesture, but by the vowelgesture or gestures as well.

    Liberman et al. (Liberman, Delattre, and Cooper 1952) reporteda complementary finding. /di/ and /du/ provide instances in whichvery different acoustic cues lead to identical perceptions ofconsonants that were produced in the same way. The complementaryfinding was that identical acoustic cues can signal very differentpercepts in different vocalic contexts. In this study, voiceless stopswere signaled by the bursts that are produced just as stop constrictiongestures are released. Liberman et al. found that the same stop burst,centered at 1440 Hz was identified as /p/ before vowels identified as/i/ and /u/, but as /k/ before /a/. Because of coarticulation with thefollowing vowel, a stop burst at 1440 Hz had to be produced withdifferent constriction gestures in the context of /i/ and /u/ versus /a/.These two findings led Liberman (1957) to conclude that whenarticulation and acoustics go their separate ways, perception tracksarticulation.

    3.3.2.2. Perception tracks articulation, II: Parsing

    That perception tracks articulation has been shown in a different way.Coarticulation results in an acoustic signal in which the informationfor a given consonant or vowel is distributed often over a substantialinterval, and in that interval, production of other gestures also shapesthe acoustic signal. Listeners parse the signal along gestural lines,

  • 28 Articulatory Phonology

    extracting information for more than one gesture from intervals inwhich the acoustic signal was shaped by more than one gesture.Perception of fundamental frequency (F0) provides a strikingexample. The fundamental frequency of a speech signal is the resultof converging effects of many gestures. Most notably, thefundamental frequency contour marks an intonational phrase. Inaddition, however, it can be raised or lowered locally by productionof high or low vowels, which have, respectively, high and lowintrinsic F0s (e.g., Whalen and Levitt 1995; Whalen, Levitt, Hsaio,and Smorodinsky 1995). (That is, when talkers produce /i/ and /a/,attempting to match them in pitch, /i/ has a higher F0 than /a/.) TheF0 contour may also be raised on a vowel that follows a voicelessobstruent (e.g., Silverman 1986, 1987). Remarkably, listeners do notperceive the F0 contour as the intonation contour; they perceive it asthe intonation contour after parsing from F0 the effects of intrinsic F0of vowels and of consonant devoicing. That is, two equal F0 peaks(marking pitch accents in the intonation contour) are heard asdifferent in pitch height if one is produced on an /i/ and one on an /a/vowel (Silverman 1987) or if one occurs on a vowel following anunvoiced and one a voiced obstruent (Pardo and Fowler 1997). Theparsed F0 information is not discarded by perceivers. Parsedinformation about vowel intrinsic F0 is used by listeners asinformation for vowel height (Reinholt Peterson 1986); parsedinformation about consonant voicing is used as such (Pardo andFowler 1997).

    Nor are findings of gestural parsing restricted to perception ofinformation provided by F0. Listeners use coarticulatory informationfor a forthcoming vowel as such (e.g., Fowler and Smith 1986;Martin and Bunnell 1981, 1982), and that coarticulatory informationdoes not contribute to the perceived quality of the vowel with whichit overlaps temporally (e.g., Fowler 1981; Fowler and Smith 1986).That is, information in schwa for a forthcoming /i/ in an /bi /disyllable does not make the schwa vowel sound high; rather, itsounds just like the schwa vowel in /ba/, despite substantial acousticdifferences between the schwas. It sounds quite different from itself

  • Articulatory Phonology 29

    cross-spliced into a /ba/ context, where parsing will pull the wrongacoustic information from it.

    3.3.2.3. The common currency underlying audiovisual speechperception

    In the McGurk effect (McGurk and MacDonald 1976), anappropriately selected acoustic syllable or word is dubbed onto a facemouthing something else. For example, acoustic /ma/ may be dubbedonto video /da/. With eyes open, looking at the face, listeners reporthearing /na/; with eyes closed, they report hearing /ma/. That is,visible information about consonantal place of articulation isintegrated with acoustic information about voicing and manner. Howcan the information integrate?

    Two possibilities have been proposed. One invokes associationsbetween the sights and sounds of speech in memory (e.g., Diehl andKluender 1989; Massaro 1998). The other invokes common currency.Visual perceivers are well-understood to perceive distal events (e.g.,lips closing or not during speech). If, as we propose, auditoryperceivers do too, then integration is understandable in the same waythat, for example, Meltzoff and Moore (1997) propose that cross-person imitation of facial gestures is possible. There is commoncurrency between events seen and heard.

    As for the association account, it is disconfirmed by findings thatreplacing a video of a speaker with a spelled syllable can eliminateany cross-modal integration, whereas replacing the video with thehaptic feel of a gesture retains the effect (Fowler and Dekle 1991).There are associations in memory between spellings andpronunciations (e.g., Stone, Vanhoy and Van Orden 1997;Tanenhaus, Flanigan and Seidenberg 1980). Any associationsbetween the manual feel of a speech gesture and the acousticconsequence should be considerably weaker than those betweenspelling and pronunciation. However, the magnitudes of cross-modalintegration worked the other way.

  • 30 Articulatory Phonology

    Returning to the claim of the previous sections that listenerstrack articulations very closely, we note that they do so cross-modally as well. Listeners parsing of coarticulated speech leads tocompensation for coarticulation (e.g., Mann 1980). For example, inthe context of a preceding /l/, ambiguous members of a /da/ to /ga/continuum are identified as /ga/ more often than in the context of /r/.This may occur, because listeners parse the fronting coarticulatoryeffects of /l/ on the stop consonant, ascribing the fronting to /l/, not tothe stop. Likewise, they may parse the backing effects of /r/ on thestop (but see Lotto and Kluender 1998, for a different account).Fowler, Brown and Mann (2000) found qualitatively the same resultwhen the only information distinguishing /r/ from /l/ was optical(because an ambiguous acoustic syllable was dubbed onto a facemouthing /r/ or /l/), and the only information distinguishing /da/ from/ga/ was acoustic. Listeners track articulation cross-modally as wellas unimodally.

    3.3.2.4. Infants perceive gestures audiovisually

    We have already cited findings by Kuhl and Meltzoff (1982) showingthat four to five month old infants exhibit cross-modal matching.Given two films of a speaker mouthing a vowel, infants look longerat the film in which the speaker is mouthing the vowel they hearemanating from a loud speaker situated between the film screens.Converging with this evidence from cross-modal matching thatinfants extract supramodal, that is, distal world properties both whenthey see and hear speech is evidence that five month olds exhibit aMcGurk effect (Rosenblum, Schmuckler and Johnson 1997). Thus,infants extract gestural information from speech events before theycan appreciate the linguistic significance of the gestures.Development of that appreciation will accompany acquisition of alexicon (e.g., Beckman and Edwards 2000).

  • Articulatory Phonology 31

    3.3.3. A short detour: Mirror neurons

    Our chapter is about the character of public language use. However,by request, we turn from that topic to consider a mechanism that, bysome accounts, might plausibly underlie maintenance of a commoncurrency between production and perception of speech. That is themirror neuron uncovered in the research of Rizzolatti and colleagues(e.g., Gallese, Fadiga, Fogassi and Rizzolatti 1996; Rizzolatti,Fadiga, Gallese and Fogassi 1996).

    Mirror neurons, found in area F5 of the monkey cortex, fireboth when the monkey observes an action, for example, a particularkind of reaching and grasping action, and when the monkey performsthe same action. This is, indeed, a part of a mechanism that connectsaction to corresponding perceptions, and so part of a mechanism thatarguably might underlie speech perception and production if thoseskills have the character that we propose. Indeed, it is notable thatmirror neurons are found in the monkey homologue of Brocas areain the human brain, an area that is active when language is used.

    However, we hasten to point out that mirror neurons in no wayexplain how production and perception can be linked or can berecognized to correspond. Indeed, mirror neurons pose essentially thesame theoretical problem that we are addressing in this chapter, butnow at the level of neurons rather than of whole organisms. How cana neuron recognize the correspondences that mirror neurons do?They fire in the brain of a monkey when a human (or other monkey)performs an appropriate action or when the same action is performedby the monkey itself. But how can the same action be recognized assuch? Presumably the theory of mirror neuron performance will haveto include some concept of common currency of the sort we haveproposed for human speaker/listeners. In short, mirror neurons mustconstitute the culmination of the operations of a very smart neuralmechanism the workings of which we understand no better than weunderstand the achievements of perceiver/actors.

  • 32 Articulatory Phonology

    3.3.4. Articulatory Phonology, task dynamics and direct realism

    Articulatory Phonology, task dynamics and direct realism togetherconstitute a mutually consistent account of public communication oflanguage forms. Articulatory Phonology provides gestures that arepublic actions as well as being phonological forms. Task dynamicsshows how these forms can be implemented nondestructively despitecoarticulation. Direct realism suggests that gestures are perceptualobjects.

    Together, these accounts of phonological competence, speechproduction and speech perception constitute a perspective onlanguage performance as parity-fostering.

    4. Public language and the emergence of phonological structure

    We have argued that the most basic phonological units must bediscrete and recombinable, and also that phonological units shouldprovide a currency common to speech production and perception.Putting these two desiderata together means that we should find thesame categorical units common to both speech perception andproduction that can serve as phonological primitives. In this section,we review some preliminary work that investigates the basis fordecomposing production and perception into discrete units. Thisleads, in turn, to some predictions about the emergence ofphonological categories in the course of phonological developmentand these appear to be borne out.

    4.1. Distinct organs

    We argued that one basis for discrete units in speech productioncould be found in the constricting organs of the vocal tract (lips,

  • Articulatory Phonology 33

    tongue tip, tongue body, tongue root, velum, larynx). They areanatomically distinct from one another and capable of performingconstricting actions independently of one another. Note, too, thatthere is generally no continuum on which the organs lie, so that thenotion of a constriction intermediate between two organs is notdefined, e.g., between a velic constriction and a laryngealconstriction. (Although one could argue that the boundaries betweenthe tongue organs are less sharp in this particular sense, they stillexhibit distinctness and independence.) Studdert-Kennedy hasproposed that this particulation of the vocal tract into organs couldhave provided the starting point for the evolution of (arbitrary)discrete categories in language (Studdert-Kennedy 1998; Studdert-Kennedy and Goldstein in press).

    But what of discrete organs in speech perception? Evidence forperception of discrete organs can be found in the experiments onfacial mimicry in infants described above (Meltzoff and Moore1997). The imitations by infants are specific to the organ used by theadult model, even though they are not always correct in detail (forexample, an adult gesture of protruding the tongue to the side mightbe mimicked by tongue protrusion without the lateral movement).Meltzoff and Moore report that when a facial display is presented toinfants, all of their facial organs stop moving, except the oneinvolved in the adult gesture.

    So infants are able to distinguish facial organs (lips, tongue,eyes) as distinct objects, capable of performing distinct classes ofactions. The set of organs involved in speech intersects with those onthe face (lips and tongue common to both, though the tongue isfurther decomposed into tongue tip and tongue body), and alsoincludes the glottis and the velum. As gestures of the speech organsare usually specified by structure in the acoustic medium (in additionto, or instead of) the optic medium, we would predict that acousticmedium could also be used to trigger some kind of imitative responseon the part of infants to gestures of the speech-related organs (notnecessarily speech gestures). So, for example, we would expect thatinfants would move their lips if presented with an auditory lip smack,or with [ba] syllables. Such experiments are being pursued in our

  • 34 Articulatory Phonology

    laboratory.5 Prima facie evidence that infants can, in fact, useacoustic structure in this way can be found in their categoricalresponse to speech sounds produced with distinct organs, e.g.,place distinctions oral gestures of lips vs. tongue tip vs. tonguebody in newborns (Bertoncini et al. 1987). While it had beenclaimed (Jusczyk 1997) that young infants exhibit adult-likeperception of all phonological contrasts tested (not just between-organ contrasts) some recent reports suggest that certain contrastsmay not be so well perceived by young infants, or may showdecreased discriminability at ages 10-12 months, even when they arepresent in the ambient language. These more poorly discriminatedcontrasts are all within-organ contrasts. For example, Polka,Colantonio, and Sundara (2001) found that English-learning infantsaged 6-8 months showed poor discrimination of a /d / contrast, awithin-organ distinction that is contrastive in English. Best andMcRoberts (in press) report decreased discriminability for a varietyof within-organ contrasts at ages 10-12 months, regardless ofwhether they are contrastive in the language environment in whichthe child is being raised, but good discrimination of between-organcontrasts, even when the segments in question are not contrastive inthe learning environment (e.g. labial vs. coronal ejective stops forEnglish-learning infants).

    The remarkable ability of infants to distinguish organs leadsnaturally to a view that distinct organs should play a key role in theemergence of phonology in the child. Recent analyses of early childphonology provide preliminary support for such a view. Fergusonand Farwell (1975), for example, showed that narrow phonetictranscriptions of a childs earliest words (first 50) are quite variable.The initial consonant of a given word (or set of words) wastranscribed as different phonetic units on different occasions, andFerguson and Farwell (1975) argue that the variability is too extremefor the child to have a coherent phonemic system, with allophonesneatly grouped into phonemes, and that the basic unit of the childsproduction must therefore be the whole word, not the segment.However, if the childs word productions are re-analyzed in terms of

  • Articulatory Phonology 35

    the organs involved, it turns out that children are remarkablyconsistent in the organ they move at the beginning of a given word(Studdert-Kennedy 2000, in press; Studdert-Kennedy and Goldstein,in press), particularly the oral constriction organ (lips, tongue tip,tongue body). Thus, children appear to be acquiring a relationbetween actions of distinct organs and lexical units very early in theprocess of developing language. Organ identity is common toproduction and perception and very early on is used for lexicalcontrast.

    4.2. Mutual attunement and the emergence of within-organ contrasts

    As discussed earlier, distinct actions of a given organ can alsofunction as contrastive gestures. Such contrastive gestures typicallydiffer in the attractor states of the tract variables that control theparticular organ. For example, bet, vet, and wet all begin withgestures of the lips organ, but the gestures contrast in the state (value)of the Lip Aperture (LA) tract variable (degree of lip constriction).Lips are most constricted for bet, less so for vet, and leastconstricted for wet. Tongue tip gestures at the beginning of the wordsthick and sick differ the value of the Tongue Tip ConstrictionLocation (TTCL) tract variable (position of the tongue tip alongupper teeth and/or palate). The contrasting attractor values along LAor TTCL are in principle points along a continuum. How are thesecontinua partitioned into contrasting states?

    One hypothesis is that the categories emerge as a consequenceof satisfying the requirement that phonological actions be shared bymembers of a speech community. In order to satisfy that requirement,members must attune their actions to one another. Such attunementcan be seen in the spread of certain types of sound changes inprogress (e.g. Labov 1994), in the gestural drift found when aspeaker changes speech communities (Sancier and Fowler 1997), andin the babbling behavior of infants as young as 6 months old (deBoysson-Bardies et al. 1992; Whalen, Levitt and Wang 1991).

  • 36 Articulatory Phonology

    Mutual attunement must be accomplished primarily through theacoustic medium. Because the relation between constrictionparameters and their acoustic properties is nonlinear (Stevens 1989),certain regions of a tract variable continuum will afford attunement,while others will not. Thus, the categories we observe could representjust those values (or regions) of the tract variable parameters thatafford attunement. They are an example of self-organization throughthe public interaction of multiple speakers.

    It is possible to test this hypothesis through computationalsimulation of a population of agents that acts randomly under a set ofconstraints or conditions. (For examples of self-organizingsimulations in phonology, see Browman and Goldstein 2000; deBoer2000; Zuraw 2000). In a preliminary simulation designed toinvestigate the partitioning of a tract variable constriction continuuminto discrete regions, agents interacted randomly under the followingthree conditions: (a) Agents attempt to attune their actions to oneanother. (b) Agents recover the constriction parameters used by theirpartners from the acoustic signal, and that recovery is assumed to benoisy. (c) The relation between constriction and acoustics isnonlinear.

    The simulation investigated an idealized constriction degree(CD) continuum and how it is partitioned into three categories(corresponding to stops, fricatives, and glides). Figure 4 shows thefunction used to map constriction degree to a hypothetical acousticalproperty, which could represent something like the overall amplitudeof acoustic energy that emerges from the vocal tract during theconstriction. The crucial point is that the form of the nonlinearfunction follows that hypothesized by Stevens (1989) for constrictiondegree and several other articulatory-acoustic mappings. Regions ofrelative stability (associated with stops, fricatives, and glides) areseparated by regions of rapid change. The constriction degreecontinuum was divided into 80 equal intervals.

    Two agents were employed. Each agent produced one of the 80intervals at random, with some a priori probability associated witheach interval. At the outset, all probabilities were set to be equal. The

  • Articulatory Phonology 37

    simulation then proceeded as follows. On each trial, the two agentsproduced one of the 80 intervals at random. Each agent thenrecovered the CD produced by its partner from its acoustic property,and compared that CD value to the one it produced itself. If theymatched, within some criterion, the agent incremented the probabilityof producing that value of CD again. The recovery process works likethis. The function in Figure 4 is a true function, so an acoustic valuecan be mapped uniquely onto the CD value. However, since weassume the acoustics to be noisy in the real world, a range of CDvalues is actually recovered, within +/- 3 acoustic units of thatactually produced.

    Figure 4. Mapping between Constriction Degree and a hypothetical AcousticProperty used in agent-based computational experiment. Shape of functionafter Stevens (1989).

    Typical results of such a simulation (60,000 trials) are shown inFigure 5. For both agents (T1 and T2), the CD productions are nowpartitioned into 3 modes corresponding to the stable states of Figure4. These values of CD in these regions are relatively frequentlymatched because of their acoustic similarity: Several values of CDfall into the +/-3 acoustic unit noise range in these regions, while

    0 10 20 30 40 50 60 70 80

    500

    1000

    1500

    2000

    2500

    Constriction Degree

    AcousticProperty

  • 38 Articulatory Phonology

    only a few fall within the +/- 3 range in the unstable regions. Thus,mutual attunement, a concept available only in public language use,gives rise to discrete modes of constriction degree, under theinfluence of a nonlinear articulatory-acoustic map.

    Figure 5. Results of agent-based experiment. Probability distributions of the twoagents (T1 and T2) emitting particular values of constriction degree (60,000iterations).

    The role of attunement in partitioning within-organ contrasts isfurther supported by the developmental data described above. Whilechildrens early words are consistent in the oral constriction organemployed, and match the adult models in this regard, they are quitevariable in within-organ properties, such as constriction degree (orconstriction location). The simulations suggest that within-organcategories emerge only from attunement, which presumably takessome time. This conclusion is further bolstered by the recentperceptual findings with infants 10-12 months of age (Best andMcRoberts in press), showing reduced discrimination for within-organ contrasts, even when the contrasts can be found in the languagethe child is about to acquire. At this age, infants have only begun toattune their vocal behavior to the language environment (de

    0 10 20 30 40 50 60 70 800.008

    0.009

    0.01

    0.011

    0.012

    0.013

    0.014

    0.015

    0.016

    0.017Agent T1

    Constriction Degree

    Prob

    abilit

    y

    0 10 20 30 40 50 60 70 800.008

    0.01

    0.012

    0.014

    0.016

    0.018

    0.02

    Agent T2

    Constriction Degree

  • Articulatory Phonology 39

    Boyssson-Bardies et al. 1992), and therefore partitioning of within-organ categories is expected to be incomplete.

    5. Conclusions

    In the fields of linguistics and psycholinguistics, there is an almostexclusive focus on the individual speaker/hearer and on the covertknowledge (linguistics) or the covert mental categories and processes(psycholinguistics) that support language use. In these approaches,there has appeared to be no scientific harm in studying phonologicalcompetence, speech production and speech perception independently,and that is how research in these domains has proceeded for the mostpart. Independent investigation of these domains means that theoriesof phonology, speech production and speech perception are eachlargely unconstrained by the others. Phonological forms are notconstrained to be producible in a vocal tract, descriptions of vocaltract activities need not be, and are not, descriptions of phonologicalforms, and neither phonological forms, nor vocal tract activities needto be perceivable.

    There is scientific harm in this approach, however, becauselanguage use is almost entirely a between-person activity, and itmatters whether or not listeners perceive the language forms thatspeakers intend to convey. In our view, the prevalent views thatlanguage forms are mental categories, that coarticulation ensures thatvocal tract activity is not isomorphic with the forms, and thatlisteners perceive acoustic cues are the erroneous consequences ofthe exclusive focus on the individual language user.

    We start from the premise that languages evolved to be spokenand heard and, therefore, that language forms the means thatlanguages provide for making language use public are likely to bepublic events. We have shown that a phonological system can becomposed of public vocal events that is, gestures. The gestures, notacoustic cues, can be supposed to be perceived. And public languageforms can be shown to emerge, with the properties of ready

  • 40 Articulatory Phonology

    produceability and perceivability that language forms do have, fromimitative communicative exchanges between people. This, we argue,is real language.

    Notes

    1. Preparation of the manuscript was supported by NICHD Grant HD-01994, andNIDCD grants DC-03782 and DC-02717 to Haskins Laboratories.

    2. Obviously, this sameness has to be within some tolerance. For example, noteveryone who speaks the same language speaks the same dialect.

    3. This is not the solution adopted by Meltzoff and Kuhl (1994) to explainaudiovisual integration of speech. They propose that, during cooing andbabbling, as infants produce speech-like acoustic signals, they learn a mappingfrom articulations to acoustic signals. Articulations, like other facial or vocaltract actions, can be perceived supramodally. Accordingly, the articulation-acoustic mapping can underlie audiovisual integration of speech.

    4. It is, of course, possible that the essential properties of a gesture may fail to becompletely achieved in some prosodic, stylistic, or informational contexts. Forexample, a closure gesture may be reduced and fail to result in completeclosure. In such cases, the reduction can serve as information about thecontext. If the case of coarticulation, however, the essential properties of agesture would be systematically obscured (and never achieved), if it were notfor equifinality.

    5. MacNeilage (1998) has argued that speech emerges from oscillatorymovements of the jaw without specific controls for lips, tongue tip, and tonguebody. It is possible that infants early production of utterances with the globalrhythmical properties of speech have the properties he proposes. However,infants may have some control of individual movements of the separateorgans. It is just their integration into a more global structure that occurs onlyafter that global structure is established through mandibular oscillation.

  • Articulatory Phonology 41

    References

    Abler, William1989 On the particulate principle of self-diversifying systems. Journal

    of Social and Biological Structures 12: 1-13.

    Beckman, Mary and Jan Edwards2000 The ontogeny of phonological categories and the primacy of

    lexical learning in linguistic development. Child Development71: 240-249.

    Bertoncini, Josiane, Ranka Bijeljac-Babic, Sheila E. Blumstein and Jacques Mehler1987 Discrimination in neonates of very short CVs. Journal of the

    Acoustical Society of America 82: 31-37.

    Best, Catherine T.1995 A direct realist perspective on cross-language speech perception.

    In: Winifred Strange and James J. Jenkins (eds.), Cross-Language Speech Perception, 171-204. Timonium, MD: YorkPress.

    Best, Catherine T. and Gerald W. McRobertsin press Infant perception of nonnative contrasts that adults assimilate in

    different ways. Language and Speech.

    deBoer, Bart2000 Self-organization in vowel systems. Journal of Phonetics 28:

    441-465.

    de Boysson-Bardies, Benedicte, Marilyn Vihman, Liselotte Roug-Hellichius,Catherine Durand, Ingrid Landberg, and Fumiko Arao

    1992 Material evidence of infant selection from the target language: across-linguistic study. In: Charles Ferguson, Lise Menn, andCarol Stoel-Gammon (eds.), Phonological Development: Models,Resesarch, Implications, 369-391. Timonium, MD: York Press.

    Browman, Catherine P.

  • 42 Articulatory Phonology

    1994 Lip aperture and consonant releases. In: Patricia Keating (ed.),Phonological Structure and Phonetic Form: Papers inLaboratory Phonology III, 331-353. Cambridge: CambridgeUniversity Press.

    Browman, Catherine P. and Louis Goldstein1986 Towards an articulatory phonology. Phonology Yearbook 3: 219-

    252.

    Browman, Catherine P. and Louis Goldstein1989 Articulatory gestures as phonological units. Phonology 6: 151-

    206.

    Browman, Catherine P. and Louis Goldstein1990 Gestural specification using dynamically-defined articulatory

    structures. Journal of Phonetics 18: 299-320.

    Browman, Catherine P. and Louis Goldstein1992a Articulatory phonology: An overview. Phonetica 49: 155-180.

    Browman, Catherine P. and Louis Goldstein1992b Targetless schwa: An articulatory analysis. In: Gerard J.

    Docherty and D. Robert Ladd (eds.), Papers in LaboratoryPhonology II: Gesture, Segment, Prosody, 26-56. Cambridge:Cambridge University Press.

    Browman, Catherine P. and Louis Goldstein1995a Dynamics and articulatory phonology. In: Robert Port and

    Timothy van Gelder (eds.), Mind as Motion: Explorations in theDynamics of Cognition, 175-193. Cambridge, MA: MIT Press.

    Browman, Catherine P. and Louis Goldstein1995b Gestural syllable position effects in American English. In:

    Fredericka Bell-Berti and Lawrence Raphael (eds.), ProducingSpeech: Contemporary Issues, 19-33. New York: AmericanInstitute of Physics.

    Browman, Catherine P. and Louis Goldstein

  • Articulatory Phonology 43

    2000 Competing constraints on intergestural coordination and self-organization of phonological structures. Bulletin de laCommunication Parle 5: 25-34.

    Byrd, Dani1996 Influences on articulatory timing in consonant sequences.

    Journal of Phonetics 24: 209-244.

    Clements, G. N.1985 The geometry of phonological features. Phonology Yearbook 2:

    225-252.

    Cooper, Franklin S., Pierre C. Delattre, Alvin M. Liberman, John M. Borst andLouis J. Gerstman

    1952 Some experiments on the perception of synthetic speech sounds.Journal of the Acoustical Society of America 24: 597-606.

    Diehl, Randy L. and Keith R. Kluender1989 On the objects of speech perception. Ecological Psychology 1:

    121-144.

    Elson, Ben1947 Sierra Popoluca syllable structure. International Journal of

    American Linguistics 13: 13-17.

    Ferguson, Charles A. and Carol B. Farwell1975 Words and sounds in early language acquisition. Language 51:

    419-439.

    Fontana, Walter and Leo Buss1996 The barrier of objects: From dynamical systems to bounded

    organizations. In: John Casti and Anders Karlquist (eds.),Boundaries and Barriers, 56-116. Addison-Wesley, Reading,MA.

    Fowler, Carol A.1980 Coarticulation and theories of extrinsic timing. Journal of

    Phonetics 8: 113-133.

    Fowler, Carol A.

  • 44 Articulatory Phonology

    1981 Production and perception of coarticulation among stressed andunstressed vowels. Journal of Speech and Hearing Research 46:127-139.

    Fowler, Carol A.1986 An event approach to the study of speech perception from a

    direct-realist perspective. Journal of Phonetics 14: 3-28.

    Fowler, Carol A.1996 Listeners do hear sounds, not tongues. Journal of the Acoustical

    Society of America 99: 1730-1741.

    Fowler, Carol A., Julie Brown and Virginia Mann2000 Contrast effects do not underlie effects of preceding liquid

    consonants on stop identification in humans. Journal ofExperimental Psychology: Human Perception and Performance26: 877-888.

    Fowler, Carol A. and Dawn J. Dekle1991 Listening with eye and hand: Crossmodal contributions to speech

    perception. Journal of Experimental Psychology: HumanPerception and Performance 17: 816-828.

    Fowler, Carol A. and Mary Smith1986 Speech perception as vector analysis: An approach to the

    problems of invariance and segmentation. In: Joseph S. Perkelland Dennis H. Klatt (eds.), Invariance and Variability in SpeechProcesses , 123-136. Hillsdale, NJ: Lawrence ErlbaumAssociates.

    Fromkin, Victoria1973 Speech Errors as Linguistic Evidence. The Hague: Mouton.

    Gafos, Adamantios2002 A grammar of gestural coordination. Natural Language and

    Linguistic Theory 20: 269-337.

    Gallese, Vittorio, Luciano Fadiga, Leonardo Fogassi and Giacomo Rizzolatti1996 Action recognition in the premotor cortex. Brain 119: 593-609.

    Gibson, Eleanor and Anne Pick

  • Articulatory Phonology 45

    2000 An Ecological Approach to Perceptual Learning andDevelopment. Oxford: Oxford University Press.

    Gibson, James J.1966 The Senses Considered as Perceptual Systems. Boston, MA:

    Houghton-Mifflin.

    Gibson, James J.1979 The Ecological Approach to Visual Perception. Boston, MA:

    Houghton Mifflin.

    Gick, Bryanin press Articulatory correlates of ambisyllabicity in English glides and

    liquids. In: John Local, Richard Ogden and Rosalind Temple(eds.), Laboratory Phonology VI: Constraints on PhoneticInterpretation. Cambridge: Cambridge University Press.

    Gick, Bryan and Louis Goldstein2002 Relative timing of the gestures of North American English /r/.

    Journal of the Acoustical Society of America 111: 2481.

    Goldstein, Louis, Marianne Pouplier, Larissa Chen and Dani Byrdsubmitted Dynamic action units slip in speech production errors. Nature.

    Gracco, Vincent L. and James H. Abbs1982 Compensatory response capabilities of the labial system to

    variation in onset of unanticipated loads. Journal of theAcoustical Society of America 71: S34.

    Guenther, Frank1995 Speech sound acquisition, coarticulation, and rate effects in a

    neural network model of speech production. PsychologicalReview 102, 594-621.

    Haken, Hermann, Lieke Peper, Peter J. Beek and Andreas Dafferthofer1996 A model for phase transitions. Physica D 90: 179-196.

    Halle, Morris1983 On distinctive features and their articulatory implementation.

    Natural Language and Linguistic Theory 1: 91-105.

    Harris, Cyril M.

  • 46 Articulatory Phonology

    1953 A study of the building blocks in speech. Journal of theAcoustical Society of America 25: 962-969.

    Hockett, Charles1955 A Manual of Phonetics. Bloomington, Indiana: Indiana

    University Press.

    Hommel, Bernhard, Jochen Msseler, Gisa Aschersleben and Wolfgang Prinz2001 The theory of event coding (TEC): A framework for perception

    and action planning. Behavioral and Brain Sciences 24: 849-937.

    Jusczyk, Peter1997 The Discovery of Spoken Language. Cambridge, MA: MIT Press.

    Keating, Patricia A.1990 The window model of coarticulation: Articulatory evidence. In:

    John Kingston and Mary E. Beckman (eds.), Papers inLaboratory Phonology I: Between the Grammar and Physics ofSpeech, 451-470. Cambridge: Cambridge University Press.

    Kelso, J. A. Scott, Betty Tuller, Eric Vatikiotis-Bateson and Carol A. Fowler1984 Functionally specific articulatory cooperation following jaw

    perturbations during speech: Evidence for coordinativestructures. Journal of Experimental Psychology: HumanPerception and Performance 10: 812-832.

    Krakow, Rena1993 Nonsegmental influences on velum movement patterns:

    Syllables, segments, stress and speaking rate. In: Marie Huffmanand Rena Krakow (eds.), Phonetics and Phonology, 5: Nasals,Nasalization and the Velum, 87-116. New York: Academic Press.

    Krakow, Rena1999 Physiological organization of syllables: a review. Journal of

    Phonetics 27: 23-54.

    Kuhl, Patricia and Andrew Meltzoff1982 The bimodal perception of speech in infancy. Science 218: 1138-

    1141.

    Kuhl, Patricia and Andrew Meltzoff

  • Articulatory Phonology 47

    1996 Infant vocalizations in response to speech: Vocal imitation anddevelopmental change. Journal of the Acoustical Society ofAmerica 100: 2425-2438.

    Labov, William1994 Principles of Linguistic Change. Volume 1: Internal Factors.

    Oxford: Basil Blackwell.

    Liberman, Alvin M.1957 Some results of research on speech perception. Journal of the

    Acoustical Society of America 29: 117-123.

    Liberman, Alvin M.1996 Speech: A Special Code. Cambridge, MA: Bradford Books.

    Liberman, Alvin, Franklin S. Cooper, Donald Shankweiler and Michael Studdert-Kennedy

    1967 Perception of the speech code. Psychological Review 74: 431-461.

    Liberman, Alvin, Pierre Delattre and Franklin S. Cooper1952 The role of selected stimulus variables in the perception of the

    unvoiced-stop consonants. American Journal of Psychology 65:497-516.

    Liberman, Alvin M., Pierre Delattre, Franklin S. Cooper and Louis Gerstman1954 The role of consonant-vowel transitions in the perception of the

    stop and nasal consonants. Psychological Monographs: Generaland Applied 68: 1-13.

    Liberman, Alvin M. and Douglas H. Whalen2000 On the relation of speech to language. Trends in Cognitive

    Sciences 4: 187-196.

    Lotto, Andrew and Keith Kluender1998 General contrast effects in speech perception: Effect of preceding

    liquid on stop consonant identification. Perception &Psychophysics 60: 602-619.

    MacNeilage, Peter F.1998 The frame/content theory of evolution of speech production.

    Behavioral and Brain Sciences, 21, 499--511.

  • 48 Articulatory Phonology

    Mann, Virginia1980 Influence of preceding liquid on stop-consonant perception.

    Perception & Psychophysics 28: 407-412.

    Martin, James G. and Timothy Bunnell1981 Perception of anticipatory coarticulation effects in /stri, stru/

    sequences. Journal of the Acoustical Society of America 69: S92.

    Martin, James G. and Timothy Bunnell1982 Perception of anticipatory coarticulation effects in vowel-stop

    consonant-vowel syllables. Journal of Experimental Psychology:Human Perception and Performance 8: 473-488.

    Massaro, Dominic1998 Perceiving Talking Faces. Cambridge, MA: MIT Press.

    Mattingly, Ignatius1981 Phonetic representation and speech synthesis by rule. In: Terry

    Myers,