backley englishvowels_2010

Upload: hakim-jan

Post on 02-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 backley Englishvowels_2010

    1/80

    1

    Element Theory

    and the

    The Structure of English Vowels

    Phillip Backley

    Tohoku Gakuin University, Japan

    February 2009

  • 8/11/2019 backley Englishvowels_2010

    2/80

    2

    Contents

    Chapter 1. Background and Introduction

    Chapter 2. Representing Segmental Structure2.1 Segments have internal structure

    2.2 Articulation versus perception

    2.3 Elements as patterns in the speech signal

    2.4 Monovalency versus bivalency

    2.5 Elements and the grammar

    2.6 Summary

    Chapter 3. Element Theory and the Representation of Vowels

    3.1 Introduction3.2 What makes |A I U| special?3.3 |A I U| as simplex expressions

    3.4 |A I U| in compounds

    3.4.1 Phonetic evidence for element compounds

    3.4.2 Phonological evidence for element compounds

    3.5 Central vowels

    3.5.1 Phonetic evidence for empty vowels

    3.5.2 Phonological evidence for empty vowels

    Chapter 4. English Vowel Structure4.1 Introduction

    4.2 Front rounding in vowels

    4.3 Element dependency

    4.4 The representation of English vowels

    4.4.1 Introduction

    4.4.2 Short vowels

    4.4.3 Long monophthongs

    4.4.4 Weak vowels

    4.4.5 Diphthong structure

    4.4.6 |I| diphthongs

    4.4.7 |U| diphthongs4.4.8 |A| diphthongs

    Chapter 5. Summary

  • 8/11/2019 backley Englishvowels_2010

    3/80

  • 8/11/2019 backley Englishvowels_2010

    4/80

    4

    in terms of grammaticality, for instance. The Optimality view sees grammaticality as

    being determined by a once-only evaluation of some lexical input, whereas in standard theory

    a grammatical form corresponds to the final stage of a serial derivation process. Yet when it

    comes to segmental structure, the two approaches usually converge in the sense that they both

    employ distinctive features and they both admit lexical forms comprising linear strings of

    segments from which prosodic structure is largely predictable.

    Distinctive features are undeniably part of the fabric of mainstream phonology. This

    is not an a priori reason to accept their validity as units of linguistic structure, however. In

    fact this paper claims that features do not provide the most suitable means of representing the

    internal structure of language sounds. Instead, I argue that segmental representations are built

    from an alternative set of units called elements, which are mapped onto patterns that humans

    perceive in the speech signal. Clearly this departs from the standard view that features are

    associated with the articulatory properties of speech production. Below I illustrate the use of

    elements in representations by analysing the internal structure of vowels.

    The discussion is organised as follows. Section 2 considers some of the problems

    associated with distinctive features. In particular, it questions two common assumptions

    about the nature of features: their bias towards articulation and their reliance on binary values.

    (Readers who are already familiar with these issues and with the thinking behind the Element

    Theory approach may skip this section altogether, and proceed to section 3). Then section 3

    introduces Element Theory as an alternative way of describing segmental structure. It focuses

    on the representation of vowels using the elements |A I U|. Section 4 offers an Element

    Theory analysis of the vowel system(s) of English. It shows how an approach based on

    phonological elements can shed light on patterns that characterise the shape and behaviour of

    vowels in present-day English. Finally, section 5 summarises the main points.

  • 8/11/2019 backley Englishvowels_2010

    5/80

    5

    2: Representing Segmental Structure

    2.1 Segments have internal structure

    There is a long tradition of using segments to describe language sounds. For example,dictionaries provide segmental (i.e. phonemic) information to show the pronunciation of a

    word (e.g. segmental //), and linguists refer to inventories of segments when

    comparing one language with another, or when discussing the set of contrastive sounds in a

    language. Yet there is overwhelming evidence that segments are not the primary units of

    sound structure. Rather, by observing how sounds behave in languages we can uncover a set

    of more basic sound properties which collectively describe the internal make-up of segments;

    and it is this assumption which has driven the study of segmental phonology since the time ofTrubetzkoy.

    According to this view, segments with one or more of the same basic properties in

    common are expected to show similar phonological behaviour, whereas segments with little

    or no shared internal structure should show quite different behaviour. Identifying these basic

    sound properties is therefore central to the task of explaining segmental patterns and

    groupings. As any introductory course in phonology attempts to show, understanding the

    nature of segment-internal properties should reveal why segments regularly cluster together

    only in certain combinations and why segments interact in predictable ways as a result of

    coming into contact with each other. So, although the term segment continues to serve as a

    convenient label for referring to language sounds, segments themselves should not be seen as

    having the status they once had as formal units of linguistic structure.

    The standard approach views segments as bundles of co-occurring features, where

    each feature picks out one aspect of a segments behaviour. This means that one feature alone

    cannot define any individual segment; in order to characterise a segment in full we must refer

    to its combined feature specification that is, to the sum of its phonological properties.

    Nevertheless, single features do have a role in representation systems: each defines an entire

    class of segments, where every member of the class shares the same phonological property by

    virtue of having the same feature in its representation. With this one property in common,

    segments from the same class should, in principle, display similar phonological behaviour

    with respect to this property. For example, the feature [+coronal] unites a range of otherwise

    disparate sounds including [ ], all of which may follow the vowel [] in English:

    the words couch, mouth, owl, blouse, shout, count contain well-formed sequences of []

  • 8/11/2019 backley Englishvowels_2010

    6/80

    6

    plus coronal, whereas a segment from any other class is banned from this position (*[],

    *[], *[], etc.).

    2.2 Articulation versus perceptionBecause every language shows distributional regularities of the kind just described, there is

    little reason to doubt that segments have internal structure. What still remains unresolved,

    however, is the question of the nature of this internal structure. In particular, what are the

    linguistic units which represent the sub-segmental properties of speech sounds? As I have

    noted, the standard approach assumes a set of features adapted from those employed in SPE.

    From their labels alone (e.g. [high], [voice], [lateral], etc.) it is clear that features can be

    traced back to phonetic propertiesprimarily, to properties referring to articulation such as

    glottal state and tongue position. When they are used to analyse linguistic patterns in speech,

    however, they are also associated with the kinds of phonological properties that describe

    segmental contrasts and dynamic processes. So there is an underlying assumption that

    phonological phenomena are motivated by phonetics, and more specifically by speech

    productionthat is, by articulation.

    Yet the association between phonology and articulation is not a necessary one. The

    authors of Fundamentals of Language (Jakobson & Halle 1956) argued that phonological

    features should be defined in auditory-acoustic terms, and this view had a major influence on

    phonological studies until the time of SPE. For instance, they propose the feature pair

    [compact]/[diffuse], where these labels reflect the acoustic properties of the sound classes

    they represent. Specifically, these features describe how acoustic energy is distributed across

    the spectrum. In compact sounds such as low vowels and back consonants it is concentrated

    in the central area of the spectrum that is, the energy has a [compact] distribution in this

    acoustic region; whereas in diffuse sounds such as high vowels and front consonants it

    extends more widely across the spectrum in other words, the energy has a [diffuse]

    distribution. The other eight feature pairs proposed in Fundamentals of Language have a

    similar acoustic or hearer-oriented characterisation.

    The tradition of describing segmental structure in auditory-acoustic terms came to an

    abrupt end with the publication of SPE. This was despite the authors of SPE having given

    little justification for rejecting auditory-acoustic features or for adopting articulatory features

    instead. But such was the influence of SPE on the development of phonological theory that its

    preference for articulatory features quickly caught on. And to this day most analyses of

  • 8/11/2019 backley Englishvowels_2010

    7/80

  • 8/11/2019 backley Englishvowels_2010

    8/80

    8

    In short, there seems little support for the assumption that speech sounds should be

    represented in terms of articulatory properties. If anything, the arguments point towards

    speech perception as being primary and speech production only secondary. This was indeed

    the accepted position before SPE, as documented in the work of Sapir and Jakobson. It is also

    the position that Element Theory attempts to revive. As just indicated, the acquisition facts

    suggest that infant learners begin by perceiving adult input forms; on the basis of these input

    forms they build mental representations, which serve as the beginnings of their native

    lexicon; and only later do they go on to reproduce these stored forms as spoken language. But

    while the former (perception) stage is necessary for successful acquisition, the latter

    (production) stage is not, as confirmed by the ability of mutes and those with abnormalities of

    the vocal apparatus to acquire a native grammar; evidently, the inability to articulate normally

    is not a bar to perceiving speech. Conversely, speech production in the profoundly deaf rarely

    develops to a native-like level, presumably because their means of perceiving language lacks

    the necessary input from the speech signal.

    Having argued that speech perception is more fundamental to the grammar than

    speech production, it is natural to assume that segments should be formally described in

    terms of their perceptual (i.e. auditory) propertiesthat is, from the hearers point of view.

    Recall, however, that this paper is attempting to develop a representation system which

    favours neither the speaker nor the hearer, but which instead models the linguistic knowledge

    common to both. As suggested above, this means focusing on the speech signal the set of

    acoustic events which involves the transmission of sound waves through the air and which

    acts as an intermediary between the origin of a sound (the vocal organs of the speaker) and its

    target (the auditory system of the hearer). This approach is motivated in Harris & Lindsey

    (2000), where it is proposed that the speech signal be understood as a channel through which

    speakers transmit and monitor[linguistic] information and listeners receive it (Harris &

    Lindsey 2000: 185).

    As a physical phenomenon, the speech signal is something that can be measured in

    concrete terms. So when an utterance is transmitted between speaker and hearer it is possible

    to describe its acoustic properties (e.g. amplitude, formant values). However, it seems that

    most of these properties are irrelevant to the grammar, and as such, need not be encoded by

    features in phonological representations. Indeed, the extensive literature on segmental

    structure gives no indication that raw acoustic data such as formant values or voice onset

    measurements have any place in formal phonological theory. A simple parallel can be found

    in music: although the notes of a musical phrase can be described by referring to their

  • 8/11/2019 backley Englishvowels_2010

    9/80

    9

    physical attributes (e.g. frequency in hertz), a musician does not need precise information of

    this kind in order to perceive that phrase, store it in memory, or reproduce it as a melody. Nor

    do these physical characteristics need to be written on the page of a musical score. A musical

    note is identified not by raw acoustic values, but rather, by its overall acoustic shape and its

    relation to other notes in the musical context.

    Like musicians, language users do not classify sounds according to their acoustic

    properties. It is true that phoneticians may use phonetic data such as formant frequency to

    describe the sounds of a language, or to compare different languages; importantly, however,

    these data do not constitute linguistic information, and as such, do not identify segmental

    features. But if the speech signal is the medium by which language is transferred between

    speaker and hearer, then which aspects of the signal arerelevant to the grammar and to the

    communication process? The claim made by Element Theory is that humans perceive specific

    information-bearing patterns in the speech signal, and that each pattern is represented by an

    element, where an element is taken to be the smallest unit of segmental structure present in

    mental representations. This is the position motivated in Harris & Lindsey (2000) and

    summarised in Nasukawa & Backley (2008).

    2.3 Elements as patterns in the speech signal

    The Element-based approach assumes that hearers instinctively seek out linguistic

    information: when decoding speech, they ignore most of the incoming acoustic stream and

    focus only on the specifically linguistic information contained within the speech signal. Thus

    Element Theory recognizes the human ability to extract from running speech only those

    acoustic patterns that are relevant to language. And, as just mentioned, it further assumes that

    the mental phonological categories represented by elements are mapped directly on to those

    same acoustic patterns. So although elements are associated with certain physical patterns in

    the speech signal, they exist primarily as mental constructsthat is, as units of phonological

    structurein the internalized grammar. In order to highlight the way the term element can

    refer to both the physical and the mental, Harris & Lindsey (2000) describe elements as

    auditory images. This label suggests that an element is primarily a grammar-internal object

    a mental image of some linguistically significant information, but that it is also a

    grammar-external objecta physical pattern in the speech signal which hearers use to cue

    that mental image. The defining characteristics of these speech signal patterns are described

    in section 3 below.

  • 8/11/2019 backley Englishvowels_2010

    10/80

    10

    So far, the discussion has given only a hearer-oriented view of elements, in which

    hearers perceive the speech signal, recover information-bearing patterns from it, and then

    associate those patterns with particular elements in phonological structure. But the speech

    signal is a neutral medium, and must therefore carry linguistic information which is also

    relevant to speakers. In the case of speakers, the same information-bearing patterns function

    not as perceptual cues but as production (i.e. articulation) targets. It must be assumed that a

    speakers internalized grammar includes knowledge of the mapping between elements in

    lexical representation and their associated acoustic patterns in the speech signal. So in order

    to phonetically interpret a word, speakers must access the lexical form of that word, associate

    the elements it contains with their corresponding speech signal patterns, and use the vocal

    organs to reproduce those target acoustic patterns in an utterance.

    Importantly, this process of reproducing an acoustic target succeeds without the need

    for an element to contain information about speech production. For the grammar to specify

    any mapping between elements and articulation would be at best unnecessary, and at worst

    counter-productive, since there is not always a one-to-one correspondence between the shape

    of the vocal tract and the resulting sound. Consider a trained ventriloquist, for example, who

    can reproduce the speech signal pattern associated with bilabial stops but without using

    conventional lip closure. Even untrained speakers typically have available to them a choice of

    different articulatory configurations for creating the same acoustic result. For example, to

    bring about a general downward shift in vowel formant values creating a flattening of

    the sound spectrum (Jakobson, Fant & Halle 1952: 31)speakers may employ lip rounding,

    or a contraction of the pharynx, or a combination of the two. 1 In sum, an element in

    phonological representation establishes which signal pattern a speaker must aim for, but it

    does not prescribe what the speaker must do to reach the target. A suitable articulation is

    something that speakers master only through being experienced users of their native language.

    Before returning to the issue of distinctive features, let us review the way some basic

    phonological concepts should be (re)defined in light of the preceding discussion on the nature

    of Element Theory. First, the elements themselves are to be seen as acoustic images

    primarily as cognitive objects which are present in lexical representations and which serve to

    encode contrasts and alternations. However, elements also connect to the external world by

    having a direct physical interpretation they are mapped onto certain acoustic patterns in

    the speech signal which carry linguistic information. Thus a phonological representation may

    1For further examples, see Harris & Urua (2001: 79).

  • 8/11/2019 backley Englishvowels_2010

    11/80

    11

    be thought of as a code which allows language users to store and identify these mental

    acoustic patterns.

    In contrast, speech production is an aspect of language use which is not controlled by

    the grammar. Tongue position, glottal state, lip attitude and the like do not constitute

    linguistic information; rather, they provide a way of delivering the speech signal. So

    articulation serves as a vehicle for carrying the linguistic message, but it does not constitute

    the message itself. To reinforce this point, we need only consider the communication process:

    when a hearer perceives information-bearing patterns in the speech signal, each pattern acts

    consistently and reliably as a cue to its associated elementit makes no difference whether

    the signal originates from the articulation of an actual utterance, or from the recording of an

    actual utterance, or from a synthesized, unarticulated voice on a computer. In each case the

    linguistic message is the same, regardless of whether the vocal organs are involved or not,

    since articulation is not a component of the mental grammar.

    In conclusion, there is little evidence to support the prevailing view that the basic

    units of segmental structure are defined in articulatory terms. For this reason, section 3 will

    argue for an alternative view of phonological representations in which features or elements

    are mapped onto certain patterns in the speech signal. Although these patterns can be

    characterized by their acoustic properties, they are to be understood primarily as cognitive

    units which carry linguistic information about the identity of morphemes.

    2.4 Monovalency versus bivalency

    Before going on to introduce the elements in detail, this section addresses another issue

    concerning the use of distinctive features: should features (or elements) in representations be

    monovalent (single-valued) or bivalent (binary-valued)? The standard model follows a

    tradition of employing bivalent features, meaning that the grammar marks the presence of a

    phonological property by specifying a positive feature value, while the absence of that

    property is shown by the corresponding negative value. For example, l-sounds are specified

    as [+lateral] while all other sounds are [lateral]; this creates an equipollent distinction

    between lateral and non-lateral, according to which [+lateral] and [lateral] appear to have

    equal status because the grammar is able to refer to either category. But alongside bivalent

    features such as [lateral] we also find a number of monovalent features being used in some

    versions of the standard model (Steriade 1995). Unlike [lateral], a monovalent feature such

    as [round] can only refer to the presence of a given property, not to its absence. This creates a

  • 8/11/2019 backley Englishvowels_2010

    12/80

    12

    privative distinction between the opposing categories, because only a single value of the

    feature can be expressed in representational terms.

    (1)

    [] vs. [] [] vs. []

    a. bivalency [+round] vs. [round] [lateral] vs. [+lateral]

    b. monovalency [round] vs. vs. [lateral]

    As (1) shows, there are two ways of referring to the same phonological contrast,

    because there are two ways of expressing the absence of a certain property. For example, to

    describe a back unrounded vowel such as [] we can either use [round] (i.e. the negativevalue of the bivalent feature [round]) or we can choose to make no reference to rounding, as

    indicated in (1) by (i.e. the monovalent feature [round] is absent from the segment s

    representation). At first sight, the difference between [round] and seems trivial, because

    the same contrast can be expressed in both systems. However, several authors including

    Durand (1995), Kaye (1989), Harris (1994) and Roca (1994) have noted that the choice

    between bivalency and monovalency affects our predictions about how language sounds are

    grouped into natural classes and how they participate in phonological processes. That is, thetwo systems make different grammatical statements.

    To illustrate this point, consider the representation of nasal vowels such as [] and [].

    These belong to a natural class a non-random group whose members all share some

    physical characteristic (nasal resonance) and, more importantly, some pattern of phonological

    behaviour (e.g. vowel lowering, trigger of nasal harmony). It is assumed that these shared

    physical and phonological characteristics are an indication that the same structural property

    in this case, nasality is specified in the representation of each member of the natural

    class. In other words, the common structural property defines the natural class. Furthermore,

    most theories of segmental structure assume that this class-defining property corresponds to a

    basic, indivisible unit in phonological representation, typically a feature or an element. In this

    example the basic property is nasality, so it follows that every nasal vowel must have a

    nasality feature/element in its segmental make-up.

    The Amerindian language Warao (Osborn 1966) illustrates how monovalency and

    bivalency make different grammatical predictions (data from Botma 2005):

  • 8/11/2019 backley Englishvowels_2010

    13/80

    13

    (2) a. sun c. summerb. walking d. kind of tree

    As (2a-b) show, this language has a lexical contrast between oral and nasal vowels. So in a

    monovalent system of representation, the feature [nasal] appears in the structure of [] in (2b),

    while [] in (2a) makes no reference to [nasal] and is therefore interpreted as an oral vowel.

    Alternatively, under bivalency [] is specified as [+nasal] while [] has [nasal]. (2c-d) show

    that Warao also has a process of nasal harmony, where the presence of a nasal trigger (a nasal

    vowel or nasal consonant) causes all target sounds (vowels, laryngeals, glides) to its right to

    be nasalised within the word domain. Any harmonic trigger in Warao is characterised as a

    segment with [nasal]/[+nasal] in its lexical representation, where this feature defines a natural

    class of nasals all united by similar (harmonic) behaviour.

    As expected, oral vowels do not act as harmonic triggers in this language, because

    they have no [nasal]/[+nasal] specification. Moreover, they do not constitute a natural class

    because they display no unified active behaviour.2 Importantly, the fact that [nasal] (i.e.

    oral) vowels collectively do not do something provides no justification for grouping them

    together as a natural class. Yet this is exactly what the bivalent feature system does. Allowing

    [nasal] to appear in representations gives it a grammatical status equal to that of [+nasal],

    making it possible for the phonology to refer to [nasal] as well as [+nasal] as an active

    property in some phonological process. However, the evidence does not support this position:

    for example, we find no comparable process of oral harmony in which [nasal] acts as a

    harmonic trigger and oralises nasal vowels. In short, it is difficult to motivate the bivalency

    prediction that [nasal] and [+nasal] both exist as basic structural properties, and hence, as

    two separate natural classes.

    It seems, then, that the problem with bivalent features arises from their ability to refer

    to negative categories that is, to properties which are absentfrom a segments structure.To reinforce this point, consider other negative features besides [nasal] that characterise oral

    vowels under a bivalent feature system. Oral vowels are all non-lateral, for example. But the

    feature [lateral] does not define a natural class either, because it identifies a whole range of

    sound classes besides oral vowels (e.g. obstruents, nasal stops, rhotics) which cannot be

    unified by the presence of even a single common property. Compare this with a true natural

    class such as [+nasal], whose members comprise nasal vowels and consonants; all and only

    2Note that [nasal] fails to capture the class of segments targeted by nasal harmony in Warao, because this set

    includes some non-nasals (e.g. glides) but excludes other non-nasals (e.g. obstruents).

  • 8/11/2019 backley Englishvowels_2010

    14/80

    14

    these sounds act as harmonic triggers in Warao because only these sounds possess the active,

    class-defining feature [+nasal].

    By contrast, the use of monovalent features makes it possible for the segmental

    structure itself to show that nasal vowels form a grammatical set whereas oral vowels do not.

    [nasal] identifies the nasal vowels as a natural class, while the lack of any equivalent feature

    specification for oral vowels indicates that they have no common behaviour; furthermore, it

    prevents the grammar from referring to them as a unified set. In more general terms, the

    monovalent feature [nasal] groups together nasal vowels and consonants as a natural class, as

    evidenced by Warao nasal harmony, whereas the arbitrary set of non-nasal segments (oral

    vowels and all non-nasal consonants) displays no common properties and consequently has

    no feature specification to indicate natural class status.

    The conclusion to be drawn from this comparison between monovalent and bivalent

    features is that bivalency makes for an altogether less restrictive system. Since bivalency

    forces representations to specify either the presence or the absence of a given property, the

    number and nature of specifiedand therefore potentially activephonological properties

    exceeds what is actually observed in natural languages. In other words, it predicts the

    possibility of many phonological processes and therefore many grammars that would

    presumably be ruled out by a more constrained theory. Of course, the notion of restrictiveness

    now plays a relatively minor role in theory building. By contrast, in early generative theory

    the issue of restricting the generative capacity of the grammar was of central concern, when

    the focus was on developing a model that could generate any possible grammar and at the

    same time rule out any impossible one.

    Even the authors of SPE recognised that the use of bivalent features did not square

    easily with the generative ideal. This is clear from the final chapter of SPE, where they

    acknowledge an asymmetry between the two values of a feature which cannot be expressed

    simply by plus or minus. Their response was to propose a theory of markedness an

    independent mechanism for calculating the grammatical significance of different feature

    values, these calculations being based on cross-linguistic generalisations about the choice of a

    default or unmarked value over its opposite value. According to their proposal, the relative

    markedness of [+feature] or [feature] could be determined on the basis of, for example, how

    widely a feature value was distributed across languages and the stage of acquisition when a

    feature value is first used. However, the elaborate way in which markedness theory was

    formulated does little to disguise its true identity as a repair strategy and an admission that

  • 8/11/2019 backley Englishvowels_2010

    15/80

  • 8/11/2019 backley Englishvowels_2010

    16/80

    16

    valency of a feature appears to be an inherent and unpredictable property of that feature

    simply an observation about its behaviour in the phonology.

    But if the task of identifying the basic units of segmental structure comes down to one

    of observing active properties, then it is logical to assume that we can observe only what is

    there, not what is absent. This means that if [+ant] and [ant] are both active in the grammar,

    they must represent two distinct, equal and independent (albeit complementary) properties

    that are both in some sense positive. As such, they are better expressed as a pair of

    monovalent features such as [anterior] and [posterior].3Moreover, if the same idea can also

    be extended to other cases where polar values are typically used, then it becomes feasible to

    dispense with bivalency altogether: each negative feature displaying active phonological

    behaviour is replaced with an equivalent monovalent feature, as illustrated by the

    hypothetical example [ant][posterior], while redundant negative features are simply

    ignored because they are linguistically insignificant. The result is a wholly monovalent

    approach to the representation of segmental properties. This is the position taken in Element

    Theory. The following sections will show how the notion of element is entirely consistent

    with the theoretical conclusions drawn above: units in segmental representation should be

    monovalent and should map onto linguistically significant patterns in the speech signal.

    2.5 Elements and the grammar

    From the way phonological representations are formulated in the standard approach, it is easy

    to gain the impression that features occupy a separate and autonomous level of structure. Of

    course they do show a direct relation with prosodic structure, by virtue of being associated to

    syllabic constituents or to intervening timing units. But they appear to play no role in

    determining or even influencing other aspects of the phonology. This is clear from the fact

    that features have been transferred from the standard approach to quite different theoretical

    models like Optimality Theory (Kager 1999, McCarthy 2002) without the need for any

    modification. In the case of elements, however, the same is not true: here I show how the

    decision to employ elements in representations goes hand in hand with other decisions about

    the shape of the grammar. In 2.3 it was argued that elements should map onto patterns in the

    acoustic signal, and in addition, in 2.4 it was claimed that they should be single-valued. Let

    us now consider the effects of these two conditions on the phonological model as a whole. It

    3

    To my knowledge, [posterior] has never been seriously considered as a member of the feature set. However,we do find legitimate cases where the standard approach has recast a single bivalent feature in monovalent

    terms: for example, [ATR] may be redefined as [ATR] and [RTR].

  • 8/11/2019 backley Englishvowels_2010

    17/80

  • 8/11/2019 backley Englishvowels_2010

    18/80

    18

    marked or positive property. Although [] does have other phonetic qualities including (in

    traditional feature terms) [+high] and [round], Element Theory treats these as unmarked and

    phonologically inactive;4 as such, they are not specified in this vowels structure. When a

    speaker interprets |I| as [] the result in phonological terms is pure frontnessof F2, since noother elements are present to indicate other marked properties. This is also the reason why [ ]

    is interpreted with the default phonetic qualities [+high] and [round]: a [+high] vowel results

    from the absence of the open element |A| (see footnote 4), while a [round] vowel is the

    phonetic byproduct of there being no round element |U| in the representation of []. The

    elements |I A U| are discussed fully in the following section.

    The previous paragraph has outlined one of the distinguishing properties of Element

    Theory namely, the independent phonetic interpretability of elements. Yet an elementsability to be interpreted in isolation is something which relates not only to segmental structure

    but more generally to the organization of the phonology as a whole. If phonological

    representations are pronounceable as they stand, then in principle Element Theory needs no

    separate level of phonetic representation. In other words, the use of elements implies a

    monostratal organisation of the phonology. Once again this marks a significant departure

    from the standard approach, which assumes a bi-stratal (or multi-stratal) model in which two

    (or more) levels of representation are required because each serves a different function:

    (4) underlying representation function: lexical storage

    (units: abstract, contrastive)

    surface representation function: input to articulation/perception

    (units: concrete, phonetic)

    The traditional arrangement in (4) presents phonology as a device for creating

    phonetic objectsthat is, for taking abstract phonological forms and converting them into

    concrete phonetic forms that can serve as the input to external language processes such as

    articulation and perception. As Harris (1994) points out, however, this renders phonology a

    performance system, its purpose being to generate phonetic representations and check the

    4

    To capture the height dimension in vowels, Element Theory posits |A| as the marked property. The element|A| loosely equates with the feature [+low], therefore high (i.e. non-low) vowels like [i] make no reference to |A|

    in their representations. Section 3 describes the vowel elements in detail.

    structure-changing

    operations

  • 8/11/2019 backley Englishvowels_2010

    19/80

    19

    grammaticality of utterances. In effect, it places phonology outside linguistic competence and

    thus outside the confines of the grammar. Yet treating phonology as extra-grammatical

    clearly goes against our understanding of what language users know. We assume, for instance,

    that linguistic knowledge includes knowledge of certain phonological generalisations like

    patterns of alternation and distribution, which are evidently part of linguistic competence

    because they exist independently of articulation and/or perception.5

    So by assuming a derivational model as in (4), the standard approach gives phonology

    a somewhat ambiguous status with respect to its role in the grammar. At best, we might say

    that the standard approach allows phonology to straddle both sides of the traditional division

    between competence and performance: by capturing a languages structure-changing

    operations (i.e. rules or constraints) it relates to competence, whereas by preparing lexical

    forms for articulation and/or perception (i.e. derivational output) it relates to performance.

    Clearly, however, this situation is at odds with the general assumption that phonology should

    be treated as part of the core grammar.

    In response, Element Theory avoids this ambiguity by keeping phonology entirely

    within the domain of linguistic competence. In an element-based phonology, therefore,

    phonological processes do not create phonetic or pronounceable forms; in fact, they have no

    direct connection with utterances. Unlike in derivational models, their role is not to take an

    abstract representation and convert it into something more physical; rather, they take an

    abstract phonological form, such as a stored lexical representation, and impose structural

    regularities on it so that it conforms to the grammar of a given language. For example, they

    may force contiguous consonants to agree in voicing, or they may cause vowels to shorten in

    closed syllables. In other words, phonological processes control grammaticality by generating

    the set of grammatical phonological structures of a language. Importantly, however, the

    output of such processes will be no less abstract than the input: an element-based process can

    only change a phonological object into another phonological object.

    Of course, the inability of an element-based phonology to generate phonetic forms is

    countered by the phonetic interpretability of elements. As discussed above, it is proposed that

    any element expression can be mapped onto its corresponding physical pattern in the speech

    signal; moreover, this can take place at any stage of derivation, since lexical representations

    and derived representations are assumed to be of the same type. In principle, then, any lexical

    5

    The traditional bi-stratal model in (4) is also motivated by the supposed advantage of separating idiosyncraticinformation (in lexical storage) from predictable information (in the structure-changing component). As Harris

    (1994) points out, however, this position has never been strongly defended in the psycholinguistics literature.

  • 8/11/2019 backley Englishvowels_2010

    20/80

    20

    form may be interpreted by a speaker or hearer as it stands. In practice, however, the result is

    likely to be an ungrammatical string, because in such cases the phonology has not imposed its

    characteristic effects on the grammaticality of the structure in question. So although lexical

    forms in Element Theory have much in common with derived forms for example, both

    involve abstract phonological representations, both employ the same structural units, and

    both can be pronounced as they areit is derived forms which are consistently grammatical

    and thus relevant to the process of information exchange via the speech signal.

    2.6 Summary

    What this discussion has shown is that the Element Theory approach to representation takes a

    more abstract view of phonology than we find in the standard approach, in the sense that

    phonology itself is seen as being concerned only with abstract or cognitive objects. On the

    one hand, the standard approach operates primarily as a performance system, generating

    phonetic forms and thereby bridging the divide between the cognitive and the physical. On

    the other hand, the element-based approach operates exclusively within the cognitive domain,

    providing a system for organising language users knowledge about phonological strings and

    about the internal structure of morphemes. So Element Theory incorporates phonology into

    the competence grammar as follows:

    (5)

    component controls determining

    syntax sentence structure how words behave in sentences

    morphology word structure how morphemes behave in words

    phonology morpheme structure how elements behave in morphemes

    As a component of the cognitive grammar, phonology in Element Theory has little to

    say about raw phonetics. Like other theoretical approaches, it does recognise the role of

    phonetic factors such as ease of articulation and/or perception in shaping the phonology; but

    unlike most other approaches, it does not see any place for phonetic factors in mental

    phonological representations. Similarly, speech production is viewed as a grammar-external

    process specifically, as a system for transmitting linguistic information; this effectively

    puts articulation on a par with writing, since both of these media function as vehicles for

    delivering language but neither actually constitutes the linguistic information itself. After all,

  • 8/11/2019 backley Englishvowels_2010

    21/80

    21

    the inability to write does not prevent a person from acquiring a normal grammar, and neither

    does the inability to speak.

    Taking all these points into consideration, this paper develops a model of segmental

    representation which uses monovalent elements as the basic units of phonological structure.

    Elements represent the cognitive categories that are responsible for conveying linguistic

    information about the structure of morphemes. For the purposes of communication, elements

    also connect to the physical world by mapping onto information-bearing patterns that humans

    perceive in the speech signal. However, their cognitive function remains primary. This means

    that the process of identifying elements should begin with an analysis of phonological

    behaviour (e.g. distribution, alternation, natural classes); only after an element has been

    identified as a grammatical unit can it be associated with a particular speech signal pattern. In

    other words, phonological structure is determined primarily through data analysis, and only

    secondarily through listening.

  • 8/11/2019 backley Englishvowels_2010

    22/80

    22

    3: Element Theory and the Representation of Vowels

    3.1 Introduction

    Section 2 considered some of the problems inherent in the standard feature-based approach to

    segmental representation. It also claimed that these problems could be overcome by imposing

    certain conditions on the way the basic units of segmental structure are formulated. In

    particular, it advocated single-valued features which stand for abstract phonological

    categories. These features, which I will refer to as elements, are the units which characterize

    the lexical shape of morphemes but which also map onto information-bearing acoustic

    patterns in the speech signal.Element Theory claims that the segmental properties of all languages are described

    using the set of six elements |A I U H N|. These fall naturally into two subgroups |A I U|

    and |H N|, the former being associated primarily with vowel structure and the latter with

    consonant structure. Admittedly, this split between vocalic and consonantal elements is

    something of an oversimplification, since vowel elements do appear in the representation of

    consonants, and vice versa. Indeed, as a consequence of abandoning distinctive features, it

    becomes possible to play down the importance of the traditional categories vowel andconsonant and instead treat these terms simply as informal labels. So for the sake of

    convenience I will continue to refer to vowels and consonants as segment types, but this does

    not imply any formal bifurcation in terms of their segmental structure. This paper will focus

    on vowel representations and therefore on the role of the elements |A I U|. For a description

    of consonant representations and the remaining elements |H N|, see Backley (in prep).

    Before discussing the structure of vowels in detail, it is worth making the point that

    the set of vowel elements in (6a) is smaller than an equivalent set of features such as (6b):

    (6) a. elements for vowels: |A|, |I|, |U|

    b. features for vowels: [high], [low], [back], [round], [ATR]

    In fact, this difference reflects a more general divergence between the two approaches over

    the issue of generative capacity: namely, feature systems tend to over-generate while element

    systems tend to under-generate. A single feature usually represents a very specific segmental

    (typically articulatory) property, so in order to describe (the articulation of) a segment in full,

  • 8/11/2019 backley Englishvowels_2010

    23/80

    23

    the grammar must call upon a sizeable number of different features. For example, Odden

    (2005) uses 17 features to describe English consonants and a further 5 features to describe the

    vowels. Unfortunately, however, having so many features available opens the door to serious

    levels of over-generation, where the set of possible combinations of feature values and

    thus, the set of possible segmental contrastsis far larger than that required by the grammar

    of any one language. To address this problem, the phonology must restrict combinability in

    some way; restrictions have come in the form of feature-geometric relations (see 2.4 above)

    or negative constraints such as *[+ATR, +low] (Archangeli & Pulleyblank 1994).6

    In contrast to feature theories, which generate too many segmental expressions and

    thus have to impose constraints on their output, Element Theory takes the opposite position

    of first generating a minimal set of contrasts capable of describing only the simplest and most

    common segmental inventories. As (6) shows, this is made possible by recognizing a

    relatively small number of basic structural units. Now, with only a small set of elements to

    hand, the phonology must have ways of expanding its generative capacity to accommodate

    larger and more complex systems of contrast. Yet according to Element Theory this is the

    preferred position, claiming that this under-generation approach is more restrictive because it

    gives the grammar greater control over the size and shape of segmental systems. So the

    function of an element-based grammar is to generate a small set of attested forms rather than

    to eliminate a potentially large set of unattested ones. In this way, the set of vowel elements

    in (6a) is intentionally smalla fact which reflects the way Element Theory is committed to

    addressing the issue of excessive generative capacity that continues to characterize feature-

    based models.

    3.2 What makes |A I U| special?

    For the reasons just outlined, the set of vowel elements should initially be capable of

    generating vowel systems that are typologically unmarked that is, structurally simple and

    cross-linguistically widespread. Why then should |A|, |I|, and |U| qualify as the most basic

    segmental properties in such systems? Crothers (1978) and other vowel typology surveys

    confirm that the universally preferred inventory has the following five-vowel arrangement:

    6Although the filter *[+ATR, +low] succeeds in capturing a distributional regularity, it is nonetheless arbitrary

    in that it fails to explain why this combination is ungrammatical whereas, for example, [+ATR, low] iswidespread. Even illogical combinations such as *[+high, +low] cannot simply be dismissed as ungrammatical

    if the features in question really do stand for abstract phonological categories rather than articulatory properties.

  • 8/11/2019 backley Englishvowels_2010

    24/80

    24

    (7)

    Yet despite the unmarked status of (7), it cannot be assumed that this system of five vowels

    corresponds to the presence of five basic phonological properties. For instance, we cannot

    automatically treat [ ] as the phonetic instantiation of a corresponding set of elements

    such as |A I U E O|. In fact, there are strong arguments to indicate that the mid vowels [ ]

    belong to more than one natural class (Harris 1994), which in turn suggests that [] and []

    are each represented by more than one element. In other words, the phonological structure of

    the mid vowels [ ] is apparently not as basic as that of the corner vowels [ ].

    Treating [ ] as the least marked vowels follows naturally from their unique

    properties. In describing these properties, let us begin with language typology, and with the

    fact that [ ] are cross-linguistically very common, indeed present in almost every known

    language. When we examine the smallest attested vowel systems, which usually comprise

    only three vowels, we find such systems regularly employing only these corner vowels. The

    examples in (8) are from Lass (1984):

    (8) [ ] (Tamazight) [ ] (Quechua) [ ] (Moroccan Arabic)

    [ ] (Greenlandic) [ ] (Amuesha) [ ] (Gadsup)

    A comment is in order about phonetic vowel quality. On the understanding that the

    vowel symbols in (8) stand for phonological categories rather than phonetic tokens, we do

    expect to find some cross-linguistic variation in the way the same contrastive system is

    interpreted phonetically. This applies not only to the systems in (8) but also to 5-vowel

    systems. Take Spanish [ ] and Zulu [ ], for example. A comparison of, say,

    Spanish [] with Zulu [] would show that these sounds have similar phonological properties

    and play the same role in their respective systems. What counts in Element Theory (and in

    related theories such as Dependency Phonology) is the behaviour of a sound with respect to

    (i) natural classes and (ii) other contrastive sounds in the same system. Phonetic values are

    not taken to be the main criterion for identifying melodic representationswhich, of course,

  • 8/11/2019 backley Englishvowels_2010

    25/80

  • 8/11/2019 backley Englishvowels_2010

    26/80

  • 8/11/2019 backley Englishvowels_2010

    27/80

  • 8/11/2019 backley Englishvowels_2010

    28/80

    28

    [bk] [+bk]

    The arrangement in (10) has an articulatory bias, as it reflects tongue positionspecifically,

    the height and degree of backness of the tongue needed to produce different vowel sounds.

    However, a vowel square fails to capture the special status of [ ], thereby missing an

    important generalization concerning typological markedness. Moreover, if Dispersion Theory

    is correct in assuming that languages prefer vowels which are maximally distinct, then from

    (10) we can infer that the vowels at each of the four corners of the vowel square are equally

    unmarked. Yet this is clearly not the case: the [hi,bk] vowel [] is cross-linguistically less

    common than [] ([+hi,bk]) or [] ([+hi,+bk]), for example.

    Here I have reviewed some of the reasons for treating [ ] as basic vowels.

    Element Theory characterizes the special status of these vowels by equating each with an

    element from the set |A I U|, where these elements function as active phonological units in

    vowel contrasts and vocalic processes. It should be noted that Element Theory is by no means

    the first to recognize the significance of |A I U| as phonological primes. The vowel elements

    are pre-dated by theparticlesof Particle Phonology (Schane 1984) and by the componentsof

    Dependency Phonology (Anderson & Ewen 1987), both of which can be traced back to

    three principal underlying and abstract 'characteristics' involved in vowel formation |u|

    'roundness', |i| 'frontness', and |a| 'lowness' first proposed by Anderson & Jones (1974: 16).

    What sets Element Theory apart from these other models of vowel representation, however,

    is its claim that elements are associated specifically with properties of the speech signal.

    Further discussion of the motivation for |A I U| can be found in Rennison (1986).

    3.3 |A I U| as simplex expressions

    Elements are primarily abstract units of linguistic structure: they determine the lexical shape

    of morphemes, and they behave as active properties in phonological processes such as

    assimilation and lenition. So we identify individual elements by studying language databy

    analyzing sound contrasts, distributional patterns and dynamic phonological changes. But in

    addition, elements connect to the physical world through their association with certain

    patterns in the acoustic speech signal. Once an element has been identified through its

    phonological properties, an analysis of its phonetic characteristics may be carried out in order

    to establish its unique acoustic signature. The typological evidence reviewed in 3.2 pointed

    to the existence of three vowel elements |A I U|. This section examines the speech signal

  • 8/11/2019 backley Englishvowels_2010

    29/80

    29

    patterns represented by these elements; then, to reinforce the status of |A I U| as phonological

    primes, it considers their roles in linguistic structures and dynamic phenomena.

    Element Theory assumes that language users focus on three specific patterns in the

    speech signal when producing or perceiving vowels. These patterns are revealed by analysing

    the distribution of energy across the frequency band from zero to around 3kHz the

    frequency range which contains the first three formants and which is therefore crucial for

    perceiving vowel sounds. The figures in (11) show the signal patterns that speakers and

    hearers associate with the three abstract phonological categories |A I U|. Spectrograms of the

    corresponding vowel sounds [ ] are given in (12).

    (11) Spectral patterns for |I|, |A| and |U|

    Figure 1: |I| as a dIp Figure 2: |A| as a mAss Figure 3: |U| as a rUmp

    (12) Spectrograms of [ ] showing the first three formants

    Figure 4: [] Figure 5: [] Figure 6: []

    The pattern for |I| in figure 1 consists of two energy peaks with a characteristic dip in

    between. One peak is located at the lower end of the vowel spectrum at around 500Hz (on the

    horizontal axis), and the other is at the upper end at approximately 2.5kHz. The peaks

    themselves represent bands of energy, typically resulting from the convergence of two

    formants; so the same pattern can also be extracted from the spectrogram for [ ] in figure 4.

  • 8/11/2019 backley Englishvowels_2010

    30/80

    30

    This figure shows a low F1value for the high vowel [], as indicated by the concentration of

    energy in the 0-500Hz range (cf. the leftmost peak in figure 1). This vowel also has a high F2

    converging with F3at around 2.5kHz, which creates a concentration of energy at the top of

    the spectrum (cf. the rightmost peak in figure 1). The sharp drop in energy in the middle ofthe spectrum, corresponding to the lighter area between 1-2kHz in figure 4, gives |I| its

    mnemonic label dIp.7

    The signal pattern for the element |A|, on the other hand, has the informal label mAss.

    This term describes a mass of energy located in the centre of the spectrum, peaking at around

    1kHz. As figure 2 shows, there is a drop in energy on either side of this mass. The same

    characteristic mAss pattern is reflected in the spectrograph for [] in figure 5, where the

    energy peak results from a high F1value converging with F2in the 1kHz region. Finally, the

    speech signal pattern for the element |U| is characterised by a concentration of energy at the

    lower end of the spectrum. In figure 3 the energy peaks are contained within the 0-1kHz band,

    while across the higher frequency range we observe a steady fall. This falling spectral shape

    has been dubbed rUmp. Again, the pattern is visible in the spectrograph for the corresponding

    vowel: figure 6 shows how [] involves a lowering of all formants, with F1at around 500Hz

    and F2at around 1kHz.

    Of course, the formant patterns in figures 4-6 are subject to some inter-speaker (as

    well as intra-speaker) variation. Nevertheless, the above samples taken from my own speech

    should illustrate the general physical correlates of the phonological categories |A I U| when

    each element is interpreted in isolation. In fact, from an Element Theory point of view such

    variation is of no linguistic consequence, since the theory defines elements only in terms of

    their overall spectral pattern i.e. dIp, mAss and rUmp and not by referring to raw

    acoustic data such as precise formant values. In the preceding paragraphs I have used specific

    frequency values to describe each spectral pattern in a precise way; but it must be stressed

    that numerical data of this kind is for descriptive purposes only it has no formal place in

    the Element Theory grammar.8A fuller description of the spectral properties of |A I U| can be

    found in Harris & Lindsey (1995).

    3.4 |A I U| in compounds

    3.4.1 Phonetic evidence for element compounds

    7

    The labels dIp, mAssand rUmpare taken from Harris (1994: 139).8Not all models of segmental structure take this position. For example, Flemming (2002) proposes that scales of

    formant values be incorporated directly into vowel representations.

  • 8/11/2019 backley Englishvowels_2010

    31/80

    31

    The definition of elements as speech signal patterns appears to be consistent with the Quantal

    Theory explanation for why languages favour triangular vowel systems bounded by |A I U|.

    As noted above, Quantal Theory assumes that each corner of the vowel triangle is associated

    with a unique and unambiguous acoustic patternwhich is exactly what the vowel elements

    represent. The original Quantal Theory descriptions, which refer to patterns of converging

    vowel formants, are redefined in (13) in terms of the impressionistic spectral shapes shown in

    figures 1-3:

    (13)

    The summary in (13) shows that each vowel element has a pattern which is not only unique

    but also highly distinct, given the small number of variables involved. So the three-way

    contrast between [], [] and [] should be easy to recognise, and moreover, difficult to

    confuse, just as the quantal approach predicts. However, most languages have vowel systems

    containing more than just [ ], which means they must allow elements to combine into

    compound expressions. Let us now look at compounding in more detail. We first examine the

    effects of compounding on the speech signal, and then consider the phonological properties

    of compounds.

    It will be recalled from 3.2 that the universally unmarked vowel system consists of

    the corner vowels [ ] plus the mid vowels [ ]. It has already been argued that [ ]

    have a special status as basic vowels, which is reflected in the way each corresponds to a

    primary unit of phonological structurei.e. an element. In contrast, the mid vowels do not

    share this status. Instead, the phonological evidence indicates that [ ] are each the result of

    combining two elements and interpreting these simultaneously: [] is represented by the

    compound |I A| while [] comes from |U A|. Now, assuming that every element is associated

    with a spectral pattern, and further assuming that all information relating to element structure

    is transmitted via the speech signal, we can expect the speech signal itself to contain complex

    spectral patterns when a mid vowel is interpreted. The spectral patterns for mid vowels are

    shown in (14) and (15):

    |I| |A| |U|

    position of peak(s) low + high centre low

    position of trough(s) centre low + high centre + high

  • 8/11/2019 backley Englishvowels_2010

    32/80

    32

    (14) Spectral pattern for |I A| (versus |I|)

    Figure 7: |I A| ([]) versus Figure 8: |I| ([])

    The mid vowel [] results from the interpretation of the compound expression |I A|,

    with both elements contributing to the overall shape of the composite spectral pattern in

    figure 7. In the centre of the spectrum we find the dip between F 1and F2that characterises |I|,

    though this is both narrower and shallower than in the pure dIppattern in figure 8 (repeated

    from figure 1). The difference introduced in figure 7 is accounted for by the presence of |A|,

    which produces an energy mass in the same central region with troughs on either side. In

    short, the |I A| compound creates a dIp within a mAss a large central mass of energy

    containing a dip inside it.

    (15) Spectral pattern for |U A| (versus |U|)

    Figure 9: |U A| ([]) versus Figure 10: |U| ([])

    The mid vowel [] is the result of interpreting the compound expression |U A|. In

    figure 9 the presence of |U| ensures that a concentration of energy is maintained at the lower

    end of the spectrum, as we find with the pure rUmppattern in figure 10 (repeated from figure

    3). Unlike [], however, where the energy peak is located very near the bottom of the

    spectrum, the mid vowel [] shows a concentration of energy somewhat closer to the central

  • 8/11/2019 backley Englishvowels_2010

    33/80

    33

    region; as Harris & Lindsey (2000) point out, the energy peak in [] is far enough above the

    bottom of the frequency range to constitute a mAss, with troughs above and below (Harris &

    Lindsey 2000: 196). So the |U A| compound produces a rUmpwithin a mAssa centralised

    mass of energy which falls as the frequency increases.

    3.4.2 Phonological evidence for element compounds

    So there is phonetic evidence to indicate that mid vowels are complex structures: the spectral

    pattern for |I A| (= []) combines mAssand dIp, while the pattern for |U A| (= []) combines

    mAssand rUmp. But structural complexity is primarily a phonological property, which means

    that support for the existence of element compounds like |I A| and |U A| should come

    primarily from phonological evidence. In the case of mid vowels, the evidence focuses on the

    way the individual elements in a compound become visible under certain phonological

    conditions. In other words, the phonology allows us to see inside complex expressions and

    observe their internal composition.

    The following examples are, above all, intended to support the existence of element

    compounds in the grammar. Additionally, however, they reinforce the status of |A I U| as

    phonological primes, since they demonstrate how these elements regularly participate as

    active units in various dynamic phenomena. In this section I shall discuss examples of vowel

    processes which make reference only to the five vowels [ ] introduced so far. In

    general, these processes cause the internal (element) structure of a vowel to be reorganised or

    reinterpreted in some way. This is illustrated by processes such as monophthongisation,

    diphthongisation and vowel coalescence. Other process types that demonstrate the workings

    of element-based representations include vowel harmony and vowel reduction; I shall touch

    on these below, after having discussed the structure of element compounds in more detail.

    The history of English provides numerous cases of monophthong formation and

    diphthong formation. Following Harris (1994: 100), I describe these two processes together,

    since one is essentially a reversal of the other. Many dialects of late Middle English had the

    diphthongs [](~[]) and [] in the following words (data from Jones 1989):

    (16) a. Middle English []/[] b. Middle English []

    day [] day law [] law

    eight [] eight dauhter [] daughter

    vain []vain naught [] not

  • 8/11/2019 backley Englishvowels_2010

    34/80

    34

    pay [] pay baul [] ball

    During the sixteenth and seventeenth centuries, however, these diphthongs began to develop

    the monophthongal realisations [] and [], respectively, which survive in some dialects of

    Modern English: for example, British English retains [] in law[] and ball [], while

    some regions in northern England also pronounce [] in eight[] andpay[]. Expressed

    in |A I U| terms, this monophthongisation process involves a simple reorganisation of the

    elements in the original diphthong:

    (17) a. [] [] b. [] []

    N N N N

    x x x x x x x x

    |A| |A| |A| |A|

    |I| |I| |U| |U|

    (17a) shows how the interpretation of the expression |A I| has changed during the

    development of the English vowel system. In late Middle English |A| and |I| were interpreted

    separately, resulting in a diphthong []. In this case, speakers distributed |A| and |I| across the

    two prosodic positions in the nuclear domain. Later, however, language users began to

    interpret the same elements simultaneously, thereby producing a mid vowel [].9Segmental

    reconfiguration of this kind typically leaves the prosodic structure untouched, so the later

    interpretation [] is still tied to a long nucleus. (17b) shows how back diphthongs also

    underwent a similar reconfiguration process.

    Importantly, monophthong formation comes about as a result of speakers and hearers

    adjusting their interpretation of the original diphthong structures. The lexical structures

    themselves are unchanged nothing has been added or removed. In the absence of any

    representational changes, then, what we see in (17) is the mid vowel interpretations [] of

    the compound expressions |A I| and |A U|, respectively. On this basis, it should come as no

    surprise that other ways of reinterpreting the same structures have also emerged. For example,

    9The compound expression |A I| can be interpreted as either [] or []. Clearly, in languages with a []~[]

    contrast these vowels must have distinct representations. This will be discussed below.

  • 8/11/2019 backley Englishvowels_2010

    35/80

    35

    Estuary English (South-East England) has since reverted to a diphthong realisation of |A I|:

    day[], eight[]. By contrast, in RP and many other dialects we also find a diphthongal

    reinterpretation: day[], eight[]. These are illustrated in (18):

    (18) a. Estuary English: day[i] b. RP English day[]

    N N N N

    x x x x x x x x

    d |A| d |A| d |A| d |A|

    |I| |I| |I| |I|

    So, historical and dialectal evidence indicates that mid vowels are represented by

    compound element expressions. Further support for the structures |A I| and |A U| comes from

    other cases of English dialect variation, and in particular from the simplification (in effect,

    monophthongisation) patterns found in various African Englishes. The examples in (19) are

    taken from Simo Bobda (2007):

    (19) a. [][] diphthong simplification

    like[] Sierra Leone, Liberia

    finding [] Zambia

    primary [] Kenya

    tribe [] Uganda

    b. [][] diphthong simplification

    round[] Kenya

    mouth[]~[] West African Pidgin

    town[] Liberia

    house[] Krio

    The process of diphthong simplification in African Englishes seems to be accompanied by

    concomitant vowel shortening, as these cases of monophthongisation tend to result in a short

  • 8/11/2019 backley Englishvowels_2010

    36/80

    36

    vowel. Nevertheless, as far as their segmental structure is concerned they reinforce the

    patterns described in (17), and provide additional evidence for (i) the primary status of the

    vowel elements |A I U| as active phonological units, and (ii) the representation of mid vowels

    as the compounds |A I| and |A U|.

    Looking beyond English, we see further evidence for the mid vowel structures |A I|

    and |A U| in languages as diverse as Japanese and Maga Rukai. Kubozono (2001) describes

    two processes of monophthong formation in Japanese, one historical and the other synchronic.

    Towards the end of the Middle Japanese period, the diphthong [] in Sino-Japanese words

    underwent monophthongisation to []:

    (20) Middle Japanese monophthongisation

    [] [] cherry tree ()

    [] [] high(), fidelity ()

    [] [] capital (), home town ()

    The output forms in (20) are subject to an analysis similar to that shown in (17b) for early

    English. Meanwhile, in present-day Tokyo Japanese the reinterpretation process described in

    (17a) has become a characteristic of casual speech (Kubozono 2001: 63), with [] beingmonophthongised to []. The diphthong [] is retained in formal speech, however, resulting

    in the alternations shown in (21):10

    (21) Tokyo Japanese monophthongisation

    []~[] usually

    []~[] siblings

    []~[] painful

    In view of the Japanese patterns in (20) and (21), it is clear that analysing [ ] as the

    element compounds |A I| and |A U|, respectively, does not just capture mid vowel behaviour

    in English; rather, it describes a property of the vowel elements themselves. This point is

    reinforced by the fact that similar behaviour is also observed in other, unrelated languages. In

    Maga Rukai, an Austronesian language spoken in Taiwan, a synchronic process of vowel

    10Hirayama (2003) analyses the Japanese data in (21) using traditional features.

  • 8/11/2019 backley Englishvowels_2010

    37/80

    37

    coalescence has created mid vowels that were not present in the proto-language (Hsin 2003).

    The nouns in (22a) have the heterosyllabic vowel sequence [][] in the root of the

    negative form, which corresponds to [] in the positive. This [] is the result of merging the

    phonological properties [] and []. In (22b) we find a parallel alternation between [][]and []:

    (22) a. [][] coalescence b. [][] coalescence

    negative positive negative positive

    bee hemp

    bridge tooth

    pan excrement

    Maga Rukai has a pattern of vowel syncope determined by its iambic foot structure (Hsin

    2003: 64). In (22) this is shown as the loss of [] in the root-initial syllable of the positive

    form. Yet although the nuclear position itself is suppressed, its segmental content |A| is

    retained; this stray element is then interpreted in the adjacent nucleus:

    (23) Maga Rukai vowel coalescence: []~[]

    N N N N

    x x x x x x

    c |A| k c |A|k |A|

    |I| |I|

    So Maga Rukai provides another example of a process which reconfigures a representation in

    such a way as to reveal the internal structure of mid vowels. The merger of |A| and |I| in (23)

    produces [] in [], while the same analysis also applies to the merger of |A| and |U| to

    create [] (e.g. []).

    The representations shown here follow the conventions of autosegmental phonology

    in having individual elements occupy separate structural levels or tiers; in (23) for instance,

    |A| and |I| reside on independent tiers. Although this arrangement is not crucial, it does offer a

  • 8/11/2019 backley Englishvowels_2010

    38/80

  • 8/11/2019 backley Englishvowels_2010

    39/80

  • 8/11/2019 backley Englishvowels_2010

    40/80

    40

    position. This difference between schwa and other vowels is to be expected, however, if

    Element Theory is correct in its claim that vowel properties are mapped onto the acoustic

    signal. The presence of |A|, |I| or |U| is associated with a strong, characteristic spectral pattern;

    and to produce such a pattern speakers must adopt a distinct, non-neutral vocal tract shape.

    On the other hand, the absence of any characteristic spectral pattern, such as we find in [], is

    naturally paired with a vocal tract configuration lacking any distinct shape. A uniformly

    shaped tube is unable to manipulate formant values in any linguistically meaningful way, and

    the phonetic result is schwaa central vowel of a neutral or indistinct quality.

    So the spectral shape for [] shows none of the characteristic vocalic patterns dIp,

    mAssor rUmp, suggesting that [] has no vowel elements in its representation. The absence

    of |A I U| effectively leaves an unspecified or representationally empty vowel. As indicatedabove, the Element Theory literature also considers schwa to be informationally empty

    (Harris & Lindsey 2000), in the sense that having no element structure means it contains no

    linguistic information. In Element Theory, representational emptiness and informational

    emptiness amount to the same thing.

    But if schwa has no element structure, how can it be heard and pronounced? Harris &

    Lindsey (1995) argue that the spectral pattern in figure 11 may be viewed as a baseline

    resonance that exists latently in all vowels. Usually this pattern is not heard, because in the

    presence of |A I U| it is overridden by the more marked patterns dIp, mAssand rUmp. In the

    case of most vowels, these marked patterns are superimposed onto the baseline resonance and

    have the effect of masking it entirely. In the case of schwa, however, which has no elements,

    the baseline resonance is exposed. Language users associate this resonance with the central

    region of the acoustic spacemore specifically, with the only area of the vowel space not

    occupied by |A I U|:

    (26) |A I U| areas of the vowel space

    |I| |U|

    |A|

  • 8/11/2019 backley Englishvowels_2010

    41/80

    41

    It has already been noted that any vowel system may contain a neutral vowel, which

    can vary phonetically between [] and []12. Now consider the stylised vowel space in (26),

    which demonstrates why this phonetic variation is possible, or perhaps even expected. The

    absence of |A I U| corresponds to a central area of the vowel space covering a sizeable rangeof different vowel qualities, any of which may be targeted by individual languages as the

    interpretation of an unspecified vowel. Importantly, phonetic differences such as [] versus

    [] are trivial in most languages,13because these variants refer to the same linguistic object,

    namely a phonologically empty vowel. Harris & Lindsey (1995) liken the empty vowel to a

    blank canvas a neutral background which becomes hidden when different colours are

    painted on to it. And no matter what shade of white or grey the original canvas may be, it is

    still interpreted as having no colour as long as it remains empty (i.e. unpainted).

    3.5.2 Phonological evidence for empty vowels

    It has been stressed that elements should be treated primarily as units of phonological

    structure, and that their existence should therefore be supported by evidence from the

    phonology. At first sight, however, it seems that a different approach may be needed in the

    case of schwa, the empty expression ||, because it contains no elements in its representation

    and thus amounts to nothing in phonological terms. In fact this is not the case. Although [ ]

    has no segmental content, it is still linked to the prosodic structure specifically, to a

    syllable nucleuswhich is clearly within the scope of phonology. If [] is to be viewed as

    the interpretation of an empty nucleus, then it should receive a phonological analysis like any

    other nucleus. Another reason for treating || as a phonological object is that this empty

    expression is often the result of a phonological process that removes element structure (e.g.

    from weak syllables). If elements are removed from a vowel expression until nothing remains,

    then it becomes possible for the baseline resonance of an empty nucleus to be interpreted.

    The following examples from Bulgarian and Turkish illustrate the phonological identity of

    empty nuclei.

    Like English, Bulgarian (Pettersson & Wood 1987) has a full set of vowel contrasts in

    stressed positions but only a reduced set in unstressed positions, as shown in (27). Examples

    of these alternating vowels are given in (28) (data from Crosswhite 2004)::

    12Other realisations of an unspecified vowel are also possible: e.g. [] in the Jivaro system [ ].

    13In 4.4.4 it will be argued that this is not true of English.

  • 8/11/2019 backley Englishvowels_2010

    42/80

    42

    (27) Vowel system(s) of Bulgarian

    stressed:

    unstressed:

    (28) Vowel reduction in Bulgarian (data from Crosswhite 2004)

    stressed unstressed

    [] village [] villages

    [] of horn [] horned

    [] work [] worker

    Bulgarian illustrates a common pattern whereby unstressed syllables support only a

    subset of the vowel contrasts that are possible in stressed syllables: [ ] are neutralised to []

    in weak syllables, [ ] become [], and [ ] merge as []. Using traditional features it is not

    easy to express these vowel reduction effects as a single process: [][] and [][] are

    captured by [high][+high], whereas the same feature [high] is irrelevant to [][] as

    both are [high]; instead, the change from [] to [] must be described as [+low][low].

    Yet it is clear that the alternations in (27) are all motivated by the same conditioning factor

    namely, the inability of an unstressed nucleus to support certain vowel properties. Restated in

    terms of Element Theory, however, the generalisation becomes formally simple: |A| is not

    licensed in unstressed syllables. As such, the element |A| is suppressed in those contexts but

    language users still interpret any remaining elements.

    (29) a. high vowels are unchanged (|A| not present)

    [][] |I||I|

    [][] |U||U|

    b. mid vowels are raised (|A| suppressed)

    [][] |A I||AI|

    [][] |A U||AU|

    c. central vowels become unspecified (|A| suppressed)

    [][] | || |

    [][] |A||A|

  • 8/11/2019 backley Englishvowels_2010

    43/80

    43

    Bulgarian vowel reduction is a process that targets |A|, and because the high vowels in

    (29a) lack |A|, they are unaffected. By contrast, the mid vowel compounds [ ] in (29b) do

    contain |A|; this element is interpreted in stressed positions, but is suppressed in weak

    positions; the loss of |A| leaves a sole |I| or |U| remaining, which is interpreted as the highvowel [] or [] respectively. Turning to the patterns in (29c), these provide evidence to

    support the analysis of [] as an unspecified vowel. As a structurally empty vowel, [] has no

    |A| and is thus unaffected by vowel reduction: [][]. On the other hand, [] has |A| in its

    representation, this element being interpreted in stressed syllables. But in unstressed positions

    [] loses its entire element structure through the |A|-suppression process, leaving behind an

    empty nucleus which is interpreted phonetically as baseline resonance: [][]. What (29)

    shows is that these vowel reduction effects can be unified as a single process only if the

    grammar allows for an unspecified vowel to appear in representations. In the absence of any

    positive vowel properties (i.e. elements), this vowel is interpreted as neutral or baseline

    resonance, typically [].

    The interpretation of phonologically empty nuclei is also observed in Turkish. This

    language, like a number of other Altaic systems, has a well-documented process of vowel

    harmony in which suffix vowels agree in backness with root vowels. In traditional analyses

    the active property is assumed to be the feature [back], whereas in Element Theory it is the

    element |I|. Recall that |I| identifies those vowels with a dIp spectral pattern; these have a

    relatively high second formant, which places them in the front area of the vowel space. In

    Turkish vowel harmony, when a root vowel contains |I| then the same element is also

    interpreted in suffixes. For example, the genitive singular suffix in (30a) has a lexically

    empty vowel, so the suffix is pronounced []. Under harmony conditions, shown in (30b), it

    copies |I| from the root and the suffix vowel is interpreted as []:

    (30) |I| harmony in Turkish

    Nom. sg. Gen. sg. Nom. pl.

    a. girl stalk

    b. rope house

  • 8/11/2019 backley Englishvowels_2010

    44/80

    44

    The nominative plural suffix also alternates, between its lexical form [] (with a vowel

    containing |A|) and its harmonising form [] (with an additional |I|). Example structures are

    shown in (31):

    (31) a. b. c.

    N N N N N N

    x x x x x x

    k | | z | | n p n |A| v l |A| r

    |I| |I| |I| |I|

    The forms in (30) present a somewhat simplified picture of the facts relating to vowel

    harmony in Turkish.14 Nevertheless, they are consistent with the analysis of []/[] given

    above, and with the claim that some grammars allow representations to contain structurally

    empty nuclei. But if || really has no element content, then why is it not interpreted as

    silence? Having no elements means that | | cannot be mapped on to any linguistically

    significant patterns in the acoustic signal; that is, it cannot carry segmental information.

    However, || isassociated with a nuclear position, and this nucleus plays an important role inthe formation of prosodic structure. In combination with other nuclei, it contributes to the

    construction of higher prosodic domains such as feet and words units which convey

    linguistic information deemed essential for speech perception and efficient lexical access

    (Cutler & Norris 1988). There is evidence, for example, that listeners pay particular attention

    to the beginnings of foot and word domains when processing running speech. So, one

    consequence of not interpreting an empty nucleus is to reduce the amount of linguistic

    (specifically, prosodic) information being transmitted via the speech signal.

    This is not to say that empty nuclei can never be silent. In fact, uninterpreted empty

    nuclei are a grammatical possibility in many languages, including English (e.g. []

    unclear, where marks a silent nucleus). Importantly, however, their appearance needs to be

    controlled in order to avoid the emergence of unmanageable sequences of consonants.

    Grammars which allow silent empty nuclei must therefore impose restrictions on their

    distribution (Charette 1991, Scheer 2004). But if a nucleus is silent, how can we be sure it is

    there at all? English provides an answer to this question by showing how the same nucleus is

    14See Charette & Gksel (1996) for a more detailed account.

  • 8/11/2019 backley Englishvowels_2010

    45/80

    45

    silent under certain conditions but phonetically interpreted under other conditions. The

    following example illustrates the point.

    According to one innovative approach to syllable structure, all well-formed lexical

    representations end in a nucleus (Kaye 1990). Some languages such as Italian require this

    final nucleus to be interpreted, with the result that words must end phonetically in a vowel.

    For example, all native Italian words are vowel-final: casa house, case housing, caso

    chance (but *cas); additionally, many loanwords in Italian have become vowel-final

    through adaptation:gallon(English) gallone(Italian). By contrast, other languages allow a

    final empty nucleus to be silent. As a result, they admit words ending phonetically in a

    consonant: peach [] (English), schlimm [] bad (German), rhad [] cheap

    (Welsh). Following Kaye (1990), the structure of the English word peachis shown in (32a),

    where the word-final empty nucleus is licensed to remain silent.

    (32) a.peach b. plural c.peaches

    O N O N O N O N O N O N

    x x x x x x x x

    p |I| | | z | | p |I| | | z | |

    As an independent lexical structure, the plural suffix in (32b) also has a final empty

    nucleus which is not phonetically interpreted; in segmental terms, the plural marker consists

    solely of its onset fricative [].15 And when a language user constructs the plural noun

    peachesby concatenating the two forms (32a) and (32b), the result is the structure in (32c).

    Since resyllabification is not permitted in Kayes model, the plural noun peaches ends up

    with two empty nucleione from the stempeach, the other from the suffix. It also contains

    the two sibilant consonants [] and [], which are phonetically adjacent and thus create an

    unmanageable sequence of the kind mentioned above. Specifically, when these sounds are

    adjacent, their similar acoustic properties make them perceptually almost indistinguishable.

    Yet the perceptibility of [] and [] and therefore the linguistic information

    associated with these segmentscan be recovered by exploiting the lexical structure itself.

    By phonetically interpreting the intervening empty nucleus || as a neutral vowel [], as was

    15The voicing properties of English obstruents are discussed in Backley (in prep).

  • 8/11/2019 backley Englishvowels_2010

    46/80

    46

    observed for Turkish in (31a), important acoustic cues carried by the C-to-V [] transition

    and the V-to-C [] transition can be easily perceived; as a result, the linguistic information

    carried by [] and [] is transmitted in full. So, without recourse to arbitrary measures such

    as the insertion of an epenthetic vowel, we get the formpeaches[]. This analysis of the[] plural departs from the usual textbook explanation in two respects. First, [] is seen here

    as a product of the existing representation rather than as a newly introduced addition to the

    structure. This is presumably a gain for restrictiveness, in that the distribution of empty nuclei

    is strictly controlled by the grammar whereas epenthesis can in principle be applied anywhere.

    Second, interpreting || as a neutral vowel has a clear linguistic motivation, since it enhances

    the perceptibility and recoverability of linguistic information. By contrast, the traditional

    vowel epenthesis account is typically concerned with notions such as ease of articulation

    which, following the discussion in 2.3 above, is best seen as non-linguistic in nature.

    The behaviour of the English plural suffix provides further evidence for the existence

    of empty nuclei in representations. It also shows how linguistic conditions can cause an

    empty nucleus to be phonetically interpreted in a language-specific way. One aspect of the

    analysis of [] should be clarified, however. I have claimed that || is interpreted as []