acoustic analysis of some american english vowels - aleksandar belic - master thesis

UNIVERZITET U BEOGRADU

FILOLOŠKI FAKULTET

Aleksandar Belić

ACOUSTIC ANALYSIS OF SOME AMERICAN ENGLISH

VOWELS

Diplomski-master rad

Mentor: prof. dr Biljana Čubrović

Beograd, 2012.

AKUSTIČKA ANALIZA NEKIH SAMOGLASNIKA AMERIČKOG

ENGLESKOG

APSTRAKT

Cilj ovog rada je bio izmjeriti te uporediti frekvencije formanata samoglasnika troje

govornika američkog engleskog kao maternjeg jezika. Govornici su snimljeni dok su čitali sa

spiska nasumično odabranih riječi, a akustička analiza je izvršena nakon toga uz pomoć

kompjuterskog programa, mjereći frekvencije prvih triju formanata. Frekvencije formanata su

mjerene ili na mjestu gdje formanti postižu stabilno stanje, ili na sredini artikulacije ako stabilno

stanje nije vidljivo. Dijagrami samoglasnika su napravljeni da bi grafički ilustrovali pozicije

samoglasnika. Podaci su pokazali da određeni samoglasnici ispoljavaju neznatno drukčije

osobine od govornika do govornika. Neke od uočenih razlika su objašnjene kao varijante unutar

govornikovog regionalnog dijalekta, a neke kao njihovi individualni idiolekti. Analiza je takođe

pokazala da je kod nekih govornika prisutan određeni nivo diftongizacije samoglasnika kada se

taj samoglasnik nalazi ispred nekih samoglasnika u finalnoj poziciji riječi. Ipak, većina

samoglasnika ne ispoljava značajnije razlike u kvaliteti između govornika.

Ključne riječi:

akustička fonetika, formanti samoglasnika, samoglasnici u američkom engleskom, frekvencija

formanata, akustička analiza, regionalni dijalekti

ACOUSTIC ANALYSIS OF SOME AMERICAN ENGLISH VOWELS

ABSTRACT

The objective of this paper is to measure and compare the frequencies of vowel formants

of three native speakers of American English. The speakers were recorded while reading from a

list of randomly chosen words, and afterwards the acoustic analysis was conducted with the help

of a computer program, measuring the frequencies of the first three formants. The formant

frequencies were measured either at a point where formants reach their steady state, or in the

middle of the articulation if the steady state was not visible. Vowel charts were made to illustrate

the vowel positions graphically. The data showed that certain vowels exhibit slightly different

qualities from speaker to speaker. Some of the differences observed were explained as being

varieties within the speakers’ regional dialects, and some as their individual idiolects. The

analysis also showed that certain amount of diphthongization is present with certain speakers

when the vowel in question precedes certain consonants in word-final position. However, the

majority of the vowels showed no significant difference in quality between the speakers.

Key words:

acoustic phonetics, vowel formants, American English vowels, formant frequency, acoustic

analysis, regional dialects

CONTENTS

1. INTRODUCTION ................................................................................................................... 1

1.1. METHOD ..................................................................................................................................... 4

2. WASHINGTON STATE ......................................................................................................... 5

2.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) ............... 5

2.2. BACK VOWELS ( as in goose, as in took, as in top, and as in war)........ 10

2.3. CENTRAL VOWELS ( as in run, as in first and as in cannon)........................... 13

3. GEORGIA ............................................................................................................................. 17

3.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) .............. 17



4. ALABAMA ........................................................................................................................... 29

4.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) ............. 29



5. CONCLUSION ..................................................................................................................... 40

REFERENCES ............................................................................................................................. 43

1

1. INTRODUCTION

We can arguably say that vowel sounds are the backbone of every language in the world.

In fact, there is a strong concurrence among linguists that languages without vowels are not only

non-existent, but also impossible. This is, of course, only logical, because vowels are considered

the least marked sounds, and therefore there is no reason why at least some vowels would not be

incorporated into the sound system of a particular language. The number of vowels, however,

may vary. The most common vowel system has five vowels, but there are languages with three,

or even fewer vowels, although they are very rare (O’Grady/Dobrovolski/Katamba 1997:375).

The number of vowels in English (not including diphthongs), and more importantly, their

quality, can vary, depending on the country and the accent spoken in that particular region.

American English, for example, can have a wide variety of vowels, some of which can be

regarded as being only the variants of the same vowel, but certain authors regard them as being

individual phonemes. The conclusion is that every dialect has a separate vowel system. However,

even when trying to describe all the vowels in all of the American dialects, not all authors

operate with the same number of vowels. Kenyon lists sixteen vowels (1964:28-29), Wells lists

eleven (1999:472)1, and Thomas lists seventeen (1958:128). On the other hand, Ladefoged and

Johnson (2010:90) mention only nine, while Olive, Greenwood and Coleman (1993:20) operate

with twelve. These examples show that there is no definite way of determining the exact number

of vowels in American English, since different authors have different understanding of whether

certain sounds ought to be classified as being only one vowel or more. It would be difficult to

determine the exact status of each of the sounds without going deeper into the study of dialectal

origins of the differences that caused them.

This paper will operate a system of 11 monophthongs, dutifully recognizing differences

that some vowel variations exhibit in certain contexts. For example, Wells (1999) regards the

vowel in the word sport as being of a different quality than the vowel in the word short. While

this is undeniably true, it makes analysis more complicated, which this paper will try to avoid.

Having this in mind, the symbols used in this paper should be understood as symbols for a

1 But in his Longman Pronunciation Dictionary (2000), he lists 12 vowels (diphthongs excluded)

2

particular “group” of vowels, where each symbol represents a group of possible vowel variations

found in different accents. Thus, the symbol // will represent both vowel variations found in

sport and short, respectively.

The traditional view on the analysis of vowel sounds recognizes two distinctive methods:

articulatory and acoustic. The former is, perhaps, more “anatomical“, for it deals with the actual

position and/or movement of the articulatory organs within the vocal tract. The latter, on the

other hand, is founded in physics, and is primarily concerned with the acoustic properties of

sounds, which may, or may not coincide with the articulatory descriptions.

The problem with the traditional distinctive feature framework, as Olive, Greenwood and

Coleman (1993:28-32) suggested it, is its inability to provide descriptions that are more precise

when more subtle differences between vowels are in question. For example, in the traditional

binary classification, vowels are regarded as being high, low, back, round and tense,

whereas the presence of the particular feature would mark the sound as being +feature, and –

feature if the feature was not present. However, certain vowels are neither high nor low, but

somewhere in between, making them difficult to describe using only this system.

Acoustic analysis, on the other hand, provides a more precise method of description,

where more subtle changes caused by the movement of the articulators are visible, more easily

tracked, measured, and therefore described. With the use of a spectrogram, minute differences in

the quality of the sound can be analyzed, and also graphically presented, which is difficult to do

using the traditional binary classification.

The principal component that needs to be taken into account when analyzing vowels is

the frequency of its formants. Formants can be defined as “resonances of the vocal tract that have

a specific frequency expressed in hertz (Hz). In most cases, the first two formants are sufficient

to characterize speech sounds, but occasionally the third formant is also useful for description”

(Olive/Greenwood/ Coleman 1993:80).

Before the analysis itself, a certain geographical identification of speech varieties needs

to be made. Since the speakers whose speech will be analyzed in this paper come from different

states (Alabama, Georgia, and Washington), some kind of geographical labeling needs to be

established. In order to place the speakers into established groups, in this case speech areas, one

needs to determine them exactly. The literature on this matter offers a wide variety of solutions,

3

and maps of the USA that portray America’s three major speech areas existed even before

WWII. From this simple 3-way division (Eastern, Southern, General American), to a more

complex 8-way division from the 50’s (Thomas 1958:232) and the 70’s (Wood 1972 as cited in

Wells 1999:528), the general understanding that American pronunciation is in no way uniform in

all parts of the United States has been evident from the start.

The issue becomes even more complicated in modern times, when considerable accent

and population shifting have taken place. This has led to further fragmentation of speech areas,

which has made precise dialect division more difficult to determine. Although certain general

characteristics of local speech that differentiate it from other areas still very much exist, it is not

so evident and clear-cut today as it has been in the past.

The majority of dialectologists, however, would agree to place Alabama and Georgia

speech into Southern, and Washington into Western, or more precisely, Pacific Northwest area.

Fig. 1: Major dialect areas of the United States (Thomas 1958)

Having all this in mind, the intention of this paper will be to analyze and compare the

vowel articulation of three speakers of American English, measure the frequencies of the first

4

three formants in all vowels, and draw general conclusions on whether these dialects differ in the

way their vowels are being pronounced, and to what extent.

Therefore, this paper will use acoustic analysis with the help of a computer program to

describe vowel articulation of three native speakers of American English. The previously

determined “target” values found in textbooks and other sources will serve as a reference point,

but only to some degree, for it would be misleading to take these values as “absolute”. It must

be noted that “formant values vary across speakers and depend on many variables. Even for a

given speaker, formants may change according to phonetic contexts, manner of speaking and rate

of speech. In fact, it should be stressed that there are no absolutely rigid descriptions of

phonemes”. (Olive/Greenwood/Coleman 1993:81)

1.1. METHOD

Three female speakers each recorded a total of 77 words, containing 11 vowels of the

American English, in various positions and phonetic contexts. The words were chosen randomly,

making sure that at least two phonetic contexts were present. The recording was made on one

afternoon using a Shure PG47 microphone, and a laptop computer. The analysis was done with

the help of the computer program “Praat”. The maximum formant frequency setting was changed

when and where it was necessary, in most cases the default setting of 4500 Hz was used. The

formants were measured in the usual way, at the place where all three formants exhibit a “steady

state”, or, in cases when this was not possible, in the middle of the vowel articulation. The

speakers are given names according to the states which they come from, therefore the terms “the

Alabama speaker” or “the Washington speaker” will be used throughout this paper.

It is important to emphasize that not all phonetic contexts have been taken into account,

due to restrictions in the length and volume. Nevertheless, even without analyzing all the

possible phonetic contexts, combinations and changes a vowel might manifest when influenced

by a neighboring sound, it is still possible to draw a general conclusion on how different (or

similar) certain vowels are in terms of their formants’ frequencies.

5

2. WASHINGTON STATE

2.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and

as in trap)

The speaker from Washington is a female in her mid 30s, born in Arlington, WA. She

works in education, and has a college degree. Since she never lived outside Washington, the

possible influence of other regional dialects on her own is minimal.

The first thing that is immediately noticeable in the spectrographic representation of her

articulation is the relative steadiness of the formants for the articulation of, especially of the

first formant. The values for the first formant are relatively close to the values suggested by

Olive, Greenwood & Coleman (1993:104) with the average of about 300 Hz. The frequency of

the first formant seems to have little or no variation throughout the articulation, regardless of the

phonetic environment. Even in the instances when F2 and F3 move as a result of co-articulation,

F1 retains its approximate value.

In the words bleed and fleet, F2 rises significantly from the target value for /l/ and almost

merges with the third formant. This is especially visible in the former, where the values of F2

and F3 differ by only 45 Hz. In other examples, neighboring sounds influence the frequency of

the F2 in the expected manner. After fricative sounds, F2 has a slight rising movement, and the

same is visible in instances where the preceding sound is a bilabial stop /b/. The voicing,

however, seems to have no influence on the frequency of F1-F3, since the relative values of F1-

F3 are the same for in both deep and peak, respectively. In all of the examples, F2 remains

significantly high in comparison to the data provided by Olive, Greenwood and Coleman

(1993:104). However, data from a Hillenbrand et al.(1995:3103) suggests that F2 values are

somewhat higher than stated by Olive, Greenwood and Coleman (1993:104), probably because

Olive, Greenwood and Coleman never stated the sex of their speakers. Stevens (1998:288) gives

data from both male and female speakers, and the relative values of F2 for female speakers

resemble greatly the data in this paper.

6

Although F3 is usually not essential for vowel identification, it was measured

nevertheless. Spectrograms show that F3 is the least prominent of all three, often hardly even

noticeable, and rarely with the steadiness in frequency found in F1. With the average value of

around 3,440 Hz, its approximate value seems to be consistent with the data by Stevens

(1998:288) and Hillenbrand et al. (1995:3103).

Fig. 2: in bleed

Fig. 3: in deep

For front vowels, F1 becomes lower when the constriction in the oral cavity increases.

is the most constricted vowel. F1 increases as the tongue position gets lower. In addition,

has the highest F2 and has the lowest F2.(Chen/Wang 2012) Consequently, is

expected to have a higher F1 value than , and the data confirms it. The average value of F1

for the vowel is around 450 Hz, the highest being measured in the word kit (510 Hz), and the

lowest in the word rip (380 Hz). There are no significant formant variations in any of the

examples. The articulation is short, between 60 and 70 milliseconds. The strongest signal appears

to be in the word rip, where the frequency of the first formant resembles the F1 of /r/, and the

signal becomes darker as the articulation of the vowel begins.

7

What is also noticeable from the spectrogram is the rise of the F2 and F3 in the word rip.

F2 starts at around 1,200 Hz at the beginning of r, and immediately starts to rise until reaching

its steady state at around 1,870 Hz. In instances when a velar sound precedes the vowel, F2 and

F3 are close at the beginning of the articulation, and then start to move away from each other, as

can be seen in the word kit. This is the result of a velar pinch which is characterized by the

coming together of F2 and F3 during the articulation of a velar consonant

(Olive/Greenwood/Coleman 1993:85). In addition, F2 and F3 exhibit a slight rising movement in

instances when is preceded by a nasal sound. In other phonetic contexts, namely when

preceded by a fricative s, or a voiceless stop p, F2 and F3 seem to have a steady frequency

throughout the articulation, with little or no variation. The average value of 2,190 Hz for F2 is

consistent with the data from Stevens (1998:288), and the average value for F3 (3,030 Hz) is

almost identical with the findings of Hillenbrand et al. (1995:3103).

Fig. 3: in rip

Fig. 4: in kit

The vowel is more back and also lower than or , as suggested by Ladefoged

and Johnson (2010:90). As a result, F1 will be higher, and F2 lower in comparison to or

8

. All of the examples show the steadiness of F1 during most part of the articulation. Slight

rising movement is visible in instances when a voiced bilabial b precedes the vowel, and F1

falls if the sound following it also is a voiced stop. In the words red, let and led, F1 retains its

frequency throughout the articulation, while F2 and F3 move upwards to reach their target

values. The average value for F1 is around 620 Hz.

All three formants are usually visible in the spectrogram, F3 being the least prominent.

F2 appears to be the least stable one, often having a rising or falling movement because of the

phonetic context. Its average frequency is 2,080 Hz. The average duration of the vowel seems to

be somewhat longer than for , often being more than 100 ms.

Fig. 5: in red

Fig. 6: in bet

The maximum separation (for the front vowels) between F1 and F2 occurs with the

highest vowel, and is the smallest with the lowest (Olive/Greenwood/Coleman 1993:102). This is

clearly noticeable from the data in this research. While the separation between F1 and F2, i.e. the

difference in their frequencies, was around 2,500 Hz for , for it was only around 1,200

Hz. Not all possible phonetic contexts were taken into account for . The focus was the

influence of nasal sounds on the vowel in instances when it follows the vowel in question.

9

Preceding sound in these examples is usually a stop, voiced or voiceless; in one instance, the

voiceless sound is in an unaccented position to show the influence of aspiration (or the lack of it)

on the visibility and movement of the formants.

In the word trap, F1 starts to rise immediately after becoming visible in the spectrogram

at the onset of /r/, and quickly reaches its steady state at around 950 Hz. The F1 value measured

in the middle of the articulation was 970 Hz. In other examples, F1 seems to be rather stable

throughout the articulation, with the exception of the word stamp, where F1 seems to be rising at

the beginning of the articulation, possibly as a result of the transition from an unaspirated t to

. F3 is hardly even noticeable in trap, and its projected value of 2,850 Hz is, to some extent,

disputable. What is also typical of F2 is its fall before nasal sounds, in words such as candle,

stamp, or sand.

Fig. 7: the word trap

Fig. 8: in stamp

10

F1 300 454 618 848

F2 2840 2187 2083 2027

F3 3447 3033 2689 3028

Table 1: Average formant frequencies (in Hz) for front vowels (Washington speaker)

2.2. BACK VOWELS ( as in goose, as in took, as in top, and

as in war)

The back vowels differ from the front vowels in that F2 is much lower and closer to F1

for the back vowels than for the front (Olive/Greenwood/Coleman 1993:103). This is evident

from the spectrograms for this speaker as well. In words like fool and pool, F1 and F2 are

especially close to each other, with the difference in frequency of some 400 Hz. In goose, this is

not the case, since F2 has a falling movement from a high position after the velar pinch.

This speaker pronounces the word new as [], with a clear distinction between

and , and not as , which is also a common pronunciation of this word in American

English. This kind of pronunciation influences the shape of F2, since normally has a higher

F2 than what is usual for (Olive/Greenwood/Coleman 1993:118). This results in a

downward movement of F2 towards its target value, which, in this case, is around 970 Hz.

The sounds and create a similar result in clue and shoe, where F2 first rises for

and starts off high for , but then gradually falls. F3 usually retains its initial value

throughout the articulation, although some rising movement is noticeable in rude and shoe.

Fig. 9: the word pool

11

Fig. 9: the word shoe

For , no significant changes to the formants can be seen in most examples. After

in would and woman, F2 rises rapidly, although this rise is more apparent in would. All

articulations are short, usually around 50 ms long. The average frequency of F1 is around 400

Hz. F1 in most cases retains its stable position and does not exhibit any significant movements,

regardless of the environment. F2 is close to F1, although not as close as with . In addition,

no noticeable diphthongization occurs in any of the articulations for this sound.

Fig. 10: in would

In words rot, lot, top or dot, where in RP the sound is predominantly found, in

American English the sound is more common (Cruttenden 2008:84). has a slightly

higher F1 than and , for this speaker 870 Hz was the average value that was measured.

F1 and F2 are close to each other, and mostly holding their frequencies steadily in phonetic

contexts examined in this paper. In rot, F3 exhibits a sharp rise in frequency at the beginning of

the articulation, after being very low and close to F1 and F2 through the most part of the

articulation of . In most words, F3 has a rather weak energy and is often barely visible in the

12

spectrogram. In addition, no diphthongization was found in the articulation of for this

speaker.

Fig. 11: in lot

Fig. 12: in top

The vowel does not usually appear in American English in contexts without the

sound following it. The whole issue involving and other sounds that may be

pronounced in its place is rather more complex, and it depends from speaker to speaker. For

some speakers, there is a difference in vowel quality between the words force and north (Wells

1999:483). For the purposes of this paper, we will consider both words as having the same vowel

.

Since is involved in all contexts for , a great deal of rhotic coloring

(Olive/Greenwood/Coleman 1993:220) is present in all examples. In fact, all the words show a

similar pattern, and what is said for one word can easily apply to other words as well. F1 has an

average value of 524 Hz. It is stable during the articulation of the vowel, but it can have a slight

rise near the transition towards . F2 usually starts off low and close to the first formant, but

then gradually rises, while F3 falls. The duration of the vowel is not long, although it is not short

either. In four and score, it is around 150 ms long. F3 is high, often not entirely distinguishable.

13

Fig. 13: in four

Fig. 14: in score

F1 341 402 871 524

F2 1040 1194 1251 1037

F3 2646 2731 2727 3021

Table 2: Average formant frequencies (in Hz) for back vowels (Washington speaker)

2.3. CENTRAL VOWELS ( as in run, as in first and as in

cannon)

According to Olive, Greenwood, and Coleman, “the most central vowel is , the

vowel in bud. This vowel is recognized by having formant values that most resemble the values

of a neutral vocal tract; the first three formants are at approximately equal intervals”

(Olive/Greenwood/Coleman 1993:103-104). These statements, as can be seen from the table, are

not entirely consistent with the data from this measurement. Although it is true that the vowel is

central, it is not “the most central”, since both and , respectively, appear to be closer to

14

that relative position (around 1,500 Hz for the second formant). Even if we disregard the

difference of around 100 Hz, which is admittedly not big, we still cannot claim that, in this case,

is “the most central vowel”, since two more vowels occupy the same approximate position.

All this, naturally, applies to this speaker only, and may not be true for the other speakers.

The average value of F1 for is around 820 Hz, which is significantly higher than the

data from Olive, Greenwood and Coleman (1993:104) and Yao et al.(2010:87), and somewhat

higher than the data from Hillenbrand et al.(1995:3103) and Peterson and Barney (1952:183).

This value of F1 suggests a somewhat lower position of , which is in fact almost as low as

. Since all sample - words except hut include a final alveolar nasal, all spectrograms have

similar-looking patterns. All of the usual formant movements triggered by the preceding sounds

are present: the F2 and F3 moving away from each other after initial velar consonant, the rise of

F2 and F3 after the liquid , and the obvious nasalization of the vowel characterized by the

presence of the nasal formant. There are no indications that any form of diphthongization has

taken place.

Fig. 15: in hut

Fig. 16: in gun

15

In , there is a large amount of r-coloring, as can be expected from a rhotacized accent

of English. For this speaker, the articulation of the vowel is not long, and the pronunciation is

systematic, with no apparent diphthongization. The average value of F1 is around 520 Hz, which

is within what is usual. This vowel is mid-central, and its approximate position is very close to

that of . F3 is close to F2, sometimes even merging with it, as in church.

In fact, looks and sounds like a reversed variant of , there are no distinct areas

within the spectrogram that might be characterized as being pure sound. This is probably

why many classifications of American English vowels do not list as a distinct vowel.

However, “there is no acoustically distinct consonant area in the region of , and, therefore,

in a strictly concatenative-segmental analysis, we must consider this sound as part of the

American English vowel system” (Olive/Greenwood/Coleman 1993:104).

Fig. 17: in church

Fig. 18: in first

Sample-words containing sound all include contexts in which is found in an

unstressed position. Being is such a position, the articulation is very short, usually around 40 ms,

with up to 70 ms in Canada, cannon and comma.

16

Even in such short articulations, formants are systematic, in around formants even start to

fall in anticipation of , however short and barely visible this movement may be. The average

value of F1 is around 630 Hz, and for F2 it is 1,470 Hz. Since the lowest and the highest second

formants measured for this speaker were 1,040 Hz for and 2,840 Hz for , is placed

in the mid-central area.

Fig. 19: in Canada

Fig. 20: in appear

F1 819 523 632

F2 1395 1452 1472

F3 2628 1832 2560

Table 3: Average formant frequencies (in Hz) for central vowels (Washington speaker)

17

3. GEORGIA

3.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as

in trap)

The speaker from Georgia is a female in her mid forties, from Atlanta. What is

immediately noticeable is that, in conversation, she does not have the accent typical of someone

coming from the South, and claims that she had lost it through education and by moving around

USA and abroad. Her regional accent, therefore, might be influenced by other regional accents to

an extent that cannot be easily determined.

For , F1 does not have any notable movement, either up nor down. With the average

frequency of around 320 Hz, it is slightly higher than the data from Olive, Greenwood and

Coleman (1993:104), but also slightly lower than the measurements conducted by Hillenbrand et

al.(1995:3103), and Yao et al.(2010:87), respectively.

In bleed and fleet, similarly as in the case of the previous speaker, F2 and F3 exhibit a

rising movement after the articulation of l. On the other hand, both formants fall before m

in seem, and k in peak. The relative duration of the articulation is long, usually longer than

200 ms. There is no indication of diphthongization of this vowel in any of the examples.

Fig. 21: in fleet

18

Fig. 22: in peak

In instances where a nasal sound both precedes and follows , as in the word nymph,

F1 first rises to reach its target frequency, and immediately after that falls in anticipation of a

nasal sound m. In this example, the values for all three formants resemble sound more

than a typical sound. This similarity is even possible to notice audibly. In the word tip, there

is a noticeable diphthongization of , where towards the end of the articulation a more

centralized sound resembling is heard, resulting in the pronunciation [tp. This allophone

is mentioned by Wells (1999:485) as being present mostly in the southern parts of the USA,

although he found it only in environments when a following final sound is a voiced consonant.

F2 and F3, if not found in front of a nasal sound, usually retain their value throughout the

articulation, with small variations, depending on the phonetic context. The average value of the

F2 formant is around 1,860 Hz and the frequency of F3 is around 2,940 Hz.

Fig. 23: in nymph

19

Fig. 24: in tip

has a larger value of F1 and only a slightly lower value of F2 if compared with .

The sample-words chosen for this research include a limited number of phonetic contexts for

, with only t and d being in syllable-final position, while the preceding sounds include

p, t, b, f, l, and r.

The previously mentioned phenomenon of inserting after the vowels , , , does

not seem to be present in the articulation of , except perhaps in the word bed. It is possible

that this allophone was more frequently present in the pronunciation of this speaker in the past,

but because of the influence of other accents is now present only in certain words. The average

F1 value of 645 Hz is consistent with the findings of Yao et al. (2010:87)

F1 seems to be rather stable in all of the examples, without any noticeable variations in

frequency regardless of the phonetic environment. In instances where l precedes , F1 rises

in order to attain its target value, and this rise is almost instantaneous. F2 has an expected rise in

instances when r precedes the sound, and F3 is clearly visible in all examples.

Fig. 25: in bed

20

Fig. 26: in led

The average value of F1 for is around 703 Hz, which is the largest value of F1 for

all front vowels. This makes the lowest of the four on the traditional articulatory vowel

chart. Its shape seems to be uniform and its frequency steady throughout the articulation. In the

environment where a final d follows the vowel, a slight falling movement of F1 is visible. F2

is stable in the words trap and bad, while in the words with a nasal sound following the vowel F2

usually has a falling movement. In these examples, the influence of nasalization is clearly visible

in the presence of a nasal formant, which is characterized by a prominent low frequency F1.

(Olive/Greenwood/Coleman 1993:97)

Centralization of towards the end of its articulation is also noticeable and audible

upon closer inspection. Forms such as bnd or stmp seem to be occurring normally.

Another allophonic variation noticed by Wells, involving an “assimilatory off-glide to the

area” (Wells 1999:486), is also present with this speaker in the pronunciation of the word tank as

tk. This feature is most certainly the attribute specific of this speaker’s regional phonetic

heritage.

Fig. 27: in band

21

Fig. 28: in tank

F1 317 525 645 703

F2 2470 1865 1841 2036

F3 2997 2939 2917 2727

Table 4: Average formant frequencies (in Hz) for front vowels (Georgia speaker)


as in war)

F1 and F2, expectedly, are close in the articulation of , as is the case with all other

back vowels. F1 is low, similar to , only slightly higher. It holds a steady frequency of

around 340 Hz in average throughout the articulation, and in all of the sample-words. Depending

on the environment, F2 can have a larger separation from F1, namely in words in which ,

, and precede the vowel. In goose, F2 starts from a high position as a result of a

preceding sound and the velar pinch associated with its production. F3 rises in goose and

rude, and falls in new. Similar to the Washington speaker, this speaker also pronounces new with

the sound, which then has an identical effect on the formants as previously stated. When

follows the vowel, it does not seem to have the same effect on F2 as it does in cases when it

precedes it. There is no apparent movement of F2 and, in fact, all three formants have a rather

steady frequency.

22

Fig. 29: in goose

Fig. 30: in rude

In crooked, F2 and F3 are close, but the vowel itself is very short, formants are visible for

only 40 ms before fading out quickly and completely. F1 and F2 are not as close as in ,

although in full they almost merge. There is almost no movement of formants when the

following sound is . The influence of in the word crooked seems to be minimal in the

area where the vowel has already started its articulation.

The duration of the articulation is generally short, with the exception of in could,

which is over 200 ms long, but it cannot be said that this vowel has become long in this context

since the data from Hillenbrand et al. (1995:3103), and Yao et al. (2010:87), to name but a few,

is even longer. The average duration of F1 is 430 Hz, and of F2 1,250 Hz.

23

Fig. 31: in crooked

Fig. 32: in could

For , F1 is usually around 780 Hz, it is steady with no significant movement. There

is, however, a small rising movement at the onset of the vowel preceded by , as in the word

jot. Here, F1 and F2 start away from each other and then move closer to reach their target values.

A similar situation is visible in dot. There is a gradual rise of F2 in situations where a liquid

precedes , as evident from the spectrograms in lot and rot. The average value of F2 is

around 1,260 Hz, which is lower if compared with the data from Hillenbrand et al. (1995:3103)

and Yao et al. (2010:87), but consistent with the data from Peterson and Barney (1952:183). F3

is weak, but usually steady in its frequency, except after a liquid, when a sharp rising movement

is visible. There are no indications of diphthongization in the articulation of this vowel for this

speaker.

24

Fig. 33: in jot

Fig. 34: in dot

In , F1 is low for this speaker, in fact, it is the lowest among all three. The average

frequency of F1 is only 366 Hz, which is significantly lower than the data from Hillenbrand et al.

(1995:3103), Yao et al. (2010:87), and Olive, Greenwood and Coleman (1993:104). The

following always has a similar effect on the formants, usually increasing the value of F2 and

decreasing the value of F3. In many sample-words, formants do not seem to be particularly

steady, often having rapid movements up or down. F1 and F2 are close for the most part of the

articulation, usually with the difference between 400 and 500 Hz. F3 is the least prominent of all

three formants, with the least amount of energy.

Fig. 35: the word more

25

Fig. 36: the word war

F1 340 434 781 366

F2 1288 1252 1267 785

F3 2614 2609 2696 2420

Table 5: Average formant frequencies (in Hz) for back vowels (Georgia speaker)


cannon)

The average value of F1 measured for is 740 Hz, which places this vowel rather low,

almost to the level of . The vowel is in the central position in the vowel chart, with the

average value of F2 around 1,400 Hz. The articulations are usually not long, which is normal

since is considered a lax vowel (O’Grady/Dobrovolski/ Katamba 1997:42). Apart from

being slightly lower than what would be usual, there are no other noticeable differences between

this vowel and current relevant phonetic descriptions of this sound.

26

Fig. 37: in fun

Fig. 38: in hut

As with the Washington speaker, the Georgia speaker also has a rhotacized , evident

by the large amount of r-coloring. Apart from being slightly more front, there are no other

significant differences between the articulations of for these two speakers. The average

value of F1 is around 550 Hz, which is consistent with both Hillenbrand et al. (1995:3103), and

Peterson and Barney (1952:183), respectively. The second formant’s average value is around

1,380 Hz, which is somewhat lower than the data from the previously mentioned sources.

Characteristically, F2 and F3 are very close, sometimes even barely distinguishable from one

another.

Fig. 39: in curse

27

Fig. 40: in journey

The sound for this speaker is short, in initial positions in words like approve, above

etc. it is around 50 ms long. It appears that this sound is somewhat longer if found in medial or

final position. In the words Canada, cannon, and comma, the articulation is between 70 and 80

ms long, depending on the particular word. The average value of F1 is around 650 Hz, which is a

little higher than for , and thus a little lower on the vowel chart. F2 is around 1,360 Hz.

Fig. 41: in above

Fig. 42: in cannon

28

F1 738 556 651

F2 1397 1379 1362

F3 2569 1828 2745

Table 6: Average formant frequencies (in Hz) for central vowels (Georgia speaker)

29

4. ALABAMA

4.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and

as in trap)

The speaker from Alabama is a female in her late forties, college educated, who has lived

in Mobile, Alabama, her whole life. Her accent is noticeably Southern, with all of its typical

features. Admits that in certain parts of the USA, speaking in a Southern accent still carries a

level of social stigma, but by her own account, she has never tried to change it. This, among

other obvious reasons, makes this speaker a relevant representative of Southern/Alabama accent.

The formants are not always easily visible in . They often show discontinuation and

sometimes completely disappear. In those instances it was necessary to rely only on the

computer program in measuring them, thus certain incorrect measurements are unfortunately

possible. Another difficulty stems from the relative closeness of F2 and F3. In words like beat or

deep, F2 and F3 are especially close to one another, making them more difficult to differentiate,

and thus, properly measure.

All articulations of this vowel are long, much longer than for the other two speakers. In

bleed, for example, the vowel articulation is around 350 ms long. The F1 frequency is low,

which is typical of this vowel. The average value of F1 is around 320 Hz, which corresponds

with the values from Peterson and Barney (1952:183) and Hillenbrand et al. (1995:3103). F2

exhibits a rising movement in bleed and fleet, but also in beat. F3 usually does not vary too

much; however, it is sometimes difficult to distinguish. The largest separation among the first

three formants is between F1 and F2, with F2 being especially high, but not higher in comparison

to the Washington speaker.

30

Fig. 43: in beat

Fig. 44: in deep

In nymph , F1 has a somewhat lower frequency value, possibly because of the

inability of the computer to properly measure it, since the nasal formant is present throughout the

articulation, suggesting a strong nasalization. Because the nasal formant seems to have a similar,

but somewhat lower value, it has possibly influenced the measurement of F1 by the computer

that was, it seems, unable to tell apart between F1 and the nasal formant. The spectrogram itself

is also not conclusive, but visual analysis seems to suggest the value of F1 at around 525 Hz.

In myth and rip, F2 and F3 show a sharp rise in frequency as a result of a nasal sound

preceding the vowel in former, and a liquid in latter. In almost all words, a mild diphthongization

is audible, usually involving a glide towards the sound . In other sample-words, formants

seem to have a more or less steady frequency throughout the articulation, with minor adaptations

in order to reach their target values.

31

Fig. 45: in nymph

Fig. 46: in myth

In most cases, is heard at the end of the articulation of the sound . This is most

evident in words like bed, red or ted, i.e. before final voiced consonants. F1 is high, the highest

for all three speakers with the average value of around 704 Hz. The formants are not always

clearly visible, F1 is not especially prominent in let, and F2 and F3 in bet.

The duration of the vowel articulation is long, usually between 250 and 300 ms, but

sometimes even longer.

Fig. 47: in bed

32

Fig. 48: in let

This speaker exhibits the largest amount of diphthongization of the vowel in

comparison with the other two speakers. Words like trap, bad or stamp, sound more like[],

[] or [] rather than, and, the pronunciations provided by Wells in his pronunciation

dictionary: (Wells 2000:793), (Wells 2000:61), and (Wells

2000:729). [] is a common substandard substitution for (Kenyon 1964:156).

In most sample-words, is found preceding the nasal consonant, thus F2 and F3 often

move up or down in response to the phonetic environment. In stamp, F1 is masked by the

presence of the nasal formant, i.e. it is almost indistinguishable in the spectrogram. The computer

measured it at around 285 Hz, but this is highly doubtful, for it is probably the frequency of the

nasal formant that the computer measured, and not the F1 of . Visual analysis seems to

suggest F1 being at around 900 Hz, but this is open to some debate since the formant is only

faintly visible, mostly at the beginning of the articulation. The same problem arose in all other

words where a nasal followed , thus the computer measurement could not be taken as

reliable. In those cases, visual analysis was the primary means of measurement.

Fig. 49: in bad

33

Fig. 50: in stamp

F1 323 500 704 814

F2 2626 2157 2049 2026

F3 3164 2935 2947 2718

Table 7: Average formant frequencies (in Hz) for front vowels (Alabama speaker)


as in war)

In goose, formants F2 and F3 move rapidly from each other and quickly attain their target

values. F2 is high, and it never comes close to F1, which would normally be expected in back

vowels. In fool and pool, a centring off glide into sound is heard in the second half of the

articulation, resulting in [] and [], respectively. This pronunciation is typical for some

speakers, especially in the mid western speech (Wells 1999:487), but throughout the country as

well (Kenyon 1964:172). In clue and rude, F2 rises afterand , and new is pronounced

[], which is why formants seem to have a much more stable and steady frequency as opposed

to what was visible for the other two speakers. The average value of F1 is around 425 Hz, and

for F2 it is around 1,490 Hz. In instances where neighboring sounds do not have significant

influence on F2, F1 and F2 are close. The duration of the vowel is longer than with the other two

speakers, often more than 400 ms.

34

Fig. 51: the word fool []

Fig. 52: the word new []

In full, the vowel pronounced is closer to than to , the sound that would

normally be expected in American English (Wells 2000:311). In could, an off glide into is

evident, resulting in []. In this situation, the vowel is pronounced longer than usual, more

than 300 ms in this particular example. F1 and F2 are close with little or no movement, except in

sugar, where F2 starts high and then falls in order to attain its target value. The average

frequency of F1 is around 470 Hz, which is normal for a female speaker. F3 usually has a weak

energy, and sometimes it is barely visible in the spectrogram.

Fig. 53: the word full

35

Fig. 54: the word could

The average value of F1 in is 990 Hz. F2 is very close at 1,340 Hz. In lot, F1 and F2

are so close that they merge, making them indistinguishable from one another. F3 is only barely

visible in almost all sample-words, where only in rot a slightly stronger F3 can be seen having a

rising movement as a result of a liquid preceding the vowel. The first two formants have a steady

value throughout the articulation, except in those instances where preceding sounds caused a

movement, as for example in jot and dot, or as in already mentioned liquid-to-vowel sequences.

Fig. 55: in dot

Fig. 56: in jot

In most of the sample- words, but most notably in four and score, this speaker produces

an off glide to the area, resulting in [] and [], respectively. In north, F2 and F3

36

start the articulation at almost the same height, and then separate, F2 moving down while F3 up.

The average F1 frequency is 430 Hz, which is low, but can be explained by the influence of

on the preceding vowel. F2 is close to F1, usually with a 400 Hz difference in frequency. The

duration of the vowel is predictably long, with only in boring being less than 200 ms.

Fig. 57: [] in four

Fig. 57: [] in score

F1 424 470 992 432

F2 1489 1200 1340 848

F3 2276 2374 1897 2127

Table 8: Average formant frequencies (in Hz) for back vowels (Alabama speaker)

37


cannon)

For this speaker, is articulated rather long, which seems to be a general feature for

this speaker, observed for almost every vowel. The rate of speaking is, obviously slower, so

every vowel appears to be much longer if compared with the other two speakers’ pronunciation.

This speaker, however, retains the normal long vs. short vowel distinction, with the difference of

having longer articulations than usual for short vowels, and even longer ones for long vowels.

The duration of for this speaker ranges from 150 ms to 250 ms. Surprisingly, in hut, the

articulation is around 250 ms long, which is much longer than for the other two speakers whose

articulation in hut is 50 ms for the Washington speaker, and 80 ms for the Georgia speaker. The

signal for the first formant is not prominent throughout the articulation. In many cases it almost

disappears or becomes faintly visible at best, making it more difficult to measure. The average

formant value is around 715 Hz for F1, and 1,415 Hz for F2. This confirms as a central

vowel, being somewhat “more front” than.

Fig. 58: in hut

Fig. 59: in sun

38

This speaker, like the rest, produces a large amount of r-coloring of the vowel when

pronouncing words like worse, church etc. The average value of F1 for vowel is around 485

Hz, which is consistent with the data from Hillenbrand et al.(1995:3103) and Peterson and

Barney (1952:183). F2 is around 1,530 Hz, which places this vowel firmly in the central area of

the vowel chart. F2 and F3 are almost merged, and there is no obvious diphthongization of this

sound visible in the spectrogram.

Fig. 60: in worse

Fig. 61: in church

The sound is pronounced short, its length being approximately the same as for the

other two speakers, which is not always the case when other short vowels are in question. The

average value of F1 is around 500 Hz, which is similar to the value of . However, these two

vowels do not occupy the same position within the vowel chart, since the lower F2 in placed

it somewhat behind . Although the final sound in Canada was not part of the original

measurement, it is interesting to note that the Georgia and the Alabama speakers pronounce it as

what is best transcribed as [], while the Washington speaker uses , thus

pronouncing [].

39

Fig. 62: in Canada

Fig. 63: in appear

F1 715 484 497

F2 1415 1533 1284

F3 2286 1806 1862

Table 9: Average formant frequencies (in Hz) for central vowels (Alabama speaker)

40

5. CONCLUSION

By looking at the data presented so far, it is possible to draw general conclusions about

the personal and/or regional characteristics of speech of the analyzed speakers. Because of the

limitations in length and volume, we will not be looking at all the noticeable differences visible

for each speaker. Hence, it must be noted that this paper is not the complete analysis and that it

does not deal with all the regional and individual characteristics that the regional dialects of

Alabama, Georgia and Washington normally exhibit.

Fig. 64: The Alabama speaker vowel chart

One general conclusion can be made by looking at the data. Voicing seems to have no

influence on the frequency of F1-F3. Measurements showed similar values in both voiced and

voiceless environments.

It is also noticeable that F1 is the lowest for front, high vowels and the highest for front,

low vowels. F2 behaves in the opposite manner: the value decreases as the tongue moves lower

in the mouth. The separation between F1 and F2 is the largest with high vowels, and decreases

towards low positions.

As far as individual words are concerned, the difference in the pronunciation of new and

Canada was observed. The speakers from Washington and Georgia pronounced the word new as

[], while the Alabama speaker pronounced it []. In Canada, the speakers from Alabama

and Georgia seem to have an [] sound at the end of the pronunciation, pronouncing it

[], while the Washington speaker pronounced it [].

41

One common thing observed for all three speakers is the amount of r – coloring for

sounds and , which is usual and normal for most speakers of American English.

Further analysis shows that even though articulation of some short vowels was rather

long in the case of the Alabama speaker, the long vs. short relation between the vowels was still

preserved. The long vowels simply had an even longer articulation. In addition, the length of the

vowel did not affect the behavior of the formants. The articulation of was very short for all

three speakers, however formants still moved in anticipation of neighboring sounds, like .

Upon examination of the vowel charts created for each individual speaker’s first and

second formant values, it is noticeable that, in the case of certain vowels, the relative position of

the vowel is different from speaker to speaker. , for example, is similar, and its position as

being the highest and also the most front vowel for all three speakers is, therefore, confirmed. On

the other hand, some vowels seem to be pronounced at completely different positions. The most

obvious difference is the relative position of articulation for the vowel . We can see that, for

the Alabama speaker, is heavily centralized, and basically not very far from . The

centralization of is noticeable for the Georgia speaker as well, although not in such an

extreme way. Its short and lax counterpart, the vowel , seems to be at the position for all

three speakers.

Fig. 65: The Georgia speaker vowel chart

42

The most peculiarly looking vowel chart is seen for the Alabama speaker. This speaker

seems to have a number of vowels grouped together and not very far from each other, all in the

central, mid-high area of the vowel chart. What is surprising is the fact that, unlike the other two

speakers, for this speaker seems to be not as central as the vowel , which might suggest

that this speaker clearly differentiates between these two vowels.

Fig. 66: The Washington speaker vowel chart

The only vowel that is truly back for the two speakers belonging to the Southern dialect is

the vowel . For the Washington speaker, alongside , the vowel also exhibits the

same degree of backness. remains the lowest vowel for all three speakers, although the

relative position of is somewhat different for the Alabama speaker, being somewhat higher.

Front vowels and are close in the vowel chart for the two speakers from the South.

This suggests that in instances where vowel is normally found, these speakers show

inclination towards pronouncing a sound similar to instead. As mentioned previously, the

Alabama speaker exhibits strong diphthongization of , resulting in [], which is a variant

associated with eastern New England and the south respectively (Wells 1999:477). Other

differences, usually involving off gliding into other sounds, are presented in the main discussion

for each individual speaker and need not be repeated here.

It can be concluded that the vowel system for these speakers, in most part, does not differ

significantly. Differences that were found may be explained as either individual idiolects, or

instances of regional variation.

43

REFERENCES

Chen, H. C. & M. J. Wang. 2012. An Acoustic Analysis of Chinese and English Vowels.

Retrieved from:

http://184.168.176.242/files/lebanon/An%20Acoustic%20Analysis%20of%20Chinese%2

0and%20English%20Vowels.pdf [August 15, 2012].

Cruttenden, A. 2008. Gimson’s Pronunciation of English. Oxford: Oxford University Press.

Hillenbrand, J., L. A. Getty, M. J. Clark, and K. Wheeler. 1995. Acoustic characteristics of

American English vowels. Journal of the Acoustical Society of America 97(5),

3099-3111.

Kenyon, J. S. 1964. American Pronunciation, tenth edition. Ann Arbor: George Wahr Publishing

Company.

Ladefoged, P. & K. Johnson. 2010. A Course in Phonetics, sixth edition. Wadsworth: Cengage

Learning.

O’Grady, W., M. Dobrovolsky, and F. Katamba. 1997. Contemporary Linguistics: An

Introduction. Harlow: Pearson Education Limited.

Olive, J.P., A. Greenwood, and J. Coleman. 1993. Acoustics of American English Speech: A

Dynamic Approach. New York: Springer-Verlag.

Peterson G.E., and H.L. Barney. 1952. Control methods used in a study

of the vowels. Journal of the Acoustical Society of America. 24, 175-184.

Thomas, C.K. 1958. An Introduction to the Phonetics of American English, second edition. New

York: The Ronald Press Company.

Wells, J. C. 1999. Accents of English – Beyond the British Isles, volume 3. Cambridge:

Cambridge University Press.

Wells, J.C. 2000. Longman Pronunciation Dictionary. Harlow: Pearson Education Limited.

Yao,Y., S. Tilsen, R.S. Sprouse, K. Johnson. 2010. Automated Measurement of vowel formants

in the Buckeye Corpus. UC Berkeley Phonology Lab Annual Report

Retrieved from:

http://conf.ling.cornell.edu/~tilsen/papers/Yao%20et%20al.%20-%202010%20-

%20buckeye%20vowels.pdf [August 20, 2012].

http://conf.ling.cornell.edu/~tilsen/papers/Yao%20et%20al.%20-%202010%20-%20buckeye%20vowels.pdf

http://conf.ling.cornell.edu/~tilsen/papers/Yao%20et%20al.%20-%202010%20-%20buckeye%20vowels.pdf

acoustic analysis of some american english vowels - aleksandar belic - master thesis

Documents

formanti samoglasnika

certain vowels

certain speakers

frequencies of vowel

acoustic phonetics

vowel charts

speakers regional dialects

vowel positions