acoustic analysis of some american english vowels - aleksandar belic - master thesis
DESCRIPTION
The objective of this paper is to measure and compare the frequencies of vowel formants of three native speakers of American English. The speakers were recorded while reading from a list of randomly chosen words, and afterwards the acoustic analysis was conducted with the help of a computer program, measuring the frequencies of the first three formants. The formant frequencies were measured either at a point where formants reach their steady state, or in the middle of the articulation if the steady state was not visible. Vowel charts were made to illustrate the vowel positions graphically. The data showed that certain vowels exhibit slightly different qualities from speaker to speaker. Some of the differences observed were explained as being varieties within the speakers’ regional dialects, and some as their individual idiolects. The analysis also showed that certain amount of diphthongization is present with certain speakers when the vowel in question precedes certain consonants in word-final position. However, the majority of the vowels showed no significant difference in quality between the speakers.TRANSCRIPT
UNIVERZITET U BEOGRADU
FILOLOŠKI FAKULTET
Aleksandar Belić
ACOUSTIC ANALYSIS OF SOME AMERICAN ENGLISH
VOWELS
Diplomski-master rad
Mentor: prof. dr Biljana Čubrović
Beograd, 2012.
AKUSTIČKA ANALIZA NEKIH SAMOGLASNIKA AMERIČKOG
ENGLESKOG
APSTRAKT
Cilj ovog rada je bio izmjeriti te uporediti frekvencije formanata samoglasnika troje
govornika američkog engleskog kao maternjeg jezika. Govornici su snimljeni dok su čitali sa
spiska nasumično odabranih riječi, a akustička analiza je izvršena nakon toga uz pomoć
kompjuterskog programa, mjereći frekvencije prvih triju formanata. Frekvencije formanata su
mjerene ili na mjestu gdje formanti postižu stabilno stanje, ili na sredini artikulacije ako stabilno
stanje nije vidljivo. Dijagrami samoglasnika su napravljeni da bi grafički ilustrovali pozicije
samoglasnika. Podaci su pokazali da određeni samoglasnici ispoljavaju neznatno drukčije
osobine od govornika do govornika. Neke od uočenih razlika su objašnjene kao varijante unutar
govornikovog regionalnog dijalekta, a neke kao njihovi individualni idiolekti. Analiza je takođe
pokazala da je kod nekih govornika prisutan određeni nivo diftongizacije samoglasnika kada se
taj samoglasnik nalazi ispred nekih samoglasnika u finalnoj poziciji riječi. Ipak, većina
samoglasnika ne ispoljava značajnije razlike u kvaliteti između govornika.
Ključne riječi:
akustička fonetika, formanti samoglasnika, samoglasnici u američkom engleskom, frekvencija
formanata, akustička analiza, regionalni dijalekti
ACOUSTIC ANALYSIS OF SOME AMERICAN ENGLISH VOWELS
ABSTRACT
The objective of this paper is to measure and compare the frequencies of vowel formants
of three native speakers of American English. The speakers were recorded while reading from a
list of randomly chosen words, and afterwards the acoustic analysis was conducted with the help
of a computer program, measuring the frequencies of the first three formants. The formant
frequencies were measured either at a point where formants reach their steady state, or in the
middle of the articulation if the steady state was not visible. Vowel charts were made to illustrate
the vowel positions graphically. The data showed that certain vowels exhibit slightly different
qualities from speaker to speaker. Some of the differences observed were explained as being
varieties within the speakers’ regional dialects, and some as their individual idiolects. The
analysis also showed that certain amount of diphthongization is present with certain speakers
when the vowel in question precedes certain consonants in word-final position. However, the
majority of the vowels showed no significant difference in quality between the speakers.
Key words:
acoustic phonetics, vowel formants, American English vowels, formant frequency, acoustic
analysis, regional dialects
CONTENTS
1. INTRODUCTION ................................................................................................................... 1
1.1. METHOD ..................................................................................................................................... 4
2. WASHINGTON STATE ......................................................................................................... 5
2.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) ............... 5
2.2. BACK VOWELS ( as in goose, as in took, as in top, and as in war)........ 10
2.3. CENTRAL VOWELS ( as in run, as in first and as in cannon)........................... 13
3. GEORGIA ............................................................................................................................. 17
3.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) .............. 17
3.2. BACK VOWELS ( as in goose, as in took, as in top, and as in war)........ 21
3.3. CENTRAL VOWELS ( as in run, as in first and as in cannon)........................... 25
4. ALABAMA ........................................................................................................................... 29
4.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as in trap) ............. 29
4.2. BACK VOWELS ( as in goose, as in took, as in top, and as in war)........ 33
4.3. CENTRAL VOWELS ( as in run, as in first and as in cannon)........................... 37
5. CONCLUSION ..................................................................................................................... 40
REFERENCES ............................................................................................................................. 43
1
1. INTRODUCTION
We can arguably say that vowel sounds are the backbone of every language in the world.
In fact, there is a strong concurrence among linguists that languages without vowels are not only
non-existent, but also impossible. This is, of course, only logical, because vowels are considered
the least marked sounds, and therefore there is no reason why at least some vowels would not be
incorporated into the sound system of a particular language. The number of vowels, however,
may vary. The most common vowel system has five vowels, but there are languages with three,
or even fewer vowels, although they are very rare (O’Grady/Dobrovolski/Katamba 1997:375).
The number of vowels in English (not including diphthongs), and more importantly, their
quality, can vary, depending on the country and the accent spoken in that particular region.
American English, for example, can have a wide variety of vowels, some of which can be
regarded as being only the variants of the same vowel, but certain authors regard them as being
individual phonemes. The conclusion is that every dialect has a separate vowel system. However,
even when trying to describe all the vowels in all of the American dialects, not all authors
operate with the same number of vowels. Kenyon lists sixteen vowels (1964:28-29), Wells lists
eleven (1999:472)1, and Thomas lists seventeen (1958:128). On the other hand, Ladefoged and
Johnson (2010:90) mention only nine, while Olive, Greenwood and Coleman (1993:20) operate
with twelve. These examples show that there is no definite way of determining the exact number
of vowels in American English, since different authors have different understanding of whether
certain sounds ought to be classified as being only one vowel or more. It would be difficult to
determine the exact status of each of the sounds without going deeper into the study of dialectal
origins of the differences that caused them.
This paper will operate a system of 11 monophthongs, dutifully recognizing differences
that some vowel variations exhibit in certain contexts. For example, Wells (1999) regards the
vowel in the word sport as being of a different quality than the vowel in the word short. While
this is undeniably true, it makes analysis more complicated, which this paper will try to avoid.
Having this in mind, the symbols used in this paper should be understood as symbols for a
1 But in his Longman Pronunciation Dictionary (2000), he lists 12 vowels (diphthongs excluded)
2
particular “group” of vowels, where each symbol represents a group of possible vowel variations
found in different accents. Thus, the symbol // will represent both vowel variations found in
sport and short, respectively.
The traditional view on the analysis of vowel sounds recognizes two distinctive methods:
articulatory and acoustic. The former is, perhaps, more “anatomical“, for it deals with the actual
position and/or movement of the articulatory organs within the vocal tract. The latter, on the
other hand, is founded in physics, and is primarily concerned with the acoustic properties of
sounds, which may, or may not coincide with the articulatory descriptions.
The problem with the traditional distinctive feature framework, as Olive, Greenwood and
Coleman (1993:28-32) suggested it, is its inability to provide descriptions that are more precise
when more subtle differences between vowels are in question. For example, in the traditional
binary classification, vowels are regarded as being high, low, back, round and tense,
whereas the presence of the particular feature would mark the sound as being +feature, and –
feature if the feature was not present. However, certain vowels are neither high nor low, but
somewhere in between, making them difficult to describe using only this system.
Acoustic analysis, on the other hand, provides a more precise method of description,
where more subtle changes caused by the movement of the articulators are visible, more easily
tracked, measured, and therefore described. With the use of a spectrogram, minute differences in
the quality of the sound can be analyzed, and also graphically presented, which is difficult to do
using the traditional binary classification.
The principal component that needs to be taken into account when analyzing vowels is
the frequency of its formants. Formants can be defined as “resonances of the vocal tract that have
a specific frequency expressed in hertz (Hz). In most cases, the first two formants are sufficient
to characterize speech sounds, but occasionally the third formant is also useful for description”
(Olive/Greenwood/ Coleman 1993:80).
Before the analysis itself, a certain geographical identification of speech varieties needs
to be made. Since the speakers whose speech will be analyzed in this paper come from different
states (Alabama, Georgia, and Washington), some kind of geographical labeling needs to be
established. In order to place the speakers into established groups, in this case speech areas, one
needs to determine them exactly. The literature on this matter offers a wide variety of solutions,
3
and maps of the USA that portray America’s three major speech areas existed even before
WWII. From this simple 3-way division (Eastern, Southern, General American), to a more
complex 8-way division from the 50’s (Thomas 1958:232) and the 70’s (Wood 1972 as cited in
Wells 1999:528), the general understanding that American pronunciation is in no way uniform in
all parts of the United States has been evident from the start.
The issue becomes even more complicated in modern times, when considerable accent
and population shifting have taken place. This has led to further fragmentation of speech areas,
which has made precise dialect division more difficult to determine. Although certain general
characteristics of local speech that differentiate it from other areas still very much exist, it is not
so evident and clear-cut today as it has been in the past.
The majority of dialectologists, however, would agree to place Alabama and Georgia
speech into Southern, and Washington into Western, or more precisely, Pacific Northwest area.
Fig. 1: Major dialect areas of the United States (Thomas 1958)
Having all this in mind, the intention of this paper will be to analyze and compare the
vowel articulation of three speakers of American English, measure the frequencies of the first
4
three formants in all vowels, and draw general conclusions on whether these dialects differ in the
way their vowels are being pronounced, and to what extent.
Therefore, this paper will use acoustic analysis with the help of a computer program to
describe vowel articulation of three native speakers of American English. The previously
determined “target” values found in textbooks and other sources will serve as a reference point,
but only to some degree, for it would be misleading to take these values as “absolute”. It must
be noted that “formant values vary across speakers and depend on many variables. Even for a
given speaker, formants may change according to phonetic contexts, manner of speaking and rate
of speech. In fact, it should be stressed that there are no absolutely rigid descriptions of
phonemes”. (Olive/Greenwood/Coleman 1993:81)
1.1. METHOD
Three female speakers each recorded a total of 77 words, containing 11 vowels of the
American English, in various positions and phonetic contexts. The words were chosen randomly,
making sure that at least two phonetic contexts were present. The recording was made on one
afternoon using a Shure PG47 microphone, and a laptop computer. The analysis was done with
the help of the computer program “Praat”. The maximum formant frequency setting was changed
when and where it was necessary, in most cases the default setting of 4500 Hz was used. The
formants were measured in the usual way, at the place where all three formants exhibit a “steady
state”, or, in cases when this was not possible, in the middle of the vowel articulation. The
speakers are given names according to the states which they come from, therefore the terms “the
Alabama speaker” or “the Washington speaker” will be used throughout this paper.
It is important to emphasize that not all phonetic contexts have been taken into account,
due to restrictions in the length and volume. Nevertheless, even without analyzing all the
possible phonetic contexts, combinations and changes a vowel might manifest when influenced
by a neighboring sound, it is still possible to draw a general conclusion on how different (or
similar) certain vowels are in terms of their formants’ frequencies.
5
2. WASHINGTON STATE
2.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and
as in trap)
The speaker from Washington is a female in her mid 30s, born in Arlington, WA. She
works in education, and has a college degree. Since she never lived outside Washington, the
possible influence of other regional dialects on her own is minimal.
The first thing that is immediately noticeable in the spectrographic representation of her
articulation is the relative steadiness of the formants for the articulation of, especially of the
first formant. The values for the first formant are relatively close to the values suggested by
Olive, Greenwood & Coleman (1993:104) with the average of about 300 Hz. The frequency of
the first formant seems to have little or no variation throughout the articulation, regardless of the
phonetic environment. Even in the instances when F2 and F3 move as a result of co-articulation,
F1 retains its approximate value.
In the words bleed and fleet, F2 rises significantly from the target value for /l/ and almost
merges with the third formant. This is especially visible in the former, where the values of F2
and F3 differ by only 45 Hz. In other examples, neighboring sounds influence the frequency of
the F2 in the expected manner. After fricative sounds, F2 has a slight rising movement, and the
same is visible in instances where the preceding sound is a bilabial stop /b/. The voicing,
however, seems to have no influence on the frequency of F1-F3, since the relative values of F1-
F3 are the same for in both deep and peak, respectively. In all of the examples, F2 remains
significantly high in comparison to the data provided by Olive, Greenwood and Coleman
(1993:104). However, data from a Hillenbrand et al.(1995:3103) suggests that F2 values are
somewhat higher than stated by Olive, Greenwood and Coleman (1993:104), probably because
Olive, Greenwood and Coleman never stated the sex of their speakers. Stevens (1998:288) gives
data from both male and female speakers, and the relative values of F2 for female speakers
resemble greatly the data in this paper.
6
Although F3 is usually not essential for vowel identification, it was measured
nevertheless. Spectrograms show that F3 is the least prominent of all three, often hardly even
noticeable, and rarely with the steadiness in frequency found in F1. With the average value of
around 3,440 Hz, its approximate value seems to be consistent with the data by Stevens
(1998:288) and Hillenbrand et al. (1995:3103).
Fig. 2: in bleed
Fig. 3: in deep
For front vowels, F1 becomes lower when the constriction in the oral cavity increases.
is the most constricted vowel. F1 increases as the tongue position gets lower. In addition,
has the highest F2 and has the lowest F2.(Chen/Wang 2012) Consequently, is
expected to have a higher F1 value than , and the data confirms it. The average value of F1
for the vowel is around 450 Hz, the highest being measured in the word kit (510 Hz), and the
lowest in the word rip (380 Hz). There are no significant formant variations in any of the
examples. The articulation is short, between 60 and 70 milliseconds. The strongest signal appears
to be in the word rip, where the frequency of the first formant resembles the F1 of /r/, and the
signal becomes darker as the articulation of the vowel begins.
7
What is also noticeable from the spectrogram is the rise of the F2 and F3 in the word rip.
F2 starts at around 1,200 Hz at the beginning of r, and immediately starts to rise until reaching
its steady state at around 1,870 Hz. In instances when a velar sound precedes the vowel, F2 and
F3 are close at the beginning of the articulation, and then start to move away from each other, as
can be seen in the word kit. This is the result of a velar pinch which is characterized by the
coming together of F2 and F3 during the articulation of a velar consonant
(Olive/Greenwood/Coleman 1993:85). In addition, F2 and F3 exhibit a slight rising movement in
instances when is preceded by a nasal sound. In other phonetic contexts, namely when
preceded by a fricative s, or a voiceless stop p, F2 and F3 seem to have a steady frequency
throughout the articulation, with little or no variation. The average value of 2,190 Hz for F2 is
consistent with the data from Stevens (1998:288), and the average value for F3 (3,030 Hz) is
almost identical with the findings of Hillenbrand et al. (1995:3103).
Fig. 3: in rip
Fig. 4: in kit
The vowel is more back and also lower than or , as suggested by Ladefoged
and Johnson (2010:90). As a result, F1 will be higher, and F2 lower in comparison to or
8
. All of the examples show the steadiness of F1 during most part of the articulation. Slight
rising movement is visible in instances when a voiced bilabial b precedes the vowel, and F1
falls if the sound following it also is a voiced stop. In the words red, let and led, F1 retains its
frequency throughout the articulation, while F2 and F3 move upwards to reach their target
values. The average value for F1 is around 620 Hz.
All three formants are usually visible in the spectrogram, F3 being the least prominent.
F2 appears to be the least stable one, often having a rising or falling movement because of the
phonetic context. Its average frequency is 2,080 Hz. The average duration of the vowel seems to
be somewhat longer than for , often being more than 100 ms.
Fig. 5: in red
Fig. 6: in bet
The maximum separation (for the front vowels) between F1 and F2 occurs with the
highest vowel, and is the smallest with the lowest (Olive/Greenwood/Coleman 1993:102). This is
clearly noticeable from the data in this research. While the separation between F1 and F2, i.e. the
difference in their frequencies, was around 2,500 Hz for , for it was only around 1,200
Hz. Not all possible phonetic contexts were taken into account for . The focus was the
influence of nasal sounds on the vowel in instances when it follows the vowel in question.
9
Preceding sound in these examples is usually a stop, voiced or voiceless; in one instance, the
voiceless sound is in an unaccented position to show the influence of aspiration (or the lack of it)
on the visibility and movement of the formants.
In the word trap, F1 starts to rise immediately after becoming visible in the spectrogram
at the onset of /r/, and quickly reaches its steady state at around 950 Hz. The F1 value measured
in the middle of the articulation was 970 Hz. In other examples, F1 seems to be rather stable
throughout the articulation, with the exception of the word stamp, where F1 seems to be rising at
the beginning of the articulation, possibly as a result of the transition from an unaspirated t to
. F3 is hardly even noticeable in trap, and its projected value of 2,850 Hz is, to some extent,
disputable. What is also typical of F2 is its fall before nasal sounds, in words such as candle,
stamp, or sand.
Fig. 7: the word trap
Fig. 8: in stamp
10
F1 300 454 618 848
F2 2840 2187 2083 2027
F3 3447 3033 2689 3028
Table 1: Average formant frequencies (in Hz) for front vowels (Washington speaker)
2.2. BACK VOWELS ( as in goose, as in took, as in top, and
as in war)
The back vowels differ from the front vowels in that F2 is much lower and closer to F1
for the back vowels than for the front (Olive/Greenwood/Coleman 1993:103). This is evident
from the spectrograms for this speaker as well. In words like fool and pool, F1 and F2 are
especially close to each other, with the difference in frequency of some 400 Hz. In goose, this is
not the case, since F2 has a falling movement from a high position after the velar pinch.
This speaker pronounces the word new as [], with a clear distinction between
and , and not as , which is also a common pronunciation of this word in American
English. This kind of pronunciation influences the shape of F2, since normally has a higher
F2 than what is usual for (Olive/Greenwood/Coleman 1993:118). This results in a
downward movement of F2 towards its target value, which, in this case, is around 970 Hz.
The sounds and create a similar result in clue and shoe, where F2 first rises for
and starts off high for , but then gradually falls. F3 usually retains its initial value
throughout the articulation, although some rising movement is noticeable in rude and shoe.
Fig. 9: the word pool
11
Fig. 9: the word shoe
For , no significant changes to the formants can be seen in most examples. After
in would and woman, F2 rises rapidly, although this rise is more apparent in would. All
articulations are short, usually around 50 ms long. The average frequency of F1 is around 400
Hz. F1 in most cases retains its stable position and does not exhibit any significant movements,
regardless of the environment. F2 is close to F1, although not as close as with . In addition,
no noticeable diphthongization occurs in any of the articulations for this sound.
Fig. 10: in would
In words rot, lot, top or dot, where in RP the sound is predominantly found, in
American English the sound is more common (Cruttenden 2008:84). has a slightly
higher F1 than and , for this speaker 870 Hz was the average value that was measured.
F1 and F2 are close to each other, and mostly holding their frequencies steadily in phonetic
contexts examined in this paper. In rot, F3 exhibits a sharp rise in frequency at the beginning of
the articulation, after being very low and close to F1 and F2 through the most part of the
articulation of . In most words, F3 has a rather weak energy and is often barely visible in the
12
spectrogram. In addition, no diphthongization was found in the articulation of for this
speaker.
Fig. 11: in lot
Fig. 12: in top
The vowel does not usually appear in American English in contexts without the
sound following it. The whole issue involving and other sounds that may be
pronounced in its place is rather more complex, and it depends from speaker to speaker. For
some speakers, there is a difference in vowel quality between the words force and north (Wells
1999:483). For the purposes of this paper, we will consider both words as having the same vowel
.
Since is involved in all contexts for , a great deal of rhotic coloring
(Olive/Greenwood/Coleman 1993:220) is present in all examples. In fact, all the words show a
similar pattern, and what is said for one word can easily apply to other words as well. F1 has an
average value of 524 Hz. It is stable during the articulation of the vowel, but it can have a slight
rise near the transition towards . F2 usually starts off low and close to the first formant, but
then gradually rises, while F3 falls. The duration of the vowel is not long, although it is not short
either. In four and score, it is around 150 ms long. F3 is high, often not entirely distinguishable.
13
Fig. 13: in four
Fig. 14: in score
F1 341 402 871 524
F2 1040 1194 1251 1037
F3 2646 2731 2727 3021
Table 2: Average formant frequencies (in Hz) for back vowels (Washington speaker)
2.3. CENTRAL VOWELS ( as in run, as in first and as in
cannon)
According to Olive, Greenwood, and Coleman, “the most central vowel is , the
vowel in bud. This vowel is recognized by having formant values that most resemble the values
of a neutral vocal tract; the first three formants are at approximately equal intervals”
(Olive/Greenwood/Coleman 1993:103-104). These statements, as can be seen from the table, are
not entirely consistent with the data from this measurement. Although it is true that the vowel is
central, it is not “the most central”, since both and , respectively, appear to be closer to
14
that relative position (around 1,500 Hz for the second formant). Even if we disregard the
difference of around 100 Hz, which is admittedly not big, we still cannot claim that, in this case,
is “the most central vowel”, since two more vowels occupy the same approximate position.
All this, naturally, applies to this speaker only, and may not be true for the other speakers.
The average value of F1 for is around 820 Hz, which is significantly higher than the
data from Olive, Greenwood and Coleman (1993:104) and Yao et al.(2010:87), and somewhat
higher than the data from Hillenbrand et al.(1995:3103) and Peterson and Barney (1952:183).
This value of F1 suggests a somewhat lower position of , which is in fact almost as low as
. Since all sample - words except hut include a final alveolar nasal, all spectrograms have
similar-looking patterns. All of the usual formant movements triggered by the preceding sounds
are present: the F2 and F3 moving away from each other after initial velar consonant, the rise of
F2 and F3 after the liquid , and the obvious nasalization of the vowel characterized by the
presence of the nasal formant. There are no indications that any form of diphthongization has
taken place.
Fig. 15: in hut
Fig. 16: in gun
15
In , there is a large amount of r-coloring, as can be expected from a rhotacized accent
of English. For this speaker, the articulation of the vowel is not long, and the pronunciation is
systematic, with no apparent diphthongization. The average value of F1 is around 520 Hz, which
is within what is usual. This vowel is mid-central, and its approximate position is very close to
that of . F3 is close to F2, sometimes even merging with it, as in church.
In fact, looks and sounds like a reversed variant of , there are no distinct areas
within the spectrogram that might be characterized as being pure sound. This is probably
why many classifications of American English vowels do not list as a distinct vowel.
However, “there is no acoustically distinct consonant area in the region of , and, therefore,
in a strictly concatenative-segmental analysis, we must consider this sound as part of the
American English vowel system” (Olive/Greenwood/Coleman 1993:104).
Fig. 17: in church
Fig. 18: in first
Sample-words containing sound all include contexts in which is found in an
unstressed position. Being is such a position, the articulation is very short, usually around 40 ms,
with up to 70 ms in Canada, cannon and comma.
16
Even in such short articulations, formants are systematic, in around formants even start to
fall in anticipation of , however short and barely visible this movement may be. The average
value of F1 is around 630 Hz, and for F2 it is 1,470 Hz. Since the lowest and the highest second
formants measured for this speaker were 1,040 Hz for and 2,840 Hz for , is placed
in the mid-central area.
Fig. 19: in Canada
Fig. 20: in appear
F1 819 523 632
F2 1395 1452 1472
F3 2628 1832 2560
Table 3: Average formant frequencies (in Hz) for central vowels (Washington speaker)
17
3. GEORGIA
3.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and as
in trap)
The speaker from Georgia is a female in her mid forties, from Atlanta. What is
immediately noticeable is that, in conversation, she does not have the accent typical of someone
coming from the South, and claims that she had lost it through education and by moving around
USA and abroad. Her regional accent, therefore, might be influenced by other regional accents to
an extent that cannot be easily determined.
For , F1 does not have any notable movement, either up nor down. With the average
frequency of around 320 Hz, it is slightly higher than the data from Olive, Greenwood and
Coleman (1993:104), but also slightly lower than the measurements conducted by Hillenbrand et
al.(1995:3103), and Yao et al.(2010:87), respectively.
In bleed and fleet, similarly as in the case of the previous speaker, F2 and F3 exhibit a
rising movement after the articulation of l. On the other hand, both formants fall before m
in seem, and k in peak. The relative duration of the articulation is long, usually longer than
200 ms. There is no indication of diphthongization of this vowel in any of the examples.
Fig. 21: in fleet
18
Fig. 22: in peak
In instances where a nasal sound both precedes and follows , as in the word nymph,
F1 first rises to reach its target frequency, and immediately after that falls in anticipation of a
nasal sound m. In this example, the values for all three formants resemble sound more
than a typical sound. This similarity is even possible to notice audibly. In the word tip, there
is a noticeable diphthongization of , where towards the end of the articulation a more
centralized sound resembling is heard, resulting in the pronunciation [tp. This allophone
is mentioned by Wells (1999:485) as being present mostly in the southern parts of the USA,
although he found it only in environments when a following final sound is a voiced consonant.
F2 and F3, if not found in front of a nasal sound, usually retain their value throughout the
articulation, with small variations, depending on the phonetic context. The average value of the
F2 formant is around 1,860 Hz and the frequency of F3 is around 2,940 Hz.
Fig. 23: in nymph
19
Fig. 24: in tip
has a larger value of F1 and only a slightly lower value of F2 if compared with .
The sample-words chosen for this research include a limited number of phonetic contexts for
, with only t and d being in syllable-final position, while the preceding sounds include
p, t, b, f, l, and r.
The previously mentioned phenomenon of inserting after the vowels , , , does
not seem to be present in the articulation of , except perhaps in the word bed. It is possible
that this allophone was more frequently present in the pronunciation of this speaker in the past,
but because of the influence of other accents is now present only in certain words. The average
F1 value of 645 Hz is consistent with the findings of Yao et al. (2010:87)
F1 seems to be rather stable in all of the examples, without any noticeable variations in
frequency regardless of the phonetic environment. In instances where l precedes , F1 rises
in order to attain its target value, and this rise is almost instantaneous. F2 has an expected rise in
instances when r precedes the sound, and F3 is clearly visible in all examples.
Fig. 25: in bed
20
Fig. 26: in led
The average value of F1 for is around 703 Hz, which is the largest value of F1 for
all front vowels. This makes the lowest of the four on the traditional articulatory vowel
chart. Its shape seems to be uniform and its frequency steady throughout the articulation. In the
environment where a final d follows the vowel, a slight falling movement of F1 is visible. F2
is stable in the words trap and bad, while in the words with a nasal sound following the vowel F2
usually has a falling movement. In these examples, the influence of nasalization is clearly visible
in the presence of a nasal formant, which is characterized by a prominent low frequency F1.
(Olive/Greenwood/Coleman 1993:97)
Centralization of towards the end of its articulation is also noticeable and audible
upon closer inspection. Forms such as bnd or stmp seem to be occurring normally.
Another allophonic variation noticed by Wells, involving an “assimilatory off-glide to the
area” (Wells 1999:486), is also present with this speaker in the pronunciation of the word tank as
tk. This feature is most certainly the attribute specific of this speaker’s regional phonetic
heritage.
Fig. 27: in band
21
Fig. 28: in tank
F1 317 525 645 703
F2 2470 1865 1841 2036
F3 2997 2939 2917 2727
Table 4: Average formant frequencies (in Hz) for front vowels (Georgia speaker)
3.2. BACK VOWELS ( as in goose, as in took, as in top, and
as in war)
F1 and F2, expectedly, are close in the articulation of , as is the case with all other
back vowels. F1 is low, similar to , only slightly higher. It holds a steady frequency of
around 340 Hz in average throughout the articulation, and in all of the sample-words. Depending
on the environment, F2 can have a larger separation from F1, namely in words in which ,
, and precede the vowel. In goose, F2 starts from a high position as a result of a
preceding sound and the velar pinch associated with its production. F3 rises in goose and
rude, and falls in new. Similar to the Washington speaker, this speaker also pronounces new with
the sound, which then has an identical effect on the formants as previously stated. When
follows the vowel, it does not seem to have the same effect on F2 as it does in cases when it
precedes it. There is no apparent movement of F2 and, in fact, all three formants have a rather
steady frequency.
22
Fig. 29: in goose
Fig. 30: in rude
In crooked, F2 and F3 are close, but the vowel itself is very short, formants are visible for
only 40 ms before fading out quickly and completely. F1 and F2 are not as close as in ,
although in full they almost merge. There is almost no movement of formants when the
following sound is . The influence of in the word crooked seems to be minimal in the
area where the vowel has already started its articulation.
The duration of the articulation is generally short, with the exception of in could,
which is over 200 ms long, but it cannot be said that this vowel has become long in this context
since the data from Hillenbrand et al. (1995:3103), and Yao et al. (2010:87), to name but a few,
is even longer. The average duration of F1 is 430 Hz, and of F2 1,250 Hz.
23
Fig. 31: in crooked
Fig. 32: in could
For , F1 is usually around 780 Hz, it is steady with no significant movement. There
is, however, a small rising movement at the onset of the vowel preceded by , as in the word
jot. Here, F1 and F2 start away from each other and then move closer to reach their target values.
A similar situation is visible in dot. There is a gradual rise of F2 in situations where a liquid
precedes , as evident from the spectrograms in lot and rot. The average value of F2 is
around 1,260 Hz, which is lower if compared with the data from Hillenbrand et al. (1995:3103)
and Yao et al. (2010:87), but consistent with the data from Peterson and Barney (1952:183). F3
is weak, but usually steady in its frequency, except after a liquid, when a sharp rising movement
is visible. There are no indications of diphthongization in the articulation of this vowel for this
speaker.
24
Fig. 33: in jot
Fig. 34: in dot
In , F1 is low for this speaker, in fact, it is the lowest among all three. The average
frequency of F1 is only 366 Hz, which is significantly lower than the data from Hillenbrand et al.
(1995:3103), Yao et al. (2010:87), and Olive, Greenwood and Coleman (1993:104). The
following always has a similar effect on the formants, usually increasing the value of F2 and
decreasing the value of F3. In many sample-words, formants do not seem to be particularly
steady, often having rapid movements up or down. F1 and F2 are close for the most part of the
articulation, usually with the difference between 400 and 500 Hz. F3 is the least prominent of all
three formants, with the least amount of energy.
Fig. 35: the word more
25
Fig. 36: the word war
F1 340 434 781 366
F2 1288 1252 1267 785
F3 2614 2609 2696 2420
Table 5: Average formant frequencies (in Hz) for back vowels (Georgia speaker)
3.3. CENTRAL VOWELS ( as in run, as in first and as in
cannon)
The average value of F1 measured for is 740 Hz, which places this vowel rather low,
almost to the level of . The vowel is in the central position in the vowel chart, with the
average value of F2 around 1,400 Hz. The articulations are usually not long, which is normal
since is considered a lax vowel (O’Grady/Dobrovolski/ Katamba 1997:42). Apart from
being slightly lower than what would be usual, there are no other noticeable differences between
this vowel and current relevant phonetic descriptions of this sound.
26
Fig. 37: in fun
Fig. 38: in hut
As with the Washington speaker, the Georgia speaker also has a rhotacized , evident
by the large amount of r-coloring. Apart from being slightly more front, there are no other
significant differences between the articulations of for these two speakers. The average
value of F1 is around 550 Hz, which is consistent with both Hillenbrand et al. (1995:3103), and
Peterson and Barney (1952:183), respectively. The second formant’s average value is around
1,380 Hz, which is somewhat lower than the data from the previously mentioned sources.
Characteristically, F2 and F3 are very close, sometimes even barely distinguishable from one
another.
Fig. 39: in curse
27
Fig. 40: in journey
The sound for this speaker is short, in initial positions in words like approve, above
etc. it is around 50 ms long. It appears that this sound is somewhat longer if found in medial or
final position. In the words Canada, cannon, and comma, the articulation is between 70 and 80
ms long, depending on the particular word. The average value of F1 is around 650 Hz, which is a
little higher than for , and thus a little lower on the vowel chart. F2 is around 1,360 Hz.
Fig. 41: in above
Fig. 42: in cannon
28
F1 738 556 651
F2 1397 1379 1362
F3 2569 1828 2745
Table 6: Average formant frequencies (in Hz) for central vowels (Georgia speaker)
29
4. ALABAMA
4.1. FRONT VOWELS ( as in bleed, as in tip, as in bed and
as in trap)
The speaker from Alabama is a female in her late forties, college educated, who has lived
in Mobile, Alabama, her whole life. Her accent is noticeably Southern, with all of its typical
features. Admits that in certain parts of the USA, speaking in a Southern accent still carries a
level of social stigma, but by her own account, she has never tried to change it. This, among
other obvious reasons, makes this speaker a relevant representative of Southern/Alabama accent.
The formants are not always easily visible in . They often show discontinuation and
sometimes completely disappear. In those instances it was necessary to rely only on the
computer program in measuring them, thus certain incorrect measurements are unfortunately
possible. Another difficulty stems from the relative closeness of F2 and F3. In words like beat or
deep, F2 and F3 are especially close to one another, making them more difficult to differentiate,
and thus, properly measure.
All articulations of this vowel are long, much longer than for the other two speakers. In
bleed, for example, the vowel articulation is around 350 ms long. The F1 frequency is low,
which is typical of this vowel. The average value of F1 is around 320 Hz, which corresponds
with the values from Peterson and Barney (1952:183) and Hillenbrand et al. (1995:3103). F2
exhibits a rising movement in bleed and fleet, but also in beat. F3 usually does not vary too
much; however, it is sometimes difficult to distinguish. The largest separation among the first
three formants is between F1 and F2, with F2 being especially high, but not higher in comparison
to the Washington speaker.
30
Fig. 43: in beat
Fig. 44: in deep
In nymph , F1 has a somewhat lower frequency value, possibly because of the
inability of the computer to properly measure it, since the nasal formant is present throughout the
articulation, suggesting a strong nasalization. Because the nasal formant seems to have a similar,
but somewhat lower value, it has possibly influenced the measurement of F1 by the computer
that was, it seems, unable to tell apart between F1 and the nasal formant. The spectrogram itself
is also not conclusive, but visual analysis seems to suggest the value of F1 at around 525 Hz.
In myth and rip, F2 and F3 show a sharp rise in frequency as a result of a nasal sound
preceding the vowel in former, and a liquid in latter. In almost all words, a mild diphthongization
is audible, usually involving a glide towards the sound . In other sample-words, formants
seem to have a more or less steady frequency throughout the articulation, with minor adaptations
in order to reach their target values.
31
Fig. 45: in nymph
Fig. 46: in myth
In most cases, is heard at the end of the articulation of the sound . This is most
evident in words like bed, red or ted, i.e. before final voiced consonants. F1 is high, the highest
for all three speakers with the average value of around 704 Hz. The formants are not always
clearly visible, F1 is not especially prominent in let, and F2 and F3 in bet.
The duration of the vowel articulation is long, usually between 250 and 300 ms, but
sometimes even longer.
Fig. 47: in bed
32
Fig. 48: in let
This speaker exhibits the largest amount of diphthongization of the vowel in
comparison with the other two speakers. Words like trap, bad or stamp, sound more like[],
[] or [] rather than, and, the pronunciations provided by Wells in his pronunciation
dictionary: (Wells 2000:793), (Wells 2000:61), and (Wells
2000:729). [] is a common substandard substitution for (Kenyon 1964:156).
In most sample-words, is found preceding the nasal consonant, thus F2 and F3 often
move up or down in response to the phonetic environment. In stamp, F1 is masked by the
presence of the nasal formant, i.e. it is almost indistinguishable in the spectrogram. The computer
measured it at around 285 Hz, but this is highly doubtful, for it is probably the frequency of the
nasal formant that the computer measured, and not the F1 of . Visual analysis seems to
suggest F1 being at around 900 Hz, but this is open to some debate since the formant is only
faintly visible, mostly at the beginning of the articulation. The same problem arose in all other
words where a nasal followed , thus the computer measurement could not be taken as
reliable. In those cases, visual analysis was the primary means of measurement.
Fig. 49: in bad
33
Fig. 50: in stamp
F1 323 500 704 814
F2 2626 2157 2049 2026
F3 3164 2935 2947 2718
Table 7: Average formant frequencies (in Hz) for front vowels (Alabama speaker)
4.2. BACK VOWELS ( as in goose, as in took, as in top, and
as in war)
In goose, formants F2 and F3 move rapidly from each other and quickly attain their target
values. F2 is high, and it never comes close to F1, which would normally be expected in back
vowels. In fool and pool, a centring off glide into sound is heard in the second half of the
articulation, resulting in [] and [], respectively. This pronunciation is typical for some
speakers, especially in the mid western speech (Wells 1999:487), but throughout the country as
well (Kenyon 1964:172). In clue and rude, F2 rises afterand , and new is pronounced
[], which is why formants seem to have a much more stable and steady frequency as opposed
to what was visible for the other two speakers. The average value of F1 is around 425 Hz, and
for F2 it is around 1,490 Hz. In instances where neighboring sounds do not have significant
influence on F2, F1 and F2 are close. The duration of the vowel is longer than with the other two
speakers, often more than 400 ms.
34
Fig. 51: the word fool []
Fig. 52: the word new []
In full, the vowel pronounced is closer to than to , the sound that would
normally be expected in American English (Wells 2000:311). In could, an off glide into is
evident, resulting in []. In this situation, the vowel is pronounced longer than usual, more
than 300 ms in this particular example. F1 and F2 are close with little or no movement, except in
sugar, where F2 starts high and then falls in order to attain its target value. The average
frequency of F1 is around 470 Hz, which is normal for a female speaker. F3 usually has a weak
energy, and sometimes it is barely visible in the spectrogram.
Fig. 53: the word full
35
Fig. 54: the word could
The average value of F1 in is 990 Hz. F2 is very close at 1,340 Hz. In lot, F1 and F2
are so close that they merge, making them indistinguishable from one another. F3 is only barely
visible in almost all sample-words, where only in rot a slightly stronger F3 can be seen having a
rising movement as a result of a liquid preceding the vowel. The first two formants have a steady
value throughout the articulation, except in those instances where preceding sounds caused a
movement, as for example in jot and dot, or as in already mentioned liquid-to-vowel sequences.
Fig. 55: in dot
Fig. 56: in jot
In most of the sample- words, but most notably in four and score, this speaker produces
an off glide to the area, resulting in [] and [], respectively. In north, F2 and F3
36
start the articulation at almost the same height, and then separate, F2 moving down while F3 up.
The average F1 frequency is 430 Hz, which is low, but can be explained by the influence of
on the preceding vowel. F2 is close to F1, usually with a 400 Hz difference in frequency. The
duration of the vowel is predictably long, with only in boring being less than 200 ms.
Fig. 57: [] in four
Fig. 57: [] in score
F1 424 470 992 432
F2 1489 1200 1340 848
F3 2276 2374 1897 2127
Table 8: Average formant frequencies (in Hz) for back vowels (Alabama speaker)
37
4.3. CENTRAL VOWELS ( as in run, as in first and as in
cannon)
For this speaker, is articulated rather long, which seems to be a general feature for
this speaker, observed for almost every vowel. The rate of speaking is, obviously slower, so
every vowel appears to be much longer if compared with the other two speakers’ pronunciation.
This speaker, however, retains the normal long vs. short vowel distinction, with the difference of
having longer articulations than usual for short vowels, and even longer ones for long vowels.
The duration of for this speaker ranges from 150 ms to 250 ms. Surprisingly, in hut, the
articulation is around 250 ms long, which is much longer than for the other two speakers whose
articulation in hut is 50 ms for the Washington speaker, and 80 ms for the Georgia speaker. The
signal for the first formant is not prominent throughout the articulation. In many cases it almost
disappears or becomes faintly visible at best, making it more difficult to measure. The average
formant value is around 715 Hz for F1, and 1,415 Hz for F2. This confirms as a central
vowel, being somewhat “more front” than.
Fig. 58: in hut
Fig. 59: in sun
38
This speaker, like the rest, produces a large amount of r-coloring of the vowel when
pronouncing words like worse, church etc. The average value of F1 for vowel is around 485
Hz, which is consistent with the data from Hillenbrand et al.(1995:3103) and Peterson and
Barney (1952:183). F2 is around 1,530 Hz, which places this vowel firmly in the central area of
the vowel chart. F2 and F3 are almost merged, and there is no obvious diphthongization of this
sound visible in the spectrogram.
Fig. 60: in worse
Fig. 61: in church
The sound is pronounced short, its length being approximately the same as for the
other two speakers, which is not always the case when other short vowels are in question. The
average value of F1 is around 500 Hz, which is similar to the value of . However, these two
vowels do not occupy the same position within the vowel chart, since the lower F2 in placed
it somewhat behind . Although the final sound in Canada was not part of the original
measurement, it is interesting to note that the Georgia and the Alabama speakers pronounce it as
what is best transcribed as [], while the Washington speaker uses , thus
pronouncing [].
39
Fig. 62: in Canada
Fig. 63: in appear
F1 715 484 497
F2 1415 1533 1284
F3 2286 1806 1862
Table 9: Average formant frequencies (in Hz) for central vowels (Alabama speaker)
40
5. CONCLUSION
By looking at the data presented so far, it is possible to draw general conclusions about
the personal and/or regional characteristics of speech of the analyzed speakers. Because of the
limitations in length and volume, we will not be looking at all the noticeable differences visible
for each speaker. Hence, it must be noted that this paper is not the complete analysis and that it
does not deal with all the regional and individual characteristics that the regional dialects of
Alabama, Georgia and Washington normally exhibit.
Fig. 64: The Alabama speaker vowel chart
One general conclusion can be made by looking at the data. Voicing seems to have no
influence on the frequency of F1-F3. Measurements showed similar values in both voiced and
voiceless environments.
It is also noticeable that F1 is the lowest for front, high vowels and the highest for front,
low vowels. F2 behaves in the opposite manner: the value decreases as the tongue moves lower
in the mouth. The separation between F1 and F2 is the largest with high vowels, and decreases
towards low positions.
As far as individual words are concerned, the difference in the pronunciation of new and
Canada was observed. The speakers from Washington and Georgia pronounced the word new as
[], while the Alabama speaker pronounced it []. In Canada, the speakers from Alabama
and Georgia seem to have an [] sound at the end of the pronunciation, pronouncing it
[], while the Washington speaker pronounced it [].
41
One common thing observed for all three speakers is the amount of r – coloring for
sounds and , which is usual and normal for most speakers of American English.
Further analysis shows that even though articulation of some short vowels was rather
long in the case of the Alabama speaker, the long vs. short relation between the vowels was still
preserved. The long vowels simply had an even longer articulation. In addition, the length of the
vowel did not affect the behavior of the formants. The articulation of was very short for all
three speakers, however formants still moved in anticipation of neighboring sounds, like .
Upon examination of the vowel charts created for each individual speaker’s first and
second formant values, it is noticeable that, in the case of certain vowels, the relative position of
the vowel is different from speaker to speaker. , for example, is similar, and its position as
being the highest and also the most front vowel for all three speakers is, therefore, confirmed. On
the other hand, some vowels seem to be pronounced at completely different positions. The most
obvious difference is the relative position of articulation for the vowel . We can see that, for
the Alabama speaker, is heavily centralized, and basically not very far from . The
centralization of is noticeable for the Georgia speaker as well, although not in such an
extreme way. Its short and lax counterpart, the vowel , seems to be at the position for all
three speakers.
Fig. 65: The Georgia speaker vowel chart
42
The most peculiarly looking vowel chart is seen for the Alabama speaker. This speaker
seems to have a number of vowels grouped together and not very far from each other, all in the
central, mid-high area of the vowel chart. What is surprising is the fact that, unlike the other two
speakers, for this speaker seems to be not as central as the vowel , which might suggest
that this speaker clearly differentiates between these two vowels.
Fig. 66: The Washington speaker vowel chart
The only vowel that is truly back for the two speakers belonging to the Southern dialect is
the vowel . For the Washington speaker, alongside , the vowel also exhibits the
same degree of backness. remains the lowest vowel for all three speakers, although the
relative position of is somewhat different for the Alabama speaker, being somewhat higher.
Front vowels and are close in the vowel chart for the two speakers from the South.
This suggests that in instances where vowel is normally found, these speakers show
inclination towards pronouncing a sound similar to instead. As mentioned previously, the
Alabama speaker exhibits strong diphthongization of , resulting in [], which is a variant
associated with eastern New England and the south respectively (Wells 1999:477). Other
differences, usually involving off gliding into other sounds, are presented in the main discussion
for each individual speaker and need not be repeated here.
It can be concluded that the vowel system for these speakers, in most part, does not differ
significantly. Differences that were found may be explained as either individual idiolects, or
instances of regional variation.
43
REFERENCES
Chen, H. C. & M. J. Wang. 2012. An Acoustic Analysis of Chinese and English Vowels.
Retrieved from:
http://184.168.176.242/files/lebanon/An%20Acoustic%20Analysis%20of%20Chinese%2
0and%20English%20Vowels.pdf [August 15, 2012].
Cruttenden, A. 2008. Gimson’s Pronunciation of English. Oxford: Oxford University Press.
Hillenbrand, J., L. A. Getty, M. J. Clark, and K. Wheeler. 1995. Acoustic characteristics of
American English vowels. Journal of the Acoustical Society of America 97(5),
3099-3111.
Kenyon, J. S. 1964. American Pronunciation, tenth edition. Ann Arbor: George Wahr Publishing
Company.
Ladefoged, P. & K. Johnson. 2010. A Course in Phonetics, sixth edition. Wadsworth: Cengage
Learning.
O’Grady, W., M. Dobrovolsky, and F. Katamba. 1997. Contemporary Linguistics: An
Introduction. Harlow: Pearson Education Limited.
Olive, J.P., A. Greenwood, and J. Coleman. 1993. Acoustics of American English Speech: A
Dynamic Approach. New York: Springer-Verlag.
Peterson G.E., and H.L. Barney. 1952. Control methods used in a study
of the vowels. Journal of the Acoustical Society of America. 24, 175-184.
Thomas, C.K. 1958. An Introduction to the Phonetics of American English, second edition. New
York: The Ronald Press Company.
Wells, J. C. 1999. Accents of English – Beyond the British Isles, volume 3. Cambridge:
Cambridge University Press.
Wells, J.C. 2000. Longman Pronunciation Dictionary. Harlow: Pearson Education Limited.
Yao,Y., S. Tilsen, R.S. Sprouse, K. Johnson. 2010. Automated Measurement of vowel formants
in the Buckeye Corpus. UC Berkeley Phonology Lab Annual Report
Retrieved from:
http://conf.ling.cornell.edu/~tilsen/papers/Yao%20et%20al.%20-%202010%20-
%20buckeye%20vowels.pdf [August 20, 2012].