s ,s m t :a a s i - skidmore college · phenomenon in french—the melodic cliche´ of the tri- ......

22
S PEECH ,S ONG , AND THE M INOR T HIRD :A N A COUSTIC S TUDY OF THE S TYLIZED I NTERJECTION J EREMY DAY-O’C ONNELL Knox College THIS PAPER DESCRIBES THE FIRST LABORATORY study of an idiosyncratic linguistic form that represents a crucial point of contact between speech and song: what is referred to here as the stylized interjection. The stylized interjection, as described throughout the musi- cological and linguistic literature, is associated with a par- ticular intonational formula—the calling contour—and intriguingly, with a purportedly cross-cultural musical fingerprint: the interval of the minor third. A reading task was used to systematically compare the stylized interjection to four other linguistic forms, and to com- pare spoken to called production. Analysis of several acoustic variables (involving pitch, duration, intensity, and timbre) demonstrates many significant effects of sentence-type and production, which together establish the characteristics of the English stylized interjection and suggest its interpretation as sung speech. The unique sound-meaning correspondence of the stylized interjec- tion is thereby elucidated. Implications for music- language studies (especially vis-a-vis the minor third) are also discussed. Received: January 2, 2012, accepted August 7, 2012. Key words: music and language, minor third interval, calling contour, speech intonation, universals V OCABULARY AND SYNTAX, THOUGH THE chief concerns of any foreign language student, comprise only part of the requisite knowledge that constitutes full linguistic competence. Among many other, seemingly dispensable, aspects of speech that separate native from non-native speakers, are those that also convey important shades of meaning: the prosodic elements of pitch, rhythm, and intensity (including voice quality). Listeners’ sensitivity to these elements—what are, arguably, the most overtly musical features of speech—facilitates a decoding of not only the paralin- guistic emotional state of the speaker, but also of truly linguistic (and sometimes crucial) information related to the utterance itself and its place in the discourse. A complete description of linguistic structure, therefore, necessarily entails an account of prosody, including acoustic markers related to grouping, focus, and sen- tence-type. In English, for instance, manipulations of pitch can mark moments of continuation and finality within a sen- tence (though exactly how is a matter of debate; see Grover, Jamieson, & Dobrovolsky, 1987); pitch-raising and syllable lengthening are both associated with information-heavy words in contrastive sentences (Eady & Cooper, 1986); and several studies have described the sentence-final rising intonation that typically charac- terizes English yes/no questions, compared to the nor- mative sentence-final falling intonation of focus-neutral declarative statements (Hadding-Koch & Studdert- Kennedy, 1964; see also, review in Eady & Cooper, 1986). The robust intelligibility of prosody is readily apparent in cases of syntactic and/or pragmatic ambigu- ity, as illustrated by the following sentence-pairs, in which identical strings of words support multiple mean- ings that can be disambiguated only through punctua- tion and typography (in writing) or prosody (in speech). 1a ‘‘The man walked down the street, festooned with garlands.’’ 1b ‘‘The man walked down the street-festooned-with-garlands.’’ 2a ‘‘The man walked down the street.’’ 2b ‘‘The man walked down the street.’’ 3a ‘‘He did.’’ 3b ‘‘He did?’’ It will be observed that intonation plays an important role in resolving competing meanings here, most obvi- ously in the case of overall sentence-type, which can be signaled by sentence-final pitch movement (3a versus 3b); intonation, along with rhythm and intensity, can likewise clarify constituent structure (1a versus 1b) and identify discourse-sensitive information (2a versus 2b). Intonational markers of sentence-type, of course, are often redundant with syntactic and other markers, for which reason these prosodic cues can be thought of more as ‘‘regularities’’ (Bolinger 1989, p. 40) than as necessities. Moreover, regional and dialectic varieties are often characterized by prosodic idiosyncrasies, such as certain dialects of English that violate the standard fall- ing intonation for unmarked statements (Cruttenden, 1995). In any event, general phonetic descriptions of Music Perception, VOLUME 30, ISSUE 5, PP. 441–462, ISSN 0730-7829, ELECTRONIC ISSN 1533-8312. © 2013 BY THE REGENTS OF THE UNIVERSITY OF CALIFORNIA ALL RIGHTS RESERVED. PLEASE DIRECT ALL REQUESTS FOR PERMISSION TO PHOTOCOPY OR REPRODUCE ARTICLE CONTENT THROUGH THE UNIVERSITY OF CALIFORNIA PRESS S RIGHTS AND PERMISSIONS WEBSITE, HTTP:// WWW. UCPRESSJOURNALS . COM/ REPRINTINFO. ASP. DOI: 10.1525/ MP.2013.30.5.441 Speech, Song, and the Minor Third 441

Upload: phungkhanh

Post on 04-May-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

SPE E CH, SON G, AND T HE MINOR THIRD: AN ACOUSTIC STUDY OF

THE ST YLIZE D INTERJECTION

JE REM Y DAY-O’CON NEL L

Knox College

THIS PAPER DESCRIBES THE FIRST LABORATORY

study of an idiosyncratic linguistic form that representsa crucial point of contact between speech and song:what is referred to here as the stylized interjection. Thestylized interjection, as described throughout the musi-cological and linguistic literature, is associated with a par-ticular intonational formula—the calling contour—andintriguingly, with a purportedly cross-cultural musicalfingerprint: the interval of the minor third. A readingtask was used to systematically compare the stylizedinterjection to four other linguistic forms, and to com-pare spoken to called production. Analysis of severalacoustic variables (involving pitch, duration, intensity,and timbre) demonstrates many significant effects ofsentence-type and production, which together establishthe characteristics of the English stylized interjectionand suggest its interpretation as sung speech. The uniquesound-meaning correspondence of the stylized interjec-tion is thereby elucidated. Implications for music-language studies (especially vis-a-vis the minor third)are also discussed.

Received: January 2, 2012, accepted August 7, 2012.

Key words: music and language, minor third interval,calling contour, speech intonation, universals

V OCABULARY AND SYNTAX, THOUGH THE

chief concerns of any foreign language student,comprise only part of the requisite knowledge

that constitutes full linguistic competence. Amongmany other, seemingly dispensable, aspects of speech thatseparate native from non-native speakers, are those thatalso convey important shades of meaning: the prosodicelements of pitch, rhythm, and intensity (including voicequality). Listeners’ sensitivity to these elements—whatare, arguably, the most overtly musical features ofspeech—facilitates a decoding of not only the paralin-guistic emotional state of the speaker, but also of trulylinguistic (and sometimes crucial) information relatedto the utterance itself and its place in the discourse.

A complete description of linguistic structure, therefore,necessarily entails an account of prosody, includingacoustic markers related to grouping, focus, and sen-tence-type.

In English, for instance, manipulations of pitch canmark moments of continuation and finality within a sen-tence (though exactly how is a matter of debate; seeGrover, Jamieson, & Dobrovolsky, 1987); pitch-raisingand syllable lengthening are both associated withinformation-heavy words in contrastive sentences (Eady& Cooper, 1986); and several studies have described thesentence-final rising intonation that typically charac-terizes English yes/no questions, compared to the nor-mative sentence-final falling intonation of focus-neutraldeclarative statements (Hadding-Koch & Studdert-Kennedy, 1964; see also, review in Eady & Cooper,1986). The robust intelligibility of prosody is readilyapparent in cases of syntactic and/or pragmatic ambigu-ity, as illustrated by the following sentence-pairs, inwhich identical strings of words support multiple mean-ings that can be disambiguated only through punctua-tion and typography (in writing) or prosody (in speech).

1a ‘‘The man walked down the street, festooned with garlands.’’1b ‘‘The man walked down the street-festooned-with-garlands.’’2a ‘‘The man walked down the street.’’2b ‘‘The man walked down the street.’’3a ‘‘He did.’’3b ‘‘He did?’’

It will be observed that intonation plays an importantrole in resolving competing meanings here, most obvi-ously in the case of overall sentence-type, which can besignaled by sentence-final pitch movement (3a versus3b); intonation, along with rhythm and intensity, canlikewise clarify constituent structure (1a versus 1b) andidentify discourse-sensitive information (2a versus 2b).Intonational markers of sentence-type, of course, areoften redundant with syntactic and other markers, forwhich reason these prosodic cues can be thought ofmore as ‘‘regularities’’ (Bolinger 1989, p. 40) than asnecessities. Moreover, regional and dialectic varieties areoften characterized by prosodic idiosyncrasies, such ascertain dialects of English that violate the standard fall-ing intonation for unmarked statements (Cruttenden,1995). In any event, general phonetic descriptions of

Music Perception, VOLUME 30, ISSUE 5, PP. 441–462, ISSN 0730-7829, ELEC TRONIC ISSN 1533-8312. © 2013 BY THE REGENTS OF THE UNIVERSIT Y OF CALIFORNIA ALL

RIGHTS RESERVED. PLEASE DIRECT ALL REQUESTS FOR PERMISSION TO PHOTOCOPY OR REPRODUC E ARTICLE CONTENT THROUGH THE UNIVERSIT Y OF CALIFORNIA PRESS’S

RIGHTS AND PERMISSIONS WEBSITE, HT TP://WWW.UCPRESSJOURNALS.COM/REPRINTINFO.ASP. DOI: 10.1525/MP.2013.30.5.441

Speech, Song, and the Minor Third 441

Page 2: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

sentence-type intonation have been proposed for manylanguages and have been studied systematically, dem-onstrating a variety of intonational systems (see, forinstance, Ladd, 2001, Section 3).

It is therefore notable that a distinct type of syntax-freeutterance has never been studied acoustically, despite itshaving been cited by many authors and in several lan-guages. Attempting to synthesize and refine theseauthors’ various observations, I will ultimately define thestylized interjection as a brief, attention-getting, butendearing (and, occasionally, mock-endearing) exclama-tion. In contrast to the widespread acknowledgement ofthe phenomenon, there has been virtually no concernwith precise acoustic description, and it is this neglectthat the present study seeks to rectify, through a con-trolled elicitation and phonetic analysis.

A ‘‘well-known intonational contour’’ . . . a ‘‘very naturalinterval.’’ The omission in the phonetic literature regard-ing this linguistic form is all the more striking, and allthe more regrettable, given that many of the authors inquestion have in fact proposed informal, quasi-musical,characterizations of its ‘‘well-known intonational con-tour’’ (Fagyal, 1997, p. 77), with a high degree of con-vergence. Varga (2008), for instance, describes the‘‘calling contour’’ in Hungarian and English as a terracedfall in pitch, and he invokes a particular (and particularlymusical) measure of pitch distance: the minor third.Dascalu-Jinga (1998) also mentions the ‘‘call contour’’and ‘‘other gentle-shaded announcements’’ in Romanian‘‘which typically [correspond] to a fall with an interval ofa minor third’’ (p. 254). With respect to English, Liber-man (1979), calls this the ‘‘vocative chant,’’ but describesit in the same musical terms, even adding tangentiallythat the minor third ‘‘seems to be a very natural intervalfor people to sing, despite being mathematically a quitecomplex proportion’’ (p. 30). Fonagy, Berard, andFonagy (1983: 156) describe a related ‘‘cliche melodique’’in Parisian French, again involving a minor third—indeed, transcribed in musical staff notation. Gibbon(1998) refers to various instances of

a ‘chanted,’ flat or stylised pitch contour . . . used inGerman to signal the opening, sustaining, andclosing of a channel of communication. It has oftenbeen pointed out that the pitch drop in this contourmay approximate . . . a musical interval, such asa minor third, though this has not been experi-mentally established (p. 91).

Like Gibbon, ‘t Hart (1998)—now concerning Dutch—repeats the by now familiar music-theoretical label evenas he questions it:

Indeed, there is a tendency to keep the pitch steadyafter the half-fall (i.e. no declination) [but] there isno need to believe that the interval is always neara minor third, as was our original impression. As wepointed out a few years ago (Cohen, Collier, &‘t Hart, 1982), the main cue for a fall of this type isthat after its occurrence, F0 has audibly not reachedthe baseline. (p. 107)

And several writers discuss apparently similar phenom-ena in other languages (Chao, 1956, on Mandarin;Bolinger, 1986, on English; Hayes & Lahiri, 1991, onBengali; chapters in Hirst & Di Cristo, 1998, by Cruz-Ferreira on Portuguese, Luksaneeyanawin on Thai, andAbe on Japanese), but with even less precise descrip-tions, leaving the reader unable to judge the phoneticsimilarity among them.

Ladd (1978, 1980) has provided an unusually carefuldescription of the phenomenon under consideration(rivaled only by Gibbon, 1976, Section 4.3), with whomhe shares a good deal), and his analysis with respect toEnglish remains the most complete and influential.(See also Ladd [2008: Section 3.2.3], for brief compar-isons to other languages, particularly regarding usage.)Ladd (1980) proposes that the calling contour is part ofa more general attitudinal effect on intonation, whichhe calls stylization: ‘‘there are certain stylized [con-tours], characterized by relatively level final pitch,which are related to other ‘plain’ [contours] character-ized by rising or falling final pitch’’ (p. 190). (In sup-port of this proposal, Ladd cites non-falling stylizedcontours, none of which, however, is as clear-cut andrecognizable as the calling contour’s stylized fall.)Moreover, Ladd emphasizes that the relationshipbetween the ‘‘calling’’ mode and the use of this stylizedintonation is ‘‘incidental’’ (Ladd, 1978). Truly urgentcalls would use non-stylized contours, as he illustratesby imagining the call, ‘‘Daddy fell downstairs,’’ in twoseparate scenarios: a genuine emergency on the onehand (non-stylized), and an embarrassing but incon-sequential cause for good-natured teasing on the other(stylized). Instead, Ladd interprets stylized intonationas indicating ‘‘a certain element of predictability orstereotype in the message,’’ in short, ‘‘routine[ness]’’(pp. 520, 522).

As for phonetics, Ladd (1978) broadly, though cau-tiously, concurs with the descriptions cited above.

The interval between these pitches is often, as bothLiberman and Gibbon observe, about a minor third;but Liberman’s implication that this interval hasprofound significance seems unwarranted. As Iwrite this, a student out on the quadrangle is calling

442 Jeremy Day-O’Connell

Page 3: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

her dog [‘Mor-gan’] with the two tones only a majorsecond apart. (p. 518)

Ladd is quite right in tempering any notion of a fixed,absolute interval. An apparently related intonationalphenomenon in French—the melodic cliche of the tri-angle assertif—has been analyzed as a fall of a majorthird (Fonagy, Fonagy, & Sap, 1979). And the relevantaudio examples in Ladd (2008: Section 3.2.3)—spokenby English, French, German, Hungarian, and Italianspeakers—use intervals between about two and foursemitones (a major second to a major third). But Ladd’sdismissal of Liberman’s (1979) interest in the intervalitself seems hasty. Liberman’s claim that the minor thirdis ‘‘a very natural interval . . . to sing,’’ while unsubstan-tiated, is provocative in light of the apparently wide-spread occurrence of stylized intonation as a linguisticphenomenon. Given the extent of variability across theworld’s languages, even a rough uniformity in the realmof intonation would be remarkable and could provide(as Liberman implies) some insight into a physiologicalbasis for a linguistic constraint.

Some music scholars have gone further than Liber-man in attributing deep psychic significance to theinterval of the minor third, which earned the praise ofno less a musician than Leonard Bernstein (1976, pp.16-17): ‘‘these are two very special notes,’’ he wrote,postulating a ‘‘musico-linguistic universal’’ (citing—though no less anecdotally than did Liberman,1979—the purported universal use of the minor thirdin children’s taunts, a gesture closely related to stylizedintonation, as I explain below). Brailiou (1984) insiststhat certain ubiquitous ‘‘melodic embryos’’ involvingthe minor third ‘‘defy space’’ (p. 129). Ringer (2001)calls the minor third ‘‘the basic singsong interval’’(p. 363). And van der Merwe (1992) simply muses,‘‘one of the great puzzles of music is the mysteriouslysatisfying quality of the minor third’’ (p. 120).

Clearly, the minor third has attained the status ofmyth, inspiring more poetry than proof: neither lin-guists nor musicologists have forwarded any concreteand objective evidence relevant to cross-cultural com-parisons of stylized intonation, despite the potentiallyprofound anthropological resonances of the phenom-enon. And other dimensions of this singsong speechhave likewise gone unstudied, even as a burgeoningbody of literature investigates the many points of con-tact between music and language (see Patel, 2008).Casual observation suggests that stylized speech repre-sents a ‘‘special effect’’ that is conveyed acousticallythrough some as yet unidentified combination of pitch,duration, and vocal quality.

Prospectus. The current paper forms part of a largerstudy of stylized intonation, its uses and characteristicscross-culturally, and what it might tell us about therelationship between music and language. This paperconcerns the phonetics of stylized intonation in Englishand, indeed, contains the first ‘‘instrumental’’ (i.e., auto-mated, objective) phonetic description of stylized into-nation in any language.

Guided by the spirit of phonetic discovery, I assumenothing about the intonation in question but ratherstart with the utterances themselves. I therefore intro-duce the term stylized interjection in order to classify thephenomenon under investigation. The stylized interjec-tion is a brief, attention-getting, but endearing (and,occasionally, mock-endearing) exclamation. It encom-passes short- and long-distance calls (yoo-hoo; dinner),infant-directed formulas (peek-a-boo) and certain adult-directed derivatives (bye-bye), playground taunts (nya-nya), ad hoc group chants (air-ball), and various otherplayful exchanges (uh-oh).

The question that motivates this study, then, iswhether the English stylized interjection bears any dis-tinctive acoustic characteristics, marking it as differentfrom other sorts of utterances (questions, statements,and so on). Of chief concern is the intonational contourand the matter of the minor third, but other acousticcues are investigated as well, in order to give as completea picture as possible. Furthermore, incidental questionsalluded to by some of the writers cited earlier will beconsidered, above all the status of calling, singing, andspeaking in realizations of the stylized interjection. AsLadd wrote (1978), ‘‘What exactly constitutes the ‘chant-ing’ nature of this contour is . . . not clear’’ (p. 518). Thestudy will thus not only redress a significant omission inthe musicological and phonetic literature but will formthe foundation for future cross-cultural work on a pecu-liar intonational mannerism that lies squarely at theintersection of music and language.

Method

MATERIALS

A collection of sentences was constructed with the aimof manipulating sentence-type (and hence phoneticrealization) in a controlled way, thus testing Ladd’s(1978) proposal that stylized speech involves a prosodicmodification of ‘‘plain’’ (declarative) speech, and furthertesting Gibbon’s (1976) characterization of stylizedspeech as ‘‘conspicuously different from other intona-tion patterns found both in conversational and formaldelivery’’ (p. 275). The collection was built around tentarget words (occasionally, word-pairs), each of which

Speech, Song, and the Minor Third 443

Page 4: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

was embedded in five different types of sentence: declar-ative statement, contrastive statement, yes/no question,wh-question, and stylized interjection. Target words werechosen that could easily and plausibly occupy thesentence-final position in each of those differentsentence-types. (These constraints resulted in the rejec-tion of some of the most obviously stylized target words,such as ‘‘yoo-hoo,’’ which were, however, used as materi-als for a practice round.) All targets were bi- or trisyllabicwith initial stress, thus facilitating a simple phoneticdescription (that is, eliminating any phonological com-plexities—unstressed low tones, etc.—that might arisefrom pre-accentual intonational features). Tables 1 and2 show the ten target words and, for two of those words,the five corresponding sentences. Complete materials aregiven in Appendix A.

SUBJECTS

The ten participants (age range ¼ 18–20 years; 5 male,5 female) were native speakers of Southern BritishEnglish who were recruited from an introductory lin-guistics course at the University of Edinburgh. (Thestudents had dealt only briefly with intonation in thatcourse, and not at all with stylized intonation.) Thestudy was described to them only as ‘‘linguistic,’’ withno indication of any concern with intonation. Each par-ticipant was paid a nominal fee for participation.

TASKS

The subjects were recorded reading the materials ina sound-insulated recording booth using an AKG CK98 hypercardioid microphone recorded via a MOTU828 firewire interface to digital (.wav) files using Sonar4 Studio Edition software at 48kHz, 16 bit mono. Thesentences and all instructions were presented on a laptoprunning a PowerPoint slideshow, which the subjects

advanced with keystrokes. The presentation involveda training phase, followed by the experiment.

A considerable challenge in the design and executionof this experiment was the difficulty of ensuring thatspeakers delivered the sentences naturally—an inevita-ble difficulty in experimental linguistics but one that isperhaps especially acute in the case of stylized intonation,which is to some extent ritualistic, artificial, and ‘‘per-formed.’’ In addition, subjects could not be expected tohave a ready concept of stylized intonation that could bestraightforwardly summoned, as they could for state-ments and questions. The target words and sentenceswere thus chosen carefully so as to minimize any attitu-dinal ambiguity or any latitude the speaker might feel indelivery. Furthermore, the presentation of sentencesincluded typographic cues in order to highlight the dif-ferences among them: statements and questions werefurnished with the appropriate end punctuation, whilethe stylized utterances were italicized and followed byan exclamation mark.

During a brief training phase, verbal and graphicalprompts were given (see Appendix B), including instruc-tions describing ‘‘statements and questions [to be] spo-ken as if making a genuine statement or question,’’ and‘‘interjections . . . [to be] spoken in a carefree, friendlymanner—to someone close by but who may or may notbe looking at you.’’ At no time were the sentence-typesdemonstrated orally, nor was further prompting or feed-back given. The training phase concluded with a practiceround of twelve sentences (four questions, four state-ments, four stylized interjections), presented in the man-ner of the subsequent experiment. Neither thosesentences nor their target words were used in the exper-iment itself; the practice round was intended, rather, toprime the subjects for the various sentence-types. Thesubjects were invited to repeat the practice round if sodesired (only Subject 10 did).

During the experiment, subjects were recorded read-ing the 50 sentences, which were each presented ona separate slide. Sentences were blocked by target word,with a randomized ordering of sentence-type varyingfrom block to block. (This randomization was intended

TABLE 1. The Target Words.

dinner Fido peek-a-boo see you thank youJohnny pie’s done careful telephone morning

TABLE 2. Sentences for Two of the Target Words.

Dinner Johnny

Declarative I’ll cook dinner. I’m looking for Johnny.Contrastive You cook lunch, I’ll cook dinner. I’m not looking for Jennifer, I’m looking for Johnny.Y/N question Shall I cook dinner? Have you seen Johnny?Wh-question What shall I cook for dinner? Where’s Johnny?Stylized Dinner! Johnny!

444 Jeremy Day-O’Connell

Page 5: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

to prevent the speakers from falling into a predictable,automatic delivery; the ordering of sentences was con-sistent from speaker to speaker.) After speaking thecomplete set of 50 sentences, subjects were then askedto repeat the task using a raised (‘‘called’’) voice; thesetwo ‘‘production’’ conditions (‘‘spoken’’ and ‘‘called’’)constituted a further control that aimed to address theunsettled relationship between calling and stylizedintonation. The ‘‘called’’ condition was preceded bya practice round as before, including the followinginstructions on the computer: ‘‘The text items will nowbe repeated. Please speak just as you did before, but nowin a raised voice, calling out loudly so as to be heardthrough the glass [window of the recording booth].’’Figure 1 shows the waveform and pitch-track for thecomplete set of utterances for a single subject and targetword.

MEASUREMENT AND ANALYSIS

In all, 1,000 tokens were recorded: 10 subjects x 10target words x 5 sentence-types x 2 production condi-tions. Each of these 1,000 tokens was isolated as anindividual audio file and manually segmented into twosyllables using the phonetics software package, Praat(version 5.1.20; Boersma & Weenink, 2010). Syllabicsegmentations were based on the spectrogram andincluded all surrounding phonemes, with the followingsimplifications: 1) in all cases, the first syllable began atthe point of voicing onset; 2) for the three cases ofinternal nasals, those nasals were not included in eithersyllable (‘‘di-nn-er,’’ ‘‘Joh-nn-y,’’ and ‘‘mor-n-ing’’); 3) inthe two cases in which the target words were not them-selves bisyllabic, a bisyllabification was neverthelesscontrived (joining syllables in ‘‘teleph-one’’ and omit-ting the middle syllable in ‘‘pee-ka-boo’’). (The othertarget words were segmented as ‘‘see-you,’’ ‘‘than-kyou,’’‘‘Fi-do,’’ ‘‘care-ful,’’ ‘‘pie’s-done.’’) These syllabificationswere adopted for the sake of reliably and reproduciblyprocessing what was a large number of tokens.

A variety of acoustical measurements involving pitch,harmonicity, intensity, and duration (described below)were taken and compared across sentence-types andproduction conditions. The statistical analysis consistedof within-subjects analysis of variance on each acousticvariable, with factors sentence-type and production;the analysis included pairwise comparisons amongsentence-types (using the Bonferroni correction), andfurther within-subjects analysis of variance in order toverify any sentence-production interactions. (Effectsand interactions involving the third manipulated vari-able—target word—were not considered of interest tothis particular experimental design and are not reported

here.) For one analysis—interval—significant datareduction was carried out that necessitated a modifiedtest of effects using Welch’s ANOVA.

Syllabic pitch analysis. For each token, a pitch-track wasgenerated using Praat’s autocorrelation algorithm. As istypical of automated pitch extraction, however, obviousanomalies (such as octave errors) arose; to ensure fidelityto the original sound, such anomalies were cautiouslyadjusted.1 For a number of tokens, the pitch-track wasseriously incomplete or otherwise judged to be insuffi-cient due to weak phonation on the target word. Sincea majority (56%) of Subject 5’s spoken-condition pitch-tracks were so judged, all of that subject’s tokens wererejected. Of the remaining 9 subjects (and 900 tokens),only 25 (2.78%) of the pitch-tracks were rejected (nomore than 6% for any one subject), and imputed valueswere substituted (corresponding to the subject’s meanfor the given variable2), in order to enable within-subjects ANOVA. For all other, non-pitch-related anal-yses described below, all 10 subjects and all 1,000 tokenswere retained.

Based on the pitch-track, each syllable was assigneda pitch corresponding to that syllable’s mean funda-mental frequency (F0) on a semitone scale (relative to1 Hz), which was calculated automatically by Praat. Inorder to minimize inclusion of syllable-boundary pitchperturbations (which could be artifacts of obstruentconsonants and other vocal-articulatory phenomena),a conservative analysis window was used: the syllableswere trimmed so that only the middle 80% of each wasanalyzed (omitting the first 15% and last 5% of syllable1, and the first 10% and last 10% of syllable 2; thedifference in the location of these two windows was

1 Adjustments consisted of some combination of the followingtechniques: 1) The speaker’s range settings were occasionally tailored tothe given token so as to exclude inappropriately low or (more often) highpitch candidates. 2) Outlying F0 points were transposed by octave and/orfifth to better match the prevailing track whenever the transposed F0 wasalready a pitch candidate (this is an adjustment that Praat facilitates). 3)When such a transposition was not possible, F0 points that were obviousoutliers (more than a perfect fourth away from neighboring trackedpoints) were discarded. 4) In addition, for utterance-final F0 points, someless egregious outliers were deemed errant and discarded on the basis ofa visibly degraded waveform and/or an immediately preceding stretch ofphonation that had fallen beneath the algorithm’s analysis threshold (thatis, a preceding stretch that was judged by Praat as voiceless). F0 pointswere never added—only discarded. Other parameters adjustable in Praatwere kept at their default values (silence threshold¼ 0.03, voicing thresh-old ¼ 0.45, octave cost ¼ 0.01, octave-jump cost ¼ 0.35, voiced/unvoicedcost ¼ 0.14).

2 An alternative imputation, corresponding to the target word’s meanvalue across subjects, yielded the same statistically significant main effectsand interactions as are reported here.

Speech, Song, and the Minor Third 445

Page 6: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

FIGURE 1. Waveform and pitch-track for Subject 8’s “dinner” utterances in the spoken condition (left) and called condition (right).

446 Jeremy Day-O’Connell

Page 7: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

motivated by the presence of noisy onsets). Thesetrimmed syllables were taken as the basic units of thepitch analysis. (See Figure 2 for a schematic illustrationof the analysis.) Subsequently, for each syllable, the stan-dard deviation of pitch was used as a measure of syllabicpitch variability.

Harmonicity analysis. Harmonicity—the ratio of har-monics to noise—was taken as an indicator of vocalquality, to test the effect of sentence-type on vocal pro-duction (for instance, the status of stylized speech as‘‘singsong’’). Mean harmonicity, again automated byPraat, was computed for each token as well as for eachsyllable, according to the following definition: ‘‘Harmo-nicity is expressed in dB: if 99% of the energy of thesignal is in the periodic part, and 1% is noise, the Har-monics-to-Noise-Ratio is 10*log10(99/1) ¼ 20dB. AHNR of 0dB means that there is equal energy in theharmonics and in the noise’’ (Boersma & Weenink,2010). Since consonants are by their nature more acous-tically noisy than vowels, there existed the possibility

that the consonant boundaries within each token mightinterfere with harmonicity measurements in a non-random way (for instance, in proportion to word dura-tion). Therefore, as with the pitch analysis (describedabove), a window was constructed within each syllableso as to isolate the vowel itself: for the syllabic harmo-nicity analysis, the corresponding pitch-analysis win-dows were used, but further trimmed to their middle50% (thus resulting in 40% of each original syllable).Syllabic harmonicity data also required imputation,though much less so than did the pitch data: Praat’sanalysis of syllable 2 harmonicity yielded seven ‘‘unde-fined’’ values (i.e., 0.7% of the 1,000 under analysis),which were imputed as described above.

Intensity analysis. Since recording levels had beenadjusted from speaker to speaker and between produc-tion conditions, absolute intensity could not be usefullycompared. Instead, the intersyllabic intensity differen-tial was calculated—that is, the absolute difference inmean intensity between the syllables of a given token.

FIGURE 2. Schematic illustration of the basic pitch analysis: the phonemic segmentation (see spectrogram, bottom) yields two syllables; each of these

is trimmed by a fixed proportion to form an analysis window; and syllabic pitch is then calculated by averaging across the respective window (see pitch-

track, top).

Speech, Song, and the Minor Third 447

Page 8: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

This measure was taken as an indication of intersyllabicevenness. (Intensity was measured using the same anal-ysis windows as the harmonicity analysis.)

Duration analysis. Durations for each token were com-puted for the target word as well as for the constituentsyllables (according to the original syllabification).

Interval analysis. While raw pitch data can be readily, ifcrudely, extracted (as above), the musical concept ofinterval—that is, the measure of distance between dis-crete pitches—requires a more refined approach, sincespeech typically involves a continuous, and continu-ously varying intonational contour. In measuring inter-val, therefore, substantial data reduction was carried outso as to confine the analysis to sufficiently level-tonedtokens. (Levelness has been, after all, the chief definingcriterion for stylized speech in the literature, as men-tioned above.) First of all, the pitch-tracks were simpli-fied using an automated pitch-processing module thatruns in Praat: the Prosogram (Mertens, 2004). (This isthe approach endorsed by Patel, 2008, and followed byboth Patel, Iversen, & Rosenberg, 2006, and by Tierney,Russo, & Patel, 2008.) The prosogram algorithm appliesstatistical and speech-perceptual calculations that pro-duce perceptually satisfactory straight-line approxima-tions of a pitch-track input; any resulting prosodicsegments having F0 slope below the algorithm’s ‘‘glis-sando threshold’’ are simplified into a level tone. (SeeFigure 3 for sample prosogram outputs.) In order tocapture a robust data set, constraints on token inclusionwere relaxed slightly: 1) all tokens with valid pitch-tracks were taken as candidates (recall that for a previouswithin-subjects ANOVA on pitch, Subject 5 had beenexcluded altogether); 2) the algorithm’s ‘‘glissandothreshold’’ was raised from the preset value of 0.32 to0.48 semitones/s. Tokens that produced level prosogramoutputs for both syllable 1 and syllable 2 were classifiedas ‘‘level-toned.’’ In all, 558 tokens passed this levelnesstest (of which declaratives comprised 21%, contrastives18%, yes/no questions 15%, wh-questions 19%, andstylized 27%).3 Consequently, interval was defined asthe difference, in semitones, between the two level tonesof the token’s prosogram output.

The literature on the stylized interjection has almostexclusively concentrated on falling contours, and

indeed, the stylized tokens in the present study wereoverwhelmingly (97%) falling. Furthermore, a decidedmajority of the other sentence-types (79% of declara-tives, 89% of contrastives, 62% of yes/no questions, and91% of wh-questions) likewise had falling contours.4 Itwas assumed that rising and falling contours (especiallyin utterance-final position) represented categoricallydifferent pragmatic intents by the speakers and hencewere not comparable; in any case, rises and falls wouldseem to represent categorically different sorts of vocaleffort (Gussenhoven, 2004, pp. 85-89). For these rea-sons—in short, the desire for consistency with previousstudies, the desire for comparability among tokens, andthe sheer prevalence of falling contours—the intervalanalysis was further confined to the 476 of the level-toned tokens (85% of the 558) with falling contours:94 declaratives, 92 contrastives, 50 yes/no questions,97 wh-questions, and 143 stylized interjections. Thisreduced data set could not support a within-subjectsanalysis of variance because of the abundance of miss-ing values, and furthermore, the data set was found tohave non-homogeneous variances; consequently,a between-subjects analysis of variance was conductedusing Welch’s ANOVA (and Games-Howell correctionfor multiple comparisons).

Results

As described above, the statistical analysis consistedof: 1) analysis of variance on each acoustic variable(syllabic pitch, syllabic pitch variability, syllabic andwhole-word harmonicity, intensity differential, syllabicand whole-word duration, and interval) with factorssentence-type (declarative, contrastive, yes/no question,wh-question, stylized) and production (spoken, called);2) further analysis of variance in order to verify anysentence-production interactions; and 3) pairwise com-parisons among sentence-types. Of the 36 significancetests conducted (data sets for the twelve variables, withtwo factors and one interaction for each), eight failedMauchly’s Test of Sphericity at the .05 level; in thosecases, statistics were computed using the Greenhouse-Geisser correction.

Most of the variables just described showed some effectof sentence-type and/or production at the .01 significancelevel, and a few showed a sentence-production interac-tion. Pairwise comparisons (reported here using the Bon-ferroni correction, i.e., by multiplying the p-value of each

3 For the sake of comparing prosogram-diagnosed levelness with themore primitive pitch variability data mentioned earlier, independent-samples t-tests were conducted to assess the correlation of pitch variabil-ity with ‘‘level’’ and ‘‘non-level’’ tones. For both syllables, a significantnegative correlation was confirmed between prosogram-diagnosed level-ness and raw F0 variability: for syllable 1, t(873) ¼ �24.62, p < .001; forsyllable 2, t(848) ¼ �20.42, p < .001.

4 The relatively high proportion of falls among yes/no questions mightbe surprising to American speakers of English; British English allows thepossibility of falling intonation for these questions.

448 Jeremy Day-O’Connell

Page 9: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

FIGURE 3. Prosogram outputs for Subject 8’s “see you” tokens in the spoken condition (right), derived from original pitch-track (left).

Speech, Song, and the Minor Third 449

Page 10: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

analysis by the corresponding total number of pairwisecomparisons) demonstrated particularly distinctive fea-tures in the stylized interjection among the fivesentence-types. Table 3 gives the mean values for eachacoustic measure, organized by sentence-type and pro-duction (and in all cases calculated from the original,un-imputed, data sets); Table 4 summarizes the maineffects, while Table 5 lists the pairwise differences.

Before detailing these results, a methodological worrymust be addressed: the systematic comparison of thestylized interjection to other sentence-types necessarilyinvolved target words that came at the ends of sentencesof varying length. Since this uncontrolled variable ofsentence-length varies according to sentence-type (thefactor of primary interest), the possibility of a confoundarises. Mean lengths (in syllables) for each sentence-type

TABLE 3. Mean Values for all Dependent Variables, by Sentence-type and Production.

measure units production Declarative Contrastive Y/N Question Wh-Question Stylized

Pitch (syl1) st re: 1Hz spoken 87.2 87.3 90.8 89.5 94.4 ycalled 93.2 93.1 95.3 94.1 97.4 y

Pitch (syl2) st re: 1Hz spoken 84.4 83.6 87.8 84.4 91.1 ycalled 87.4 85.9 91.7 86.6 95.0 y

Pitch variability (syl1) SD (st) spoken 0.92 1.03 1.16 1.02 0.69 ycalled 0.83 0.93 1.00 1.02 0.65 y

Pitch variability (syl2) SD (st) spoken 0.80 0.97 1.40 1.05 0.80 ycalled 1.46 1.68 1.82 1.86 0.72 y

Harmonicity (word) dB spoken 8.63 7.80 8.62 9.24 13.36 ycalled 9.52 8.03 9.25 8.65 13.60 y

Harmonicity (syl1) dB spoken 11.23 11.13 11.85 12.35 13.63 ycalled 12.17 y 11.19 11.93 11.60 11.81

Harmonicity (syl2) dB spoken 8.77 6.69 7.71 9.09 15.65 ycalled 8.29 6.28 8.45 7.08 17.11 y

Intensity differential dB spoken 6.21 7.59 6.92 6.73 5.23 ycalled 7.36 7.51 5.74 7.13 4.08 y

Duration (word) ms spoken 468 458 461 475 537 ycalled 555 533 546 542 634 y

Duration (syl1) ms spoken 170 167 170 y 167 167called 215 y 210 208 211 210

Duration (syl2) ms spoken 220 213 220 231 308 ycalled 261 245 269 258 369 y

Interval size (falls) st spoken 4.37 4.17 7.96 5.85 3.14 ycalled 6.57 6.70 8.23 7.09 2.30 y

Note: For each row, one extreme value (either a high or a low) is indicated (y). (See also Appendix C for supplemental pitch data, including measurements in hertz.)

TABLE 4. Summary Table of Effects and Interactions for Each of the Twelve Measures.

Effect of Sentence-typea Effect of Production Sentence-Production Interaction

df F p df F p df F p

Pitch (Syl 1) (1.73, 13.85) 41.74 < .001 (1, 8) 38.83 < .001 (4, 32) 7.64 < .001Pitch (Syl 2) (1.46, 11.68) 64.07 < .001 (1, 8) 25.29 .001 (4, 32) 3.43 .02Variability (Syl 1) (2.12, 17.03) 2.99 .08 (1, 8) 0.89 .37 (4, 32) 0.46 .77Variability (Syl 2) (2.06, 16.49) 6.76 .007 (1, 8) 10.22 .01 (4, 32) 3.73 .01Harmonicity (Word) (1.55, 14.01) 25.14 < .001 (1, 9) 0.95 .36 (4, 36) 1.43 .25Harmonicity (Syl 1) (4, 36) 1.91 .13 (1, 9) 0.73 .41 (4, 36) 2.38 .07Harmonicity (Syl 2) (1.50, 13.57) 30.42 < .001 (1, 9) 0.05 .84 (4, 36) 3.23 .02Intensity differential (4, 36) 10.90 < .001 (1, 9) 0.14 .72 (4, 36) 5.17 .002Duration (Word) (1.47, 13.29) 13.72 < .001 (1, 9) 17.42 .002 (4, 36) 2.09 .10Duration (Syl 1) (4, 36) 0.32 .86 (1, 9) 36.24 < .001 (4, 36) 0.88 .49Duration (Syl 2) (1.30, 11.70) 38.14 < .001 (1, 9) 15.91 .003 (4, 36) 4.25 .006Intervalb (4, 176.56) 49.78 < .001 (1, 415.13) 3.94 .05 (4, 379) 10.89 < .001

a Non-integer degrees of freedom indicate non-spherical data sets; Greenhouse-Geisser corrected values are given. b Data reduction for interval analysis required Welch’sANOVA.

450 Jeremy Day-O’Connell

Page 11: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

were as follows: contrastives, 11.8; wh-questions, 6.1;declaratives, 5.7; yes/no questions, 5.7; stylized interjec-tions, 2.2. Both contrastives and stylized interjections,that is, represent outliers along this dimension, each toa comparable degree. Yet, tallies of significant pairwisedifferences between sentence-types, while modestlyhigher for contrastives, are pronouncedly so for stylizedinterjections: out of 96 comparisons for each sentence-type (12 variables x 2 conditions, x 4 sentence-types),the stylized interjection had 51 significant pairwise dif-ferences; the contrastive, 22; the declarative, 19; the yes/no question, 18; the wh-question, 14. Furthermore, ten-tative comparisons between the stylized interjection andnon-stylized bisyllabic utterances in another recentstudy (Curtis & Bharucha 2010, to be discussed below),likewise suggest that communicative intent, more thansentence-length, provides the basis for the acoustic dif-ferences in question.

Syllabic pitch results. Significant main effects of produc-tion (p < .001) and of sentence-type (p < .001) on thepitch of syllable 1 were found, as was a sentence-production interaction (p < .001). In both productionconditions, the stylized interjection had the highestmean value among the five sentence-types. In both con-ditions, significant pairwise differences were foundbetween the stylized interjection and all othersentence-types (p � .006), with the exception of theyes/no question in the called condition; the only othersignificant pairwise difference was that between spokendeclaratives and yes/no questions (p ¼ .004). The effectof production was significant for all sentence-types(p � .001), with ‘‘calling’’ consistently associated withhigher mean pitch.

Similar main effects of production (p ¼ .001) and ofsentence-type (p < .001) on the pitch of syllable 2 werefound, though in this case, the sentence-production inter-action was not significant at this alpha level. The stylizedinterjection again had the highest mean value among thefive sentence-types. There were again significant pairwisedifferences between the stylized interjection and all other

sentence-types (p < .001), with the exception of the yes/no question, as well as between contrastives and twoother sentence-types (declaratives and yes/no questions,both with p ¼ .005). ‘‘Calling’’ was again consistentlyassociated with higher mean pitch.

No effects of either production or sentence-type on thepitch variability of syllable 1 were found. On the otherhand, a significant effect of sentence-type on the pitchvariability of syllable 2 was found (p < .007). The styl-ized interjection had the lowest mean value among thefive sentence-types, but no pairwise differences werefound to be significant.

Harmonicity results. A significant main effect of sentence-type on word harmonicity was found (p < .001). Thestylized interjection had the highest mean value amongthe five sentence-types. Significant pairwise differenceswere found between the stylized interjection and eachof the other sentence-types (p � .006); the only othersignificant pairwise difference was that between declara-tives and contrastives (p ¼ .003).

Analysis of syllabic harmonicity shows that thiseffect of sentence-type was confined to the second syl-lable of the target word: whereas no significant effectwas found for sentence-type on the harmonicity of syl-lable 1, there was a significant main effect of sentence-type on the harmonicity of syllable 2 (p < .001). Thestylized interjection had the highest mean value amongthe five sentence-types, by a factor of roughly two. Sig-nificant pairwise differences were found, exclusivelybetween the stylized interjection and each of the othersentence-types (p < .003).

Intensity results. A significant main effect of sentence-type on intersyllabic intensity differential was found(p < .001), as was a sentence-production interaction(p ¼ .002). (Subsequent within-subjects ANOVAs foreach sentence-type failed to confirm any significanteffects of production, however.) In both production con-ditions, the stylized interjection had the lowest meanvalue among the five sentence-types—that is, the highest

TABLE 5. List of Pairwise Differences Among Sentence-types, with Respect to the Phonetic Measures (at .01 Significance Level).

declarative contrastive yes/no question wh-question

contrastive P2,H

yes/no question (P1),(I) P2,[ID],[D2],(I)

wh-question

stylized P1,P2,H,H2,ID,D2,[I] P1,P2,H,H2,ID,D2,[I] (P1),H,H2,ID,D2,I P1,P2,H,H2,ID,D2,I

Note: P1/2 ¼ pitch of syllable 1/2; V1/2 ¼ pitch variability of syllable 1/2; H ¼ whole-word harmonicity; H1/2 ¼ harmonicity of syllable 1/2; ID ¼ intensity differential;D ¼ whole-word duration; D1/2 ¼ duration of syllable 1/2; I ¼ interval; () ¼ spoken condition only; [] ¼ called condition only.

Speech, Song, and the Minor Third 451

Page 12: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

degree of intersyllabic evenness—and the contrastive hadthe highest. In both production conditions, significantpairwise differences were found between the stylizedinterjection and all other sentence-types (p� .009); theonly other significant pairwise difference was thatbetween called contrastives and yes/no questions(p ¼ .008).

Duration results. Significant main effects of production(p ¼ .002) and of sentence-type (p ¼ .001) on wordduration were found. The stylized interjection had thelongest mean value among the five sentence-types,though no pairwise differences were found to be signif-icant. For each sentence-type, ‘‘calling’’ was consistentlyassociated with longer mean duration.

Analysis of syllabic duration shows that these effectswere dominated by the second syllable of the targetword. While significant main effects of production onduration were found for both syllable 1 and syllable 2,(p < .001 and p ¼ .003, respectively), an effect ofsentence-type was confined to the duration of syllable2 (p < .001), in which a sentence-production interactionwas also found (p¼ .006). (The lengthening of syllable 2is consistent with the anecdotal claims of Hayes andLahiri, 1991.) In both production conditions, the styl-ized interjection had the longest mean value among thefive sentence-types. In both conditions as well, signifi-cant pairwise differences were found between thestylized interjection and each of the other sentence-types (p � .009); the only other pairwise difference wasthat between called contrastives and yes/no questions(p ¼ .008). The effect of production was significant forall sentence-types but contrastives (p� .009), with ‘‘call-ing’’ consistently associated with longer mean duration.

Interval results. As described above, interval data wascollected for the 558 tokens for which such data wereinterpretable—that is, tokens with sufficiently leveltones—and the analysis itself was restricted to the 476of these that used a falling interval.

A significant main effect of sentence-type (p < .001)on interval was found, as was a sentence-productioninteraction (p < .001).5 For both production conditions,the stylized interjection had the smallest mean valueamong the five sentence-types—in the called condition,smallest by a factor of 35%. In the spoken condition,significant pairwise differences were found between

stylized interjections and both types of questions(p < .001), and between yes/no questions and all othersentence-types but wh-questions (p < .001); in the calledcondition, significant pairwise differences were foundexclusively between the stylized interjection and allother sentence-types (p < .001). The effect of productionwas significant for declaratives (p ¼ .001), contrastives(p < .001), and stylized interjections (p ¼ .003), but notfor questions. This effect, furthermore, interactedinconsistently with sentence-type: ‘‘calling’’ was associ-ated with a smaller mean interval in the case of thestylized interjection but a larger one in the case of theother sentence-types.

Figure 4 compares the distribution of interval sizesacross sentence-types to reveal a uniquely pronouncedpeak for the stylized interjection, centered close toa minor third: 49.7% of interjections had an intervalsize of 3+1st, and the median interval for the stylizedtokens in this bin was 2.70 st, close to their overallmedian (2.39 st) and mean (2.72 st). (A more finelybinned distribution shows a comparably exaggeratedpeak for the stylized interjection at 2.5 [+.5] st, witha median interval in that bin of 2.46 st.) In contrast, thedistributional peaks for the other sentence-types allremained well below 25%. The stylized interjection’sexceptional distribution curve is also apparent in a com-parison of the interval standard deviations for eachsentence-type: declarative, 3.25; contrastive, 3.17; yes/no question, 3.97; wh-question, 3.76; stylized, 1.70.

The interaction of production is well illustrated inFigure 5, which compares the stylized interjection to allother sentence-types en masse between production con-ditions; and, in both conditions, the 3+1st bin stillrepresents a marked distributional peak for the stylizedinterjection (and, as it happens, a local trough for thenon-stylized sentence-types). At least as noteworthy isthe precipitous drop-off in the stylized interjection’sdistribution at the major third (four semitones): fewerthan 15% of stylized interjections used an interval span-ning a major third or greater, whereas more than 68% ofthe non-stylized tokens did.

Since the stylized interjection was distinguished bya higher overall pitch level (see above; and likewise forcalling production in general), one must consider thepossibility that its smaller mean interval is an artifact ofpitch heightening. Musical (i.e., logarithmic) pitch scalingentails a dependency between these two measures: a givenhertz difference between two pitches will representa smaller interval size, the higher the absolute registerof those pitches. To explore this possibility, a Pearson’scorrelation test was conducted, which showed no signif-icant correlation between a token’s interval size and its

5 The non-parametric Welch’s ANOVA, which was required for theinterval data, analyzes only one factor at a time, but a comparison ofresults for each factor strongly implied a sentence-production interaction;that interaction was therefore measured using a standard factorialANOVA.

452 Jeremy Day-O’Connell

Page 13: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

overall mean pitch—whether that mean pitch was mea-sured on a semitone scale, r(466) ¼ .03, p ¼ .54, or ona hertz scale, r(466)¼ .02, p¼ .64. Differences in intervalsize, that is, appear to represent a genuine effect of sen-tence-type.

Finally, since the stylized interjection’s interval dis-tribution was largely confined to the lowest two bins(1+1st and 3+1st) and because the nature of the dis-tributional peak is of central interest to the research

question, the tallies of those bins were scrutinized usinga binomial test under the assumption of a uniformprobability distribution. That is, the noteworthinessof the distributional peak was tested via the nullhypothesis that the stylized tokens should fall intoeither bin with equal probability. This test revealeda divergence from uniformity approaching significance(p ¼ .05); further tests confined to each productioncondition found a significant divergence in the case of

FIGURE 5. Distribution of falling intervals, comparing the stylized interjection to all others, in the two production conditions.

FIGURE 4. Distribution of falling intervals by sentence-type across production conditions.

Speech, Song, and the Minor Third 453

Page 14: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

the spoken condition (p ¼ .03) but not in the calledcondition (p ¼ .36).

Discussion

SUNG SPEECH

The foregoing results demonstrate the ‘‘phonetic real-ity’’ of the stylized interjection, which distinguishes itselffrom other established sentence-types along a numberof acoustical dimensions. Furthermore, pairwise com-parisons indicate that the stylized interjection is farmore distinct from other sentence-types than are thoseother sentence-types from one another. Finally, giventhe inevitable and methodologically treacherous vari-ability that can be expected from context-free (i.e., lab-oratory-elicited) speech, the categorically smallervariability in interval size for stylized interjectionsunderscores the specialness of these utterances. Themere fact of these numerous and striking phonetic dif-ferences would seem to mark the stylized interjection aslinguistically anomalous, aptly serving its purportedfunction as a call to attention by virtue of its saliencealone, its difference from ‘‘normal’’ sentential speech.

But the precise character of those phonetic differencescalls for further interpretation. Recall that the stylizedinterjection was performed, on average, with: 1) higherpitch, 2) higher pitch stability in the second syllable,3) higher harmonicity, 4) greater evenness of intensity,5) longer duration (especially in the second syllable),and 6) smaller interval than all of the other sentence-types. These features may be summarized as heighten-ing, leveling, refining, lengthening, and modulating—allof which are conventionally associated with the vocalproduction of singing as opposed to speaking.6 Theperformance of the stylized interjection, that is, maybe thought of as sung speech.

The interpretation of the stylized interjection as sungspeech helps to elaborate an important insight of Ladd’s(1978), mentioned above: that stylization, rather thanstrictly a calling gesture, can be thought of as a softening

gesture, modifying a message by instilling ‘‘a certainelement of predictability,’’ ‘‘stereotype,’’ or ‘‘routineness.’’I propose that English speakers (and, quite possibly,speakers in cultures throughout the world) possess a par-ticular intuition concerning vocal sound, which can bedistilled as an opposition between seriousness andplayfulness:

speech () song(serious) (playful)

If we understand the stylized interjection as singing,and we understand singing as playful—as ‘‘performed’’and involving artifice—then we can explain the partic-ular (and peculiar) relationship between form and func-tion in stylized intonation: a speaker ‘‘breaks into song,’’as it were, precisely to indicate that a particular request,call, or other utterance is not urgent (and perhaps noteven informative). A converse example supports thepoint: in the early days of rock-n-roll, singers wouldoccasionally deliver one verse of a song in a self-consciously spoken mode (e.g., Elvis Presley, ‘‘Are youlonesome tonight?’’); this ‘‘breaking into speech’’ seemsto convey a special intimacy and earnestness.

This is not to deny the role that singing has in themost serious of music-making—from the heart-wrenching confessions of Romantic Lied to the mysticalincantations of sacred chant. ‘‘Artifice,’’ after all, canencompass everything from the trivial to the sublime.I propose that particular musical, contextual, and con-ventional details can position a given musical utterancevariously within that continuum, but that all of music isultimately grounded in a common principle of song asoutside-of-reality. This view is consistent with that ofHuizinga (1949), who sees play as a primal element ofhuman culture: ‘‘in everything that pertains to music wefind ourselves in the play-sphere’’ (p. 164); music‘‘transports audience and performers alike out of ‘ordi-nary’ life’’ (p. 42).

As in human interaction more generally, the conceptof ‘‘play’’ in speech is complicated and at times contra-dictory. For instance, stylized intonation accompaniesnot only the cheerful greetings of adults, as emphasizedin my examples above, but also the sometimes crueltaunts of schoolchildren. The former style of play isgenuine; the latter, sinister. Ultimately, the ‘‘softening’’of speech into play might amount to a sort of process ofinfantilization, explaining its use in these varied prag-matic situations. Consider a more lexical example: whileadult lovers might refer to one another affectionately as‘‘baby’’ (a conventional, and entirely welcome instanceof infantilization), that same locution among childrenon the school yard would be intended as anything but

6 Unfortunately, systematic explorations of acoustic differencesbetween singing and speech have been rare; furthermore, such researchhas tended to focus on trained singers, consequently emphasizingcultivated singerly devices such as vibrato and the ‘‘singer’s formant.’’Nevertheless, some prior researchers have identified higher pitch (Asoet al., 2010), higher harmonicity (Lundy, Roy, Casiano, Xue, & Evans,2000), higher pitch stability (Lindblom & Sundberg, 2007), andrhythmic simplicity (Gerhard, 2003) as distinctive characteristics ofsinging as compared to speaking. It is also relevant in this regard thatpitch stability seems to play an important role in driving the speech-to-song illusion described by Deutsch, Lapidis, and Henthorn (2008); seealso Falk and Rathcke (2010).

454 Jeremy Day-O’Connell

Page 15: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

affectionate (but rather, as mock-affectionate—an alto-gether unwelcome instance of infantilization). The usesof playful intonation might well be similarly flexible,from the stylized ‘‘bye-bye’’ of both infant-directed and(infantilizing) adult-directed speech, to the acousticallycomparable schoolyard perversion, ‘‘nya-nya.’’

The notion of sung speech also connects with theidiosyncratic sentence-production interaction found inthe case of interval size: ‘‘calling’’ resulted in a smallermean interval for the stylized interjection, but a largermean interval for statements. I hold that singing, almostby definition, requires more vocal control than doesspeaking; and likewise, the singing of large intervalsrequires more vocal control than the singing of smallintervals. It may be that there is a coupling, in terms ofmotor control, between voice-raising and intervalwidening—as one speaks more loudly, one naturallyadopts wider pitch movements—but that this couplingis severed or even reversed when vocal control is ata premium, as it is in the case of singing. Indeed, it isnot uncommon for voice teachers and choral conduc-tors to coax a larger vocal range out of their singersthrough the use of exaggerated spoken (rather thansung) vocal gestures.

All the same, it is also notable that two of the phoneticfeatures that characterize the stylized interjection—higher pitch and longer duration—are also significantlyassociated with ‘‘calling’’ across sentence-types. Thiswould affirm an intuition encountered throughout theliterature, namely that the stylized interjection, evenwhen performed in close-range situations, involvessomething of the quality of calling. The stylized inter-jection thus appears to convey two distinct pragmaticcues simultaneously, by manipulating two broad dimen-sions of vocal production: I hereby request your atten-tion (signaled by an increase in vocal energy—i.e.,higher pitch height and longer duration), and I say thisendearingly (signaled by an increase in vocal control—i.e., higher harmonicity, lower intensity differential, andhigher pitch stability).

THE MINOR THIRD

One of the chief motivations for this study was theintriguing possibility of (in Liberman’s, 1979, words)a ‘‘very natural interval’’ in speech and/or song. Theinterval distribution given above (Figure 4) certainlyvindicates the more cautious of the writers reviewedearlier: speakers’ intonational realizations encompassa good deal of variability. Nevertheless, the distributionof intervals for the stylized interjection does clearlystand out among the other, relatively flat distributions;its most populous bin, comprising nearly 50% of the

stylized tokens, far surpasses any of the distributionalpeaks for the other sentence-types. And indeed, thisdistributional peak, at 3+1 semitones, correspondsroughly to the minor third of lore. Although Liberman’sand others’ grand claims can hardly be said to be con-firmed, we now have strong empirical evidence of animportant interval-range in English (sung) speech. Themean interval size (2.72 st) and the finer-binned distri-bution peak mentioned earlier (at 2.5 [+.5] st) suggestthat Liberman’s and others’ categorization of the inter-val in question as a minor third (i.e., 3 st) representssomething of an overestimate (though the current studyobviously has nothing to say about the possible effect ofvariation between linguistic communities, or of varia-tion between natural and laboratory-elicited speech,both of which could explain slight discrepanciesbetween my reported results and anecdotal accounts).And Liberman’s hypothesis can be further refined inlight of the unique distributional drop-off for the styl-ized interjection: rather than the interval of three semi-tones being itself particularly ‘‘natural to sing,’’ it wouldappear that intervals greater than about four semitonesare less natural to sing, presumably requiring categori-cally greater vocal effort.

The question of a ‘‘very natural interval’’ becomes evenmore interesting when one considers it from the originaldisciplinary context of the term ‘‘minor third’’: musictheory. The minor third can be thought of as a defininginterval of the pentatonic scale, which forms the basis foran astounding diversity of musical traditions throughoutthe world. The pentatonic scale is a five-tone scale con-structed as a series of perfect fifths modulo the octave,a structure in which adjacent tones are separated bymajor seconds and minor thirds (in contrast to the dia-tonic scale, which instead features major and minor sec-onds) (Day-O’Connell, 2001, 2007; see pp. 2-5). Theethnomusicologist Bruno Nettl (2000) calls the penta-tonic scale a rare ‘‘statistical universal’’ (p. 570) in musicand claims that ‘‘all societies have some music that usesonly three or four pitches, usually combining major sec-onds and minor thirds’’ (p. 468). The results of the pres-ent study, with the mean and the median intervals forthe stylized interjection between a major second anda minor third, offer a striking parallel to what Nettlreports—hence suggesting a possible linguistic (appar-ently vocal) basis for a musical universal, analogous todifferent (auditory) explanations for certain other prev-alent intervals in the world’s scales, such as the octaveand perfect fifth (Patel, 2008; see pp. 88-92).

Comparing the interval distributions of the presentstudy with those from other studies yields furtherinsights. First of all, intervals in running speech more

Speech, Song, and the Minor Third 455

Page 16: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

generally have been shown to yield a roughly (negative)exponential distribution—that is, a distribution witha peak at 0 semitones and a consistent inverse relation-ship between interval size and occurrence (see Figure6a, from Tierney et al., 2008; see also Patel, 2008). (Thediscrepancy with the present study—where intervalswere more widely distributed, with peak distributionsfor some sentence-types as high as 8 to 10 semitones—can be explained by the fact that these tokens were all inthe sentence-final position, where a relatively largeinterval would be expected.) Intervals in music, on theother hand, yield a roughly unimodal distribution withheavy density below six semitones, a peak at two semi-tones, and a thin right tail (see Figure 6b, from Vos &Troost, 1989; see also Tierney et al., 2008). That peak, ofcourse, in part reflects an inherent structural character-istic of the diatonic scale and other common scales: thesheer combinatorial prevalence of major seconds, ascompared to minor seconds, between notes of the scale.(The pentatonic scale, which lacks minor seconds alto-gether, is a special case of this principle.) It is not clearwhy such an intervallic disposition should be an effica-cious quality of musical scales in the first place. But the

strong resemblance between this musical distributionand that of the stylized intervals in the present studydoes provide further indication of the stylized interjec-tion’s relationship to singing—even as it raises furtherquestions about the relationship between the two, andconsequently about the provenance of scales in humanculture.

Despite this similarity between song and stylizedspeech, there is a fundamental (and inevitable) differ-ence in the nature of these two types of data: discrete inthe case of music, continuous in the case of speech.Hence, while the musical distribution’s peak lies at(exactly) two semitones, the stylized speech distribu-tion’s peak lies between two and three semitones: mean¼ 2.72 st, median ¼ 2.39 st. The question of the cate-gorical perception of musical intervals is relevant here:although categorical perception of intervals has beendemonstrated in musicians (Burns & Ward, 1978), indi-vidual variation in categorical perception and categori-cal identification is not well understood. One of thesubjects described in Siegel and Siegel (1977), forinstance, was found to map intervals as small as 2.40semitones onto a minor third, while for others of their

Speech intervals

Positive intervalsNegative intervals

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

40

Freq

uenc

y of

Occ

urre

nce

(%)

30

20

10

00 1 2 3 4 5

Interval Size in Semitones6 7 8 9 10 11 12

(a)

(b)

FIGURE 6. Distribution of intervals in running speech (a) and in song (b). Figure 6a (from Tierney et al., 2008) redrawn with permission of the authors.

Figure 6b (from Vos & Troost, 1989) redrawn with the permission of University California Press.

456 Jeremy Day-O’Connell

Page 17: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

subjects, such an intermediate interval was ambiguous.The identification curves reported in Rakowski (1990)are more straightforward, with a relatively clear percep-tual transition from major second to minor third ataround 2.5 semitones. Such individual variation in cat-egorical perception is part of what motivated the con-servative, rough binning of interval data reported here(3+1 semitones).

Returning to semantics, the minor third might repre-sent a happy medium between the urgent and the pro-saic, related to the aforementioned cues of ‘‘calling’’ andyet ‘‘routineness.’’ Indeed, the distributional peaks forthe five sentence-types occupy a variety of bins: declara-tives at 1+1 semitones; stylized interjections at 3+1semitones; contrastives and wh-questions at 5+1 semi-tones; and yes/no questions at 7+1 (tied with 9+1)semitones. The mapping between (modal) interval sizeand sentence-type, that is, suggests a plausible orderingof intent/urgency for these sentence-types, in a way thatcould be consistent with Gussenhoven’s ‘‘effort code’’(2004).

Finally, I must acknowledge the apparent inconsis-tency between my earlier interpretation of the stylizedinterjection as ‘‘playful’’ and a recent study of vocalemotion. Analyzing elicited bisyllabic utterances acrossfour intended emotions (anger, happiness, pleasantness,and sadness), Curtis and Bharucha (2010) demon-strated a correlation between the descending minorthird and vocal expressions of sadness. Several method-ological aspects of that study make comparisons to thepresent study hazardous,7 and furthermore, the seman-tic categories of the two studies are somewhat incom-mensurate: ‘‘sadness’’ is a genuine emotion, whereas‘‘playfulness,’’ as I have construed it, is more properlya mode of delivery, a ‘‘perlocutionary’’ aspect of thespeech act. In any case, vocal signaling can hardly beexpected to depend on any one factor alone (such asinterval), but likely involves a complex interplay of mul-tiple acoustic cues (see Gabrielsson & Lindstrom 2001).This point is borne out by the two studies. Sadness, forinstance, was associated with the lowest mean F0 amongCurtis and Bharucha’s four emotions; and at 250 Hz,that figure was distinctly lower than the mean F0 forstylized utterances in the present study (352 Hz; 331 Hz

if confined to the spoken condition). Likewise, meanduration for Curtis and Bharucha’s sad utterances(544 ms) fell far below mean durations for stylizedutterances in the present study (655 Hz; 585 Hz if con-fined to the spoken condition). (The data quoted herefrom my study have been confined to women partici-pants, so as to allow better comparison to Curtis &Bharucha.) Other significant acoustic factors measuredin the current study, but not considered in Curtis &Bharucha—for instance, syllable 2 duration and harmo-nicity—would be expected to further differentiate ‘‘play-fulness’’ from ‘‘sadness.’’ Harmonicity, for one, shouldbe particularly sensitive to the sort of vocal tension thattypically accompanies the experience of sadness. It iseven possible that the interval of the minor third perse is circumstantial to both types of semantic meaning,but is instead related to a more general narrowing ofinterval in these types of speech.

These qualifications and explanations aside, it remainsplausible that the minor third does figure as an essentialcommunicator of both ‘‘sadness’’ and ‘‘playfulness,’’despite the apparent incongruity between the two mean-ings. I will therefore propose an interpretation that beginsto reconcile the two results. This unintuitive connectionhinges on the idea that sadness, though a negative emo-tion, necessarily entails a distinctively non-aggressiveattitude—an attitude that is also integral to play. More-over, insofar as sadness (in contrast to another negativeemotion, anger) entails resignation, it again shares somecorner of semantic space with playfulness: a sense thatthere is nothing (or no longer anything) at stake. In short,the standard dimensions used to categorize emotion—valence and arousal are the most favored—fail to discernthese more subtle commonalities. Further untangling thevarious semantic and acoustic factors of speech will,needless to say, require additional study.

SUMMARY AND FUTURE DIRECTIONS

This study has established the basic phonetic identity ofthe stylized interjection, empirically confirming and atthe same time augmenting the various subjectivedescriptions in the literature. The paper has argued foran interpretation of this linguistic form as a ‘‘breakinginto song’’ and thus recommends it as a potent locus forinquiry in music-language studies (and not only onconceptual grounds, but on methodological groundsas well: as noted earlier, stylized interjections dispropor-tionately passed the level-tone prosogram test). Finally,the paper has verified that realizations of the stylizedinterjection are strongly disposed toward descendingintervals of two to four semitones (i.e., a minor third+1 semitone), providing an initial data point in support

7 In Curtis and Bharucha (2010): 1) the speech was performed bytrained actresses (and, furthermore, Americans) in a dramatic situation;2) the recordings (as excerpted online) have a good deal of reverberation;3) a small number of compound intervals were simplified, modulo theoctave (a perfectly reasonable musical operation, but one that begs theresearchers’ fundamental question: whether speech intervals behave likemusical intervals); 4) the analysis apparently included tokens with glidingpitch segments.

Speech, Song, and the Minor Third 457

Page 18: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

(however modest and provisional) of the widespread andgrand supposition of a ‘‘musico-linguistic universal.’’

It should be considered that the study’s results, par-ticularly in the case of the stylized interjection, areremarkably strong given the experiment’s conservativeelicitation method: without explicit prompting or inter-vention by the experimenter, the subjects neverthelessreadily produced the ‘‘well-known intonational contour,’’and with extremely well-defined phonetic features. It isexpected that a more manipulative elicitation paradigmwould yield even more starkly defined phonetic differ-ences (including in the distribution of interval sizes),and such a paradigm would be valuable. The challengefor the experimenter is to strike the appropriate balancebetween heavy-handedness in the method and reliabil-ity in the resultant data; the present study has empha-sized the latter.

The study’s focus on production has left the ‘‘percep-tual reality’’ of the stylized interjection unstudied. It isexpected that listeners would be able to distinguisha curt statement from a stylized interjection on the basisof the phonetic differences in question, something thatcould be easily tested. Furthermore, listeners’ categori-zation of interval size from the continuous pitch move-ments of speech—even in the most ‘‘musical’’ of speech,as here—begs for study (a point alluded to above). Doesthe insistent reference by so many writers to the minorthird, when the current study has found the actual meanand median intervals to be slightly smaller than a minorthird, suggest nothing more than an inevitable ‘‘round-ing’’ upwards? Or might it also be that the minor third,as an interval that simultaneously represents melodyand harmony (unlike the major second), is in somesense a ‘‘prototype’’ in Western music? Is there anyrelationship between the present findings and those ofLottermoser and Meyer (1960), who found the tuning of(harmonic) minor thirds in professional choirs to besmall (with a mean of 2.76 semitones)? What shouldbe said, furthermore, about speech tokens that do notyield straightforward, level tones?8

The universality hypothesis, of course, remains themost pressing subject for future work on the stylizedinterjection: do all cultures have ‘‘sung speech,’’ and ifso, how similar are the intonational profiles (and,indeed, the meanings) of sung speech from languageto language? (A pilot project using informants from

a number of genetically and geographically distantlanguages suggests that the universalist hypothesis is,to be sure, more than mere myth.) Careful and sys-tematic elicitations will be needed to establish the pho-netic characteristics of the stylized interjection inother languages, hence enabling cross-linguistic com-parisons. Speech related to childhood and child-rearing may prove to be the most promising area ofinvestigation (indeed, use of the stylized interjection inEnglish can be reasonably reduced to child-directedspeech, even when applied to adults). Should the uni-versality hypothesis be refuted, the stylized interjec-tion could nevertheless offer a fruitful gauge of theinterpenetration between musical and linguistic sys-tems (along the lines of one of the dominantapproaches in Patel, 2008): Do the intervallic realiza-tions of sung speech in a culture reflect features of thatculture’s musical system? Such a refutation could like-wise illuminate sociolinguistic issues: Does the pres-ence or absence of sung speech in a culture or inparticular speech contexts reflect culture-specificbeliefs about acceptable speaker intentions, aboutinter-speaker dynamics, or about singing in general?On the other hand, strong support of Bernstein’s‘‘musico-linguistic universal’’—the proper confirma-tion of a cultural universal is impractical if not strictlyimpossible—would provide evidence of a phylogeneticlink between music and language (such as has beenhypothesized throughout the ages, most recently byMithen, 2006), whether in the form of some innatecommunicative drive to sing intervals of 3+1 semi-tones, or more likely in the form of some psycholog-ical, anatomical, or otherwise physical constraints onvocal communication in humans.

Author Note

I conducted this research while visiting the University ofEdinburgh as a Fulbright Distinguished Scholar. I wishto thank the US-UK Fulbright Commission for their

8 As an initial answer to this question, a crude alternate analysis of thepresent data was conducted, which defined interval as simply thedifference between the mean F0 of each syllable. 875 tokens wereanalyzed in this way (excluding Subject 5 and the other 25 problematicpitch-tracks), of which 761 had falling intervals. Within-subjects ANOVA

confirmed the same effects and interactions here as were reported abovefor the level-toned tokens, though, not unexpectedly, the two methods ofmeasurement yielded different interval sizes for the same tokens. Amongtokens that permitted both methods, a majority (78%) yielded a smallerinterval size as measured using the alternate analysis (smaller on averageby 0.6 st); this could be related to the use of raw syllabic F0, which (beingan averaged value) would have tempered the perceptually relevant pitchmaxima and minima. On the other hand, for 22% of the tokens, thealternate analysis yielded a larger interval size (larger on average by 1.0st). In the case of the stylized interjections, the alternate analysis yieldeda higher mean (2.93 st) but lower median (2.07 st) than were reportedabove, apparently related to the greater inclusion of both large and smallsized intervals.

458 Jeremy Day-O’Connell

Page 19: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

generous support; the University of Edinburgh’s Schoolof Philosophy, Psychology, and Language Sciences, fortheir gracious hospitality; and the Dean’s Office of KnoxCollege for approving and facilitating my leave ofabsence. I am also grateful to Heather Hoffmann andKevin Hastings for their generous and patient help withstatistical matters. Above all, I owe hearty thanks and

appreciation to D. R. Ladd for his abundant advice andencouragement throughout the project and for his care-ful reading of the manuscript.

Correspondence concerning this article should beaddressed to Jeremy Day-O’Connell, Music Department,Knox College, Box K-95, 2 E. South St., Galesburg, IL61401. E-mail: [email protected]

References

ASO, S., SAITOU, T., GOTO, M., ITOYAMA, K., TAKAHASHI, T.,KOMATANI, ET AL. (2010). Speakbysinging: Converting singingvoices to speaking voices while retaining voice timbre. InProceedings of the 13th International Conference on DigitalAudio Effects (DAFx-10). Graz, Austria: Institut furElektronische Musik und Akustik.

BERNSTEIN, L. (1976). The unanswered question. Cambridge,MA: Harvard University Press.

BOERSMA, P., & WEENINK, D. (2010). Praat: Doing phonetics bycomputer [Computer program]. Version 5.1.21, retrievedOctober 31, 2009 from http://www.praat.org/.

BOLINGER, D. (1986). Intonation and its parts. Melody in spokenEnglish. Stanford, CA: Stanford University Press.

BOLINGER, D. (1989). Intonation and its uses. Stanford, CA:Stanford University Press.

BRAILIOU, C. (1984). Concerning a Russian melody. In A. L.Lloyd (Ed.), Problems of Ethnomusicology (pp. 259-83). NewYork: Cambridge University Press.

BURNS, E. M., & WARD, W. D. (1978). Categorical perception—Phenomenon or epiphenomenon. Journal of the AcousticalSociety of America, 63, 456-68.

CHAO, Y. R. (1956). Tone, intonation, singsong, chanting,recitative, tonal composition, and atonal composition inChinese. In M. Halle (Ed.), For Roman Jakobson. Essays on theoccasion of his sixtieth birthday (pp. 52-59). The Hague:Mouton.

CRUTTENDEN, A. (1995). Rises in English. In J. Windsor Lewis(Ed.), Studies in general and English phonetics: Essays in honorof Professor J. D. O’Connor (pp. 155-73). London, UK:Routledge.

CURTIS, M. E., & BHARUCHA, J. J. (2010). The minor thirdcommunicates sadness in speech, mirroring its use in music.Emotion, 10, 335-48.

DASCALU-JINGA, L. (1998). Intonation in Romanian. In D. Hirst& A. Di Cristo (Eds.), Intonation systems. A survey of twentylanguages (pp. 239-60). Cambridge, UK: CambridgeUniversity Press.

DAY-O’CONNELL, J. (2001). Pentatonic. In S. Sadie (Ed.), The newGrove dictionary of music and musicians (2nd ed., Vol. 19,pp. 315-17). London, UK: Macmillan.

DAY-O’CONNELL, J. (2007). Pentatonicism from the eighteenthcentury to Debussy. Rochester, NY: Rochester University Press.

DEUTSCH, D., LAPIDIS, R., & HENTHORN, T. (2008). The speech-to-song illusion [Abstract]. Journal of the Acoustical Society ofAmerica, 124, 2471.

EADY, S. J., & COOPER, W. E. (1986). Speech intonation andfocus location in matched statements and questions. Journal ofthe Acoustical Society of America, 80, 402-15.

FAGYAL, Z. (1997). Chanting intonation in French. University ofPennsylvania Working Papers in Lingistics, 4, 77-90.

FALK, S., & RATHCKE, T. (2010). On the speech-to-song illusion:Evidence from German. In Proceedings of Speech Prosody 2010Chicago. Chicago, IL: Speech Prosody Interest Group (SProsIG).

FONAGY, I., BERARD, E., & FONAGY, J. (1983). Cliches melodiques[Melodic cliches]. Folia Linguistica, 17, 153-185.

FONAGY, I., FONAGY, J., & SAP, J. (1979). A la recherche de traitspertinents prosodiques du français parisien [A study of theprosodic traits of Parisian French]. Phonetica, 36, 1-20.

GABRIELSSON, A., & LINDSTROM, E. (2001). The influence ofmusic structure on emotional expression. In P. N. Juslin & J. A.Sloboda (Eds.), Music and emotion (pp. 223-248). Oxford, UK:Oxford University Press.

GERHARD, D. B. (2003). Computationally measurable temporaldifferences between speech and song (Doctoral dissertation).Simon Fraser University, Burnaby, British Columbia.

GIBBON, D. (1976). Perspectives of intonation analysis. Bern:Lang.

GIBBON, D. (1998). Intonation in German. In D. Hirst & A. DiCristo (Eds.), Intonation systems. A survey of twenty languages(pp. 78-95). Cambridge, UK: Cambridge University Press.

GROVER, C., JAMIESON, D. G., & DOBROVOLSKY, M. B. (1987).Intonation in English, French, and German: Perception andProduction. Language and Speech, 30, 277-95.

GUSSENHOVEN, C. (2004). The phonology of tone and intonation.Cambridge, UK: Cambridge University Press.

Hadding-Koch, K., & Studdert-Kennedy, M. (1964). Anexperimental study of some intonation contours. Phonetica,11, 175-85.

‘T HART, J. (1998). Intonation in Dutch. In D. Hirst & A. DiCristo (Eds.), Intonation systems. A survey of twenty

Speech, Song, and the Minor Third 459

Page 20: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

languages (pp. 96-11). Cambridge, UK: Cambridge UniversityPress.

HAYES, B., & LAHIRI, A. (1991). Durationally specifiedintonation in English and Bengali. In J. Sundberg, L. Nord, &R. Carlson (Eds.), Music, language, speech, and brain(pp. 78-91). London, UK: Macmillan.

HIRST, D., & DI CRISTO, A. (Eds.) (1998). Intonation systems.A survey of twenty languages. Cambridge, UK: CambridgeUniversity Press.

HUIZINGA, J. H. (1949). Homo ludens: A study of the play-elementin culture. London, UK: Routledge.

LADD, D. R. (1978). Stylized intonation. Language, 54,517-40.

LADD, D. R. (1980). The structure of intonational meaning.Bloomington, IN: Indiana University Press.

LADD, D. R. (2001). Intonation. In M. Haspelmath, E. Konig,W. Oesterreicher, & W. Raible (Eds.), Language typology andlanguage universals. An international handbook. (Vol. 2,pp. 1380-90). New York: Walter de Gruyter.

LADD, D. R. (2008). Intonational phonology (2nd ed.)Cambridge, UK: Cambridge University Press.

LIBERMAN, M. (1979). The intonational system of English. NewYork: Garland.

LINDBLOM, B., & SUNDBERG, J. (2007). The human voice inspeech and singing. In T. Rossing (Ed.), Springer handbook ofacoustics (pp. 669-712). New York: Springer.

LOTTERMOSER, W., & MEYER, F-J. (1960). Frequenzmessungenan gesungenen Akkorden. Acustica, 10, 181-184.

LUNDY, D. S., ROY, S., CASIANO, R. R., XUE, J. W., & EVANS, J.(2000). Acoustical analysis of the singing and speaking voice insinging students. Journal of Voice, 14, 490-493.

MERTENS, P. (2004). The prosogram: Semi-automatic transcrip-tion of prosody based on a tonal perception model. In

Proceedings of Speech Prosody 2004 (pp. 23-26). Nara, Japan:ISCA.

MITHEN, S. (2006). The singing Neanderthals: The origins of music,language, mind and body. London, UK: Weidenfeld & Nicolson.

NETTL, B. (2000). An ethnomusicologist contemplates universalsin musical sound and musical culture. In N. Wallin & B.Merker (Eds.), The origins of music (pp. 463-72). Cambridge,MA: MIT Press.

PATEL, A. (2008). Music, language, and the brain. New York:Oxford University Press.

PATEL, A., IVERSEN, J., & ROSENBERG, J. (2006). Comparing therhythm and melody of speech and music: the case of BritishEnglish and French. Journal of the Acoustical Society ofAmerica, 119, 3034-3047.

RAKOWSKI (1990). Intonation variants of musical intervals.Psychology of Music, 18, 60-72.

RINGER, A. (2001). Melody. In S. Sadie (Ed.), The new Grovedictionary of music and musicians (2nd ed., V. 16, pp. 363-73).London: Macmillan.

SIEGEL, J. A., & SIEGEL, W. (1977). Absolute identification oftones and intervals by musicians. Perception andPsychophysics, 21, 143-152.

TIERNEY, A., RUSSO, F. A., & PATEL, A. (2008, July). Empiricalcomparisons of pitch patterns in music, speech, and birdsong.Paper presented at Acoustics ‘08 (155th Meeting of theAcoustical Society of America), Paris, France.

VAN DER MERWE, P. (1992). Origins of the popular style. New York:Oxford University Press.

VARGA, L. (2008). The calling contour in Hungarian and English.Phonology, 25, 469-497.

VOS, P. G., & TROOST, J. M. (1989). Ascending and descendingmelodic Intervals: Statistical findings and their perceptualrelevance. Music Perception, 6, 383-396.

460 Jeremy Day-O’Connell

Page 21: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

Appendix A: The Complete Sentence Collection

Appendix B: Four Slides from the Practice Round

Declarative Contrastive Yes/no question Wh-question Stylized

I’ll cook dinner. You cook lunch; I’ll cook dinner. Shall I cook dinner? What shall I cook fordinner?

Dinner!

I’m looking forJohnny.

I’m not looking for Jennifer;I’m looking for Johnny.

Have you seenJohnny?

Where’s Johnny? Johnny!

The baby likes‘‘peek-a-boo.’’

The baby doesn’t like ‘‘patty-cake’’;she likes ‘‘peek-a-boo.’’

Does the baby like‘‘peek-a-boo’’?

How do you play‘‘peek-a-boo’’?

Peek-a-boo!

Tell me when thepie’s done.

I’ll tell you when the table’s set; youtell me when the pie’s done.

Are the pies done? Why aren’t the piesdone?

Pie’s done!

This is my dogFido.

This is Rover, and this is Fido. Would you like to petFido?

Whose dog is Fido? Fido!

I hear thetelephone.

That’s not the alarm clock, it’s thetelephone.

Was that thetelephone?

Who’s on the telephone? Telephone!

I hope you’ll becareful.

I may not be fast, but I’m careful. Are you being careful? Why aren’t you beingmore careful?

Careful!

It’s nice to see you. I don’t want to talk on the phone;I want to see you.

May I see you? When may I see you? See you!

He’d like to thankyou.

He’s not angry at you; in fact, hewants to thank you.

Did he thank you? When did he thank you? Thank you!

Let’s meet in themorning.

I can’t meet you this evening; let’smeet in the morning.

Can you meet me inthe morning?

What appointments do Ihave this morning?

Morning!

Speech, Song, and the Minor Third 461

Page 22: S ,S M T :A A S I - Skidmore College · phenomenon in French—the melodic cliche´ of the tri- ... that certain ubiquitous ‘‘melodic embryos’’ involving the minor third ‘‘defy

Appendix C: Supplemental Pitch Data

I. F0 means by sentence-type and production (across gender)

measure unit production Declarative Contrastive Y/N Q Wh Q Stylized

Syllable 1 pitch Hz spoken 154.0 154.9 189.6 175.9 233.4called 217.8 216.5 245.9 229.4 277.6

Syllable 2 pitch Hz spoken 131.0 125.1 159.4 131.0 192.9called 155.8 142.8 199.7 148.7 241.6

Token-wide pitch Hz spoken 157.1 157.9 198.7 177.6 242.7called 211.0 201.6 250.5 217.3 285.2

II. F0 means by sentence-type and gender (across production)

measure unit gender Declarative Contrastive Y/N Q Wh Q Stylized

Syllable 1 pitch Hz males 140.1 139.3 163.2 146.1 187.4females 283.4 289.3 332.4 322.0 371.3

Syllable 2 pitch Hz males 106.0 99.3 134.8 104.6 159.9females 220.2 204.9 280.9 209.5 316.3

Token-wide pitch Hz males 124.4 120.6 149.8 127.8 177.2females 255.9 253.6 310.3 275.0 351.8

III. F0 means by production and gender (across sentence-type)

measure unit gender spoken called across production

Syllable 1 pitch Hz males 131.2 178.6 155.2females 289.3 348.6 321.5

Syllable 2 pitch Hz males 108.1 133.4 121.0females 234.8 260.1 248.5

Token-wide pitch Hz males 121.4 158.1 140.0females 268.0 310.9 291.3

462 Jeremy Day-O’Connell