on the boundaries of phonology and phonetics - faculteit der letteren

204
2QWKH%RXQGDULHVRI 3KRQRORJ\DQG3KRQHWLFV

Upload: others

Post on 11-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

������������ ������������������������ ��

Sponsored by

Nederlandse Vereniging voor Fonetische Wetenschappen

Center for Language and Cognition Groningen

Stichting Groninger Universiteitsfonds

Department of Linguistics, University of Groningen

1st edition, January 20042nd edition, February 2004

ISBN 90 367 1930 5

U N I V E R S I T Y O F G R O N I N G E N

������������ ������������������������ ��

Edited by

Dicky GilbersMaartje Schreuder

Nienke Knevel

To honour Tjeerd de Graaf

Contents

On the Boundaries of Phonology and Phonetics 7The Editors: Dicky Gilbers, Maartje Schreuder and Nienke Knevel

Tjeerd de Graaf 15Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

Tseard de Graaf 31Oerset troch Jurjen van der Kooi

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 37Vincent J. van Heuven

The Position of Frisian in the Germanic Language Area 61Charlotte Gooskens and Wilbert Heeringa

Learning Phonotactics with Simple Processors 89John Nerbonne and Ivilin Stoianov

Weak Interactions 123Tamás Bíró

Prosodic Acquisition: a Comparison of Two Theories 147Angela Grimm

Base-Identity and the Noun-Verb Asymmetry in Nivkh 159Hidetoshi Shiraishi

The Influence of Speech Rate on Rhythm Patterns 183Maartje Schreuder and Dicky Gilbers

List of Addresses 203

On the Boundaries of Phonology and Phonetics

The Editors: Dicky Gilbers, Maartje Schreuder andNienke Knevel

In this volume a collection of papers is presented in which the boundariesof phonology and phonetics are explored. In current phonological research,the distinction between phonology, as the study of sound systems oflanguages, and phonetics, as the study of the characteristics of human(speech) sound making, seems to be blurred.

Consider an example of the phonological process of /l/-substitution asexemplified in the data in Table 1.

Table 1. /l/ substitutions

/l/ → [w]

Historical Dutch data:alt/olt oud 'old'kalt/kolt koud 'cold'schoo[l] schoo[w] 'school'

First Language Acquisition data (Dutch):hallo ha[w]o 'hello'lief [w]ief 'sweet'blauw b[w]auw 'blue'

In phonology, the substitution segment is expected to be a minimaldeviation from the target segment. For example, boot ‘boat’ could berealized as [pot], but not as [lot], since the target /b/ and the output [l] differin too many dimensions. In other words, sound substitutions should becharacterized more commonly by single feature changes than by severalfeature changes. The widely attested substitution of /l/ by [w], however,cannot be accounted for adequately as a minimal deviation from the targetbased on articulatorily defined features, as shown in Figure 1.

8 The Editors: Dicky Gilbers, Maartje Schreuder and Nienke Knevel

/l/ → [w]

+ son + son+ cons - cons+ cont + cont+ lat - lat- lab + lab+ ant - ant+ cor - cor- high + high- back + back- round + round

Figure 1. /l/-substitutions

From an acoustic point of view, liquid-glide alternations can be describedas minimal changes. The differences between the individual glides andliquids can be related to their relative second and third formant locusfrequencies. Ainsworth and Paliwal (1984) found that in a perceptual-identification experiment liquids such as [l] having a mid F2 locusfrequency were classified as [w] if they had a low F2 locus frequency andas [j] if they had a high F2 locus frequency.

3160 Hz w w w l l l l j j j↑ w w w l l l l j j j

F3 locus freq. w w w r r r l j j j↓ w w w r r r j j j j

1540 Hz w w r r r r r j j j

760 Hz ← F2 locus freq. → 2380 Hz

Figure 2. Typical set of responses obtained from listening to glide/liquid-vowel synthetic stimuli (after Ainsworth & Paliwal, 1984 (simplified))

Based on these acoustic characteristics, liquid-glide substitutions can bedescribed as a minimal change from the target, which cannot be done in thephonological representation of these sounds. Obviously, phonology needs

On the Boundaries of Phonology and Phonetics 9

phonetic information to explain a phonological process of this kind (cf.Gilbers, 2002).

Now consider the Dutch process of schwa insertion as exhibited inTable 2.

Table 2. schwa insertion in Dutch

helm [��l�m] 'helmet' darm [��r�m] 'intestine'half [��l�f] 'half' durf [��r�f] 'courage'melk [��l�k] 'milk' hark [��r�k] 'rake'

not in: vals 'out of tune', hals 'neck', hart 'heart', start 'start'

Schwa may be inserted between a liquid /l,r/ and a non-homorganicconsonant (i.e. a consonant that differs in place of articulation with /l,r/) atthe end of a syllable. Therefore, schwa may be inserted between coronal /l/or /r/ and non-coronal /m/, /f/, /k/, etc. Schwa is not allowed, however,between /l/ or /r/ and a coronal obstruent /s/ or /t/. Now, Dutch has at leasttwo different varieties of /r/: an alveolar [r] and a uvular [�]. Since there isno functional difference between realizations such as [���] and [���] for rat'rat', however, there is only one phoneme /r/ in the Dutch system with itsallophones [r] and [�]. Interestingly, even Dutch speakers with a uvular [�]do not show schwa insertion between their [�] and non-homorganic coronalobstruent /s/ or /t/. The process of schwa insertion, apparently, takes placebefore the phonetic level of actual realization of segments, i.e. on theabstract phonological level, where /r/, /s/ and /t/ share their place feature[coronal]. Synchronically, the process can only be described in aphonological way, even though it may have had a phonetic - articulatory -base originally. We assume that uvular [�] is a later variant of Dutch /r/than coronal [r], just as the even younger, recently observed allophonicvariant [�] in Western Dutch dialects: raar 'strange' realized as [ra:�]. Theseallophones date from times when the process of schwa insertion betweennon-homorganic, syllable-final liquid-consonant clusters was already'fossilized' in the Dutch system.

The above-mentioned two accounts of phonological processes indicatethe way many phonologists approach their research objects nowadays.More and more the distinction between phonology and phonetics ischallenged in attempts to provide adequate accounts of the phonologicalphenomena. In this way, the phonologists of the so-called CLCG Klankleer

10 The Editors: Dicky Gilbers, Maartje Schreuder and Nienke Knevel

group in Groningen study the phonology-phonetics interface, whereas othermembers of the group cross the boundaries of phonology and phonetics bycombining the study of sound patterns with dialectology, computationallinguistics, musicology, first language acquisition or ethnolinguistics.

The Center for Language and Cognition Groningen (CLCG) is aresearch institute within the Faculty of Arts of the University of Groningen.It comprises most of the linguistic research that is being carried out withinthe Faculty of Arts. One of the research groups of CLCG is this 'Klankleer'group (Phonology and Phonetics), which focuses on the structure andcontents of the sounds of language.

This volume of papers by members of the Klankleer group is dedicatedto Tjeerd de Graaf, who was the coordinator of this group from 1999 until2003. It does not mean that Tjeerd no longer participates in the group,because he still supervises two PhD projects. These projects by HidetoshiShiraishi and Markus Bergmann combine phonetics and phonology withethnolinguistics. As mentioned above, the research of most members of thegroup involves combinations of different (linguistic) areas. WilbertHeeringa, Charlotte Gooskens and Roberto Bolognesi apply phonetics tothe study of dialectology. Nanne Streekstra is one of the first linguists inour group who was interested in the phonology-phonetics interface. WouterJansen's work is exemplary for this so-called 'laboratory phonology'. Heprovides acoustic studies of voicing assimilation in obstruent clusters inGermanic languages. Maartje Schreuder and Dicky Gilbers combinephonetics and phonology with areas beyond linguistics, such as musictheory. Former member Klarien van der Linde and Angela Grimm studyfirst language acquisition, whereas Wander Lowie studies second languageacquisition. Finally, Tjeerd de Graaf started his academic life as aresearcher in theoretical physics, switched to phonetics, whereas his maininterest is now in ethnolinguistics. This homo universalis also plays pianoand oboe and speaks nine different languages. This Festschrift, however, isdedicated to the phonetician Tjeerd de Graaf. The papers cover a widerange of topics varying from ethnolinguistics to computational linguisticsand from first language acquisition to dialectology. The commondenominator is that all researchers work on the boundaries of phonologyand phonetics.

Vincent van Heuven, as a guest author from University of Leiden,wonders whether certain distinctions in the speech signal are phonologicalor phonetic. He investigates whether different prosodic boundary tonesform a continuum or whether they are categorical. He finds a categorical

On the Boundaries of Phonology and Phonetics 11

division between low (declarative) and non-low tones, but within the non-low category the cross-over from continuation to question is rather gradual.

Charlotte Gooskens and Wilbert Heeringa measured linguistic distancesbetween Frisian dialects and the other Germanic languages in order to getan impression of the effect of genetic relationship and language contact onthe position of the modern Frisian language on the Germanic languagemap. Wilbert is a member of the CLCG group 'Computational Linguistics'.

John Nerbonne participates as head of CLCG. His paper with IvilinStoianov explores the learning of phonotactics in neural networks, inparticular the so-called Simple Recurrent Networks (SRNs). SRNs providea valuable means of exploring what information in the linguistic signalcould in principle be acquired by a very primitive learning mechanism.

Tamás Bíró, who is also a member of 'Computational Linguistics' andinterested in phonology, claims that the types of interactions betweenlanguages can be extremely diverse, depending on a number of factors. Thepaper analyses three case studies, namely the influence of Yiddish onHungarian, Modern Hebrew and Esperanto.

Angela Grimm discusses a number of empirical and theoreticalproblems with respect to two models of prosodic acquisition: a templatemapping model and a prosodic hierarchy model. Both models assume thatthe acquisition of word prosody is guided by universal prosodic principles.

Toshi Shiraishi discusses phonological asymmetries between nominaland verbal stems of Nivkh, a minority language spoken on the island ofSakhalin. These asymmetries are observed in two phonological phenomena:consonant alternation and final fricative devoicing. Though theasymmetries themselves look very different on the surface, Toshi's papermakes explicit that they are subject to a common generalization, Base-Identity.

Maartje Schreuder and Dicky Gilbers wondered whether the influenceof a higher speech rate leads to adjustment of the rhythmic pattern, as itdoes in music, or just to 'phonetic compression' with preservation of thephonological structure. An example of an item they examined is the Dutchword perfèctioníst, which can get the rhythmic structure pèrfectioníst in fasttempo. The results indeed showed a preference for restructured rhythms infast speech.

With this very diverse collection of papers, we hope to present thephonetician Tjeerd de Graaf a representative selection of the currentactivities of his CLCG-Klankleer group.

12 The Editors: Dicky Gilbers, Maartje Schreuder and Nienke Knevel

In the 1970's and 1980's Tjeerd's phonetic research stood miles awayfrom the feature geometries and grid representations that were customary inphonology. He used to make sonagrams, i.e. visual displays of soundspectrograms, of e.g. [p�], [si] and [r�]. But when the violin string of hissonagraph broke, he wasn't able to do phonetic research anymore and that iswhen ethnolinguistics stole his heart. Nowadays, it is much easier to dophonetic analyses on the computer using programs, such as PRAAT(Boersma and Weenink, 1992-2003). Whereas phonetics and phonologygrew apart from each other since they were installed as two distinctdisciplines of linguistics at the First International Congress of Linguists(The Hague 1928), current laboratory phonological research may evensuggest that phonetics and phonology coincide. However, as shown in thetwo examples in this introductory paper, /l/-substitution and schwa-insertion, the role of both disciplines is still distinguishable. That does notalter the fact that co-operation between phoneticians and phonologists mustbe an integral part of the study of sound patterns. Some sound phenomena,such as ethnolinguistic and dialect differences or acquisition data, can onlybe explained adequately if both phonological and phonetic characteristicsof sounds are considered.

University of Groningen, January 2004

This volume was presented to Tjeerd de Graaf on January 30, 2004 at theworkshop 'On the Boundaries of Phonology and Phonetics'. The CLCG andthe Department of Linguistics of the University of Groningen, 'deNederlandse Vereniging voor Fonetische Wetenschappen' and GUF(Stichting Groninger Universiteitsfonds) sponsored this workshop. Keynotespeakers were Vincent van Heuven and Carlos Gussenhoven.

References

Ainsworth, W.A. & K.K. Paliwal (1984). Correlation between the productionand perception of the English glides /w,r,l,j/. Journal ofPhonetics, 12: 237-243.

Boersma, Paul, and David Weenink (1992-2003). PRAAT, phonetics bycomputer. Available at http://www.praat.org. University ofAmsterdam.

On the Boundaries of Phonology and Phonetics 13

Gilbers, D.G. (2002). Conflicting phonologically based and phonetically basedconstraints in the analysis of /l/-substitutions. In: M. Beers, P.Jongmans & A. Wijnands (eds). Netwerk Eerste Taalverwerving,Net-bulletin 2001. Leiden, 22-40.

Tjeerd de Graaf

Markus Bergmann, Nynke de Graaf and HidetoshiShiraishi

Tjeerd de Graaf was born on January 27th 1938 in Leeuwarden, the capitalof the province Fryslân in the Netherlands. Fryslân is the largest of severalregions on the North Sea where Frisian is spoken, a West Germaniclanguage whose genetically closest relative is English.

Tjeerd’s parents were both Frisians, and at home they spoke exclusivelyFrisian. As most other children in Fryslân at that time, Tjeerd grew upbilingually. His first native language was Frisian, and at school he learnedDutch, the official language of the Netherlands.

The coexistence of Frisian at home and Dutch at school was Tjeerd’sfirst experience in a fascinating world of different languages. For Tjeerd,the difference between the two languages had a very illustrative spatialimplication: when he and the other children in his neighborhood went toschool in the mornings, there was a railway crossing along the way. Oncethey had crossed it they stopped speaking Frisian and switched to Dutch,their official school language.

At the age of 18, in 1956, Tjeerd graduated from the Leeuwarden HighSchool and became interested in languages. His other big passion was thescience of physics and astronomy. The oldest planetarium in the world islocated in Franeker, an old academic place in Fryslân. Intrigued by the lawsgoverning space and time, Tjeerd studied physics at the University ofGroningen from 1956 to 1963. In 1963 he received his master’s degree inscience (Doctoraal examen) in theoretical physics, a combination ofphysics, mathematics and astronomy. From 1963 until 1969 he continuedas a research associate at the Institute of Theoretical Physics at theUniversity of Groningen.

Tjeerd was already a “polyglot” at that time, speaking not only Frisianand Dutch, but also German, English and French. Other languages wouldfollow. In the former Soviet Union the study of astronomical sciences wasenjoying an era of superiority. Tjeerd understood that learning Russian andother East European languages would be the key to enter the field ofscientific knowledge. Along with his theoretical physics’ studies, he also

16 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

enrolled for the study of Slavic languages. The new technologies and theirapplication for future research fascinated him. In 1967 he received hisMaster of Arts degree (Kandidaatsexamen) in Slavic languages andcomputer linguistics. In the meantime, after having obtained his MS, hecontinued his research in theoretical physics, combined with a study abroadin Poland, where he lived for half a year and mastered the language.

By 1969, he finished his dissertation entitled “Aspects of NeutrinoAstrophysics”.

The cover page of Tjeerd’s dissertation in Theoretical Physics in 1969

Tjeerd’s quenchless thirst for knowledge led him to England together withhis wife Nynke and their children where they spent a year from 1970 to1971 and where he worked as a research associate at the Institute ofTheoretical Physics at the University of Cambridge.

Upon their return to Groningen, Tjeerd became assistant professor inphysics at the Institute of Astronomy, a post he held until 1975. This was tobe a turning point in his professional career when he decided to switch tohis second passion, namely the study of languages. One of his dissertationaltheses dealt with the question as to how exact a person’s identity could bedefined by his or her speech. This thesis symbolically defined one ofTjeerd’s later linguistic interests: the aspects of spoken language, the studyof phonetics.

In 1975, Tjeerd became associate professor at the Institute of PhoneticSciences, Department of Linguistics, University of Groningen.

Being a native bilingual in Frisian and Dutch, Tjeerd was aware of thenumerous phonetic differences between the languages. Having studied

Tjeerd de Graaf 17

many other languages as well, Tjeerd understood how important phoneticdescriptions are not only for theoretical linguistics, but also in learning andteaching foreign languages.

Language coexistence and language change would become another focalpoint of his research. In most regions of the world, people are bilingual oreven multilingual. Language variety appears both in space and time.Listening to radio programs or TV broadcasts dating back ten or twentyyears, reveals a distinct difference in speech as compared with today’scustom of speaking. It is still the same language, the same place, and yetthe speech is not the same as before. Not only the lexicon of a languagechanges but also the manner in which people speak, their pronunciation andintonation. This is an extremely intriguing topic for a person interested inlanguages and their varieties.

Tjeerd started to trace the oldest recordings of spoken examples oflanguages. He analyzed Frisian recordings from the province of Fryslân aswell as recordings from North and East Frisian regions. Recordings of thespoken language of former times are not only a historically importantheritage, but they also offer valuable information pertaining to languageshift processes. A practical problem with the oldest sound recordings is thatthey were made on wax cylinders and their quality decreases tremendouslyevery time they are listened to. Tjeerd was aware of the fact that one of themain tasks was to transfer these recordings to modern media in order topreserve them. In the beginning of the 1990s, together with Japanesecolleagues, Tjeerd started to investigate the possibilities of preserving oldlanguage recordings via modern audio technology. At that time, Tjeerdacquired yet another language, namely Japanese.

Tjeerd working on wax cylinders with old recordings of Dutch

18 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

Tjeerd started to contact the most important sound archives of the world,which are in Vienna, Berlin, and St. Petersburg. Through his collaborationwith the sound archive of the Academy of Sciences in St. Petersburg in the1990s, he renewed his contact with Russia, which had begun with hisstudies of Slavic languages in the 1960s.

After 1990, the world had experienced dramatic changes. The IronCurtain had disappeared and Russia had once again opened her “Window tothe West”. When Tjeerd came back to St. Petersburg in the 1990s, he wasimmediately fascinated by this city he had visited for the first time sometwenty years ago when it was still known as Leningrad. As a Frisian and aDutchman, he felt at home there. The picturesque canals and paths alongthe wide boulevards reminded him of his home region. This was nocoincidence: Czar Peter the Great, some 300 years ago, had chosen Hollandas the model for his new capital.

In the following years, Tjeerd organized joint projects with the RussianAcademy of Sciences and St. Petersburg State University to preserve andtransfer old Russian sound recordings onto modern digital audio media.Research on a vast collection of the most various sound recordingsresulting from many linguistic field work expeditions from the end of theXIX and XX centuries served as an incentive for several projects related todifferent languages spoken in Russia.

Tjeerd started to initiate research projects on the language spoken by theMennonites, a group of people in Siberia, who had originally come fromregions in the Northern Netherlands and Germany and still speak thelanguage of their ancestors – in fact a language with great similarities to themodern dialects spoken in North-Germany and northern parts of theNetherlands. The Dutch press even reported about “Siberians speakGronings”.

Languages do not only divide people of different nations, but also builda bridge between them. Tjeerd showed this with his research work. Even infar-away Siberia there are people speaking almost the same language as inGroningen. When planning his expeditions, Tjeerd was concerned withboth scientific aims and the organization of humanitarian aid fromGroningen to the Siberian villages he visited.

Language as a cultural heritage became the core of Tjeerd’s linguisticactivities. With his bilingual origin, he set the perfect example. Throughouthis life, he showed that each individual can contribute to the survival of alanguage. With his Frisian wife Nynke, whom he met in his student years,Tjeerd used to converse in Dutch. After their parents had passed away, they

Tjeerd de Graaf 19

decided to switch to Frisian. They personally experienced how a languageslowly starts to become extinct if the children do not carry on the language.

This attitude defined Tjeerd’s successive research activities in Russia.Subsequent projects, which he coordinated now, had two goals:documentation of endangered languages, and revitalizing and preservingthem for future generations. In the following projects, both aspects –preservation and further development – were present. Tjeerd made severalexpeditions, among others to Yakutia and the Island of Sakhalin, where heand other linguists recorded the speech of the local indigenous peoples.

Tjeerd de Graaf with a group of speakers of indigenous languages of the Island ofSakhalin in the Far East of Russia: Uiltas and Nivkhs, in the 1990s.

In the second half of the 1990s, Tjeerd coordinated several projects withInstitutions throughout the Russian Federation funded by the NetherlandsOrganization for Scientific Research and the EU INTAS organization inBrussels.

His main goal was to make young people aware of their uniquelinguistic heritage and stimulate them in supporting minority and regionallanguages. In 1998, Tjeerd was appointed Knight in the Order of the DutchLion for his research and contribution in support of the preservation andconstruction of databases for the minority languages in Russia. Later thatsame year Tjeerd was awarded an honorary doctorate at the University ofSt. Petersburg for his contribution in the joint language preservationprojects.

20 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

Tjeerd de Graaf is appointed Doctor Honoris Causa at the University ofSt. Petersburg, November 1998.

Tjeerd has retired from the University of Groningen in 2003 and vacatedthe chair of the coordinator of the 'Klankleer' (Phonology and Phonetics)group of CLCG (Center of Language and Cognition Groningen). Therefore,his colleagues compiled this Festschrift exhibiting a diversity of researchsubjects on the boundaries of phonology and phonetics.

It is not a goodbye to our former coordinator. Tjeerd's passionateengagement for languages and linguistic projects continues. Since hisretirement he became an active honoree member at the Frisian Academy inLeeuwarden and he is still in contact with the University of St. Petersburgfor future research projects. That means more than enough commitmentsfor Tjeerd combined with his role as a grandfather for his fivegrandchildren. Tjeerd’s enthusiasm is a stimulation for other researchersand the young generation to continue his research.

Publications by Tjeerd de Graaf

1966The Annihilation of a Neutrino-antineutrino Pair into Photons and theNeutrino Density in the Universe. (With H.A.Tolhoek). Nuclear physics,81: 596 and 99: 695.Neutrinoprocessen en Neutrino-astronomie [Neutrino Processes andNeutrino Astronomy]. Internal Report IR 68, Natuurkundig LaboratoriumGroningen. 58 pp.

Tjeerd de Graaf 21

1968De Rol van het Neutrino in de Astrofysica [The Role of the Neutrino inAstrophysics]. Nederlands tijdschrift voor natuurkunde, 34: 329.Phase Factors in Discrete Symmetry Operations. (With H.A.Tolhoek).Intern Rapport IR 85, Natuurkundig Laboratorium Groningen, 96 pp.Detectie van Neutrino's uit de Zon [Detection of Solar Neutrinos].Nederlands tijdschrift voor natuurkunde, 34: 357.

1969Phase Factors in Quantum Field Theory. Physica, 43: 142.Muonen uit Kosmische Straling: het Utah Experiment [Muons fromCosmic Radiation: the Utah Experiment]. (With J. van Klinken).Nederlands tijdschrift voor natuurkunde, 36: 301.Aspects of Neutrino Astrophysics. Dissertation University of Groningen.Groningen. 119 pp.Syllabus Beknopte Theoretische Natuurkunde [Syllabus Summary ofTheoretical Physics]. Natuurkundig Laboratorium Groningen, 190 pp.

1970On a Cosmic Background of Low-energy Neutrinos. Astronomy andAstrophysics, 5: 335.Neutrino Processes in the Lepton Era of the Universe. Lettere al NuovoCimento, 4: 638.Cosmological Neutrinos. Proceedings of the Cortona Meeting onAstrophysical Aspects of the Weak Interactions, 81.

1971Nucleaire Astrofysica in het Laboratorium [Laboratory NuclearAstrophysics]. Nederlands tijdschrift voor natuurkunde, 38: 107.The Astrophysical Importance of Heavy Leptons. Lettere al NuovoCimento, 2: 979.

1972Lecture Notes on Nuclear Astrophysics. Scuola Normale Superiore, Pisa,45 pp.The Lepton Era of the Big Bang. Proceedings of the EurophysicsConference Neutrino'72. Budapest, 167.

22 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

1973Neutrinos in the Universe. Vistas in Astronomy, 15: 161.

1974Nuclear Processes in the Early Universe. VIth International Seminar onNuclear Reactions in the Cosmos. Leningrad, 329.Kernenergie in de Kosmos [Nuclear Energy in the Cosmos]. Atoomenergieen haar Toepassingen, 81.De Heliumabundantie in het Heelal [The Helium Abundance in theUniverse]. (With W.J. Weeber). Nederlands tijdschrift voor natuurkunde,40: 183.

1977De Computer en de Faculteit der Letteren [The Computer and the Facultyof Arts]. Informatiebulletin Computercommissie FdL. Groningen, 38 pp.

1978Vowel Analysis with the Fast Fourier Transform. Acustica, 41: 41Ienlûd, twa lûden, twalûden [Monophthongs, Two Sounds, Diphthongs].(with G.L. Meinsma). Us Wurk, 27: 81.Analyse de voyelles avec des méthodes digitales [Vowel Analysis withDigital Methods]. Actes des 9èmes Journées d'Etude sur la Parole.Lannion, 233.Linear Prediction in Speech Research. Prace XXV Seminarium Otwartego zAkustyki. Poznań, 19.

1979Het kenmerk <+kort> bij hoge gespannen vokalen [The Feature <+short>in High Tense Vowels]. (With N.Streekstra). TABU, 8: 40.De Computer en Fonetisch Onderzoek [The Computer and PhoneticResearch]. Informatiebulletin Computercommissie FdL. Groningen, 5 pp.Vowel Analysis with Linear Prediction. Proceedings of the 9thInternational Congress of Phonetic Sciences. Copenhagen, 265.Digital Methods for the Analysis of Speech. Proceedings of the 7thColloquium on Acoustics. Budapest, 289.

1980Phonetic Aspects of Breaking in West Frisian. (With P.Tiersma).Phonetica, 37: 109.

Tjeerd de Graaf 23

De brekking fan sintralisearjende twalûden yn it Frysk [Breaking ofCentralizing Diphthongs in Frisian]. (With G.L. Meinsma). Us Wurk, 29:131.Vannak-e Diftongusok a Magyar Köznyelvben? [Are there Diphthongs inStandard Hungarian?]. (With A.D. Kylstra). Nyelvtudományi Közlemények,82: 313.Applications of Linear Predictive Coding in Speech Analysis. Proceedingsof the Symposium on Speech Acoustics, 57.

1981Wiskundige Modellen in het Spraakonderzoek [Mathematical Models inSpeech Research]. Wiskundige Modellen: Cursusboek Stichting TELEAC,165.Syllabegrenzen en Fonetische Experimentatie [Syllable Boundaries andPhonetic Experiments]. GLOT, Tijdschrift voor Taalwetenschap, 4: 229.Book Review of: Metrical Myths – An Experimental-PhoneticInvestigation into the Production and Perception of Metrical Speech.Spectator, 10: 385.

1982Vowel Contrast Reduction in Japanese Compared to Dutch. (With F.J.Koopmans-van Beinum). Proceedings of the Institute of Phonetic Sciences .Amsterdam, 7: 27.A Sociophonetic Study of Language Change. Proceedings of the 13th

International Conference of Linguistics. Tokyo, 602.

1983Phonetic Sciences in the Netherlands, Past and Present. (With otherauthors). Publication of the Netherlands Association for Phonetic Sciences.Dordrecht, 32 pp.On the Reliability of the Intraoral Measuring of Subglottal Pressure. (WithG.L.J. Nieboer and H.K. Schutte). Proceedings of the 10th InternationalCongress of Phonetic Sciences. Utrecht, 367.Phonetic Aspects of Vowels and Breaking of Diphthongs. FifthInternational Phonology Meeting. Eisenstadt, 98.Vowel Contrast Reduction in Finnish, Hungarian and Other Languages.Dritte Tagung für Uralische Phonologie. Eisenstadt, 11.

24 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

1984Vowel Contrast Reduction in Terms of Acoustic System Contrast. (WithF.J. Koopmans-van Beinum). Proceedings of the Institute of PhoneticSciences. Amsterdam, 8: 41.Vokaalduur en Breking van Diftongen in het Fries [Vowel Duration andBreaking of Diphthongs in Frisian]. Verslagen van de NederlandseVereniging voor Fonetische Wetenschappen, 54.The Acoustic System Contrast and Vowel Contrast Reduction in VariousLanguages. Proceedings of the 23rd Acoustic Conference on Physiologicaland Psychological Acoustics. Madrid, 76.Vowel Data Bases. (With A. Bladon en M. O'Kane). SpeechCommunication, 3: 169.Nederlandse Leerboeken voor de Fonetiek van het Engels [Dutch TeachingMethods on the Phonetics of English]. (With A. van Essen en J.Posthumus). Toegepaste Taalwetenschap in Artikelen, 20: 123-154.

1985Phonetic Aspects of the Frisian Vowel System. NOVELE, 5: 23-42.Review of: Spreken en Verstaan, een nieuwe Inleiding tot de ExperimenteleFonetiek [Speaking and Understanding, A New Introduction toExperimental Phonetics]. (By S. Nooteboom en A. Cohen). Logopedie enFoniatrie, 57: 106.De Groninger Button [The Groningen Button]. (With G.L.J. Nieboer andH.K. Schutte). Verslagen van de Nederlandse Vereniging voor FonetischeWetenschappen, 57-62.

1986Sandhi Phenomena in West Frisian. (With G. van der Meer). SandhiPhenomena in the Languages of Europe. Berlin, 301-328.Review of: The Production of Speech. (By P.F. MacNeilage). Studies inLanguage, 10: 273-277.Production of Different Types of Esophageal Voice Related to the Qualityand the Intensity of the Sound Produced. Folia Phoniatrica, 38: 292.De Uitspraak van het Nederlands door Buitenlanders [The Pronunciation ofDutch by Foreigners]. Logopedie en Foniatrie, 58: 343-349.Sociophonetic Aspects of Frisian. Friser Studier IV/V. Odense, 3-21.Een contrastief fonetisch onderzoek Japans-Nederlands [A ContrastivePhonetic Research Japanese-Dutch]. Verslagen van de NederlandseVereniging voor Fonetische Wetenschappen, 15-24.

Tjeerd de Graaf 25

1987The Retrieval of Dialect Material from Old Phonographic Wax Cylinders.Proceedings of the Workshop on “New Methods in Dialectology”.Amsterdam, 117-125.Acoustic and Physiological Properties of the Laryngeal and Alaryngeal(Esophageal) Voice. Proceedings of the XXXIVth Open Seminar onAcoustics. Wrocław, 10-16.

A Contrastive Study of Japanese and Dutch. Proceedings of the XIth

International Congress of Phonetic Sciences. Tallinn, 124-128.

1988His Master's Voice: Herkenning van de Spraakmaker [His Master’s Voice:Recognition of the Speech Producer]. TER SPRAKE: SPRAAK alsbetekenisvol geluid in 36 thematische hoofdstukken. Dordrecht, 200-208.Book Review: Fonetiek en Fonologie [Phonetics and Phonology]. (By R.Collier en F.G. Droste). Logopedie en Foniatrie, 60: 195.The Frisian Language in America. (With T. Anema and H. Schatz).NOWELE, 6: 91-108.Esophageal Voice Quality Judgements by Means of the SemanticDifferential. (With G.L.J. Nieboer and H.K. Schutte). Journal of Phonetics,16: 417-436.Book Review: Sprechererkennung [Speaker Recognition]. (By Hermann J.Künzel). Journal of Phonetics, 16: 459-463.

1989Reconstruction, Signal Enhancement and Storage of Sound Material inJapan. Proceedings of the 2nd International Conference on JapaneseInformation in Science, Technology and Commerce. Berlin, 367-374.Aerodynamic and Psycho-acoustic Properties of Esophageal VoiceProduction. (With G.L.J. Nieboer and H.K. Schutte). Proceedings of theConference on Speech Research '89. Budapest, 53-58.A Data Base of Old Sound Material. Proceedings of the ESCA Workshopon Speech Input/Output Assessment and Speech Data Bases. Noordwijk,2.14.1-5.

1990Een contrastief fonetisch onderzoek, in het bijzonder Japans-Nederlands[Contrastive Phonetic Research, in Particular Japanese-Dutch]. NeerlandicaWratislaviensia IV. Wrocław, 140-148.

26 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

Book Review: To Siberia and Russian America, Three Centuries of RussianEastward Expansion. Circumpolar Journal, 7: 41-46.New Technologies in Sound Reconstruction and their Applications to theStudy of the Smaller Languages of Asia. Proceedings of the IVth

International Symposium “Uralische Phonologie”. Hamburg, 15-19.GARASU-GLAS: Fonetische contrasten Japans-Nederlands [GARASU-GLAS: Phonetic Contrasts Japanese-Dutch]. TABU. Bulletin voorTaalwetenschap, 20: 49-57.

1991Aerodynamic and Phonetic Properties of Voice Production with theGroningen Button. TENK jaarboek, 91-97.Laser-beam Technology in Diachronic Phonetic Research andEthnolinguistic Field Work. Proceedings of the XIIth InternationalCongress of Phonetic Sciences. Amsterdam, 114-118.Laut aus Wachs: Der Übergang von stoffgebundenen zum elektronischenund optischen Informationstransport [Sound from Wax: The Transitionfrom Material-Bound to Electronic and Optic Information Transport]. TUInternational. Berlin, 14/15: 63-66.

1992The Languages of Sakhalin. Small Languages and Small LanguageCommunities: News, Notes, and Comments. International Journal of theSociology of Languages, 94: 185-200.Dutch Encounters with Sakhalin and with the Ainu People. Proceedings ofthe International Conference 125th anniversary of the birth of Bronis�awPi�sudski. Sapporo, 108-137.The Ethnolinguistic Situation on the Island of Sakhalin. CircumpolarJournal, 6: 32-58.Aerodynamische en fonetische eigenschappen van verschillende soortenslokdarmstem [Aerodynamic and Phonetic Features of Different Kinds ofEsophageal Voice]. (With G.L.J. Nieboer and H.K. Schutte). KlinischeFysica, 8: 64-66.The Dutch Role in the Border Area between Japan and Russian. Round TableConference “The Territorial Problem in Russo-Japanese Relations” .Moscow, 20-26.De Taal der Mennonieten [The Language of the Mennonites]. SyllabusNOMES Symposium Groningen, 42 pp.

Tjeerd de Graaf 27

1993Saharin ni okeru shoosuu minzoku no gengo jookyoo [The Status ofMinority Languages on Sakhalin]. (With K. Murasaki). Japanese ScientificMonthly, 46: 18-24.The Ethnolinguistic Situation on the Island of Sakhalin. Ethnic minoritieson Sakhalin. Yokohama, 13-32.Vstrechi gollandtsev c Sakhalinom i Ainami [Meetings of the Dutch withSakhalin and the Ainu Population]. Proceedings of the InternationalConference “B.O. Pilsudski - issledovatel' narodov Sakhalina”. Yuzhno-Sakhalinsk, 92-99.De taal der Mennonieten in Siberië en hun relatie met Nederland [TheLanguage of the Siberian Mennonites and their Relation with theNetherlands]. (With R. Nieuweboer). Doopsgezinde Bijdragen, 19: 175-189.Languages and Cultures of the Arctic Region in the Former Soviet Union.(With R. Nieuweboer). Circumpolar Journal, 1-2: 29-42.

1994The Dutch Role in the Border Area between Japan and Russia.Circumpolar Journal, 3-4: 1-12.Nederlands in Siberië [Dutch in Siberia]. (With R. Nieuweboer). TABUTaalkundig Bulletin, 24: 65-75.The Language of the West Siberian Mennonites. (With R. Nieuweboer).RASK, Internationalt tidsskrift for sprog og kommunikation, 1: 47-63.

1995Het territoriale geschil tussen Japan en Rusland over de Koerilen [TheTerritorial Dispute between Japan and Russia about the Kuril Islands].(With I. van Oosteroom). Internationale Spectator, 49: 41-46.Dutch Encounters with Sakhalin and with the Ainu People. Linguistic and�������� ����� ��� ������, 35-61.The Language of the West Siberian Mennonites. (with R. Nieuweboer).Proceedings of the XIIIth Congress of Phonetics Sciences. Stockholm, 4:180-184.Pitch Stereotypes in the Netherlands and Japan. (With R. van Bezooijenand T. Otake). Proceedings of the XIIIth Congress of Phonetic Sciences.Stockholm, 680-684.The Reconstruction of Acoustic Data on the Ethnic Minorities of Siberia.Proceedings of the International Conference on “The Indigenous Peoples

28 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

of Siberia: Studies of Endangered Languages and Cultures”. Novosibirsk,1: 381-383.

1996Book Review of: Joshua A. Fishman, Yiddish: Turning to Life. Studies inLanguage, 20,1: 191-196.Language Minorities in the Sakha Republic (Yakutia). Report Nagoya CityUniversity. Nagoya, 165-179.Dutch Encounters with the Peoples of Eastern Asia. A Frisian and GermanicMiscellany, published in Honour of Nils Århammar on his Sixty-FifthBirthday. Odense, 377-386.Dutch Immigrants in Siberia? The Language of the Mennonites. Charisteriaviro doctissimo Přemysl Janota oblata, Acta Universitatis Carolinae

Philologica. Prague, 75-86Archives of the Languages of Russia. (With L.V. Bondarko). Reports on theINTAS Project No. 94-4758. St.-Petersburg, 120 pp.

1997The Reconstruction of Acoustic Data and Minority Languages in Russia.Proceedings of the 2nd International Congress of Dialectologists andGeolinguists. Amsterdam.,44-54.Language and Culture of the Russian Mennonites. Around Peter the Great.Three Centuries of Russian-Dutch Relations. Groningen, 132-142.Resten van het Jiddisch in Groningen en Sint-Petersburg [Remnants of theYiddish Language in Groningen and Saint-Petersburg]. VDW-berichten,Vereniging voor Dialectwetenschap, 1: 6-7.The Reconstruction of Acoustic Data and the Study of Language Minoritiesin Russia. Language Minorities and Minority Language. ������� �����

1998Linguistic Databases and Language Minorities around the North PacificRim. Lecture on the Occasion of the Doctorate Honoris Causa, St.-Petersburg, 14 pp.Linguistic Databases: A Link between Archives and Users. Journal of theInternational Association of Sound Archives, 27-34.

Tjeerd de Graaf 29

1999Russian-Yiddish: Phonetic Aspects of Language Interference. (With N.Svetozarova, Yu. Kleiner and R. Nieuweboer). Proceedings of the 14th

International Congress of Phonetic Sciences. San Francisco., 1397-1401.Language Contact and Sound Archives in Russia. (With L. Bondarko).Proceedings of the 14th International Congress of Phonetic Sciences. SanFrancisco, 1401-1404.Lingvisticheskie bazy dannykh i yazykovye men’shinstva po obeim storonamsevernogo tikho-okeanskogo poyasa [Linguistic databases and languageminorities at both sides of the North-Pacific Rim]. Yazyk i rechevayadeyatel’nost’, 2: 8-18.

2000Scientific Links between Russia and The Netherlands: A Study of theLanguages and Cultures in the Russian Federation. Proceedings of theConference on the Netherlands and the Russian North. Arkhangelsk. To bepublished.The Language of the Siberian Mennonites. (With R. Nieuweboer). NewInsights in Germanic Linguistics II. Frankfurt am Main, 21-34.

2001Nivkh and Kashaya: Two endangered Languages in Contact with Russian andEnglish. Materialy mezhdunarodnoy konferentsii “100 let eksperimental’noyfonetike v Rossii”. St.-Petersburg, 78-83.Data on the Languages of Russia from Historical Documents, Sound Archivesand Fieldwork Expeditions. Recording and Restoration of MinorityLanguages, Sakhalin Ainu and Nivkh, ELPR Report A2-009. Kyoto, 13-37.Kashaya Pomo and the Russian Influence around the North Pacific. Materials�� ��� ���� ����������� ���������� �� ��������� �������� �� ���

Scholarly Heritage. Kraków, 385-395.

2002Yazyk i etnos [Language and Ethnos]. (With A.S. Gerd and M. Savijärvi).Texts and Comments on Balto-Finnic and Northwestern Archaic RussianDialects. St.-Petersburg, 206 pp.Voices from Tundra and Taiga: Endangered Languages in Russia on theInternet. Conference Handbook on Endangered Languages. Kyoto, 57-79.

30 Markus Bergmann, Nynke de Graaf and Hidetoshi Shiraishi

Phonetic Aspects of the Frisian Language and the Use of Sound Archives.Problemy i metody eksperimental’no-foneticheskikh issledovaniy. St.-Peterburg, 52-57.Voices from the Shtetl: The Past and Present of the Yiddish Language inRussia. Final Report NWO Russian-Dutch Research Cooperation.Groningen, 143 pp.The Use of Sound Archives in the Study of Endangered Languages. MusicArchiving in the World, Papers Presented at the Conference on theOccasion of the 100th Anniversary of the Berlin Phonogramm-Archiv.Berlin, 101-107.The Use of Acoustic Databases and Fieldwork for the Study of theEndangered Languages of Russia. Proceedings of the International LRECWorkshop on Resources and Tools in Field Linguistics. Las Palmas, 29.1-4(CD-ROM).Yiddish in St.-Petersburg: The Last Sounds of a Language. Proceedings ofthe Conference “Klezmer, Klassik, jiddisches Lied. Jüdische Musik-Kulturin Osteuropa.”. Potsdam. To be published.

2003Yazyki severnoy i vostochnoy Tartarii – o yazykovykh svedeniyakh vknige N. Vitsena [The Languages of North and East Tartary – About theLinguistic Data in the Book of N. Witsen]. (With M. Bergmann).Proceedings of the Conference on General Linguistics. St.-Petersburg. Tobe published.Description of Minority Languages in Russia on the Basis of HistoricalData and Fieldwork. Proceedings of the XVIth International Congress ofLinguists. Prague. To be published.Voices of Tundra and Taiga: Data on Minority Languages in Russia fromHistorical Data and Fieldwork. Proceedings of the Conference “Formationof Educational Programs Aimed at a New Type of Humanitarian Educationin Siberian Polyethnic Society, Novosibirsk. To be published.Endangered Languages in Europe and Siberia: State of the Art, Needs andSolutions. International Expert Meeting on UNESCO Programme“Safeguarding of Endangered Languages”. Paris. To be publishedPresentation of the UNESCO Document “Language Vitality andEndangerement”. Focus on Linguistic Diversity in the New Europe.European Bureau for Lesser Used Languages, Brussels. To be published.

Tseard de Graaf

Oerset troch Jurjen van der Kooi

Tseard de Graaf is berne op 27 jannewaris 1938 yn Ljouwert, de haadstêdfan de Nederlânske provinsje Fryslân, de grutste regio oan de kusten fan deNoardsee dêr’t it Frysk, in Westgermaanske taal mei as neiste sibbe itIngelsk, sprutsen wurdt.

Tseard syn âlden wiene beide Fries en thús waard allinne Frysk praat.Syn earste taal wie dan ek it Frysk; op skoalle learde er it Nederlânsk, deoffisjele taal fan Nederlân.

It Frysk waard doe noch net op skoalle jûn en de bern moasten dêrNederlânsk leare. Troch it neistinoar fan it Frysk thús en it Nederlânsk opskoalle kaam Tseard foar it earst yn ’e kunde mei de fassinearjende wrâldfan ûnderskate talen. Foar Tseard hie it ûnderskie tusken dy twa talen ek inbesûnder romtlik aspekt. As hy en de oare bern út syn buert de moarns neiskoalle ta gyngen moasten se oer it spoar. Wiene se dêr oer, dan giene sefan it Frysk oer op it Nederlânsk, de taal fan de skoalle.

18 Jier âld wie er, doe’t er yn 1956 eineksamen middelbere skoalle dieen hy krige niget oan talen. Syn oare grutte leafdes wiene natuer- enstjerrekunde. It âldste planetarium fan de wrâld is yn Frjentsjer, de âldeakademystêd fan Fryslân. Yn ’e besnijing fan de wetten dy’t tiid en romteregeare, studearre Tseard fan 1956 oant 1963 natuerkunde oan deUniversiteit fan Grins. Yn dat lêste jier die er doktoraal eksamen teoretyskenatuerkunde, in kombinaasje fan natuerkunde, wiskunde en stjerrekunde endêrnei wie er oant 1969 ûndersykmeiwurker oan it Ynstitút foar TeoretyskeNatuerkunde fan de Grinzer universiteit.

Tseard wie doe al in ‘polyglot’. Hy spruts net allinne Frysk enNederlânsk, mar ek Dútsk, Ingelsk en Frânsk. Oare talen soene folgje. Ynde Sowjet Uny stie de stúdzje fan de astronomy doe op in superieur nivo.Tseard seach dat en learde it Russysk en oare Eastjeropeeske talen om’t dyit paad nei nij ynsjoch yn dy fjilden fan wittenskip iepenleinen. Neist synstúdzje fan de teoretyske natuerkunde folge er kolleezjes yn de Slavysketalen. Hy wie fassinearre troch de nije technologyen en hartapassingsmooglikheden foar takomstich ûndersyk en yn 1967 die erkandidaatseksamen Slavyske talen en kompjutertaalkunde. Yntysken wie er

32 Oerset troch Jurjen van der Kooi

nei syn doktoraal natuerkunde trochgien mei syn ûndersyk yn de teoretyskenatuerkunde, dat er kombinearre mei in heal jier stúdzje yn Poalen, dêr’t erek it Poalsk by learde.

Yn 1969 wie er klear mei syn dissertaasje, titele: “Aspects of NeutrinoAstrophysics”.

It titelblêd fan Tseard syn dissertaasje teoretyske natuerkunde út 1969

Syn ûndwêstbere toarst nei witten brocht Tseard nei Ingelân, dêr’t er meifrou en bern fan 1970 oant 1971 in jier tabrocht en dêr’t er wurke asûndersiker oan it Ynstitút foar Teoretyske Natuerkunde fan de Universiteitfan Cambridge.

Werom yn Grins waard Tseard universitêr meiwurker natuerkunde oanit Ynstitút foar Astronomy. Oant 1975. Dat jier waard in kearpunt yn synwittenskiplike karriêre om’t er besleat de wei fan syn twadde grutte leafdete gean, dy fan de bestudearring fan talen. Ien fan de stellingen by syndissertaasje gie oer de fraach, hoe krekt of immens identiteit definiearrewurde kin troch syn of har taal. Dy stelling kin sjoen wurde as insymboalyske paadwizer nei syn lettere wei yn de taalkunde, dy’t him liedesoe nei de stúdzje fan aspekten fan de sprutsen taal, nei de fonetyk.

Yn 1975 waard Tseard meiwurker oan it Ynstitút foar Fonetyk fan deLiterêre Fakulteit fan de Grinzer universiteit.

Om’t er fan jongs ôf oan twatalich wie (Frysk-Nederlânsk) hie Tseard inskerp each foar de ûntelbere fonetyske ferskillen tusken dy talen. En om’ter oare talen bestudearre hie, wist er hoe wichtich fonetyske beskriuwingenbinne, net allinne foar de teoretyske taalkunde, mar likegoed ek foar itlearen fan en it lesjaan yn frjemde talen.

Tseard de Graaf 33

It neistinoar fan talen en taalferoaring soene oare swiertepunten fan synûndersyk wurde. Rûnom yn ’e wrâld binne minsken twa- of sels meartalich.Der is taalfariaasje yn romte èn yn tiid. Harket men nei radioprogramma’sof tillevyzje-útstjoerings fan tsien of twintich jier lyn, dan heart men in oaresprektaal as at no gongber is. It is noch altiten deselde taal en itselde plak,en dochs is de taal net mear gelyk. Net allinne it leksikon fan in taalferoaret, mar ek minskene wize fan sprekken, de útspraak en de yntonaasje,in útsûnderlik nijsgjirrich ûnderwerp foar immen dy’t niget hat oan talen enhar fariabiliteit.

Tseard begûn mei in syktocht nei de âldste registraasjes op lûddragersfan sprutsen taal. Hy analysearre materiaal net allinne út Westerlauwerskmar ek út Noard- en Eastfryslân. Sokke registraasjes fan eardere sprutsentaal binne net allinne wichtich histoarysk erfguod, mar se jouwe ekweardefolle ynformaasje oangeande taalferoaringsprosessen. In praktyskprobleem by dy âldste lûdregistraasjes is dat se makke binne opwaakssilinders en dat de kwaliteit hurd ôfnimt elke kear as se beharkewurde. Tseard seach yn dat it fan it grutste belang is en bring dizzeregistraasjes oer op moderne lûddragers, sadat se bewarre bliuwe. Yn itbegjin fan de jierren 90 begûn Tseard mei kollega’s út Japan in ûndersyknei de mooglikheden dêrta. Yn dy tiid makke er him noch in taal eigen, itJapansk, dat er floeiend sprekken learde.

Tseard oan ’e skrep mei waakssilinders mei âlder Nederlânsk

Tseard socht kontakt mei de wichtichste lûdargiven yn de wrâld, dy ynWenen, Berlyn en Sint Petersboarch. Troch syn oparbeidzjen yn de 90erjierren mei it lûdargyf fan de Akademy fan Wittenskippen yn dy lêste stêd

34 Oerset troch Jurjen van der Kooi

luts er op ’en nij de relaasjes oan mei Ruslân, dy’t yn de 60er jierren begûnwiene mei syn bestudearring fan de Slavyske talen.

Sûnt 1990 is wrâld dramatysk feroare. It Izeren Gerdyn is der net mearen Ruslân hat opnij syn ‘Finster op it Westen’ iepenset. Doe’t Tseard nei1990 weromkaam yn Sint Petersboarch rekke er daliken fassinearre trochdizze stêd dy’t er foar it earst likernôch 20 jier lyn, doe’t er noch Leningradhiet, sjoen hie. As Fries en Nederlanner fielde er him der thús. Deskildereftige kanalen en paden lâns de wide bûlevaren diene him tinke oanthús. Dat wie gjin tafal: tsaar Peter de Grutte hie sa’n 300 jier earder Hollânkeazen as model foar syn nije haadstêd.

Tseard organisearre no mienskiplike projekten mei de RussyskeAkademy fan Wittenskippen en de Steatsuniversiteit fan Sint Petersboarch.It doel wie âlde Russyske lûdregistraasjes te bewarjen en oer te setten opmoderne digitale audio media. Undersyk nei in grutte samling fanalderhande lûdregistraasjes, resultaat fan withoe folle linguïstyskeekspedysjes fan ein 19de en út de 20ste ieu, brocht nije projektenûnderskate yn Ruslân spruten talen oanbelangjende op ’e gleed.

Sels sette er útein mei ûndersyk nei de taal fan de SibearyskeMennoniten, dêr’t it komôf fan socht wurde moat yn noardlik Nederlân enDútslân en dy’t noch altiten de taal fan de foarâlden sprekke – feitliken intaal dy’t gâns hat fan de dialekten fan it hjoeddeiske Noard-Dútslân en denoardlike parten fan Nederlân. De Nederlânske parse kaam sels mei de kop“Sibeariërs sprekke Grinzers”.

Talen skiede net allinne folken en naasjes, se bouwe der ek brêgentusken. Tseard liet ek dat mei syn ûndersyk sjen. Sels yn it fiere Sibearjewenje minsken dy’t likernôch deselde taal hawwe as de minsken ynGrinslân. By it plannen fan syn ekspedysjes tocht Tseard net allinne oan dewittenskip, mar ek om de minsken: hy organisearre ek humanitêre help útGrins wei foar de doarpen dy’t er yn Sibearje oandie.

Taal as kultureel erfskip waard de kearn fan syn linguïstyskeaktiviteiten. Troch syn twatalich komôf koe er in treflik foarbyld jaan. Synhiele libben hat er sjen litten dat elk yndividu bydrage kin oan it oerlibjenfan in taal. Mei syn frou Nynke, dy’t er met hie yn syn studintetiid en dêr’ter lang allinne Nederlânsk mei praat hie, praatte er no ôf om oer te stappenop it Frysk. Nynke is sels in Friesinne en nei de dea fan har âlden murkenhja sels hoe't in taal stadichoan út begjint te stjerren as de bern him netfierder trochjouwe.

Dy taalhâlding waard de rjochtline foar Tseard syn opienfolgjendeûndersykaktiviteiten yn Ruslân. De projekten dy’t er fan dat stuit ôf oan

Tseard de Graaf 35

koördinearre krigen twa doelstellings: bedrige talen net allinnedokumintearje, mar ek revitalisearje en yn stân hâlde foar kommendegeneraasjes. Tseard die mei oan ferskate ekspedysjes, ûnder oaren neiYakutia en it eilân Sakhalin, wêr’t er mei oare linguïsten de talen fan delokale folken fêstlei.

Tseard de Graaf mei sprekkers fan talen fan it eilân Sakhalin yn it fiere easten fanRuslân: Uiltas en Nivkhs (jierren 90)

Yn de twadde helte fan de 90er jierren koördinearre Tseard ûnderskateprojekten mei ynstituten rûnom yn de Russyske Federaasje, foar definansearring soargen Nederlânske wittenskiplike organisaasjes en deINTAS fan de EU.

Alderearsten woe er jonge minsken bybringe dat harren taal in unykerfskip is en dat hja minderheids- en regionale talen stypje moatte. Yn 1998waard Tseard beneamd ta ridder yn de oarder fan de Nederlânske liuwfanwegens syn ûndersyk nei en krewearjen foar it behâld fan en it opsettenfan databanken foar de minderheidstalen yn Ruslân. Letter datselde jierskrige er in earedoktoraat fan de Universiteit fan Sint Petersboarch foar synbydragen oan de mienskiplike taalbehâldprojekten.

36 Oerset troch Jurjen van der Kooi

Tseard de Graaf earedoktor oan de Universiteit fan Sint Petersboarch, novimber1998.

Tseard moast yn 2003 mei pinsjoen en syn plak as koördinator fan deôfdieling 'Klanklear' (Fonology en Fonetyk) fan it CLGC (Center ofLanguage and Cognition Groningen) fan de Grinzer Universiteit opjaan. Tady gelegenheid ha syn kollega's dizze earebondel mei in ferskaat oanbydragen oer ûndersyk yn de grinsgebieten fan fonology en fonetykgearstald.

Lykwols, it is gjin ôfskie fan ús eardere koördinator. Tseard synpasjonearre belutsenens by talen en linguïstyske projekten is bleaun. Sûntsyn pinsjoen is er aktyf as honorêr meiwurker fan de Fryske Akademy ynLjouwert en ek it kontakt mei de Universiteit fan Sint Petersboarch oerûndersyksprojekten dy't op kommende wei binne is bleaun. Dat betsjut datTseard neist syn rol as pake foar syn fiif pakesizzers noch genôch te dwaanhat. Syn entûsjasme is in oantrún foar (kommende) ûndersikers om fierderte gean mei it ûndersyk dat hy op priemmen set hat.

Boundary Tones in Dutch: Phonetic or PhonologicalContrasts?

Vincent J. van Heuven

1. Introduction1

1.1. Linguistic categorization of sound

A basic problem of linguistic phonetics is to explain how the infinitevariety of speech sounds in actual utterances can be described with finitemeans, such that they can be dealt with in the grammar, i.e. phonology, of alanguage. The crucial concept that was developed to cope with thisreduction problem is the sound category, or – when applied to thedescription of segmental phenomena – the phoneme. This is best conceivedof as an abstract category that contains all possible sounds that are mutuallyinterchangeable in the context of a minimal word pair. That is, substitutionof one token (allophone) of a phoneme for an other does not yield adifferent word (i.e., a string of sounds with a different lexical meaning).2

The phonemes in a language differ from one another along a finitenumber of phonetic dimensions, such as degree of voicing, degree ofnoisiness, degree of nasality, degree of openness, degree of backness,degree of rounding, etc. Each phonetic dimension, in turn, is subdividedinto a small number (two to four) of phonologically functional categories,such as voiced/voiceless, (half)closed/(half)open, front/central/back, etc.Phonetic dimensions generally have multiple acoustical correlates. Forinstance, degree of voicing correlates with a multitude of acoustic cuessuch as voice onset time, duration of preceding vowel, steepness ofintensity decay and of formant bends in preceding vowel, duration ofintervocalic (near) silence, duration and intensity of noise burst, steepnessof intensity attack and formant bends of following vowel. These acousticproperties typically co-vary in preferred patterns, but may be manipulated

38 Vincent J. van Heuven

independently through speech synthesis. When non-typical (‘conflicting’)combinations of parameter values are generated in the laboratory, somecues prove to be more influential than others; so-called ‘cue tradingrelationships’ have been established for many phonemic contrasts. InDutch, for instance, vowel quality (acoustically defined by F1 and F2, i.e.,the centre frequencies of the lowest two resonances in the vocal tract) andvowel duration were found to be equally influential in cuing the tense/lax-contrast between /�� and /�/: a duller vowel quality (lower F1 and F2-values), normally cuing /�/ could be compensated for by increasing theduration of the vowel so that native listeners still perceive /a �/ (and viceversa, van Heuven, 1986).

Categorization of sounds may proceed along several possible lines.First, many differences between sounds are simply too small to be heard atall: these are subliminal. The scientific discipline of psycho-acousticsprovides a huge literature on precisely what differences between soundscan and cannot be heard with the naked ear. Moreover, research has shownthat the human hearing mechanism (and that of mammals in general) hasdeveloped specific sensitivities to certain differences between sounds and isrelatively deaf to others. These predilections have been shown to be presentat birth (probably even in utero), and need not be acquired throughlearning. However, human categorization of sound is further shaped byexposure to language. As age progresses from infancy to adulthood, sounddifferences that were still above threshold shortly after birth quickly losetheir distinctivity. An important concept in this context is the notion ofcategorical perception. This notion is best explained procedurally in termsof a laboratory experiment.

Imagine a minimal word pair such as English back ~ pack. Oneimportant difference between these two tokens is that the onset of voicingin back is more or less coincident with the plosive release, whilst the voiceonset in pack does not start until some 50 ms after the release. It is not toodifficult in the laboratory to create a series of exemplars by interpolatingthe voice onset time of a prototypical back (0-ms delay) and that of aprototypical pack (70-ms delay) in steps of, say, 10 ms, so that we nowhave an 8-step continuum ranging over 0, 10, 20, 30, 40, 50, 60, and 70 ms.These eight exemplars are shuffled in random order and played to anaudience of native English listeners for identification as either back or pack(forced choice). The 0-ms voice delay token will naturally come out withexclusively back-responses (0% pack); the 70-ms token will have 100%pack-responses. But what results will be obtained for the intermediate

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 39

exemplars? If the 10-ms changes in voice delay are perceived continuously,one would predict a constant, gradual increase in %-pack responses foreach 10-ms increment in the delay. I.e., when the stimulus increment (fromleft to right) is plotted against the response increment (from bottom to top),the psychometric function (the line that captures the stimulus-responserelationship) is essentially a straight line (open symbols in Figure 1B). Thetypical outcome of experiments with voiced/voiceless continua, however, isnon-continuous. For the first part of the continuum all exemplars areperceived as back-tokens, the rightmost two or three exemplars are near-unanimously perceived as pack. Only for one or two exemplars in themiddle of the continuum do we observe uncertainty on the part of thelistener: here the distribution of responses is more or less ambiguousbetween back and pack. The psychometric function for this so-calledcategorical perception is sigmoid, i.e., has the shape of an S (big solidsymbols in Figure 1B). In the idealized case of perfect categoricalperception we would, in fact, expect to see a step-function jumpingabruptly from (almost) 0 to (almost) 100% pack-responses somewherealong the continuum (thin black line with small solid symbols in Figure1B).

The category boundary (at 35-ms VOT in Figure 1B) is defined as the(interpolated) point along the stimulus axis where the distribution ofresponses is completely ambiguous, i.e., 50-50%. For a well-defined cross-over from one category to the other there should be a point along thestimulus axis where 75% of the responses agree on one category, and asecond point where there is 75%-agreement on the other category. Theuncertainty margin is defined in absolute terms as the distance along thestimulus axis between the two 75%-points; equivalent relative measurescan be derived from the steepness of the psychometric function (e.g. theslope coefficient or the standard deviation of the cumulative normaldistribution fitted to the data points).

40 Vincent J. van Heuven

Figure 1. Panel A. Hypothetical discrimination function for physically same anddifferent pairs of stimuli (one-step difference) reflecting categoricalperception. Panel B. Illustration of continuous (open squares) versuscategorical (big solid squares) perception in the identification anddiscrimination paradigm. The thin line with small squares represents theideal step function that should be obtained when categorical perception isabsolute. Category boundary and uncertainty margin are indicated(further, see text).

Although a pronounced sigmoid function (such as the one drawn in Figure1B) is a clear sign of categorical perception, researchers have always beenreluctant to consider it definitive proof. Listeners, when forced to, tend tosplit any continuum down the middle. For a continuum to be perceivedcategorically, therefore, two conditions should be met:

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 41

- results of an identification experiment should show a clear sigmoidfunction, and

- the discrimination function should show a local peak for stimulistraddling the category boundary.

The discrimination function is determined in a separate experiment inwhich either (i) identical or (ii) adjacent tokens along the stimuluscontinuum are presented pair-wise. Listeners then decide for each pairwhether the two tokens are ‘same’ or ‘different’. Two kinds of error mayoccur in a discrimination task:

- a physically different pair may be heard as ‘same’, and- a pair of identical tokens may be called ‘different’.

The results of a discrimination task are best expressed as the percentage ofcorrect decisions obtained for a ‘different’ stimulus pair minus thepercentage of errors for ‘same’ pairs constructed from these stimuli (thelatter percentage is often called the response bias). In the case of truecategorical perception the discrimination scores show a pronounced peakfor the stimulus pair straddling the category boundary, whilst all other pairsare discriminated at or only little above chance level (see panel A in Figure1). Physically different sounds that fall in the same perceptual category arehard to discriminate. In the case of continuous perception, there is no localpeak in the discrimination function.

1.2. Categorical nature of intonational contrasts

By intonation or speech melody we mean the pattern of rises and falls inthe time-course of the pitch of spoken sentences. Melodic patterns inspeech vary systematically across languages, and even within languagesacross dialects. The cross-linguistic differences can be parameterized anddescribed in much the same way as has been done for the segmentals inlanguage: a set of distinctive features defines an inventory of abstract units,which can be organized in higher-order units subject to wellformednessconstraints. Moreover, intonational contrasts are used to performgrammatical functions that can also be expressed by lexico-syntacticmeans, such as turning statements into questions, and putting constituentsin focus. For these reasons it has become widely accepted that intonation is

42 Vincent J. van Heuven

part of the linguistic system (Ladd, 1996: 8). Yet, there have always beenadherents of the view that speech melody should be considered assomething outside the realm of linguistics proper, i.e., that intonation is aparalinguistic phenomenon at best, to be treated on a par with theexpression of attitudes or emotions. Typically, the communication ofemotions (such as anger, fear, joy, surprise) or of attitudes (such assarcasm) is non-categorical: the speaker shows himself more or less angry,fearful, or sarcastic in a continuous, gradient fashion.

A relatively recent insight, therefore, is that a division should be madein melodic phenomena occurring in speech between linguistic versusparalinguistic contrasts. Obviously, only the former but not the latter typeof phenomena should be described by the grammar and explained bylinguistic theory. This, however, begs the question how the difference canbe made between linguistic and paralinguistic phenomena within the realmof speech melody.3 Ladd & Morton (1997) were the first to suggest that thetraditional diagnostic for categorical perception should be applicable tointonational categories in much the same ways as it works for segmentalcontrasts. Only if a peak in the discrimination function is found for adjacentmembers on a tone continuum straddling a boundary between tonalcategories, are the categories part of the linguistic system, i.e., phonologicalcategories. If no categorical perception of the tone categories can beestablished, the categories are ‘just’ the extremes of a paralinguistic orphonetic tonal continuum. Ladd & Morton tested the traditional diagnosticon a tone continuum between normal and emphatic accent in English andnoted that it failed. This – to me – indicates that the contrast is not part ofthe phonology of English.

Remijsen & van Heuven (1999, 2003) tested the traditional diagnosticon a tone continuum between ‘L%’ and ‘H%’ in Dutch, and showed thatindeed there was a discrimination peak for adjacent members along thecontinuum straddling the boundary – indicating that the ‘L%’ and ‘H%’categories are part of the phonology of Dutch. At the same time, however,we had to take recourse to listener-individual normalization of the categoryboundary, a complication that is not generally needed when dealing withcontrasts in the segmental phonology.4

Van Heuven & Kirsner (2002) suggested that the relatively weakcategorical effects in Remijsen & van Heuven could have been the result ofan incorrect subdivision of the ‘L%’ to ‘H%’ tone range. Van Heuven &Kirsner (2002) showed that Dutch listeners were perfectly able tocategorize a range of final pitches between low and high in terms of three

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 43

categories, functionally denoted as command intonation, continuation, andquestion. However, we did not run the full diagnostic involving bothidentification and discrimination procedures. Moreover, Van Heuven &Kirsner forced their listeners to choose between three response alternatives,viz. command, conditional and question. Although the extremes of therange, i.e. command versus question are unchallenged categories, it maywell be the case that the conditional is not necessarily distinct from thequestion type. After all, in the grammar developed by ‘t Hart, Collier &Cohen (1990) any type of non-low terminal pitch falls into the samecategory, indicating non-finality. It occurred to us that we should take theprecaution to run the experiment several times, using different responsealternatives, such that two separate binary (‘command’ ~ ‘no command’and ‘question ~ ‘no question’) response sets as well as the ternary responseset (‘command’ ~ ‘conditional’ ~ ‘question’) were used by the same set oflisteners. If the intermediate ‘conditional’ response category does constitutea clearly defined notion in the listeners’ minds, the binary and ternarydivisions of the stimulus range should converge on the category boundaries.

The present paper seeks to remedy the infelicities of Van Heuven &Kirsner (2002). However, before I deal with the experiments, it is necessaryto introduce the inventory of the domain-final boundary configurations thatcan be found in Dutch.

1.3. Dutch domain-final boundary tones

Over the past decades a major research effort has been spent on the formaldescription of the sentence melody of Dutch. In the present paper weconcentrate on one small part of the intonation system of Dutch: the optionsthat are available to the speaker to terminate an intonation phrase. It hasbecome customary to model the intonation system of a language as ahierarchically organized structure in which the tonal primitives (or ‘atoms’)are combined into tonal configurations, which in turn combine intointonation phrases. One or more of such intonation phrases are combinedinto an utterance, which may combine with other utterances to form aprosodic paragraph. The intonation phrase (henceforth IP), then, is situatedroughly in the middle of the prosodic hierarchy. Note that a short utterancemay consist of just one IP. An IP is characterized as a stretch of speechbetween two IP boundaries, i.e., a break in the segment string that issignaled by either a pause (physical interruption of the sound stream), pre-

44 Vincent J. van Heuven

boundary lengthening and/or by a boundary-marking tone. If the boundaryis sentence medial, then yet another IP must follow in order to finish theutterance.

The first explicit and experimentally verified grammar of Dutchintonation was developed at the Institute for Perception Research atEindhoven (‘t Hart et al., 1990; Rietveld & van Heuven, 2001: 263-270).This grammar models the sentence melody of Dutch as a system of twogently declining reference lines, nominally 6 semitones (half an octave)apart, between which the pitch rises and falls in a limited number ofpatterns. The grammar provides for three different ways in which an IPmay be terminated: (i) on the low reference line (‘0’), (ii) on the highreference line (‘∅’), or (iii) by executing a steep pitch rise (‘2’). Althoughthe grammar is not completely explicit on this point, it appears that theoffset of rise ‘2’ may exceed the level of the high reference line,specifically when the rise starts at the high reference line. The grammarthen allows IPs to end at three different pitches: low, high, and extra high.

A more recent account of Dutch intonation is given by Gussenhoven andco-workers (Gussenhoven, Rietveld & Terken, 1999; Rietveld & vanHeuven, 2001: 270-277). This model is constructed along the principlesadopted by autosegmental intonologists, in which a sentence melody isbasically a sequence of tonal targets of two types: ‘H’ (high) and ‘L’ (low).The ToDI system (Transcription of Dutch Intonation), which is aninventory of tonal configurations for surface-level transcriptions of Dutchsentence melodies using the autosegmental H/L notation format, providesthree symbols for marking IP boundaries: (i) ‘L%’, i.e., the final pitchtarget extends below the baseline, (ii) ‘%’, i.e., the absence of a tonal IPboundary marker, and (iii) ‘H%’, i.e., the final pitch is higher than thepreceding pitch.5 For details of the ToDI transcription system I refer to theToDI website (www.lands.kun.nl/todi) or to Rietveld & van Heuven (2001:399-401).

Remijsen & van Heuven (1999, 2003) report an experiment whichsought to establish the perceptual boundary between sentence-finalstatement and question intonation. They did this by varying the pitchconfiguration on the utterance-final syllable of the verb-less phrase DeDennenlaan(?) ‘Pine Lane(?)’ between a fall and a steep rise in elevenperceptually equal steps. Listeners were then asked to decide for each of theeleven pitch patterns whether they perceived it as a statement or a question.At the time we tacitly assumed that the continuum spanned just twopragmatic categories, i.e. statement versus question, and that there was no

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 45

relevant intermediate category that could be interpreted as ‘non-finality’. Infact, Kirsner & van Heuven (1996) suggested a single abstract meaning forthe non-low tonal category: ‘appeal (by the speaker to the hearer)’, askingfor the hearer’s continued attention or for a verbal response to a question ora non-verbal compliance with a request. However, Caspers (1998)suggested that there is a functional difference between the non-tonalboundary (‘%’) following an earlier ‘H*’ target and the high boundary(‘H%’) following an earlier ‘H*’. She synthesized stimuli in which theterminal pitch after the accent-marking ‘H*’ was followed by either ‘H%’(where the final pitch was raised further) or just % (where the pitchremained high but level after the accent). Her results indicate that listenersunequivocally expect the speaker to continue after the ‘H* ... %’configuration, in contradistinction to the ‘H* ... H%’ pattern, for which theresponses were equally divided between ‘same speaker will continue’ and‘interlocutor will take over (with a response)’.

Note that the ‘%’ tone-less boundary as studied by Caspers is foundonly after a preceding H* accent. Strictly speaking, then, the ‘%’ boundarycannot be used as an intermediate category in between ‘L%’ and ‘H%’when the preceding pitch is low. After ‘L’, any rise in pitch, whether strongor intermediate, is a perceptually relevant change in pitch, which must becoded by an ‘H%’ target. On the other hand, this formal constraint is in theway of an attractive generalization which would allow us to view the highlevel pitch (‘H* ... %’) pattern as a surface realization of the ‘H*L...%’pattern from which the L target has been deleted – in much the same way aswas suggested by Haan (2002) in order to account for the functionalsimilarity between the ‘H*...H%’ and the ‘H*L…H%’ interrogativepatterns, as exemplified in Figure 2.

Figure 2. Underlying tonal shape (dotted) and surface realization after ‘L’-deletion(solid) of an ‘H*L … H%’ sequence.

There seems to be a mismatch between the functions expressed by Caspers’‘%’ and ‘H%’ after ‘H*’. If we assume an iconic relationship between the

46 Vincent J. van Heuven

terminal pitch of the utterance and the degree of submissiveness of thespeaker towards the hearer, then we would reason that ‘H%’ should makemore of an appeal to the hearer (expressing greater submissiveness) thanjust ‘%’. On the other hand, answering a question seems a bigger favor onthe part of the hearer than merely waiting for the speaker to continue theutterance. It could be the case, of course, that even the highest terminalpitches used by Caspers were not high enough to elicit unambiguous ‘otherspeaker will take over’ (i.e. ‘question’) responses. Also, it is unclear if theunambiguous ‘same speaker will continue’ response crucially depends on aflat stretch of high declination (as is the case after an ‘H*’ accent) or if anyterminal pitch of intermediate height would yield the same response.

In Caspers’ analysis the ‘%’ boundary – and arguably an ‘L … H%’sequence with a moderately high terminal pitch – unambiguously signalscontinuation. This category would then be expected to be firmlyrepresented in the listener’s cognitive system. Varying the terminal pitchfrom low to extremely high should then elicit two well-defined categories:(i) unambiguous statement for low pitches, (ii) unambiguous continuationsfor intermediate terminal pitches, and (iii) a poorly defined or non-uniqueinterrogative category, which is also compatible with a continuationreading.

At this time, then, we do not know whether two or three formal tonecategories should be postulated in IP-final position. It seems that the statusof ‘L%’ as a linguistic category is unchallenged but the non-low part of theIP-final tone range is very much a matter of debate. Does the non-low partof the range form a continuum expressing lesser or greater appeal by thespeaker in a paralinguistic manner, or should this part of the range be splitinto two discrete phonological categories, each expressing a distinctmeaning of its own (i.e. ‘continuation’ ~ ‘question’, or – even worse – intotwo categories of which one is specific for ‘continuation’ and the otherunderspecified and compatible with both ‘question’ and ‘continuation’?These meanings, and a possible way of testing the categorical nature oftonal contrasts expressing them, are the topic of the next section.

1.4. Clause typing

Dutch, like any other language, has lexico-syntactic means to express arange of clause types, such as statement, command, exclamation andquestion. Although the lexico-syntactic means are generally adequate and

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 47

sufficient to express the speaker’s pragmatic intention to the hearer, several– if not all – clause types are supported by prosodic means, specifically byappropriate intonation patterns. In fact, exceptional situations may arisewhere there is no lexico-syntactic differentiation between the clause types,and where the speaker’s intention can only be recovered from melodiccues. For the purposes of the present experiment we have looked for asituation in which the three prosodic categories may serve as the only cue toa ternary choice among clause types, so that prosody will be exploited tothe utmost, and the listener’s choice will not be co-determined by lexicaland/or syntactic cues. Such a situation may be obtained in a V1 sentence,where the finite verb has been moved into the sentence-initial position.6 Inthe sentence Neemt u de trein naar Wageningen ‘Take you the train toWageningen’ the lexico-syntactic information is compatible with at leastthree interpretations:7

- A polite imperative (Kirsner, van Heuven & Caspers, 1998)- A conditional clause similar in meaning to ‘If you take the train to

Wageningen ...’- A yes/no question ‘Do you take the train to Wageningen?’

Which of the three readings is intended by the speaker, is expressedthrough prosody only. In setting up the experiment we assumed that there isno principal difference in the speech melody between a statement and acommand in Dutch.8 Using a range of terminal pitch patterns on the singlephrase Neemt u de trein naar Wageningen, we can determine the categoryboundaries between command (for statement), conditional (forcontinuation), and question without any interfering differences in lexico-syntactic structure.

We may conclude this introduction by summarizing the researchquestions that we will address:

1. Are the domain-final boundaries ‘L%’ ~ ‘%’ ~ ‘H%’ contiguouscategories along a single tonal dimension?

2. Is there a one-to-one correspondence between ‘L%’ and ‘command’,‘%’ and ‘conditional’, and ‘H%’ and ‘question’?

3. Where are the category boundaries – if any – along the continuumbetween (i) ‘L%’ and ‘%’ and (ii) between ‘%’ and ‘H%’?

4. Are the category boundaries at the same positions along the stimulusrange irrespective of the binary versus ternary response mode?

48 Vincent J. van Heuven

5. Are both boundaries truly categorical in the sense that there arediscrimination peaks for adjacent stimulus pairs straddling the categoryboundaries?

2. Methods

2.1. Stimuli

A male native speaker of standard Dutch read the sentence Neemt u de treinnaar WAgeningen? with a single ‘H*L’ accent on the first syllable ofWageningen. The utterance was recorded onto digital audio tape (DAT)using a Sennheiser MKH 416 unidirectional condenser microphone,transferred to computer disk (16 kHz, 16 bits) and digitally processed usingthe Praat speech processing software (Boersma & Weenink, 1996; Boersma& van Heuven, 2001). The intonation pattern of the utterance was stylizedby hand as a sequence of straight lines in the ERB x linear timerepresentation. Nine intonationally different versions were then generatedusing the PSOLA analysis-resynthesis technique (e.g. Moulines & Verhelst,1995; Rietveld & van Heuven, 2001: 379-380) implemented in the Praatsoftware. The nine versions were identical up to and including the ‘H*L’configuration on Wageningen. From that point onwards the nine versionsdiverged into two falls and seven rises. The terminal frequencies of the nineversions were chosen to be perceptually equidistant, i.e., the differencebetween any two adjacent terminal frequencies was equal in terms of theERB scale.9 The terminal pitch of version 1 equaled 80 Hz, the incrementin the terminal frequency for each following version was 0,25 ERB. Thenine pitch patterns are shown in Figure 3.

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 49

Figure 3. Steps 1 through 9 along resynthesized continuum differing in terminal F0by 0,25 ERB increments. Intensity contour (dB) and segmentation (bysyllables) are indicated.

2.2. Tasks and experimental procedures

For the discrimination task, which was the first task imposed on thesubjects, we followed Ladd and Morton (1997) in using the AXdiscrimination paradigm. Stimuli were presented in pairs that were eitherthe same or one step apart on the continuum. In the latter case, the secondcan be higher or lower than the first (hereafter AB and BA, respectively).The eight AB stimulus types ran from pair {1,2} to {8,9}; the eightcorresponding BA types from {2,1} to {9,8}. This yielded 9 identical pairsand 2 x 8 = 16 different pairs, which occurred in random order, yielding aset of 25 trials in all, which was presented to each listener four times indifferent random orders, preceded by five practice trials. Stimuli withinpairs were separated by a 500-ms silence, the pause between pairs was3000 ms. A short warning tone was sounded after every tenth trial.

For the identification task listeners responded to individual stimuli fromthe 9-step continuum by classifying each either in terms of a binary or aternary choice:

1. ‘Command’ ~ ‘no command’. In one task the listeners were instructedto decide for each stimulus whether they interpreted it as a command ornot.

2. ‘Question’ ~ ‘no question’. An alternative task involved the decisionwhether the stimulus sounded like a question or not.

neem t u de trein naar W A ge ni ngen

T ime (s)0 1.510.5

60

80

80

120

160

50 Vincent J. van Heuven

3. ‘Command’ ~ ‘condition’ ~ ‘question’. The third task was identical tothe task imposed in van Heuven & Kirsner (2002).

Half of the listeners first performed task (1), the other half of the listenersbegan with task (2). Task (3) was always the last identification procedure inthe array of tests. For each task, the set of nine stimuli were presented fivetimes to each listener, in different random orders, and preceded by fivepractice items, yielding sets of 50 identification stimuli per task.

Twenty native Dutch listeners, ten males and ten females, took part inthe experiment on a voluntary basis. Participants were university studentsor members of their families. None of them reported any perceptualdeficiencies.

The experiments were run with small groups of subjects, who listened tothe stimuli at a comfortable loudness level over Quad ESL-63 electrostaticloudspeakers, while seated in a sound-treated lecture room. Subjectsmarked their responses on printed answer sheets provided to them, alwaystaking the discrimination task first and the identification tasks last.

3. Results

3.1. Identification

Figures 4 and 5 present the results obtained in the binary identificationtasks, i.e., the forced choice between ‘command’ ~ ‘no command’ (Figure4) and between ‘question’ ~ ‘no question’ (Figure 5).

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 51

Figure 4. Percent ‘command’ responses as a function of stimulus step (terminal F0increments in 0.25 ERB steps) in a binary identification task (‘command’~ ’no command’).

The psychometric function for the ‘command’ responses is very steep. Thecategory boundary between ‘command’ and ‘no command’ is located at astep size of 2.7, and the margin of uncertainty runs between 2.2 and 3.7,i.e., a cross-over from 75% to 25% ‘command’ responses is effected by anincrease in the terminal pitch of the stimulus of 1.5 step (i.e., 0.37 ERB).

Figure 5. Percent ‘question’ responses as a function of stimulus step (terminal F0increments in 0.25 ERB steps) in a binary identification task (‘question’~ ’no question’

Stimulus step

987654321

Inde

ntifi

catio

ns "

ques

tion"

(%

)

100

80

60

40

20

0

Stimulus step

987654321

Inde

ntifi

catio

ns "

com

man

d" (

%) 100

80

60

40

20

0

52 Vincent J. van Heuven

A complete cross-over is also found for the ‘question’ ~ ‘no question’ task.The category boundary finds itself at a stimulus value of 3.6, whilst themargin of uncertainty runs between 2.3 and 4.9, i.e., an interval of 2.6increments of 0.25 ERB. We may note that the category boundaries in the‘command’ and the ‘question’ tasks do not coincide, but are separatedalong the stimulus axis by almost a complete step: 2.7 versus 3.6 or 0.9step. Note, once more, that none of the subjects had been alerted to thepossible existence of an intermediate category between ‘command’ and‘question’. Therefore, the emergence of the interval between the‘command’ and the ‘question’ boundaries might be taken in justification ofsuch an intermediate category.

Let us now turn to the results of the ternary identification task in whichall the listeners who had already responded to the stimuli were nowrequired to classify the nine stimulus types as either ‘command’,‘conditional subclause’ or ‘question’. These results are shown in Figure 6.

Figure 6. Ternary identification of stimuli as ‘command’, ‘conditional clause’ or‘question’. Category boundaries are indicated.

The boundary between ‘command’ and the ‘continuation’ categories is at2.8; this is hardly different than the ‘command’ ~ ’no command’ boundarythat was found in the binary response task. This, then, would seem to be avery robust boundary, showing that at least ‘command’ intonation has well-defined linguistic status. The boundary between ‘continuation’ and‘question’ is less clearly defined. Also, the maximum scores in these two

Stimulus step

987654321

Iden

tific

atio

n (%

)

100

80

60

40

20

0

'command'

'continuation'

'question'

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 53

categories are around 80% rather than 90% or more. Although there is noambiguity in the listeners’ minds whether a stimulus is a command orsomething else, the choice between ‘continuation’ and ‘question’ seemsmore ambiguous leaving room for a minority response in the order of 20%.This would indicate to us that we are dealing here with a continuum ratherthan with a dichotomy. Finally, we may note that the (soft) categoryboundary between ‘continuation’ and ‘question’ is located at a stimulusvalue of 7.2. The boundary, then, that sets off ‘question’ from ‘no question’responses proves very unstable: there is a shift from the binary responsetask (3.6) to the ternary task (7.2) of no less than 3.6 points along thestimulus continuum.

It would seem, then, that the ‘command’ category is highly stable andwell-established in the minds of the listeners. The ‘question’ boundary,however, is rather poorly defined, as a result of several circumstances. Thecross-over points for the ‘question’ category of individual listeners varyover a wide range of stimulus values, i.e., between 2.2 and 8.5 step number,with a fairly even spread of values in between these extremes. Moreover,for two listeners no cross-over to the ‘question’ category could be found atall; here the listeners never gave the ‘question’ response in more than 75%.Also, some listeners have extremely sharp cross-overs to the ‘question’category, but others show large margins of uncertainty.

3.2. Discrimination

Figure 7 presents the mean percentage of successfully discriminated stimulithat were actually different (hereafter ‘hits’), and the percentage of falsealarms, i.e. ‘different’ responses to (identical) AA stimuli. The false-alarmrate is roughly 20% across the entire stimulus continuum. This value can beseen as a bias for responding ‘different’. Generally, an increment of 0.25ERB is discriminated above the 20% bias level, with the exception of thedifference between stimulus steps 5 and 6. The discrimination functionshows two local peaks. The first one is very large, and is located betweenstimulus steps 2 and 3. This peak obviously coincides with the stablecategory boundary found between ‘command’ and the non-commandresponses (whether binary or ternary). A much smaller seconddiscrimination peak may be observed between stimulus steps 6 and 7,which location may well reflect the rather poorly defined categoryboundary between ‘continuation’ and ‘question’.

54 Vincent J. van Heuven

Figure 7. Percent ‘different’ judgments to nine identical stimulus pairs (falsealarms) and eight pairs differing by one step (hits).

4. Conclusions and discussion

Let us now try to formulate answers to the research questions that we askedin section 2. The first two questions, which I will attempt to answertogether, asked whether the domain-final boundary tones are contiguouscategories along a single tonal dimension, and map onto the command,continuation and question meaning in a one-to-one fashion. The results ofour experiments clearly indicate that this is indeed the case. Our listenershad no difficulty in using the three response alternatives provided to them.When the terminal pitch was lower than the preceding pivot point in thecontour the responses were almost unanimously for ‘command’. When theIP-final pitch was higher than the preceding pivot point, the incidence of‘continuation’ responses increased up to and including step 4, anddecreased for higher terminal pitches which were more readily identified asquestions as the terminal pitch was higher. Although there was alwayssome ambiguity between the ‘continuation’ and ‘question’ alternatives, theresults clearly indicate that ‘continuation’ is signaled by moderate finalpitch, and question by (extra) high pitch.

The latter finding corresponds with our suggestion that asking aquestion involves a higher degree of appeal by the speaker to hearer than

Stimulus step

9

8-9

8

7-8

7

6-7

6

5-6

5

4-5

4

3-4

3

2-3

2

1-2

1

"Diff

eren

t" ju

dgm

ents

(%

)

100

80

60

40

20

0

"same" pair

"diff" pair

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 55

asking the listener’s continued attention. We may also note that our resultclashes with Caspers (1998). She found that the intermediate final pitch (orhigh level pitch in her experiment) was unambiguously identified ascontinuation; extra high final pitch ambiguously coded either continuationor question. Comparison of Caspers’ and our own results is hazardous sincethe utterance-final tone configurations differ, not so much at the underlyingtone level, but at the surface. It seems to me that the discrepancy betweenCaspers’ and our own findings can be resolved if we accept the possibilitythat Caspers’ extra high terminal pitch was simply not high enough to elicitthe 80% ‘question’ responses that we got in our experiment.

The results so far concur with van Heuven & Kirsner (2002). However,we may now go on to consider the third, fourth and fifth question, whichasked where the category boundaries are located along the final pitchcontinuum between ‘L%, ‘%’ and ‘H’, in the binary and ternary responsetasks, and to what extent the boundaries coincide with a peak in thediscrimination function.

The results obtained in the binary (‘command’ ~ ‘no command’) andternary (‘command’ ~ ‘continuation’ ~ ‘question’) identification tasks arevirtually the same, yielding the same location of the boundary (at step 2.7)separating the ‘command’ category from the rest of the stimuluscontinuum. However, a very unstable boundary is found in the binary‘question’ ~ ‘no question’ task (at step 3.6), which is reflected in the poorlydefined boundary separating the ‘continuation’ and ‘question’ categories inthe ternary response task (at step 7.2). Moreover, we have seen that thecategory boundary between ‘command’ and ‘no command’ coincides witha huge peak in the discrimination function. Although there is a modest localmaximum in the discrimination function that may be associated with aboundary between ‘continuation’ and ‘question’, this peak is not veryconvincing.

I take these findings as evidence that there is a linguistic, orphonological, categorization of the IP-final boundary tone continuum injust two types, which is best characterized as low and non-low. The lowboundary tone signals dominance or superiority on the part of the speaker.This is the boundary tone that is suited for issuing statements andcommands. The non-low boundary tone signals subservience of the speakerto the hearer; the speaker appeals to the hearer for his continued attention orfor an answer to a question.

The non-low part of the boundary opposition, however, represents agradient, paralinguistic continuum between a moderate appeal (asking for

56 Vincent J. van Heuven

the hearer’s continued attention) and a stronger appeal (asking the hearerfor a verbal reply to a question). Here the lower terminal pitches areassociated with weaker degrees of appeal (or subservience), and the higherlevels with strong appeal, but in a continuous, gradient, non-phonologicalmanner.

Our results indicate that earlier findings reported by Remijsen & vanHeuven (1999, 2003) are to be viewed with caution. We now know that theproper task to be imposed on listeners should not be to decide whether thestimulus is a statement (or a command) versus a question. If binaryresponse alternatives are required, then the categories should be ‘statement’versus ‘no statement’ but a better procedure would be to ask the listener torespond by choosing from three categories: ‘statement’ (equivalent to‘command’ in our experiments ~ ‘continuation’ ~ ‘question’. Had suchprecautions been taken by Remijsen & van Heuven, their categoryboundary would have been much better defined with less listener-individualvariation.

Methodologically, we argue that the classical identification-cum-discrimination paradigm is a useful diagnostic tool in intonation researchwhich allows linguists to decide experimentally whether a melodic contrastis categorical and therefore part of the phonology, or continuously gradientand therefore phonetic or even paralinguistic.

Notes

1 The experiments reported in this chapter were run by Susanne Strik and Josien

Klink in partial fulfillment of the course requirements for the ExperimentalPhonetics Seminar taught by the Linguistics Programme at University ofLeiden.

2 This commutation procedure is best viewed as a mental experiment; when theexchange is implemented through actual digital tape splicing, the result is moreoften than not an uninterpretable stream of sound.

3 The nature of the distinction between intonational categories is problematic fora further reason: inter-listener agreement on the identity of intonational eventsis low (Pitrelli et al., 1994), particularly in comparison with the self-evidentconsensus on segmental distinctions. This lack of consistency has lead Taylor(1998) to reject a basic principle of (intonational) phonology, namely itscategorical nature. With respect to methodology, researchers tend to act asexpert listeners, linking contours that sound distinct to pragmatic meaning in an

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 57

intuitive fashion. Accordingly, inter-researcher agreement may be low, too (e.g.Caspers, 1998).

4 Nevertheless, large between-listener variability has been reported, for instance, inthe cuing of the voiced/voiceless contrast by the duration of the pre-burst silentinterval: the boundary was at 70 ms for subject #1 and over 100 ms for subject #7(Slis & Cohen, 1969). These results are commented on by Nooteboom & Cohen(1976: 84) as follows: ‘Although the cross-over from /d/ to /t/ proceeds rathergradually when averaged over all listeners, the boundary is quite sharply defined forindividual listeners’ (my translation, VH).

5 The ‘%’ sign following the tone letter (as in ‘L%’, ‘H%’) denotes a domain-final boundary; domain-initial boundaries are coded by the ‘%’ sign precedinga tone letter (as in ‘%L’, ‘%H’). A ‘%’ sign unaccompanied by a tone lettermay only occur in domain-final positions, where it is phonetically coded by aphysical pause and/or pre-boundary lengthening only.

6 It has been argued by structuralists at least as far back as Merckens (1960) thatV1 (‘verb first’) is directly opposed to V2 ('verb second') in signaling, forexample, ‘non-assertion’ rather than ‘assertion’, since neither a command nor aquestion nor a condition expresses an ongoing state of affairs.

7 A sequence like Neemt u de trein naar Wageningen might in addition beinterpretable as a topic-drop-sentence (e.g. [Dan/Daar] neemt u de trein naarWageningen ‘[Then/There] you take the train to Wageningen’, analogous toDoen we! ‘We'll do [it]’ or Weet ik! ‘[That] I know’. Although this addedinterpretation (with a ‘deleted’ element) is theoretically possible, we believethat it was highly unlikely under the controlled conditions of the experiment.Furthermore, none of the experimental subjects volunteered the informationthat we had forgotten such an extra interpretation.

8 This position does not exclude the possibility that statement and imperative aresubtly different in their paralinguistic use of prosody. For instance, the overallpitch of the imperative may be lower, and it may be said with greater loudnessand larger/higher pitch excursions on the accented syllables. This does notinvalidate our claim that both statements and imperatives are coded by the ‘L%’terminal boundary.

9 The ERB scale (Equivalent Rectangular Bandwidth) is currently held to be themost satisfactory psychophysical conversion for pitch intervals in humanspeech (Hermes & van Gestel, 1991; Ladd & Terken, 1995). The conversionfrom Hertz (f) to ERB (E) is achieved by a simple formula: E = 16.6 * log (1 + f/ 165.4).

58 Vincent J. van Heuven

References

Boersma, P. and Heuven, V.J. van (2001). Speak and unSpeak with Praat. GlotInternational, 5: 341-347.

Boersma, P. and Weenink, D. (1996). Praat, a System for Doing Phonetics byComputer. Report of the Institute of Phonetic SciencesAmsterdam, 132.

Caspers, J. (1998). ‘Who’s Next? The Melodic Marking of Question vs.Continuation in Dutch. Language and Speech, 41: 375-398.

Gussenhoven, C., Rietveld, T. and Terken, J.M.B. (1999). Transcription ofDutch Intonation. http://lands.let.kun. nl/todi.

Haan, J. (2002). Speaking of questions. An Exploration of Dutch QuestionIntonation. LOT Dissertation Series, nr. 52, Utrecht: LOT.

Hart, J. 't, Collier, R. and Cohen, A. (1990). A Perceptual Study of Intonation.An Experimental-phonetic Approach to Speech Perception.Cambridge: Cambridge University Press.

Hermes, D.J. and Gestel, J.C. van (1991). ‘The Frequency Scale of SpeechIntonation. Journal of the Acoustical Society of America, 90: 97-102.

Heuven, V.J. van (1986). Some acoustic characteristics and perceptualconsequences of foreign accent in Dutch spoken by Turkishimmigrant workers. In: J. van Oosten, J.F. Snapper (eds.) DutchLinguistics at Berkeley, papers presented at the Dutch LinguisticsColloquium held at the University of California, Berkeley onNovember 9th, 1985, Berkeley: The Dutch Studies Program, U.C.Berkeley, 67-84.

Heuven, V.J. van and Kirsner, R.S. (2002). Interaction of tone and particle inthe signaling of clause type in Dutch. In: H. Broekhuis, P. Fikkert(eds.). Linguistics in the Netherlands 2002, Amsterdam/Philadelphia: John Benjamins, 73-84.

Kirsner, R.S. and Heuven, V.J. van (1996). Boundary Tones and the Semanticsof the Dutch Final Particles hè, hoor, zeg and joh. In: M. denDikken, C. Cremers, eds., Linguistics in the Netherlands 1996,Amsterdam/Philadelphia: John Benjamins, 133-146.

Kirsner, R.S., Heuven, V.J. van, and Caspers, J. (1998). From Request toCommand: An Exploratory Experimental Study of GrammaticalForm, Intonation, and Pragmatic Particle in Dutch Imperatives.In: R. van Bezooijen, R. Kager, eds., Linguistics in theNetherlands 1998. Amsterdam/Philadelphia: John Benjamins,135-148.

Boundary Tones in Dutch: Phonetic or Phonological Contrasts? 59

Ladd, D.R. (1996). Intonational phonology. Cambridge: Cambridge UniversityPress.

Ladd, D.R. and Morton, R. (1997). The perception of intonational emphasis:continuous or categorical? Journal of Phonetics, 25: 313-342.

Ladd, D.R. and Terken, J.M.B. (1995). Modelling intra- and inter-speaker pitchrange variation. Proceedings of the 13th International Congress ofPhonetic Sciences, Stockholm, 2: 386-389.

Merckens, P.J. (1960). De plaats van de persoonsvorm: een verwaarloosd code-teken [The position of the finite verb: a neglected code sign]. Denieuwe taalgids, 53: 248-54.

Moulines, E. and Verhelst, E. (1995). ‘Time-domain and frequency-domaintechniques for prosodic modification of speech’. In: W.B. Kleijnand K.K. Paliwal, eds., Speech coding and synthesis. Amsterdam:Elsevier Science, 519-555.

Nooteboom, S.G. and Cohen, A. (1976). Spreken en verstaan. Een inleiding totde experimentele fonetiek [Speaking and understanding. Anintroduction to experimental phonetics], Assen: van Gorcum.

Pitrelli, J.F., Beckman, M.E. and Hirschberg, J. (1994). Evaluation of prosodictranscription reliability in the ToBI framework. Proceedings ofthe 3rd International Conference on Spoken LanguageProcessing, Yokohama, 1: 123-126.

Remijsen, A.C. and Heuven, V.J. van (1999). Gradient and categorical pitchdimensions in Dutch: Diagnostic test’. Proceedings of the 14thInternational Congress of Phonetic Sciences, San Francisco,1865-1868.

Remijsen, A.C. and Heuven, V.J. van (2003). Linguistic versus paralinguisticstatus of prosodic contrasts, the case of high and low pitch inDutch. In: J.M. van de Weijer, V.J. van Heuven, H.G. van derHulst (eds.): The phonological spectrum. Volume II:Suprasegmental structure. Current Issues in Linguistic Theory nr.235. Amsterdam/Philadelphia: John Benjamins, 225-246.

Rietveld, A.C.M. and Heuven, V.J. van (2001). Algemene Fonetiek [GeneralPhonetics]. Bussum: Coutinho.

Slis, I.H. and Cohen, A. (1969). On the complex regulating the voiced-voiceless distinction, Language and Speech, 80-102: 137-155.

Taylor, P. (1998). Analysis and synthesis of intonation using the TILT model.Unpublished manuscript, Centre for Speech TechnologyResearch, University of Edinburgh.

The Position of Frisian in the Germanic LanguageArea

Charlotte Gooskens and Wilbert Heeringa

1. Introduction

Among the Germanic varieties the Frisian varieties in the Dutch provinceof Friesland have their own position. The Frisians are proud of theirlanguage and more than 350,000 inhabitants of the province of Frieslandspeak Frisian every day. Heeringa (2004) shows that among the dialects inthe Dutch language area the Frisian varieties are most distant with respectto standard Dutch. This may justify the fact that Frisian is recognized as asecond official language in the Netherlands. In addition to Frisian, in sometowns and on some islands a mixed variety is used which is an intermediateform between Frisian and Dutch. The variety spoken in the Frisian towns isknown as Town Frisian1.

The Frisian language has existed for more than 2000 years. Geneticallythe Frisian dialects are most closely related to the English language.However, historical events have caused the English and the Frisianlanguage to diverge, while Dutch and Frisian have converged. Thelinguistic distance to the other Germanic languages has also altered in thecourse of history due to different degrees of linguistic contact. As a resulttraditional genetic trees do not give an up-to-date representation of thedistance between the modern Germanic languages.

In the present investigation we measured linguistic distances betweenFrisian and the other Germanic languages in order to get an impression ofthe effect of genetic relationship and language contact for the position ofthe modern Frisian language on the Germanic language map. We includedsix Frisian varieties and one Town Frisian variety in the investigation.Furthermore, eight Germanic standard languages were taken into account.Using this material, we firstly wished to obtain a hierarchical classificationof the Germanic varieties. From this classification the position of (Town)

62 Charlotte Gooskens and Wilbert Heeringa

Frisian became clear. Secondly, we ranked all varieties with respect to eachof the standard Germanic languages as well as to (Town) Frisian. Therankings showed the position of (Town) Frisian with respect to the standardlanguages and the position of the standard languages with respect to(Town) Frisian.

In order to obtain a classification of varieties and establish rankings, weneeded a tool that can measure linguistic distances between the varieties.Bolognesi and Heeringa (2002) investigated the position of Sardiniandialects with respect to different Romance languages using the Levenshteindistance, an algorithm with which distances between word pronunciationsare calculated. In our investigation we used the same methodology.

In Section 2, we will present the traditional ideas about the geneticrelationship between the Germanic languages and discuss the relationshipbetween Frisian and the other Germanic languages. At the end of thesection we will discuss the expected outcome of the linguistic distancemeasurements between Frisian and the other Germanic languages. InSection 3 the data sources are described and in Section 4 the method formeasuring linguistic distances between the language varieties is presented.The results are presented in Section 5, the discussion of which is presentedin Section 6.

2. Frisian and the Germanic languages

2.1. History and classification of the Germanic languages2

The Germanic branch of the Indo-European languages has a large numberof speakers, approximately 450 million native speakers, partly due to thecolonization of many parts of the world. However, the number of differentlanguages within the Germanic group is rather limited. Depending on thedefinition of what counts as a language there are about 12 differentlanguages. Traditionally, they are divided into three subgroups: EastGermanic (Gothic, which is no longer a living language), North Germanic(Icelandic, Faeroese, Norwegian, Danish, and Swedish), and WestGermanic (English, German, Dutch, Afrikaans, Yiddish, and Frisian).Some of these languages are so similar that they are only consideredindependent languages because of their position as standardized languages

The Position of Frisian in the Germanic Language Area 63

spoken within the limits of a state. This goes for the languages of theScandinavian countries, Swedish, Danish and Norwegian, which aremutually intelligible. Other languages consist of dialects which are in factso different that they are no longer mutually intelligible but are stillconsidered one language because of standardization. Northern and southernGerman dialects are an example of this situation.

Figure 3. The genetic tree of Germanic languages.

In Figure 1, a traditional Germanic genetic tree is shown. We constructedthis tree on the basis of data in the literature. The tree gives just a roughdivision, and linguistic distances should not be derived from this tree. It iscommonly assumed that the Germanic languages originate from thesouthern Scandinavian and the northern German region. After the migrationof the Goths to the Balkans towards the end of the pre-Christian era, North-West Germanic remained uniform till the 5th century AD, after which asplit between North and West Germanic occurred owing to dialectalvariation and the departure of the Anglo-Saxons from the Continent and thecolonization of Jutland.

During the Viking Age, speakers of North Germanic settled in a largegeographic area, which eventually led to the five modern languages (seeabove). Of these languages, Icelandic (and to a lesser degree Faeroese),which is based on the language of southwestern Norway where the settlerscame from, can be considered the most conservative language (Sandøy,1994). Of the three mainland Scandinavian languages, Danish has moved

64 Charlotte Gooskens and Wilbert Heeringa

farthest away from the common Scandinavian roots due to influences fromthe south.

The parentage of the West Germanic languages is less clear. Differenttribal groups representing different dialect groups spread across the area,which eventually resulted in the modern language situation. HistoricallyFrisian and English both belong to the Ingwaeonic branch of the WestGermanic language group. Originally the Frisian speech communityextended from the present Danish-German border along the coast to theFrench-Belgian border in the south. However, expansion from Saxons andFranconians from the east and the south throughout the medieval periodresulted in a loss of large Frisian areas and a division into three mutuallyintelligible varieties: West Frisian (spoken in the northern Dutch provinceof Friesland by more than 350,000 people), East Frisian or Saterlandic(spoken by a thousand speakers in three villages west of Bremen) andNorth Frisian (spoken by less than ten thousand people on the islands onthe north-western coast of Germany).

The English language came into being as a result of immigrations oftribal Anglo-Saxon groups from the North Sea coast during the fifth andsixth centuries. Whereas other insular Germanic varieties are in generalrather conservative, the English insularity lacked this conservatism. Englishis considered most closely related to Frisian on every linguistic level due totheir common ancestorship and to continued language contact over theNorth Sea.

The German language is spoken in many European countries in a largenumber of dialects and varieties, which can be divided into Low Germanand High German. Yiddish, too, can be regarded as a German variety.Dutch is mainly based on the western varieties of the low Franconian areabut low Saxon and Frisian elements are also found in this standardlanguage. Scholars disagree about the precise position of Dutch and LowGerman in the language tree. They can be traced back to a common rootoften referred to as the Ingwaeonic language group, but are often groupedtogether with High German as a separate West Germanic group. Thisgrouping with High German might be the best representation of the modernlanguage situation given that the individual dialects spoken in the area infact form a dialect continuum. Afrikaans, finally, is a contemporary WestGermanic language, developed from seventeenth century Dutch as a resultof colonization, but with influences from African languages.

The Position of Frisian in the Germanic Language Area 65

2.2. The relationship between Frisian and the other Germanic languages.

This short outline of the relationships among the Germanic languagesshows that English is the language which is genetically closest to Frisian,and still today English is considered to be most similar to Frisian. Forexample The Columbia Encyclopedia (2001) says: “Of all foreignlanguages, [Frisian] is most like English”. Pei (1966, p. 34) summarizes thesituation as follows: “Frisian, a variant of Dutch spoken along the Dutchand German North Sea coast, is the foreign speech that comes closest tomodern English, as shown by the rhyme: ‘Good butter and good cheese isgood English and good Fries’”. This rhyme refers to the fact that the wordsfor butter and cheese are almost the same in the two languages. However,in the course of history, contact with other Germanic languages has causedFrisian to converge to these languages. The Frisians have a long history oftrade and in early medieval times they were one of the leading tradingnations in Europe due to their strategic geographic position close to majortrade routes along the rivers and the North Sea. Also, the Vikings and theEnglish were frequent visitors of the Frisian language area. This intensivecontact with both English and the North Germanic languages, especiallyDanish, resulted in linguistic exchanges (see Feitsma, 1963; Miedema,1966; Wadstein, 1933). Later in history, the Frisian language wasespecially influenced by the Dutch language (which itself contains manyFrisian elements). For a long period, Frisian was stigmatized as a peasantlanguage and due to the weak social position of the Frisian language in theDutch community it was often suppressed, resulting in a strong Dutchimpact on the Frisian language. Nowadays, Dutch as the language of theadministration still has a large influence on the media and there has beensubstantial immigration of Dutch speaking people to Friesland. However,the provincial government has decided to promote Frisian at all levels inthe society.

When investigating the position of the Frisian language within theGermanic language group, there are clearly two forces which should betaken into account. On the one hand, Frisian and English are geneticallyclosely related and share sound changes which do not occur in the otherGermanic languages. This yields the expectation that the linguistic distancebetween these two languages is relatively small. On the other hand, theclose contact with Dutch makes it plausible that the Dutch and the Frisianlanguages have converged. Also the distance to Danish might be smallerthan expected from the traditional division of Germanic into a North

66 Charlotte Gooskens and Wilbert Heeringa

Germanic and a West Germanic branch at an early stage because of theintensive contacts in the past.

3. Data sources

In this section, we will first give a short characterization of the languagevarieties and the speakers who were recorded for our investigation. Next,we will present the nature of the recordings and the transcriptions whichformed the basis for linguistic distance measurements.

3.1. Language varieties

Since our main interest was the Frisian language and its linguistic positionwithin the Germanic language group we wished to represent this languageas well as possible. For this reason, we included seven Frisian varieties,spread over the Frisian language area. Furthermore, our material containedeight Germanic standard languages. First, we will describe the Frisianvarieties and next the standard languages.

As far as the Frisian varieties are concerned, we chose varieties fromdifferent parts of the province, both from the coastal area and from theinland. The varieties are spoken in different dialect areas according to thetraditional classification (see below) and they represent different stages ofconservatism. The precise choice of the seven varieties was determined byspeaker availability for recordings in our vicinity and at the FryskeAkademy in Leeuwarden. In Figure 2, the geographical position of theseven Frisian language varieties in the province of Friesland is shown.

Due to the absence of major geographical barriers, the Frisian languagearea is relatively uniform. The major dialectal distinctions are primarilyphonological. Traditionally, three main dialect areas are distinguished (seee.g. Hof, 1933; Visser, 1997): Klaaifrysk (clay Frisian) in the west,Wâldfrysk (forest Frisian) in the east and Súdwesthoeksk (southwestquarter) in the southwest. In our material Klaaifrysk is represented by thedialects of Oosterbierum and Hijum, Wâldfrysk by Wetsens andWestergeest, and Súdwesthoeksk by Tjerkgaast. Hindeloopen is in the areaof Súdwesthoeksk. However, this dialect represents a highly conservativearea. The phonological distance between Hindeloopen and the maindialects is substantial (van der Veen, 2001). Finally, our material contains

The Position of Frisian in the Germanic Language Area 67

the variety spoken in Leeuwarden (see note 1). This is an example of TownFrisian, which is also spoken in other cities of Friesland. Town Frisian is aDutch dialect strongly influenced by Frisian but stripped of the mostcharacteristic Frisian elements (Goossens, 1977).

Hijum

Hindeloopen

LeeuwardenOosterbierum

Tjerkgaast

Westergeest

Wetsens

Figure 2. The geographical position of the seven Frisian language varieties in theprovince of Friesland.

In addition to the Frisian dialects, the following eight standard languageswere included: Icelandic, Faroese, Norwegian, Swedish, Danish, English,Dutch, and German. We had meant to include all standard Germaniclanguages in our material. However, due to practical limitations a fewsmaller languages were not included.

As for Norwegian, there is no official standard variety. The varietiesspoken around the capital of Oslo in the southeast, however, are oftenconsidered to represent the standard language. We based the presentinvestigation on prior research on Norwegian dialects (see Heeringa andGooskens, 2003; Gooskens and Heeringa, submitted), and we chose therecording which to Norwegians sounded most standard, namely theLillehammer recording3. It was our aim to select standard speakers from allcountries, but it is possible that the speech of some speakers contains slightregional influences. The speakers from Iceland, the Faroe Islands and

68 Charlotte Gooskens and Wilbert Heeringa

Sweden spoke the standard varieties of the capitals. The Danish speakercame from Jutland, the German speaker from Kiel, the English speakerfrom Birmingham and the Dutch speaker had lived at different places in theNetherlands, including a long period in the West during adolescence.

3.2. Phonetic transcriptions

The speakers all read aloud translations of the same text, namely the fable‘The North Wind and the Sun’. This text has often been used for phoneticinvestigations; see for example The International Phonetic Association(1949 and 1999) where the same text has been transcribed in a largenumber of different languages. A database of Norwegian transcriptions ofthe same text has been compiled by J. Almberg (see note 3). As mentionedin the previous section, we only used the transcription of Lillehammer fromthis database. In future, we would like to investigate the relations betweenNorwegian and other Germanic varieties, using the greater part of thetranscriptions in this database. Therefore, our new transcriptions should beas comparable as possible with the existing Norwegian ones. To ensurethis, our point of departure was the Norwegian text. This text consists of 91words (58 different words) which were used to calculate Levenshteindistances (see Section 4). The text was translated word for word fromNorwegian into each of the Germanic language varieties. We are aware ofthe fact that this may result in less natural speech: sentences were oftensyntactically wrong. However, it guarantees that for each of the 58 words atranslation was obtained. The words were not recorded as a word list, but assentences. Therefore in the new recordings words appear in a similarcontext as in the Norwegian varieties. This ensures that the influence ofassimilation phenomena on the results is as comparable as possible.

Most new recordings were transcribed phonetically by one of theauthors. To ensure consistency with the existing Norwegian transcriptions,our new transcriptions were corrected by J. Almberg, the transcriber of theNorwegian recordings. In most cases we incorporated the corrections. Thetranscription of the Faroese language was completely done by J. Almberg.The transcriptions were made in IPA as well as in X-SAMPA (eXtendedSpeech Assessment Methods Phonetic Alphabet). This is a machine-readable phonetic alphabet, which is also readable by people. Basically, itmaps IPA-symbols to the 7 bit printable ASCII/ANSI characters4. The

The Position of Frisian in the Germanic Language Area 69

transcriptions were used to calculate the linguistic distances betweenvarieties (see Section 4).

4. Measuring distances between varieties

In 1995 Kessler introduced the use of the Levenshtein distance as tool formeasuring linguistic distances between language varieties. The Levenshteindistance is a string edit distance measure and Kessler applied this algorithmto the comparison of Irish dialects. Later on, this approach was applied byNerbonne, Heeringa, Van den Hout, Van der Kooi, Otten, and Van de Vis(1996) to Dutch dialects. They assumed that distances between all possiblepairs of segments are the same. E.g. the distance between an [�] and an [e]is the same as the distance between the [�] and [�]. Both Kessler (1995) andNerbonne and Heeringa (1997) also experimented with more refinedversions of the Levenshtein algorithm in which gradual segment distanceswere used which were found on the basis of the feature systems ofHoppenbrouwers (1988) and Vieregge et. al. (1984).

In this paper we use an implementation of the Levenshtein distance inwhich sound distances are used which are found by comparingspectrograms. In Section 4.1 we account for the use of spectral distancesand explain how we calculate them. Comparisons are made on the basis ofthe audiotape The Sounds of the International Phonetic Alphabet (Wellsand House, 1995). In Section 4.2 we describe the Levenshtein distance andexplain how spectral distances can be used in this algorithm.

4.1. Gradual segment distances

When acquiring language, children learn to pronounce sounds by listeningto the pronunciation of their parents or other people. The acoustic signalseems to be sufficient to find the articulation which is needed to realize thesound. Acoustically, speech is just a series of changes in air pressure,quickly following each other. A spectrogram is a “graph with frequency onthe vertical axis and time on the horizontal axis, with the darkness of thegraph at any point representing the intensity of the sound” (Trask, 1996, p.328).

In this section we present the use of spectrograms for finding segmentdistances. Segment distances can also be found on the basis of phonological

70 Charlotte Gooskens and Wilbert Heeringa

or phonetic feature systems. However, we prefer the use of acousticrepresentations since they are based on physical measurements. In Potter,Kopp and Green’s (1947) Visible Speech, spectrograms are shown for allcommon English sounds (see pp. 54-56). Looking at the spectrograms wealready see which sounds are similar and which are not. We assume thatvisible (dis)similarity between spectrograms reflects perceptual(dis)similarity between segments to some extent. In Figure 3 thespectrograms of some sounds are shown as pronounced by John Wells onthe audiotape The Sounds of the International Phonetic Alphabet (Wellsand House, 1995). The spectrograms are made with the computer programPRAAT5.

Figure 3. Spectrograms of some sounds pronounced by John Wells. Upper the [i](left) and the [e] (right) are shown, and lower the [p] (left) and the [s](right) are visualized.

4.1.1. Samples

For finding spectrogram distances between all IPA segments we needsamples of one or more speakers for each of them. We found the sampleson the tape The Sounds of the International Phonetic Alphabet on which all

The Position of Frisian in the Germanic Language Area 71

IPA sounds are pronounced by John Wells and Jill House. On the tape thevowels are pronounced in isolation. The consonants are sometimespreceded, and always followed by an [a]. We cut out the part preceding the[a], or the part between the [a]’s. We realize that the pronunciation ofsounds depends on their context. Since we use samples of vowelspronounced in isolation and samples of consonants selected from a limitedcontext, our approach is a simplification of reality. However, Stevens(1998, p. 557) observes that

“by limiting the context, it was possible to specify rather precisely thearticulatory aspects of the utterances and to develop models for estimatingthe acoustic patterns from the articulation”.

The burst in a plosive of the IPA inventory is always preceded by a periodof silence (voiceless plosives) or a period of murmur (voiced plosives).When a voiceless plosive is not preceded by an [a], it is not clear how longthe period of silence which really belongs to the sounds lasts. Therefore wealways cut out each plosive in such a way that the time span from thebeginning to the middle of the burst is equal to 90 ms. Among the plosiveswhich were preceded by an [a] or which are voiced (so that the real time ofthe start-up phase can be found) we found no sounds with a period ofsilence or murmur which was clearly shorter than 90 ms.

In voiceless plosives, the burst is followed by an [h]-like sound beforethe following vowel starts. A consequence of including this part in thesamples is that bursts often do not match when comparing two voicelessplosives. However, since aspiration is a characteristic property of voicelesssounds, we retained aspiration in the samples. In general, when comparingtwo voiced plosives, the bursts match. When comparing a voiceless plosiveand a voiced plosive, the bursts do not match.

To keep trills comparable to each other, we always cut three periods,even when the original samples contained more periods. When there weremore periods, the most regular looking sequence of three periods was cut.

The Levenshtein algorithm also requires a definition of ‘silence’. To geta sample of ‘silence’ we cut a small silent part on the IPA tape. Thisassures that silence has approximately the same background noise as theother sounds.

To make the samples as comparable as possible, all vowel and extractedconsonant samples are monotonized on the mean pitch of the 28concatenated vowels. The mean pitch of John Wells was 128 Hertz; the

72 Charlotte Gooskens and Wilbert Heeringa

mean pitch of Jill House was 192 Hertz. In order to monotonize thesamples the pitch contours were changed to flat lines. The volume was notnormalized because volume contains too much segment specificinformation. For example it is specific for the [v] that its volume is greaterthan that of the [f].

4.1.2. Acoustic representation

In the most common type of spectrogram the linear Hertz frequency scale isused. The difference between 100 Hz and 200 Hz is the same as thedifference between 1000 Hz and 1100 Hz. However, our perception offrequency is non-linear. We hear the difference between 100 Hz and 200Hz as an octave interval, but also the difference between 1000 Hz and 2000Hz is perceived as an octave. Our ear evaluates frequency differences notabsolutely, but relatively, namely in a logarithmic manner. Therefore, in theBarkfilter, the Bark-scale is used which is roughly linear below 1000 Hzand roughly logarithmic above 1000 Hz (Zwicker and Feldtkeller, 1967).

In the commonly used type of spectrogram the power spectral density isrepresented per frequency per time. The power spectral density is the powerper unit of frequency as a function of the frequency. In the Barkfilter thepower spectral density is expressed in decibels (dB’s). “The decibel scale isa way of expressing sound amplitude that is better correlated withperceived loudness” (Johnson, 1997, p. 53). The decibel scale is alogarithmic scale. Multiplying the sound pressure ten times corresponds toan increase of 20 dB. On a decibel scale intensities are expressed relative tothe auditory threshold. The auditory threshold of 0.00002 Pa correspondswith 0 dB (Rietveld and Van Heuven, 1997, p. 199).

A Barkfilter is created from a sound by band filtering in the frequencydomain with a bank of filters. In PRAAT the lowest band has a centralfrequency of 1 Bark per default, and each band has a width of 1 Bark.There are 24 bands, corresponding to the first 24 critical bands of hearingas found along the basilar membrane (Zwicker and Fastl, 1990). A criticalband is an area within which two tones influence each other’s perceptibility(Rietveld and Van Heuven, 1997). Due to the Bark-scale the higher bandssummarize a wider frequency range than the lower bands.

In PRAAT we used the default settings when using the Barkfilter. Thesound signal is probed each 0.005 seconds with an analysis window of0.015 seconds. Other settings may give different results, but since it was

The Position of Frisian in the Germanic Language Area 73

not a priori obvious which results are optimal, we restricted ourselves to thedefault settings. In Figure 4 Barkfilters for some segments are shown.

Figure 4. Barkfilter spectrograms of some sounds pronounced by John Wells.Upper the [i] (left) and the [e] (right) are shown, and lower the [p] (left) and the [s] (right) are visualized.

4.1.3. Comparison

In this section, we explain the comparison of segments in order to getdistances between segments that will be used in the Levenshtein distancemeasure. In a Barkfilter, the intensities of frequencies are given for a rangeof times. A spectrum contains the intensities of frequencies at one time.The smaller the time step, the more spectra there are in the acousticrepresentation. We consistently used the same time step for all samples.

It appears that the duration of the segment samples varies. This may beexplained by variation in speech rate. Duration is also a sound-specificproperty. E.g., a plosive is shorter than a vowel. The result is that thenumber of spectra per segment may vary, although for each segment thesame time step was used. Since we want to normalize the speech rate andregard segments as linguistic units, we made sure that two segments get thesame number of spectra when they are compared to each other.

74 Charlotte Gooskens and Wilbert Heeringa

When comparing one segment of m spectra with another segment of nspectra, each of the m elements is duplicated n times, and each of the nelements is duplicated m times. So both segments get a length of m × n.

In order to find the distance between two sounds, the Euclidean distanceis calculated between each pair of corresponding spectra, one from each ofthe sounds. Assume a spectrum e1 and e2 with n frequencies, then theEuclidean distance is:

Equation 1. Euclidean distance

The distance between two segments is equal to the sum of the spectrumdistances divided by the number of spectra. In this way we found that thegreatest distance occurs between the [a] and ‘silence’. We regard thismaximum distance as 100%. Other segment distances are divided by thismaximum and multiplied by 100. This yields segment distances expressedin percentages. Word distances and distances between varieties which arebased on them may also be given in terms of percentages.

In perception, small differences in pronunciation may play a relativelystrong role in comparison with larger differences. Therefore we usedlogarithmic segment distances. The effect of using logarithmic distances isthat small distances are weighed relatively more heavily than largedistances. Since the logarithm of 0 is not defined, and the logarithm of 1 is0, distances are increased by 1 before the logarithm is calculated. To obtainpercentages, we calculate ln(distance + 1) / ln(maximum distance + 1).

4.1.4. Suprasegmentals and diacritics

The sounds on the tape The Sounds of the International Phonetic Alphabetare pronounced without suprasegmentals and diacritics. However, arestricted set of suprasegmentals and diacritics can be processed in oursystem.

Length marks and syllabification are processed by changing thetranscription beforehand. In the X-SAMPA transcription, extra-short

The Position of Frisian in the Germanic Language Area 75

segments are kept unchanged, sounds with no length indication aredoubled, half long sounds are trebled, and long sounds are quadrupled.Syllabic sounds are treated as long sounds, so they are quadrupled.

When processing the diacritics voiceless and/or voiced, we assume that avoiced voiceless segment (e.g. [��]) and a voiceless voiced segment (e.g. [d �])are intermediate pronunciations of a voiceless segment ([t]) and a voicedsegment ([d]). Therefore we calculate the distance between a segment x anda voiced segment y as the average of the distance between x and y and thedistance between x and the voiced counterpart of y. Similarly, the distancebetween a segment x and a voiceless segment y is calculated as the mean ofthe distance between x and y and the distance between x and the voicelesscounterpart of y. For voiced sounds which have no voiceless counterpart(the sonorants), or for voiceless sounds which have no voiced counterpart(the glottal stop) the sound itself is used.

The diacritic apical is only processed for the [s] and the [z]. Wecalculate the distance between [s �] and e.g. [f] as the average of the distancebetween [s] and [f] and [�] and [f]. Similarly, the distance between [z�] ande.g. [v] is calculated as the mean of [z] and [v] and [�] and [v].

The thought behind the way in which the diacritic nasal is processed isthat a nasal sound is more or less intermediate between its non-nasalversion and the [n]. We calculate the distance between a segment x and anasal segment y as the average of the distance between x and y and thedistance between x and [n].

4.2. Levenshtein distance

Using the Levenshtein distance, two dialects are compared by comparingthe pronunciation of a word in the first dialect with the pronunciation of thesame word in the second. It is determined how one pronunciation ischanged into the other by inserting, deleting or substituting sounds.Weights are assigned to these three operations. In the simplest form of thealgorithm, all operations have the same cost, e.g. 1. Assume afternoon ispronounced as [���t�n��n] in the dialect of Savannah, Georgia, and as[��� �����] in the dialect of Lancaster, Pennsylvania6. Changing onepronunciation into the other can be done as in table 1 (ignoringsuprasegmentals and diacritics for this moment)7:

76 Charlotte Gooskens and Wilbert Heeringa

Table 1. Changing one pronunciation into another using a minimal set ofoperations.

æ�ft�n�n delete � 1æft�n�n insert r 1æft�rn�n subst. �/u 1æft�rnun⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

3

In fact many sequence operations map [�æ�ft��n��n] to [�æft�r�nu�n]. Thepower of the Levenshtein algorithm is that it always finds the cost of thecheapest mapping.

Comparing pronunciations in this way, the distance between longerpronunciations will generally be greater than the distance between shorterpronunciations. The longer the pronunciation, the greater the chance fordifferences with respect to the corresponding pronunciation in anothervariety. Because this does not accord with the idea that words are linguisticunits, the sum of the operations is divided by the length of the longestalignment which gives the minimum cost. The longest alignment has thegreatest number of matches. In our example we have the followingalignment:

Table 2. Alignment which gives the minimal cost. The alignment correspondswith table 1.

æ � f t � n � næ f t � r n u n⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯

1 1 1

The total cost of 3 (1+1+1) is now divided by the length of 9. This gives aword distance of 0.33 or 33%.

In Section 3.1.3 we explained how distances between segments can befound using spectrograms. This makes it possible to refine our Levenshteinalgorithm by using the spectrogram distances as operation weights. Nowthe cost of insertions, deletions and substitutions is not always equal to 1,but varies, i.e., it is equal to the spectrogram distance between the segment

The Position of Frisian in the Germanic Language Area 77

and ‘silence’ (insertions and deletions) or between two segments(substitution).

To reckon with syllabification in words, the Levenshtein algorithm isadapted so that only a vowel may match with a vowel, a consonant with aconsonant, the [j] or [w] with a vowel (or opposite), the [i] or [u] with aconsonant (or opposite), and a central vowel (in our research only theschwa) with a sonorant (or opposite). In this way unlikely matches (e.g. a[p] with a [a]) are prevented.

In our research we used 58 different words. When a word occurred inthe text more than once, the mean over the different pronunciations wasused. So when comparing two dialects we get 58 Levenshtein distances.Now the dialect distance is equal to the sum of 58 Levenshtein distancesdivided by 58. When the word distances are presented in terms ofpercentages, the dialect distance will also be presented in terms ofpercentages. All distances between the 15 language varieties are arrangedin a 15 × 15 matrix.

5. Results

The results of the Levenshtein distance measurements are analyzed in twoways. First, on the basis of the distance matrix we applied hierarchicalcluster analysis (see Section 5.1). The goal of clustering is to identify themain groups. The groups are called clusters. Clusters may consist ofsubclusters, and subclusters may in turn consist of subsubclusters, etc. Theresult is a hierarchically structured tree in which the dialects are the leaves(Jain and Dubes, 1988). Several alternatives exist. We used the UnweightedPair Group Method using Arithmetic averages (UPGMA), sincedendrograms generated by this method reflected distances which correlatedmost strongly with the original Levenshtein distances (r=0.9832), see Sokaland Rohlf (1962).

Second, we ranked all varieties in order of relationship with the standardlanguages, Frisian and Town Frisian (see Section 5.2). When ranking withrelation to Frisian, we looked at the average over all Frisian dialects. Sincethe ratings with respect to each of the Frisian varieties individually werevery similar averaging was justified.

78 Charlotte Gooskens and Wilbert Heeringa

5.1. The classification of the Germanic languages

Looking at the clusters of language varieties in Figure 5 we note that ourresults reflect the traditional classification of the Germanic languages to alarge extent (see Figure 1). On the highest level there is a division betweenEnglish and the other Germanic languages. When we examine the group ofother Germanic languages, we find a clear division between the NorthGermanic languages and the West Germanic languages. Within the NorthGermanic group, we see a clear division between the Scandinavianlanguages (Danish, Norwegian and Swedish) on the one hand and theFaroese and Icelandic on the other hand. In the genetic tree (see Figure 1),Norwegian is clustered with Icelandic and Faroese. However, due to theisolated position of Iceland and the Faroes and intensive language contactbetween Norway and the rest of Scandinavia, modern Norwegian hasbecome very similar to the modern languages of Denmark and Sweden. Allvarieties spoken in the Netherlands, including the Frisian varieties, clustertogether, and German clusters more closely to these varieties than English.

Figure 5. Dendrogram showing the clustering of the 14 language varieties in ourstudy. The scale distance shows average Levenshtein distances inpercentages.

All Frisian dialects form a cluster. This clustering corresponds well withthe traditional classification as sketched in Section 3.1. The dialects ofHijum and Oosterbierum belong to Klaaifrysk and these dialects form acluster. The Wâldfrysk dialects of Westergeest and Wetsens also clustertogether. The Levenshtein distance between the four dialects is small,

The Position of Frisian in the Germanic Language Area 79

ranging from 19.6% between Hijum and Oosterbierum and 23.8% betweenOosterbierum and Westergeest. Also the Súdwesthoeksk dialects,represented by the Tjerkgaast dialect, are rather close to the Klaaifrysk andWâldfrysk dialects (distances between 21.6% and 26.4%). The highlyconservative dialect of Hindeloopen is more deviant from the other dialects(distances between 29.8% and 32.5%) and this is also the case for the TownFrisian dialect of Leeuwarden which is more similar to Dutch (20.3%) thanto Frisian (between 32.3% and 35.8%) which confirms the characterizationof Town Frisian by Kloeke (1927) as ‘Dutch in Frisian mouth’.

5.2. The relationship between Frisian and the other Germanic languages

From Table 3 and 4 it is possible to determine the distance between allGermanic standard languages. We are especially interested in the positionof Frisian within the Germanic language group. For this purpose the meandistance over the 6 Frisian dialects (excluding the dialect of Leeuwardenwhich is considered Dutch) has been added. This makes it possible to treatFrisian as one language. Examining the column which shows the rankingwith respect to Frisian, we find that Dutch is most similar to Frisian (amean distance of 38.7%). Clearly the intensive contact with Dutch duringhistory has had a great impact on the distance between the two languages.Moreover, German appears to be closer to Frisian than any other languageoutside the Netherlands. Looking at the ranking with respect to Dutch, itappears that Town Frisian is most similar (Leeuwarden 20.3%), followedby the Frisian varieties (average of 38.7%). Next, German is most similar,due to common historical roots and continuous contact (a distance of53.3%).

As discussed in the introduction, Friesland has a long history oflanguage contact with the Scandinavian countries, and traces ofScandinavian influences can be found in the Frisian language. The impactof this contact is reflected in our results only to a limited extent.Remarkably, the distances to the mainland Scandinavian languages(Danish, Norwegian and Swedish) are smaller (between 60.7% and 63.3%)than to English (65.3%) even though the Frisian language is geneticallycloser related to English than to Scandinavian (see Section 2.1).

80 Charlotte Gooskens and Wilbert Heeringa

Table 3. Ranked Levenshtein distances in percentages between each of the fiveWest Germanic languages and the other language varieties in theinvestigation.

Frisian Leeuwarden Dutch English German

Dutch 20.3 Leeuw 20.3 Hindel 63.1 Dutch 53.3

Wetsens 32.3 Hindel 37.5 Wetsens 64.4 Leeuw 54.2

Westerg 32.7 Westerg 37.7 Dutch 64.7 Hindel 56.2

Frisian 34.2 Wetsens 38.3 Swedish 64.9 Westerg 56.9

Oosterb 34.3 Tjerkg 38.5 Leeuw 65.1 Oosterb 57.2

Hindel 34.9 Frisian 38.7 Tjerkg 65.2 Tjerkg 57.3

Leeuw 34.2 Tjerkg 35.3 Hijum 38.9 Frisian 65.3 Frisian 57.3

Dutch 38.7 Hijum 35.8 Oosterb 41.3 Hijum 65.8 Hijum 57.5

German 57.3 German 54.2 German 53.3 Westerg 65.8 Wetsens 58.6

Swedish 60.7 Swedish 59.2 Swedish 60.9 Danish 66.7 Swedish 61.0

Norweg 60.9 Norweg 60.0 Norweg 61.4 Faroese 67.1 Danish 63.5

Danish 63.3 Danish 61.1 Danish 63.4 Oosterb 67.2 Norweg 64.0

English 65.3 English 65.1 English 64.7 German 68.1 Faroese 67.1

Faroese 67.7 Faroese 67.5 Faroese 66.1 Norweg 68.6 English 68.1

Icelandic 70.0 Icelandic 69.6 Icelandic 69.2 Icelandic 69.1 Icelandic 68.5

Table 4. Ranked Levenshtein distances in percentages between each of the fiveNorth Germanic languages and the other language varieties in theinvestigation.

Danish Swedish Norwegian Icelandic Faroese

Norweg 43.8 Norweg 43.4 Swedish 43.4 Faroese 54.1 Swedish 53.6

Swedish 47.0 Danish 47.0 Danish 43.8 Swedish 58.7 Icelandic 54.1

Faroese 58.5 Faroese 53.6 Faroese 57.2 Norweg 62.6 Norweg 57.2

Leeuw 61.1 Icelandic 58.7 Westerg 59.6 Danish 62.7 Danish 58.5

Westerg 62.2 Hindel 59.2 Leeuw 60.0 German 68.5 Dutch 66.1

Wetsens 62.3 Leeuw 59.2 Hindel 60.2 Tjerkg 69.1 Hindel 67.0

Icelandic 62.7 Westerg 59.6 Tjerkg 60.6 English 69.1 English 67.1

Hijum 62.9 Tjerkg 60.0 Wetsens 60.7 Dutch 69.2 German 67.1

Frisian 63.3 Frisian 60.7 Frisian 60.9 Leeuw 69.6 Westerg 67.4

Hindel 63.4 Dutch 60.9 Dutch 61.4 Hijum 69.8 Leeuw 67.5

Dutch 63.4 German 61.0 Oosterb 61.9 Frisian 70.0 Tjerkg 67.5

German 63.5 Wetsens 61.1 Hijum 62.6 Wetsens 70.1 Frisian 67.5

Tjerkg 63.8 Oosterb 61.4 Icelandic 62.6 Hindel 70.1 Oosterb 67.7

Oosterb 65.2 Hijum 62.7 German 64.0 Oosterb 70.3 Wetsens 68.1

English 66.7 Icelandic 64.9 English 68.6 Westerg 70.3 Hijum 68.2

The Position of Frisian in the Germanic Language Area 81

So, when looking at the results from a Frisian perspective, the close geneticrelationship with English is not reflected in our results. Of the Germaniclanguages in our investigation, only Icelandic and Faroese are less similarto Frisian than English. However, when looking at the results from anEnglish perspective, we discover that of all Germanic language varieties inour material the Frisian dialect of Hindeloopen is most similar to English.As mentioned before, this dialect is highly conservative and furthermore itis spoken in a coastal place, which provides for easy contact with England.Also the Frisian dialect of Wetsens is more similar to English than theremaining Germanic languages. The other Frisian varieties are foundelsewhere in the middle of the ranking. Among the non-Frisian varieties,Dutch appears to be most similar to English. However, all Germaniclanguages, including Frisian and Dutch, show a large linguistic distance toEnglish, all distances being above 60%. The development of the Englishlanguage has thus clearly taken place independently from the otherGermanic languages, which can be explained by the strong influence fromnon-Germanic languages, especially French.

Also Icelandic shows a large distance to all other Germanic languages(from 54.1% to 70.0%), but in the Icelandic case this is explained by theconservative nature of this language rather than by language contactphenomena. Faroese is somewhat less conservative, but still shows ratherlarge distances to the other languages (between 53.6% and 67.7%). Thedistances between the other Nordic languages are smaller (between 43.4%and 47%), as was expected given that the three Scandinavian languages aremutually intelligible.

6. Conclusions and discussion

Overall, the classification of the Germanic languages resulting from ourdistance measurements supports our predictions. This goes for theclassification of the Frisian dialects and also for the rest of the Germaniclanguages. We interpret this as a confirmation of the suitability of ourmaterial showing that it is possible to measure Levenshtein distances on thebasis of whole texts with assimilation phenomena typical of connectedspeech and with a rather limited number of words.

The aim of the present investigation was to get an impression of theposition of the Frisian language in the Germanic language area on the basisof quantitative data. The fact that Frisian is genetically most closely related

82 Charlotte Gooskens and Wilbert Heeringa

to English yields the expectation that these two languages may still belinguistically similar. However, the distance between English and theFrisian dialects is large. We can thus conclude that the close geneticrelationship between English and Frisian is not reflected in the linguisticdistances between the modern languages. Geographical and historicalcircumstances have caused the two languages to drift apart linguistically.Frisian has been strongly influenced by Dutch whereas English has beeninfluenced by other languages, especially French.

It would have been interesting to include these languages in ourmaterial. This would have given an impression of their impact on theEnglish language. At the same time it would also have given us theopportunity to test the Levenshtein method on a larger language family thanthe Germanic family with its relatively closely related languages. It wouldalso be interesting to include Old English in our material since this wouldgive us an impression of how modern Frisian is related to the Englishlanguage at a time when it had only recently separated from the commonAnglo-Saxon roots to which also Old Frisian belonged.

For many centuries Frisian has been under the strong influence fromDutch and the Frisian and Dutch language areas share a long commonhistory. It therefore does not come as a surprise that Dutch is the Germaniclanguage most similar to the language varieties spoken in Friesland.

It may be surprising that the linguistic distances between Dutch and theFrisian dialects are smaller than the distances between the Scandinavianlanguages (a mean difference of 6%). Scandinavian languages are known tobe mutually intelligible. This means that when, for example, a Swede and aDane meet, they mostly communicate each in their own language. Thiskind of communication, which is known as semi-communication (Haugen,1966), is not typical in the communication between Dutch-speaking andFrisian-speaking citizens in the Netherlands. The two languages areconsidered so different that it is not possible for a Dutch-speaking person tounderstand Frisian and consequently the Frisian interlocutor will have tospeak Dutch to a non-Frisian person. Our results raise the question whethersemi-communication would also be possible in a Dutch-Frisian situation. Ifthis is not the case, we may explain this by linguistic and non-linguisticdifferences between the Frisian-Dutch situation and the Scandinaviansituation. The Levenshtein distance processes lexical, phonetic andmorphological differences. All three types are present in our transcription,since word lists are derived from running texts. Syntactic characteristics arecompletely excluded from the analysis. It might be the case that certain

The Position of Frisian in the Germanic Language Area 83

characteristics play a larger role for the Levenshtein distances thandesirable in the case of the Scandinavian languages if we were to use themethod for the explaining mutual intelligibility. For example, it is well-known among the speakers of Scandinavian languages that many wordsend in an ‘a’ in Swedish while ending in an ‘e’ in Danish. Probably peopleuse this knowledge in an inter-Scandinavian situation. However, thisdifference is included in the Levenshtein distances between Swedish andDanish. It is possible that Frisian-Dutch differences are less predictable orless well-known by speakers of the two languages. It is also possible thatthe difference in communication in the Netherlands and in Scandinaviashould be sought at the extra-linguistic level. Scandinavian research onsemi-communication has shown that the willingness to understand and thebelief that it is possible to communicate play a large role for mutualintelligibility between speakers of closely related languages.

Staying with the Scandinavian languages, it should be noted that themainland Scandinavian languages are in fact closer to Frisian than English,even though the Scandinavian languages belong genetically to anotherGermanic branch than English and Frisian. This can probably be explainedby intensive contacts between Frisians and Scandinavians for manycenturies. However, the common idea among some speakers of Frisian andScandinavian that the two languages are so close that they are almostmutually intelligible is not confirmed by our results, at least not as far asthe standard Scandinavian languages are concerned. Probably this popularidea is built on the fact that a few frequent words are identical in Frisianand Scandinavian. It is possible, however, that this picture would change ifwe would include more Danish dialects in our material. For example, itseems to be relatively easy for fishermen from Friesland to speak to theircolleagues from the west coast of Denmark. Part of the explanation mightalso be that fishermen share a common vocabulary of professional terms.Also the frequent contact and a strong motivation to communicatesuccessfully are likely to be important factors.

As we mentioned in the introduction, among dialects in the Netherlandsand Flanders, the Frisian varieties are most deviant from Standard Dutch.However, among the varieties which are recognized as languages in theGermanic language area, Frisian is most similar to Dutch. The smallestdistance between two languages, apart from Frisian, was found betweenNorwegian and Swedish: 43.4%. The distance between Frisian and Dutch issmaller: 38.7%. The Town Frisian variety of the capital of Friesland(Leeuwarden) has a distance of only 20.3% to Dutch. Although the

84 Charlotte Gooskens and Wilbert Heeringa

recognition of Frisian as second official language in the Netherlands is rightin our opinion, we found that the current linguistic position of Frisianprovide too little foundation for becoming independent from theNetherlands, as some Frisians may wish8.

Acknowledgements

This research would have been impossible without informants who werewilling to translate the story of ‘the Northwind and the Sun’. We wish tothank G. Blom (Hindeloopen), J. Spoelstra (Hijum) and W. Visser(Oosterbierum). All of them are affiliated with the Fryske Akademy inLeeuwarden. We also thank S. van Dellen (Wetsens), T. de Graaf(Leeuwarden), F. Postma (Tjerkgaast) and O. Vries (Westergeest), all ofthem employees of the University of Groningen. We thank J. Allen(England), A. Mikaelsdóttir (Iceland), Vigdis Petersen (the Faroes), R.Kraayenbrink (the Netherlands), K. Sjöberg (Sweden) and R. Schmidt(Germany). We are also very grateful to Jørn Almberg for making availablethe recording of Lillehammer (Norway). The recordings and transcriptionsof the Frisian transcriptions are made by the second author, and those of thestandard languages (except Norway and the Faroes) by the first author. Thetranscriptions subsequently were checked by Jørn Almberg who we thankgratefully for correcting our transcriptions. Furthermore, we wish toexpress our gratitude to Peter Kleiweg for his software for creating the map(Figure 2) and visualizing the dendrogram (Figure 5). Finally we thankMaartje Schreuder for reading an earlier version of this article and givinguseful comments and Angeliek van Hout for reviewing our English.

Notes

1 Dr. Tjeerd de Graaf, the central figure in this volume, was born in Leeuwarden,

the capital of Friesland. Leeuwarden is one of the places where Town Frisian isspoken. Tjeerd de Graaf is a native speaker of this dialect, but later on he alsolearned (standard) Frisian. The Leeuwarden speaker in the present investigationwas Tjeerd de Graaf (see Section 3.1).

2 Most of this section is based on König and Van der Auwera (1994).3 The Lillehammer recording can be found at http://www.ling.hf.ntnu.no/nos/

together with 52 recordings of other Norwegian dialects.

The Position of Frisian in the Germanic Language Area 85

4 Since our material included two toneme languages, Swedish and Norwegian,

also the two tonemes I and II were transcribed. For the other varieties primarystress was noted. Stress and tonemes were, however, not included forcalculation of linguistic distances.

5 The program PRAAT is a free public-domain program developed by PaulBoersma and David Weenink at the Institute of Phonetic Sciences of theUniversity of Amsterdam and available at http://www.fon.hum.uva.nl/praat.

6 The data is taken from the Linguistic Atlas of the Middle and South AtlanticStates (LAMSAS) and available via: http://hyde.park.uga.edu/lamsas/.

7 The example should not be interpreted as a historical reconstruction of the wayin which one pronunciation changed into another. From that point of view itmay be more obvious to show how [����������] changed into [����t��nn]. Wejust show that the distance between two arbitrary pronunciations is found on thebasis of the least costly set of operations mapping one pronunciation intoanother.

8 Tjeerd de Graaf has never taken such an extreme position. Possibly speakers ofTown Frisian have a more moderate opinion towards this issue since TownFrisian is more closely related to standard Dutch, as appeared in Figure 5 andTable 3.

References

Bolognesi, R. and W. Heeringa (2002). De invloed van dominante talen op hetlexicon en de fonologie van Sardische dialecten. In: D. Bakker,T. Sanders, R. Schoonen and Per van der Wijst (eds.).Gramma/TTT: tijdschrift voor taalwetenschap. NijmegenUniversity Press, Nijmegen, 9 (1): 45-84.

Feitsma, T. (1963). Sproglige berøringer mellem Frisland og Skandinavien.Sprog og kultur, 23: 97-121.

Gooskens, Ch. and W. Heeringa (submitted). Perceptive Evaluation ofLevenshtein Dialect Distance Measurements Using NorwegianDialect Data. (submitted to Language Variation and Change).

Goossens, J. (1977). Inleiding tot de Nederlandse Dialectologie. Wolters-Noordhoff, Groningen.

Haugen, E. (1966). Semicommunication: The Language Gap in Scandinavia.Sociological Inquiry, 36 (2): 280-297.

Heeringa, W. (2004). Measuring Dialect Pronunciation Differences usingLevenshtein Distance. Doctoral dissertation. University ofGroningen.

86 Charlotte Gooskens and Wilbert Heeringa

Heeringa, W. and C. Gooskens (2003). Norwegian Dialects ExaminedPerceptually and Acoustically. In: J. Nerbonne and W.Kretzschmar (eds.). Computers and the Humanities. KluwerAcademic Publishers, Dordrecht, 37 (3): 293-315.

Hof, J. J. (1933). Friesche Dialectgeographie. ‘s Gravenhage (Noord- en Zuid-Nederlandse Dialectbibliotheek 3).

Hoppenbrouwers, C and G. Hoppenbrouwers (1988). De featurefrequentiemethode en de classificatie van Nederlandse dialecten. TABU:Bulletin voor Taalwetenschap, 18 (2): 51-92.

Jain, A.K. and R.C. Dubes (1988). Algorithms for Clustering Data. PrenticeHall, Englewood Cliffs, New Yersey.

Johnson, K. (1997). Acoustic and Auditory Phonetics. Blackwell Publishers,Cambridge etc..

Kessler, B. (1995). Computational dialectology in Irish Gaelic. In: Proceedingsof the 7th Conference of the European Chapter of the Associationfor Computational Linguistics. EACL, Dublin, 60-67.

Kloeke, G. G. (1927). De Hollandsche expansie in de zestiende en zeventiendeeeuw en haar weerspiegeling in de hedendaagscheNederlandsche dialecten. Nijhoff, ‘s-Gravenhage.

König, E. and J. van der Auwera (1994). eds. The Germanic Languages.Routledge, London.

Miedema, H.T.J. (1966). Van York naar Jorwerd. Enkele problemen uit deFriese taalgeschiedenis. J.B. Wolters, Groningen.

Nerbonne, J., W. Heeringa, E. van den Hout, P. van der Kooi, S. Otten, and W.van de Vis, (1996). Phonetic Distance between Dutch dialects.In: G. Durieux, W. Daelemans, and S. Gillis (eds.). CLIN VI,Papers from the sixth CLIN meeting. Antwerpen. University ofAntwerp, Center for Dutch Language and Speech, 185-202.

Nerbonne, J. and W. Heeringa (1997). Measuring dialect distancesphonetically. In: J. Coleman (ed.). Workshop on ComputationalPhonology. Madrid, 11-18.

Pei, M. (1966). The story of language. Allen & Unwin, London.Potter, R.K., G.A. Kopp and H.C. Green (1947). Visible Speech. The Bell

Telephone Laboratories Series. Van Nostrand, New York.Rietveld, A.C.M. and V.J. Van Heuven (1997). Algemene fonetiek. Coutinho,

Bussum.Sandøy, H. (1994). Utan kontakt og endring? In: U.-B. Kotsinas and J.

Helgander (eds.). Dialektkontakt, språkkontakt ochspråkförändring i Norden. Almqvist & Wiksell International,Stockholm, 38-51.

Sokal, R.R. and F.J. Rohlf (1962). The comparison of dendrograms byobjective methods. Taxon, 11: 33-40.

The Position of Frisian in the Germanic Language Area 87

Stevens, K.N. (1998). Acoustic Phonetics. MIT Press, Cambridge.The Columbia Encyclopedia (2001). www.bartleby.com/65/fr/Frisianl.htmlThe International Phonetic Association (1949). The principles of the

International Phonetic Association: being a description of theInternational Phonetic Alphabet and the manner of using it,illustrated by texts in 51 languages. International PhoneticAssociation, London.

The International Phonetic Association (1999). Handbook of the InternationalPhonetic Association: a guide to the use of the InternationalPhonetic Alphabet. Cambridge University Press, Cambridge.

Trask, R.L. (1996). A Dictionary of Phonetics and Phonology. Routledge,London and New York.

Van der Veen, K. F. (2001). West Frisian Dialectology and Dialects. In: H. H.Munske (ed.). Handbook of Frisian Studies. Niemeyer,Tübingen, 83-98.

Vieregge, W. H., A.C.M. Rietveld and C. Jansen (1984). A distinctive featurebased system for the evaluation of segmental transcription inDutch. In: M.P.R. van den Broecke and A. Cohen. Proceedingsof the 10th International Congress of Phonetic Sciences. ForisPublications, Dordrecht and Cinnaminson, 654-659.

Visser, W. (1997). The syllable in Frisian. Holland Academic Graphics, TheHague.

Wadstein, E. (1933). On the Relations between Scandinavians and Frisians inEarly Times. University of London, London.

Wells, J. and J. House (1995). The sounds of the International PhoneticAlphabet. UCL, London.

Zwicker, E. and H. Fastl (1990). Psychoacoustics and Models. Springer Verlag,Berlin.

Zwicker, E. and R. Feldtkeller (1967). Das Ohr als Nachrichtemfänger.Monographien der elektrischen Nachrichtentechnik. 19, 2nd

revised edition. Hirzel, Stuttgart.

Learning Phonotactics with Simple Processors

John Nerbonne and Ivilin Stoianov

Abstract

This paper explores the learning of phonotactics in neural networks.Experiments are conducted on the complete set of over 5,000 Dutchmonosyllables extracted from CELEX, and the results are shown to beaccurate within 5% error. Extensive comparisons to human phonotacticlearning conclude the paper. We focus on whether phonotactics can beeffectively learned and how the learning which is induced compares tohuman behavior.

1. Introduction1

Phonotactics concerns the organization of the phonemes in words andsyllables. The phonotactic rules constrain how phonemes combine in orderto form larger linguistic units (syllables and words) in that language (Laver,1994). For example, Cohen, Ebeling & van Holk (1972) describe thephoneme combinations possible in Dutch, which will be the language infocus in this study.

Phonotactic rules are implicit in natural languages so that humansrequire no explicit instruction about which combinations are allowed andwhich are not. An explicit phonotactic grammar can of course be abstractedfrom the words in a language, but this is an activity linguists engage in, notlanguage learners in general. Children normally learn a language'sphonotactics in their early language development and probably update itonly slightly once they have mastered the language.

Most work on language acquisition has arisen in linguistics andpsychology, and that work employs mechanisms that have been developedfor language, typically, discrete, symbol-manipulation systems.Phonotactics in particular has been modeled with n-gram models, Finite

90 John Nerbonne and Ivilin Stoianov

State Machines, Inductive Logic Programming, etc. (Tjong Kim Sang,1998; Konstantopoulos, 2003). Such approaches are effective, but acognitive scientist may ask whether the same success could be possibleusing less custom-made tools. The brain, viewed as a computationalmachine, exploits other principles, which have been modeled in theapproach known as Parallel Distributed Processing (PDP), which wasthoroughly described in the seminal work of Rumelhart & McClelland(1986). Computational models inspired by the brain structure and neuralprocessing principles are Neural Networks (NNs), also known asconnectionist models.

Learning phonotactic grammars is not an easy problem, especially whenone restricts one's attention to cognitively plausible models. Sincelanguages are experienced and produced dynamically, we need to focus onthe processing of sequences, which complicates the learning task. Thehistory of research in connectionist language learning shows both successesand failures even when one concentrates on simpler structures, such asphonotactics (Stoianov, Nerbonne & Bouma, 1998; Stoianov & Nerbonne,2000; Tjong Kim Sang, 1995; Tjong Kim Sang & Nerbonne, 1999; Pacton,Perruchet, Fayol & Cleeremans, 2001).

This paper will attack phonotactics learning with models that have nospecifically linguistic knowledge encoded a priori. The models naturally dohave a bias, viz., toward extracting local conditioning factors forphonotactics, but we maintain that this is a natural bias for many sorts ofsequential behavior, not only linguistic processing. A first-order DiscreteTime Recurrent Neural Network (DTRNN) (Carrasco, Forcada & Neco,1999; Tsoi & Back, 1997) will be used: the Simple Recurrent Network(SRNs) (Elman, 1988). SRNs have been applied to different languageproblems (Elman, 1991; Christiansen & Chater, 1999; Lawrence, Giles &Fong, 1995), including learning phonotactics (Shillcock, Levy, Lindsey,Cairns & Chater, 1993; Shillcock, Cairns, Chater & Levy, 1997). Withrespect to phonotactics, we have also contributed reports (Stoianov et al.,1998; Stoianov & Nerbonne, 2000; Stoianov, 1998).

SRNs have been shown capable of representing regular languages(Omlin & Giles, 1996; Carrasco et al., 1999). Kaplan & Kay (1994)demonstrated that the apparently context-sensitive rules that are standardlyfound in phonological descriptions can in fact be expressed within the morerestrictive formalism of regular relations. We begin thus with a devicewhich is in principle capable of representing the needed patterns.

Learning Phonotactics with Simple Processors 91

We then simulate the language learning task by training networks toproduce context-dependent predictions. We also show how the continuouspredictions of trained SRNs - likelihoods that a particular token can followthe current context - can be transformed into more useful discretepredictions, or, alternatively, string recognitions.

In spite of the above claims about representability, the Back-Propagation (BP) and Back-Propagation Through Time (BPTT) learningalgorithms used to train SRNs do not always find optimal solutions - SRNsthat produce only correct context-dependent successors or recognize onlystrings from the training language. Hence, section 3 focuses on the practicaldemonstration that a realistic language learning task may be simulated byan SRN. We evaluate the network learning from different perspectives -grammar learning, phonotactics learning, and language recognition. Thelast two methods need one language-specific parameter - a threshold - thatdistinguishes successors/words allowed in the training language. Thisthreshold is found with a post-training procedure, but it could also besought interactively during training.

Finally, section 4 assesses the networks from linguistic andpsycholinguistic perspectives: a static analysis extracts acquired linguisticknowledge from network weights, and the network performance iscompared to humans' in a lexical decision task. The network performance,in particular the distribution of errors as a function of string position, willbe compared to alternative construals of Dutch syllabic structure -following a suggestion from discussions of psycholinguistic experimentsabout English syllables (Kessler & Treiman, 1997).

1.1. Motivations for a Phonotactic Device

This section will review standard arguments that demonstrate the cognitiveand practical importance of phonotactics. English phonotactic rules such as:

‘/s/ may precede, but not follow /t/ syllable-initially’

(ignoring loanwords such as `tsar' and `tse-tse') may be adduced by judgingthe well-formedness of sequences of letters/phonemes, taken as words inthe language, e.g. /st�p/ vs. */ts�p/. There may also be cases judged to be ofintermediate acceptability. So, even if all of the following are Englishwords:

92 John Nerbonne and Ivilin Stoianov

/m����/ `mother', /f����/ `father', /s�st��/ `sister'

None of the following are, however:

*/m���/, */f����/, */tss���/

None of these sound like English words. However, the followingsequences:

/m����/, /fu���/, /s�nt��/

"sound" much more like English, even if they mean nothing and thereforeare not genuine English words. We suspect that, e.g., /s�nt��/ 'santer', couldbe used to name a new object or a concept.

This simple example shows that we have a feeling for word structure,even if no explicit knowledge. Given the huge variety of words, it is moreefficient to put this knowledge into a compact form - a set of phonotacticrules. These rules would state which phonemic sequences sound correct andwhich do not. In this same vein, second language learners experience aperiod when they recognize that certain phonemic combinations (words)belong to the language they learn without knowing the meaning of thesewords.

Convincing psycholinguistic evidence that we make use of phonotacticscomes from studying the information sources used in word segmentation(McQueen, 1998). In a variety of experiments, this author shows that wordboundary locations are likely to be signaled by phonotactics. The authorrules out the possibility that other sources of information, such as prosodiccues, syllabic structure and lexemes, are sufficient for segmentation.Similarly, Treiman & Zukowski (1990) had shown earlier that phonotacticsplay an important role in the syllabification process. According toMcQueen (1998), phonotactic and metrical cues play complementary rolesin the segmentation process. In accordance with this, some researchers haveelaborated on a model for word segmentation: the Possible WordConstraints Model (Norris, McQueen, Cutler & Butterfield, 1997), in whichlikely word-boundary locations are marked by phonotactics, metrical cues,etc., and in which they are further fixed by using lexicon-specificknowledge.

Learning Phonotactics with Simple Processors 93

Exploiting the specific phonotactics of Japanese, Dupoux, Pallier,Kakehi & Mehler (2001) conducted an experiment with Japanese listenerswho heard stimuli that contained illegal consonant clusters. The listenerstended to hear an acoustically absent vowel that brought their perceptioninto line with Japanese phonotactics. The authors were able to rule outlexical influences as a putative source for the perception of the illusoryvowel, which suggests that speech perception must use phonotacticinformation directly.

Further justification for the postulation of a neurobiological device thatencodes phonotactics comes from neurolinguistic and neuroimagingstudies. It is widely accepted that the neuronal structure of Broca’s area (inthe brain's left frontal lobe) is used for language processing, and morespecially that it represents a general sequential device (Stowe, Wijers,Willemsen, Reuland, Paans & Vaalburg, 1994; Reilly, 2002). A generalsequential processor capable of working at the phonemic level would be aplausible realization of a neuronal phonotactic device.

Besides cognitive modeling, there are also a number of practicalproblems that would benefit from effective phonotactic processing. Inspeech recognition, for example, a number of hypotheses that explain thespeech signal are created, from which the impossible sound combinationshave to be filtered out before further processing. This exemplifies a lexicaldecision task, in which a model is trained on a language L and then testswhether a given string belongs to L. In such a task a phonotactic devicewould be of use. Another important problem in speech recognition is wordsegmentation. Speech is continuous, but we divide it into psychologicallysignificant units such as words and syllables. As noted above, there are anumber of cues that we can use to distinguish these elements - prosodicmarkers, context, but also phonotactics. Similarly to the former problem, anintuitive strategy here is to split the phonetic/phonemic stream at the pointsof violation of phonotactic constraints (see Shillcock et al. (1997) andCairns, Shillcock, Chater & Levy (1997) for connectionist modeling).Similarly, the constraints of the letters forming words in written languages(graphotactics) are useful in word processing applications, for example,spell-checking.

There is another, more speculative aspect to investigating phonotactics.Searching for an explanation of the structure of the natural languages,Carstairs-McCarthy presented in his recent book (1999) an analogybetween syllable structure and sentence structure. He argues that sentencesand syllables have a similar type of structure. Therefore, if we find a proper

94 John Nerbonne and Ivilin Stoianov

mechanism for learning the syllabic structures, we might apply a similarmechanism to learning syntax as well. Of course, syntax is much morecomplex and more challenging, but if Carstairs-McCarthy is right, the basicprinciples of both devices might be the same.

2. Simple Recurrent Networks

This section will briefly present Simple Recurrent Networks (Elman, 1988;Robinson & Fallside, 1988) and will review earlier studies of sequential,especially phonotactic learning. Detailed descriptions of the SRNprocessing mechanisms and the Back-Propagation Through Time learningalgorithm that is used to train the model are available elsewhere (Stoianov,2001; Haykin, 1994), and will be reviewed only superficially.

Figure 1. Learning phonotactics with the SRNs. If the training data set contains thewords /n�t#/, /n�ts#/ and /n�t��rk#/ then after the network has processeda left context /n�/, the reaction to an input token /t/ will be active neuronscorresponding to the symbol '#' and the phonemes /s/, and /�/.

Simple Recurrent Networks (SRNs) were invented to encode simpleartificial grammars, as an extension of the Multilayer Perceptron(Rumelhart, Hinton & Williams, 1986) with an extra input - a context layerthat holds the hidden layer activations at the previous processing cycle.After training, Elman (1988) conducted investigations on how context

Learning Phonotactics with Simple Processors 95

evolves in time. The analysis showed graded encoding of the inputsequence: similar items presented to the input were clustered at close, butdifferent, shifting positions. That is, the network discovered and implicitlyrepresented in a distributed way the rules of the grammar generating thetraining sequences. This is noteworthy, because the rules for context werenot encoded, but rather acquired through experience. The capacity of SRNsto learn simple artificial languages was further explored in a number ofstudies (Cleeremans, Servan-Schreiber & McClelland, 1989; Gasser, 1992).

SRNs have the structure shown in Figure 1. They operate as follows:Input sequences SI are presented to the input layer, one element SI(t) at atime. The purpose of the input layer is just to transfer activation to thehidden layer through a weight matrix. The hidden layer in turn copies itsactivations after every step to the context layer, which provides anadditional input to the hidden layer - i.e., information about the past, after abrief delay. Finally, the hidden layer neurons output their signal through asecond weight matrix to the output layer neurons. The activation of thelatter is interpreted as the product of the network. Since the activation ofthe hidden layer depends both on its previous state (the context) and on thecurrent input, SRNs have the theoretical capacity to be sensitive to theentire history of the input sequence. However, practical limitations restrictthe time span of the context information to maximally 10-15 steps(Christiansen & Chater, 1999). The size of the layers does not restrict therange of temporal sensitivity.

The network operates in two working regimens - supervised training andnetwork use. In the latter, the network is presented the sequential input dataSI(t) and computes the output N(t) using contextual information. Thetraining regimen involves the same sort of processing as network use andalso includes a second, training step, which compares the network reactionN(t) to the desired one ST(t), and which uses the difference to adjust thenetwork behavior in a way that improves future network performance onthe same data.

The two most popular supervised learning algorithms used to trainSRNs are the standard Back-Propagation algorithm (Rumelhart et al., 1986)and the Back-Propagation Through Time algorithm (Haykin, 1994). Whilethe earlier is simpler because it uses information from one previous timestep only (the context activation, the current network activations, anderror), the latter trains the network faster, because it collects errors from alltime steps during which the network processes the current sequence andtherefore it adjusts the weights more precisely. However, the BPTT

96 John Nerbonne and Ivilin Stoianov

learning algorithm is also cognitively less plausible, since the collection ofthe time-spanning information requires mechanisms specific for thesymbolic methods. Nevertheless, this compromise allows more extensiveresearch, and without it the problems discussed below would require muchlonger training time when using standard computers for simulations.Therefore, in the experiments reported here the BPTT learning algorithmwill be used. In brief, it works in the following way: the network reaction toa given input sequence is compared to the desired target sequence at everytime step and an error is computed. The network activation and error ateach step are kept in a stack. When the whole sequence is processed, theerror is propagated back through space (the layers) and time, and weight-updating values are computed. Then, the network weights are adjusted withthe values computed in this way.

2.1. Learning Phonotactics with SRNs

Dell, Juliano & Govindjee (1993) showed that words could be describednot only with symbolic approaches, using word structure and content, butalso by a connectionist approach. In this early study of learning wordstructure with neural nets (NNs), the authors trained SRNs to predict thephoneme that follows the current input phoneme, given contextinformation. The data sets contained 100 - 500 English words. Animportant issue in their paper is the analysis and modeling of a number ofspeech-error phenomena, which were taken as strong support for paralleldistributed processing (PDP) models, in particular SRNs. Some of thesephenomena were: phonological movement errors (reading list - leadinglist), manner errors (department - jepartment), phonotactic regularityviolations (dorm - dlorm), consonant-vowel category confusions and initialconsonant omissions (cluster-initial consonants dropping as when `stop' ismispronounced [t�p]).

Aiming at segmentation of continuous phonetic input, Shillcock et al.(1997) and Cairns et al. (1997) trained SRNs with a version of the BPTTlearning algorithm on English phonotactics. They used 2 millionphonological segments derived from a transcribed speech corpus andencoded with a vector containing nine phonological features. The neuralnetwork was presented a single phoneme at a time and was trained toproduce the previous, the current and the next phonemes. The outputcorresponding to the predicted phoneme was matched against the following

Learning Phonotactics with Simple Processors 97

phoneme, measuring cross-entropy; this produced a varying error signalwith occasional peaks corresponding to word boundaries. The SRNreportedly learned to reproduce the current phoneme and the previous one,but was poor at predicting the following phoneme. Correspondingly, thesegmentation performance was quite modest, predicting only about one-fifth of the word boundaries correctly, but it was more successful inpredicting syllable boundaries. It was significantly improved by addingother cues such as prosodic information. This means that phonotacticsmight be used alone for syllable detection, but polysyllabic word detectionneeds extra cues.

In another connectionist study on phonological regularities, Rodd (1997)trained SRNs on 602 Turkish words; the networks were trained to predictthe following phonemes. Analyzing the hidden layer representationsdeveloped during the training, the author found that hidden units came tocorrespond to graded detectors for natural phonological classes such asvowels, consonants, voiced stops and front and back vowels. This is furtherevidence that NN models can capture important properties of the data theyhave been trained on without any prior knowledge, based only on statisticalco-occurrences.

Learning the graphotactics and phonotactics of Dutch monosyllableswith connectionist models was first explored by Tjong Kim Sang (1995)and Tjong Kim Sang & Nerbonne (1999), who trained SRNs to predictgraphemes/phonemes based on preceding segments. The data wasorthogonally encoded, that is, for each phoneme or grapheme there wasexactly one neuron activated at the input and output layers (see below 3.1).To test the knowledge learned by the network, Tjong Kim Sang andNerbonne tested whether the activation of the neurons corresponding to theexpected symbols are greater than a threshold determined as the lowestactivation for some correct sequence encountered during the training data.This resulted in almost perfect acceptance of unseen Dutch words(generalization), but also in negligible discrimination with respect to (ill-formed) random strings. The authors concluded that “SRNs are unfit forprocessing our data set” (Tjong Kim Sang & Nerbonne, 1999).

These early works on learning phonotactics with SRNs prompted thework reported here. First, Stoianov et al. (1998) demonstrated that theSRNs in Tjong Kim Sang and Nerbonne's work were learning phonotacticsrather better than those authors had realized. By analyzing the error as afunction of the acceptance threshold, Stoianov et al. (1998) were able todemonstrate the existence of thresholds successful at both the acceptance of

98 John Nerbonne and Ivilin Stoianov

well-formed data and the rejection of ill-formed data (see below 3.6.2 for adescription of how we determine such thresholds). The interval of high-performing thresholds is narrow, which is why earlier work had notidentified it (see Figure 2 on how narrow the window is). More recently,Stoianov & Nerbonne (2000) have studied the performance of SRNs from acognitive perspective, attending to the errors produced by the network andto what extent it correlates with the performance of humans on relatedlexical decision tasks. The current article ties these two strands of work andpresents it systematically.

3. Experiments

The challenge in connectionist modeling is not only developing theoreticalframeworks, but also obtaining the most from the network models duringexperimentation. This section focuses on experiments on learning thephonotactics of Dutch syllables with Simple Recurrent Networks anddiscusses a number of related problems. It will be followed by a study onthe network behavior from a linguistic point of view.

3.1. Some implementation decisions

SRNs were presented in section 2. A first implementation decisionconcerns how sounds are to be represented. A simple orthogonal strategy isto choose a vector of n neurons to represent n phonemes, to assign eachphoneme (e.g. /�/) to a neuron (e.g., neuron 5 in a sequence of 45), and thento activate that one neuron and deactivate all the others whenever thephoneme is to be represented (so a /�/ is represented by four deactivatedneurons, a single activated one, and then forty more deactivated neurons).This orthogonal strategy makes no assumptions about phonemes beingnaturally grouped into classes on the basis of linguistic features such asconsonant/vowel status, voicing, place of articulation, etc. An alternativestrategy exploits such features by assigning each feature to a neuron andthen representing a phoneme via a translation of its feature description intoa sequence of corresponding neural activations.

In phonotactics learning, the input encoding method might be feature-based or orthogonal, but the output decoding should be orthogonal in orderto obtain a simple prediction of successors, and to avoid a bias induced

Learning Phonotactics with Simple Processors 99

from the peculiarities of the feature encoding scheme used. The inputencoding chosen was also orthogonal, which also requires the networkdiscover natural classes of phonemes by itself.

The orthogonal encoding implies that we need as many neurons as wehave phonemes, plus one for the end-of-word '#' symbol. That is, the inputand output layers will have 45 neurons. However, it is usually difficult tochoose the right size of the hidden layer for a particular learning problem.That size is rather indirectly related to the learning task and encodingchosen (as a subcomponent of the learning task). A linguistic bias in theencoding scheme, e.g., feature-based encoding, would simplify the learningtask and decrease the number of the hidden neurons required learning it(Stoianov, 2001). Intuition tells us that hidden layers that are too small leadto an overly crude representation of the problem and larger error. Largerhidden layers, on the other hand, increase the chance that the networkwanders aimlessly because the space of possibilities it needs to traverse istoo large. Therefore, we sought an effective size in a pragmatic fashion.Starting with a plausible size, we compared its performance to nets withdouble and half the number of neurons in the hidden layer. We repeated inthe direction of the better behavior, keeping track of earlier bounds in orderto home in on an appropriate size. In this way we settled on a range of 20-80 neurons in the hidden layer, and we continued experimentation onphonotactic learning using only nets of this size.

However, even given the right size of the hidden layer, the training willnot always result in an optimal weight set W* since the network learning isnondeterministic - each network training process depends on a number ofstochastic variables, e.g., initial network weights and an order ofpresentation of examples. Therefore, in order to produce more successfullearning, several SRNs with different initial weights were trained in a pool(group).

The back-propagation learning algorithm is controlled by two mainparameters - a learning coefficient � and a smoothing parameter �. The firstone controls the speed of the learning and is usually set within the range(0.1…0.3). It is advisable to choose a smaller value when the hidden layeris larger. Also, this parameter may vary in time, starting with a larger initialvalue that decreases progressively in time (as suggested in Kuan, Hornik &White (1994) for the learning algorithm to improve its chances at attaininga global minimum in error). Intuitively, such a schedule helps the networkapproximately to locate the region with the global minima and later tomake more precise steps in searching for that minimum (Haykin, 1994;

100 John Nerbonne and Ivilin Stoianov

Reed & Marks, II 1999). The smoothing parameter � will be set to 0.7,which also allows the network to escape from local minima during thesearch walk over the error surface.

The training process also depends on the initial values of the weights.They are set to random values drawn from a region (-r...+r). It is alsoimportant to find a proper value for r, since large initial weight values willproduce chaotic network behavior, impeding the training. We used r = 0.1.

The SRNs used for this problem are schematically represented in Fig. 1,where the SRN reaction to an input sequence /n�/ after training on anexemplary set containing the sequences /n�t#/, /n�ts#/, /n�t��rk#/ is given.For this particular database, the network has experienced the tokens '#', /s/and /�/ as possible successors to /n��/ during training and therefore it willactivate them in response to this input sequence.

3.2. Linguistic Data - Dutch syllables

A data base LM of all Dutch monosyllables - 5,580 words - was extractedfrom the CELEX (1993) lexical database. CELEX is a difficult data sourcebecause it contains many rare and foreign words among its approximately350,000 Dutch lexical entries, which additionally complicate the learningtask. Filtering out non-typical words is a formidable task and one whichmight introduce experimenter prejudice, and therefore all monosyllableswere used. The monosyllables have a mean length of 4.1(� = 0.94; min = 2;max = 8) tokens and are built from a set of 44 phonemes plus one extrasymbol representing space (#) used as a filler specifying end-of-word.

The main dataset is split into a training (L1) and a testing (L2) databasein proportion approximately 85% to 15%. The training database will beused to train a Simple Recurrent Network and the testing one will be usedfor evaluating the success of word recognition. Negative data also will becreated for test purposes. The complete database LM will be used for someparts of evaluation.

In language modeling it is important to explore the frequencies of wordoccurrences which naturally bias humans' linguistic performance. If amodel is trained on data in proportion to its empirical frequency, thisfocuses the learning on the more frequent words and thus improves theperformance of the model. This also makes feasible a comparison of themodel's performance with that of humans performing various linguistictasks, such as a lexical decision task. For these reasons, we used the word

Learning Phonotactics with Simple Processors 101

frequencies given in the CELEX database. Because the frequencies varygreatly ([0...100,000]), we presented training data items in proportion withthe natural logarithms of their frequencies, in accordance with standardpractice (Plaut, McClelland, Seidenberg & Patterson, 1996). This approachresulted in frequencies in a range of [1...12].

3.3. Difficulty

One way to characterize the complexity of the training set is to compute theentropy of the distribution of successors, for every available left context.The entropy of a language L viewed as a stochastic process measures theaverage surprise value associated with each element (Mitchell, 1997). Inour case, the language is a set of words and the elements are phonemes,hence the appropriate entropy measures the average surprise value forphonemes c preceded by a context s. Entropy is measured for a givendistribution, which in our case is the set of all possible successors. Wecompute entropy Entr(s) for a given context s with (1):

Equation 1. Entropy

where � is the alphabet of segment symbols, and p(c) the probability of agiven context. Then the average entropy for all available contexts s∈L,weighted with their frequencies, will be the measure of the complexity ofthe words. The smaller this measure, the less difficult are the words. Themaximal possible value for one context would be log2(45), that is, 5.49, andthis would only obtain for the unlikely case that each phoneme was equallylikely in that context. The actual average value of the entropy measured forthe Dutch monosyllables, is 2.24, � = 1.32. The minimal value was 0.0, andthe maximal value was 3.96. These values may be interpreted as follows:The minimal value of 0.0 means that there are left contexts with only onepossible successor (log2(1) = 0). A maximal value of 3.96 means that thereis one context which is as unpredictable as one in which 23.96 = 16successors were equally likely. The mean entropy is 2.24, which is to saythat in average 4.7 phonemes follow a given left context.

102 John Nerbonne and Ivilin Stoianov

3.4. Negative Data

We noted above that negative data is also necessary for evaluation. Sincewe are interested in models that discriminate more precisely the stringsfrom L (the Dutch syllables), the negative data for the followingexperiments will be biased toward L.

Three negative testing sets were generated and used: First, a set RM

containing strings with syllabic form [C]0...3V[C]0...4, based on the empiricalobservation that the Dutch mono-syllables have up to three onset (wordinitial) consonants and up to four coda (word final) consonants. The secondgroup consists of three sub-sets of RM: {R1

M , R2M , R M

3 + }, with fixed distances ofthe random strings to any existing Dutch word at 1, 2, and 3+ phonemes,respectively (measured by edit distance (Nerbonne, Heeringa & Kleiweg,1999)). Controlling for the distance to any training word allows us to assessmore precisely the performance of the model. And finally, a third group:random strings built of concatenations of n-grams picked randomly fromDutch monosyllables. In particular, two sets - R2

N and R3N - were randomly

developed, based on bigrams and trigrams, correspondingly.The latter groups are the most "difficult" ones, and especially R3

N ,because it consists of strings that are closest to Dutch. They are also usefulfor the comparison of SRN methods to n-gram modeling. Thecorresponding n-gram models will always wrongly recognize these randomstrings as words from the language. Where the connectionist predictorrecognizes them as non-words, it outperforms the corresponding n-grammodels, which are considered as benchmark models for prediction taskssuch as phonotactics learning.

3.5. Training

This section reports on network training. We will add a few more detailsabout the training procedure, then we will present pilot experiments aimedat determining the hidden layer size. The later parts will analyze thenetwork performance.

Learning Phonotactics with Simple Processors 103

3.5.1. Procedure

The networks were trained in a pool on the same problem, andindependently of each other, with the BPTT learning algorithm. Thetraining of each individual network was organized in epochs, in the courseof which the whole training data set is presented in accordance with theword frequencies. The total of the logarithm of the frequencies in thetraining data base L1

M is about 11,000, which is also the number ofpresentations of sequences per epoch, drawn in a random order. Next, foreach word, the corresponding sequence of phonemes is presented to theinput, one at a time, followed by the end-of-sequence marker `#'. Each timestep is completed by copying the hidden layer activations to the contextlayer, which is used in the following step.

The parameters of the learning algorithm were as follows: the learningcoefficient � started at 0.3 and dropped by 30% each epoch, finishing at0.001; the momentum (smoothing) term � = 0.7. The networks required 30epochs to complete training. After this point, very little improvement isnoted.

3.5.2. Pilot experiments

Pilot experiments aiming at searching for the most appropriate hidden layersize were done with 20, 40 and 80 hidden neurons. In order to avoid othernondeterminism which comes from the random selection of negative data,during the pilot experiments the network was tested solely on its ability todistinguish admissible from inadmissible successors. Those experimentswere done with a small pool of three networks, each of them trained for 30epochs, which resulted in approximately 330,000 word presentations or1,300,000 segments. The total number of individual word presentationsranged from 30 to 300, according to the individual word frequencies. Theresults of the training are given in Table 1, under the group of columns"Optimal phonotactics". In the course of the training, the networks typicallystarted with a sharp error drop to about 13%, which soon turned into a veryslow decrease (see Table 2, left 3 columns).

The training of the three pools with hidden layer size 20, 40 and 80,resulted in networks with similar performance, with the largest networkperforming best. Additional experiments with SRNs with 100 hiddenneurons resulted in larger errors than a network with 80 hidden neurons, so

104 John Nerbonne and Ivilin Stoianov

that we settled experimentally on 80 hidden neurons as the likely optimalsize. It is clear that this procedure is rough, and that one needs to be onguard against premature concentration on one size model.

Table 1. Results of a pilot study on phonotactics learning by SRNs with 20, 40,and 80 (rows) hidden neurons. Each network is independently trained onlanguage LM three times (columns). The performance is measured (left 3columns) using the error in predicting the next phoneme, and (right 3columns) using L2 (semi-Euclidean) distance between the empiricalcontext-dependent predictions and the network predictions for eachcontext in the tree. Those two methods do not depend on randomlychosen negative data.

Optimal Phonotactics ||SRNL, TL||L2

Hidd Layer Size SRN1 SRN2 SRN3 SRN1 SRN2 SRN3

20 10.57% 10.65% 10.57% 0.0643 0.0642 0.064240 10.44% 10.51% 10.44% 0.0637 0.0637 0.063780 10.00% 9.97% 10.02% 0.0634 0.0634 0.0632

Table 2. A typical shape of the SRN error during training. The error drops sharplyin the beginning and then slowly decreases to convergence.

Epoch 1 2-4 5-10 11-15 16-30Error (%) 15.0 12.0 10.8 10.7 10.5

3.6. Evaluation

The performance of a neural predictor trained on phonotactics may beevaluated with different methods, depending on the particular task thenetwork is applied to. In this section we evaluate the neural networksperforming best during the pilot studies.

3.6.1. Likelihoods

The direct outcome of training the sequential prediction task is learning thesuccessors' distribution. This will therefore be used as a basic evaluationmethod: the empirical context-dependent successor distribution Ps

L (C) willbe matched against the network context dependent predictions NPs

L (C). For

Learning Phonotactics with Simple Processors 105

this purpose, the output of the network will be normalized and matchedagainst the distribution in the language data. This procedure resulted in amean L2 (semi-Euclidean) distance of 0.063 - 0.064, where the optimalvalue would be zero (see Table 1, right 3 columns).2 These values are closeto optimal but baseline models (completely random networks) also result inapproximately 0.085 L2 distance.

3.6.2. Phonotactic Constraints

To evaluate the network's success in becoming sensitive to phonotacticconstraints, we first need to judge how well it predicts individualphonemes. For this purpose we seek a threshold above which phonemes arepredicted to be admissible and below which they are predicted to beinadmissible. This is done empirically - we perform a binary search for anoptimal threshold, i.e. the threshold �� that minimizes the network errorE(�). The classification obtained in this fashion constitutes the network'spredictions about phonotactics.

We now turn to evaluating the network's predictions: the method toevaluate the network from this point of view compares the context-dependent network predictions with the corresponding empiricaldistributions. For this purpose, the method described by Stoianov (2001)will be used. The algorithm traverses a trie (Aho, Hopcroft & Ullman,1983: 163-169), which is a tree representing the vocabulary where initialsegments are the first branches. Words are paths through this data structure.The algorithm computes the performance at the optimal thresholddetermined using the procedure described in the last paragraph, i.e., at thethreshold which determines which phonemes are admissible and whichinadmissible (see also 2.1). This approach compares the actual distributionwith the learned distribution, and we normally use the complete database LM

for training and testing.Figure 2 shows the error of SRN1

8 0 at different values of the threshold.The optimal threshold searching procedure resulted in 6.0% erroneousphoneme prediction at a threshold of 0.0175. This means that if we want topredict phonemes with this SRN, they would be accepted as allowedsuccessors if the activation of the correspondent neurons are higher than0.0175.

106 John Nerbonne and Ivilin Stoianov

3.6.3. Word Recognition

Using an SRN trained on phoneme prediction as a word recognizing deviceshifts the focus from phoneme prediction to sequence classification. Wewish to see whether it can classify sequences of phonemes into well-formedwords on the one hand and ill-formed non-words on the other. To do thiswe need to translate the phoneme (prediction) values into sequence values.We do this by taking the sum of the phoneme error values for the sequenceof phonemes in the string, normalized to correct for length effects. But totranslate this sum into a classification, we again need to determine anacceptability threshold, and we use a variant of the same empiricaloptimization described above. The threshold arrived at for this purpose isslightly lower than the optimal threshold from the previous algorithm. Thismeans that the network accepts more phonemes, which, however, iscompensated for by the fact that a string is accepted only if all its phonemesare predicted. In string recognition it is better to increase the phonemeacceptance rate, because the chance to detect a non-word is larger whenmore tokens are tested.

Figure 2. SRN error (in %) as a function of the threshold �� The False NegativeError increases as the threshold increases because more and moreadmissible phonemes are incorrectly rejected. At the same time, the FalsePositive Error decreases because fewer unwanted successors are falselyaccepted. The mean of those two errors is the network error, which findsits minimum 6.0% at threshold �� = 0.0175. Notice that the optimalthreshold is limited to a small range. This illustrates how critical theexact setting of threshold is for good performance.

Learning Phonotactics with Simple Processors 107

Since the performance measure here is the mean percentage of correctlyrecognized monosyllables and correctly rejected random strings, weincorporate both in seeking the optimal threshold. The negative data is asdescribed above in 3.4. Concerning the positive data, this approach allowsus to test the generalization capacity of the model, so that the training L1

M

and testing L2M subsets may be used here - the first for training the model

and evaluating it during training, and the second to test the generalizationcapacity of the trained network.

Once we determine the optimal sequence-acceptance threshold (0.016),we obtain 5% error on the positive training dataset L1

M and the negativestrings from RM , where the error varied 0.5% depending on the random dataset generated.

The model was tested further on the second group of negative data sets.As expected, strings which are more unlike Dutch resulted in smaller error.Performance on random strings from RN

3 + is almost perfect. In the oppositecase, the strings close to real words (from R1

N ) resulted in larger error.The generalization capabilities of the network were tested on the L2

M

positive data, unseen during training. The error on this test set was about6%. An explanation of the increase of the error will be presented later,when the error will be studied by varying its properties.

Another interesting issue is how SRN performance compares to otherknown models, e.g. n-grams. The trained SRN definitely outperformedbigrams and trigrams, which was shown by testing the trained SRNs on thenon-words from R2

N and R3N sets, yielding 19% and 35% error, respectively.

This means that the SRN correctly rejected four out of five non-wordstrings composed of correct bigrams and two out of three non-word stringsmade of trigrams. To clarify, note that bigram models would have 100%error on R2

N , and trigram models 100% error on R3N .

4. Network Analysis

The distributed representations in Neural Networks prevent the analysis ofgeneralizations in trained models by simple observation, which symboliclearning methods allow. Smaller NNs may be analyzed to some extent byexamination, but for larger networks this is practically impossible.

It is possible, however, to analyze trained networks to extract abstractknowledge about their behavior. Elman (1988), for example, trained anSRN to learn sentences and then analyzed the hidden layer activations of

108 John Nerbonne and Ivilin Stoianov

that SRN in various contexts, from which he showed that the network hadinternally developed syntactical categories. Similarly, we trained SRNs onphonotactics (Stoianov et al., 1998), and then analyzed the networkstatically, by viewing the weight vectors of each neuron as patternclassifiers. We showed that the SRN had induced generalizations aboutphonetic categories. We follow that earlier work in order to study networkbehavior, and we present the results of this study in the first subsection.

Another approach to the analysis of connectionist models assumes thatthey are black boxes and examines the variation of network performancewhile varying some properties of the data (Plaut et al., 1996; Stoianov,Stowe & Nerbonne, 1999). For example, one can vary word frequency,length, etc., and study the network error. When modeling human cognitivefunctions with this approach one can compare the behavior of the cognitivesystem and its artificial models. For example, in phonotactic modeling, onecan compare results from psycholinguistic studies of a lexical decision taskwith the network reaction. This will be subject of study in the rest of thesection.

4.1. Weight Analysis

The neurons of a neural network act as pattern classifiers. The inputsselectively activate one or another neuron, depending on the weightvectors. This means that information about network structure may beextracted from the weight vectors.

In this section we will present a cluster analysis of the neurons in theoutput layer. For that purpose, the mean weight vectors of the output layerof one of the networks - SRN2

4 0 (from Table 1) - were clustered using aminimum variance (Ward's) method, and each vector in the resultingdendrogram was labeled with the phoneme it corresponds to.3 The resultingdiagram is shown in Figure 3.

Learning Phonotactics with Simple Processors 109

Figure 3. Cluster analysis of the vector of the output neurons, labeled with thephonemes they correspond to. The weight vectors are split into clusterswhich roughly correspond to existing phonetic categories.

110 John Nerbonne and Ivilin Stoianov

We can see that the weight vectors (and correspondingly, the phonemes)cluster into some well-known major natural classes - vowels (in the bottom)and consonants (the upper part). The vowels are split into two majorcategories: low vowels and semi-low, front vowels (/��, �, a, e/), and high,back ones. The latter, in turn, are clustered into round+ and round- classes.Consonants appear to be categorized in a way less congruent withphonetics. But here, too, some established groups are distinguished. Thefirst subgroup contains non-coronal consonants (/f, k, m, p, x/) with theexceptions of /l/ and /n/. Another subgroup contains voiced obstruents (/�,d, �, ���/). The delimiter '#' is also clustered as a consonant, in a group with/t/, which is also natural. The upper part of the figure seems to containphonemes from different groups, but we can recognize that most of thesephonemes are quite rare in Dutch monosyllables, e.g., /�/, perhaps becausethey have been 'loaned' from other languages, e.g. /g/.

4.2. Functional analysis

We may also study NNs by examining their performance as a function offactors such as word frequency, similarity neighborhood, and word length.Such an analysis relates computational language modeling topsycholinguistics, and we submit that it is useful to compare the models'performance with humans'. In this section we introduce several factorswhich have played a role in psycholinguistic theorizing. We then examinethe performance of our model as a function of these factors.

4.2.1. Psycholinguistic Factors

Frequency is one of the most thoroughly investigated characteristics ofwords that affect performance. Numerous previous studies havedemonstrated that the ease and the time with which spoken words arerecognized are monotonically related to the experienced frequency ofwords in the language environment (Luce, Pisoni & Goldinger, 1990; Plautet al., 1996). The general tendencies found are that the more frequent wordsare, the faster and the more precise they are recognized.

Our perception of a word is likewise known to depend on its similarityto other words. The similarity neighborhood of a word is defined as thecollection of words that are phonetically similar to it. Some neighborhoods

Learning Phonotactics with Simple Processors 111

are dense with many phonetically similar words while others are sparsewith few.

The so-called Colthearth-N measure of a word w counts the number ofwords that might be produced by replacing a single letter of w with someother. We modify this concept slightly to make it sensitive to similarity ofsub-syllabic elements, so that we regard words as similar when they sharetwo of the subsyllabic elements - onset, nucleus and coda. Empty onsets orcodas are counted as the same. The word neighborhood is computed bycounting the number of the similar words. If implemented precisely, thecomplexity of the measuring process just explained is high, so we reduce itby probing for sub-syllables rather than for units of variable size, startingfrom a single phoneme. This simplifies and speeds up processing. Theneighborhood size of the corpus we used ranged from 0 to 77 and had meanvalue of �= 30; � = 13.

For example, the phonological neighborhood of the Dutch word broeds/bruts/ is given below. Note that the neighborhood contains only Dutchwords.

/br�ts/, /brots/, /bruj/, /brujt/, /bruk/, /brur/, /brus/, /brut/, /buts/, /kuts/,/puts/, /tuts/

These represent the pronunciations of Brits `British', broods `bread'(gen.sg.), broei `brew', broeit `brew' (3rd. sg.), broek `pants', broer`brother', broes `spray nozzle', broed `brood', boots `boots' (Eng. loan),koets `coach', poets `clean' and toets `test'. Among the words with very poorneighborhood are /����/ schwung, /b�rts/ boards, /�����jnt/ joint, and/sk�rs/ squares, all of which are of foreign origin. Words such as /hk/hek, /b�s/ bas, /l�xt/ lacht, and /b�kt/ bakt have large neighborhoods.

It is still controversial how similarity neighborhood influences cognitiveprocesses (Balota, Paul & Spieler, 1999). Intuitively, it seems likely thatwords with larger neighborhoods are easier to access due to many similaritems, but from another perspective these words might be more difficult toaccess due to the nearby competitors and longer selection process.However, in the more specific lexical decision task, the overall activity ofmany candidates has been shown to facilitate lexical decisions, so we willlook for the same effect here.

The property word length might affect performance in the lexicaldecision task in two different ways. On one hand, longer words providemore evidence since more phonemes are available to decide whether the

112 John Nerbonne and Ivilin Stoianov

input sequence is a word so that we expect higher precision for longerwords, and lower precision for particularly short words. On the other hand,network error accumulating in iteration increases the error in phonemepredictions at later positions, which in turn will increase the overall errorfor longer words. For these reasons we expect U-shaped patterns of error asword length increases. Such a pattern was observed in a study on modelinggrapheme-to-phoneme conversion with SRNs (Stoianov et al., 1999). StaticNNs are less likely (than dynamic models such as SRNs) to produce suchpatterns.

So far we have presented three main characteristics of the individualwords, which we expect to affect the performance of the model. However, astatistical correlation analysis (bivariate Spearman test) showed that theyare not independent, which means that an analysis of the influence of anysingle factor should control for the rest. In particular, there is high negativecorrelation between word neighborhood and word length (r = -0.476),smaller positive correlation between neighborhood and frequency (r =0.223), and very small negative correlation between frequency and wordlength (r = -0.107). Because of the large amount of data all thesecoefficients are significant at the 0.001 level.

Finally, it will be useful to seek a correlate in the simulation for reactiontime, which psycholinguists are particularly fond of using as a probe tounderstanding linguistic structure. Perhaps we can find an SRN correlate toReaction Time (RT) for the lexical decision task in network confidence,i.e., the amount of evidence that the test string is a word from the traininglanguage. The less confident the network, the slower the reaction, whichcan be implemented with a lateral inhibition (Haykin, 1994; Plaut et al.,1996). The network confidence for a given word might be expressed as theproduct of the activations of the neurons corresponding to the phonemes ofthat word. A similar measure, which we call uncertainty U, is the negativesum of (output) neuron activation logarithms, normalized with respect toword length |w| (2). Note that U varies inversely with confidence. Lesscertain sequences get higher (positive) scores.

Learning Phonotactics with Simple Processors 113

Equation 2.

To analyze the influence of these parameters, the network scores and U-values were recorded for each monosyllabic word at the optimal threshold��= 0.016. The data was then submitted to the statistical package SPSS foranalysis of variance using SPSS's General Linear Model (GLM). Whenanalyzing network score, the analysis revealed main effects of all threeparameters discussed above: word neighborhood size (F = 18.4; p <0.0001), word frequency (F = 19.2; p < 0.0001), word length (F = 11.5; p <0.0001). There was also interaction between neighborhood size and theother parameters: the interaction with word frequency had an F -score 6.6and the interaction of the neighborhood with word length had an F-score of4.9, both significant at 0.0001 level. Table 3 summarizes the findings. Errordecreases both as neighborhood size and as frequency increases, and errordependent on length shows the predicted U-shaped form (Table 3c).

Table 3. Effect of (a) frequency, (b) neighborhood density and (c) length on worduncertainty U and word error.

a. Frequency Low Mid HighU 2.30 2.20 2.18Error (%) 8.6 4.1 1.5

b. Neighb. size Low Mid HighU 2.62 2.30 2.21Error (%) 12.7 3.9 0.8

c. Length Low Mid HighU 2.63 2.20 2.13Error (%) 5.2 4.4 13.1

114 John Nerbonne and Ivilin Stoianov

Analysis of variance on the U-values revealed similar dependencies. Therewere main effects of word neighborhood size (F = 58.2; p < 0.0001), wordfrequency (F = 45.9; p < 0.0001), word length (F = 137.5; p < 0.0001), aswell as the earlier observed interactions between neighborhood density andthe other two variables: word length (F = 10.4; p < 0.001) and frequency (F= 5.235; p < 0.005).

The frequency pattern of error and uncertainty variance was expected,given the increased evidence to the network for more frequent words. Thedisplayed length effect showed that the influence of error gained inrecursion is weaker than the effect of stronger evidence for longer words.Also, the pattern of performance when varying neighborhood densityconfirmed the hypothesis of the lexical decision literature that largerneighborhoods makes it easier for words to be recognized as such.

4.3. Syllabic structure

Phonotactic constraints might hint at how the stream of phonemes isorganized in the language processing system. The popular phoneme,syllable and word entities may not be the only units that we use for lexicalaccess and production. There are suggestions that in addition, some sub-syllabic elements are involved in those processes, that is, the syllablesmight have not linear structure, but more complex representations (Kessler& Treiman, 1997). For that purpose, we will analyze how the phonemeprediction error at a threshold of 0.016 - where the network resulted in bestword recognition - is located within words with respect to the followingsub-syllabic elements - onset, nucleus and coda. The particular hypothesisthat will be tested is whether Dutch monosyllables follow the structurebelow that was found in English as well (Kessler & Treiman, 1997).

( Onset - Rhyme (Nucleus - Coda) )

The distribution of phoneme error within words (Table 4a) shows that thenetwork makes more mistakes at the beginning than at the end of words,where SRN becomes more confident in its decision. This could beexplained with increasing contextual information that more severelyrestricts possible phonemic combinations. A more precise analysis of theerror position in the onset, the nucleus and the coda further reveals otherinteresting phenomena (Table 4b).

Learning Phonotactics with Simple Processors 115

Table 4. Distribution of phoneme prediction error at a threshold of 0.016 by (a)phoneme position within words and (b) phoneme position within sub-syllables. Word and Onset positions start from 2, because the predictionstarts after the first phoneme.

a. Word Position 2 3 4 5 6 7 8Error (%) 4.3 1.7 1.4 0.6 0.3 0.3 0.00

b. Sub-syllabes Onset Nucleus CodaRelative PositionError (%)

22.6

30.0

14.5

11.0

21.5

32.0

42.6

First, error within the coda increases at the coda's end. We attribute this toerror accumulated toward the end of the words, as was predicted earlier.The mean entropy in the coda (1.32; � = 0.87) is smaller than the meanentropy in the onset (1.53; � = 0.78), where we do not observe such effects.So looser constraints are not the reason for the relatively greater error in thecoda. Next, the error at the transition onset-nucleus is much higher than theerror at the surrounding positions, which means that the break betweenonset and rhyme (the conjunction nucleus-coda) is significant. Thisdistribution is also consistent with the statistical finding that the entropy islarger in the body (the transition point onset-nucleus) (3.45; � = 0.39), thanin the rhyme (1.94; � = 1.21). All this data support the hypothesis that onsetand rhyme play significant roles in lexical access and that the syllabicstructure confirmed for English by Kessler & Treiman (1997) is valid forDutch, too.

5. Conclusions

Phonotactic constraints restrict the way phonemes combine in order to formwords. These constraints are empirical and can be abstracted from thelexicon - either by extracting rules directly, or via models of that lexicon.Existing language models are usually based on abstract symbolic methods,which provide good tools for studying such knowledge. But linguisticresearch from a connectionist perspective can provide a fresh perspectiveabout language because the brain and artificial neural networks shareprinciples of computations and data representations.

116 John Nerbonne and Ivilin Stoianov

Connectionist language modeling, however, is a challenging task.Neural networks use distributed processing and continuous computations,while languages have a discrete, symbolic nature. This means that somespecial tools are necessary if one is to model linguistic problems withconnectionist models. The research reported in this paper attempted toprovide answers to two basic questions: first, whether phonotactic learningis possible at all in connectionist systems, which had been doubted earlier(Tjong Kim Sang, 1995; Tjong Kim Sang, 1998). In the case of a positiveanswer, the second question is how NN performance compares to humanability. In order to draw this comparison, we needed to extract thephonotactic knowledge from a network which has learned the sequentialstructure. We proposed several ways of doing this.

Section 3 studied the first question. Even if there are theoretical resultsdemonstrating that NNs have the needed finite-state capacity forphonotactic processing, there are practical limitations, so that we neededexperimental support to demonstrate the practical capability of SRNs tolearn phonotactics. A key to solving the problems of earlier investigatorswas to focus on finding a threshold that optimally discriminated thecontinuous neuron activations with respect to phoneme acceptance andrejection simultaneously. The threshold range at which the networkachieves good discrimination is very small (see Figure 2), which illustrateshow critical the exact setting of the threshold is. We also suggested that thisthreshold might be computed interactively, after processing each symbol,which is cognitively plausible, but we postpone a demonstration of this toanother paper.

The network performance on word recognition - word acceptance rate of95% and random string rejection rate of 95% at a threshold of 0.016 -competes with the scores of symbolic techniques such as Inductive LogicProgramming and Hidden Markov Models (Tjong Kim Sang, 1998), bothof which reflect low-level human processing architecture with less fidelity.

Section 4 addressed the second question of how other linguisticknowledge encoded into the networks can be extracted. Two approacheswere used. Section 4.1 clustered the weights of the network, revealing thatthe network has independently become sensitive to established phoneticcategories.

We went on to analyze how various factors which have been shown toplay a role in human performance find their counterparts in the network'sperformance. Psycholinguistics has shown, for example, the ease and thetime with which spoken words are recognized are monotonically related to

Learning Phonotactics with Simple Processors 117

the frequency of words in language experience (Luce et al., 1990). Themodel likewise reflected the importance of neighborhood density infacilitating word recognition, which we speculated stems from thesupportive evidence which more similar patterns lend to the words in theirneighborhood. Whenever network and human subjects exhibit a similarsensitivity to well-established parameters, we see a confirmation of theplausibility of the architecture chosen.

Finally, the distribution of the errors within the words showed anotherlinguistically interesting result. In particular, the network tended to errmore often at the transition onset-nucleus - which is also typical fortransitions between adjacent words in the speech stream and used forspeech segmentation. Analogically, we can conclude from this that thenucleus-coda unit - the rhyme - is a significant linguistic unit for the Dutchlanguage, a result suggested earlier for English (Kessler & Treiman, 1997).

We wind up this conclusion with one disclaimer and a repetition of thecentral claim. We have not claimed that SRNs are the only (connectionist)model capable of dynamic processing, nor that they are biologically themost plausible neural network. Our central claim is to have demonstratedthat relatively simple connectionist mechanisms have the capacity to modeland learn phonotactic structure.

Notes

1 The authors are particularly pleased to offer this piece to a Festschrift honoring

Dr. Dr. h.c. Tjeerd de Graaf, who graciously agreed to cooperate in thesupervision of Stoianov's Ph.D. project 1997-2001 at the University ofGroningen. Even if Tjeerd is best known for his more recent work ondescriptive linguistics, minority languages and language documentation, hisearly training in physics and earlier research on acoustic phonetics made himone of the best-suited supervisors for projects such as the one reported on hereinvolving advanced learning algorithms. Tjeerd's sympathy with EasternEuropean languages and cultures is visceral and might have led him to agree inany case, but we particularly appreciated his phonetic acumen.

2 The distance is related to Euclidean, but more exactly the distance between thetwo n-dimensional vectors is

118 John Nerbonne and Ivilin Stoianov

3 The cluster analysis in Figure 3 was produced by programs written by Peter

Kleiweg, available at http://www.let.rug.nl/alfa.

References

Aho, Alfred, John Hopcroft & Jeffrey Ullman (1983). Data Structures andAlgorithms. Addison Wesley.

Balota, David, Stephen Paul & Daniel Spieler (1999). Attentional control oflexical processing pathways during word recognition andreading. In: S. Garrod & M. Pickering (eds). Studies in cognition:Language processing. UCL Press, London, England, 15-57.

Cairns, Paul, R. Shillcock, Nick Chater & Joe Levy (1997). Bootstrapping wordboundaries: A bottom-up corpus-based approach to speechsegmentation. Cognitive Psychology, 33(2): 111-153.

Carrasco, Rafael, Mikel Forcada & Ramon Neco (1999). Stable encoding offinite-state machines in discrete-time recurrent neural networkswith sigmoid units. Neural Computation, 12(9): 2129-2174.

Carstairs-McCarthy, Andrew (1999). The Origins of Complex Language.Oxford Univ Press.

CELEX (1993). The CELEX Lexical Data Base (cd-rom), Linguistic DataConsortium. http://www.kun.nl/celex.

Christiansen, Morton H. & Nick Chater (1999). Toward a connectionist modelof recursion in human linguistic performance. Cognitive Science,23: 157-205.

Cleeremans, A., D. Servan-Schreiber & J.L. McClelland (1989). Finite stateautomata and simple recurrent networks. Neural Computation,1(3): 372-381.

Cohen, A., C. Ebeling & A.G.F. van Holk (1972). Fonologie van hetNederlands en het Fries. Martinus Nijhoff, The Hague.

Dell, Gary, Cornell Juliano & Anita Govindjee (1993). Structure and content inlanguage production: A theory of frame constraints inphonological speech errors. Cognitive Science, 17: 145-195.

Dupoux, Emmanuel, Christophe Pallier, Kazuhiko Kakehi & Jacques Mehler(2001). New evidence for prelexical phonological processing inword recognition. Language and Cognitive Processes, 5(16):491-505.

Elman, Jeffrey L. (1988). Finding structure in time. Technical Report 9901,Center for Research in Language, UCSD, CA.

Learning Phonotactics with Simple Processors 119

Elman, Jeffrey L. (1991). Distributed representations, simple recurrentnetworks, and grammatical structure. Machine Learning, 7(2/3):195-226.

Gasser, Michael (1992). Learning distributed representations for syllables. In:Proc. of 14th Annual Conference of Cognitive Science Society,396- 401.

Haykin, Simon (1994). Neural Networks. Macmillian Publ, NJ.Kaplan, Ronald & Martin Kay (1994). Regular models of phonological rule

systems. Computational Linguistics, 20/3: 331-378.Kessler, Brett & Rebecca Treiman (1997). Syllable structure and the

distribution of phonemes in English syllables. Journal of Memoryand Language, 37: 295-311.

Konstantopoulos, Stasinos (2003). Using Inductive Logic Programming toLearn Local Linguistic Structures. PhD thesis, RijksuniversiteitGroningen.

Kuan, Chung-Ming, Kurt Hornik & Halbert White (1994). A convergenceresult for learning in recurrent neural networks. NeuralComputation, 6: 420-440.

Laver, John (1994). Principles of Phonetics. Cambridge University Press,Cambridge.

Lawrence, Steve, C. Lee Giles & S. Fong (1995). On the applicability of neuralnetworks and machine learning methodologies to naturallanguage processing. Technical report, Univ. of Maryland.

Luce, Paul L., David B. Pisoni & Steven D. Goldinger (1990). Similarityneighborhoods of spoken words. In: G. T. M. Altmann (ed.).Cognitive Models of Speech Processing. A Bradford Book,Cambridge, Massachusetts, USA, 122-147.

McQueen, James (1998). Segmentation of continuous speech usingphonotactics. Journal of Memory and Language, 39: 21-46.

Mitchell, Thomas (1997). Machine Learning. McGraw Hill College.Nerbonne, John, Wilbert Heeringa & Peter Kleiweg (1999). Edit distance and

dialect proximity. In: D. Sankoff & J. Kruskal (eds). Time Warps,String Edits and Macromolecules: The Theory and Practice ofSequence Comparison, 2nd ed.. CSLI, Stanford, CA, v-xv.

Norris, D., J.M. McQueen, A. Cutler & S. Butterfield (1997). The possible-word constraint in the segmentation of continuous speech.Cognitive Psychology, 34: 191-243.

Omlin, Christian W. & C. Lee Giles (1996). Constructing deterministic finite-state automata in recurrent neural networks. Journal of the ACM,43(6): 937-972.

120 John Nerbonne and Ivilin Stoianov

Pacton, S., P. Perruchet, M. Fayol & A. Cleeremans (2001). Implicit learning inreal world context: The case of orthographic regularities. Journalof Experimental Psychology: General, 130(3): 401-426.

Plaut, D.C., J. McClelland, M. Seidenberg & K. Patterson (1996).Understanding normal and impaired word reading:Computational principles in quasi-regular domains.Psychological Review, 103: 56-115.

Reed, Russell D. & Robert J. Marks II (1999). Neural Smithing. MIT Press,Cambridge, MA.

Reilly, Ronan (2002). The relationship between object manipulation andlanguage development in Broca's area: A connectionistsimulation of Greenfield's hypothesis. Behavioral and BrainSciences, 25: 145-153.

Robinson, A. J. & F. Fallside (1988). Static and dynamic error propagationnetworks with application to speech coding. In: D. Z. Anderson(ed.). Neural Information Processing Systems. American Instituteof Physics, NY.

Rodd, Jennifer (1997). Recurrent neural-network learning of phonologicalregularities in Turkish. In: Proc. of Int. Conf. on ComputationalNatural Language Learning. Madrid, 97-106.

Rumelhart, David E. & James A. McClelland (1986). Parallel DistributedProcessing: Explorations of the Microstructure of Cognition. TheMIT Press, Cambridge, MA.

Rumelhart, D.E., G.E. Hinton & R.J. Williams (1986). Learning internalrepresentations by error propagation. In: D. E. Rumelhart & J. A.McClelland (eds.). Parallel Distributed Processing: Explorationsof the Microstructure of Cognition, Volume 1, Foundations . TheMIT Press, Cambridge, MA, 318-363.

Shillcock, Richard, Paul Cairns, Nick Chater & Joe Levy (1997). Statistical andconnectionist modelling of the development of speechsegmentation. In: Broeder & Murre (eds.). Models of LanguageLearning. MIT Press.

Shillcock, Richard, Joe Levy, Geoff Lindsey, Paul Cairns & Nick Chater(1993). Connectionist modelling of phonological space In: T. M.Ellison & J. Scobbie (eds.). Computational Phonology.Edinburgh Working Papers in Cognitive Science, Edinburgh, 8:179-195

Stoianov, Ivilin Peev (1998). Tree-based analysis of simple recurrent networklearning. In: 36 Annual Meeting of the Association forComputational Linguistics and 17 Int. Conference onCompuational Linguistics. Vol. 2, Montreal, Canada, 1502-1504.

Learning Phonotactics with Simple Processors 121

Stoianov, Ivilin Peev (2001). Connectionist Lexical Modelling. PhD thesis,Rijksuniversiteit Groningen.

Stoianov, Ivilin Peev & John Nerbonne (2000). Exploring phonotactics withsimple recurrent networks. In: F. van Eynde, I. Schuurman & N.Schelkens (eds.). Computational Linguistics in the Netherlands,1998. Rodopi, Amsterdam, NL, 51-68.

Stoianov, Ivilin Peev, John Nerbonne & Huub Bouma (1998). Modelling thephonotactic structure of natural language words with simplerecurrent networks. In: P.-A. Coppen, H. van Halteren & L.Teunissen (eds.). Computational Linguistics in the Netherlands,1997. Rodopi, Amsterdam, NL, 77-96.

Stoianov, Ivilin Peev, Laurie Stowe & John Nerbonne (1999). Connectionistlearning to read aloud and correlation to human data. In: 21stAnnual Meeting of the Cognitive Science Society, Vancouver,Canada. Lawrence Erlbaum Ass., London, 706-711.

Stowe, Laurie, Anton Wijers, A. Willemsen, Eric Reuland, A. Paans & WimVaalburg (1994). Pet studies of language: An assessment of thereliability of the technique. Journal of Psycholinguistic Research,23(6): 499-527.

Tjong Kim Sang, Erick (1995). The limitations of modeling finite stategrammars with simple recurrent networks. In: Proceedings of the5th Computational Linguistics in The Netherlands, 133-143.

Tjong Kim Sang, Erick (1998). Machine Learning of Phonotactics. PhD thesis,Rijksuniversiteit Groningen.

Tjong Kim Sang, Erik & John Nerbonne (1999). Learning simple phonotactics.In: Proceedings of the Workshop on Neural, Symbolic, andReinforcement Methods for Sequence Processing, MachineLearning Workshop at IJCAI '99, 41-46.

Treiman, R. & A. Zukowski (1990). Toward an understanding of Englishsyllabification. Journal of Memory and Language, 34: 66-85.

Tsoi, Ah Chung & Andrew Back (1997). Discrete time recurrent neuralnetwork architectures: A unifying review. Neurocomputing, 15:183-223.

Weak Interactions

Yiddish influence in Hungarian, Esperanto andModern Hebrew

Tamás Bíró

When I arrived in Groningen, I was introduced to Tjeerd de Graaf assomebody speaking Hungarian. Then it turned out that both of us wereinterested in Yiddish. Furthermore, we shared the fact that we started ourscientific life within physics, although, unlike Tjeerd, I have not worked asa physicist since my graduation. Nevertheless, as a second year physicsstudent I received a research question from the late leading Hungarianphysicist George Marx that was also somehow related to Tjeerd’s earlierresearch topic, neutrino astrophysics.

Neutrinos are funny particles. They are extremely light, if they have anymass, at all.1 Therefore, they cannot interact through gravitation. Becausethey do not have any electrical charge either, electromagnetic interaction isalso unknown to them. The only way they can interact with the universe isthe so-called weak interaction, one of the four fundamental forces.2

Nowadays physicists spend an inconceivable amount of budget buildinggigantic, underground basins containing millions of liters of heavy waterjust to try to detect a few neutrinos per year out of the very intense streamof neutrinos flowing constantly from the Sun and going through the Earth,that is, us. Even though they almost never interact with regular material,through weak interaction they play a fundamental role both in shaping whatthe universe looks like and in the Sun’s energy production. Therefore ourlife would not be possible without neutrinos and without weak interaction.

Something similar happens in ethnolinguistics. The interaction betweentwo languages may not always be very salient, and it cannot necessarily beexplained by the most famous types of interactions. A weak interaction inlinguistics might be an interaction which is not acknowledged by thespeakers’ community, for instance for ideologically reasons.

In the present paper I shall present three cases of weak interactionbetween languages, understood in this sense, namely Yiddish affecting

124 Tamás Bíró

Hungarian, Modern Hebrew (Israeli Hebrew) and Esperanto. All the storiestake place in the late nineteenth or early twentieth century, when a new ormodernized language had to be created. We shall observe what kind ofinteractions took place under which conditions. A model for interactionscombined with the better understanding of the social-historical setting willenable us to do so.

1. Language interactions within a given socio-historical setting

1.1. Modeling interactions

In physics, the interaction between two bodies depends on three factors: thetwo “eligibilities” of the parties to interact, as well as their distance. Forgravity and electromagnetism, the formula probably familiar from high-school physics states that the force is proportional to the product of the“eligibilities” - mass or electric charge - of the two bodies, divided by thesquare of their distance. In other words, the higher the two masses (orelectric charges) and the smaller the distance, the stronger the interaction.

For Newton, who formulated this formula first, gravity was a long-rangeinteraction. Modern physics has completed this picture with introducingexchange particles intermediating between the interacting bodies.3 Thatway, contemporary science has also incorporated the view of Newton’sopponents who argued for the only possibility of short-range interactions.

To transplant this image, vaguely, into the phenomenon of languageinteraction, we have to identify the eligibilities of the two interactinglanguages, their distance and the exchange particles. In fact, we can do thateven on two levels. On a purely linguistic level, one can easily point towords and grammatical phenomena - “exchange particles” - wanderingfrom language to language. But it would be harder to identify in general theproperties of the phenomena and of the given languages that make theinteraction more probable or less probable.

The sociolinguistic level is more promising for such an approach. In thiscase, the human beings are the exchange particles: people who leave onelinguistic community in order to join a new one. By the very fact of theirmoves, they affect their new language by a linguistic quantum. The closerthe two language communities, the more people will act as an exchange

Weak Interactions 125

particle. Here distance should be understood not only based on geography,but on the intensity of the social network, as well. Thus, the more peoplewander to the target community, the more linguistic impulse is brought tothe second language and therefore the stronger the interaction. Note that thephysical analogy is not complete, since the symmetry of action and reactionis not guaranteed for interacting languages.

The three cases to be discussed share the feature that the role of thecarriers of the interaction is played by late nineteenth century EasternEuropean Jews. In order to understand the historical background, we haveto recall what is called Haskala or Jewish Enlightenment.

1.2. The Haskala

By the late eighteenth century, the French and German Aufklärung hadraised the question whether to emancipate and integrate - or assimilate - theJewish population on the one side, and an increasing wish to join theEuropean culture on the other. Although in the second half of the siècle deslumières there were only a few Jewish intellectuals who articulated theseideas, most of them belonging to the circle of the philosopher MosesMendelssohn (1729-1786) in Berlin, the next decades witnessed theacculturation of a growing segment of the Jewish population in the Germanterritories, as well as within the Austrian Empire. The eighteenth centuryBerlin Haskala is called the first stage of the Jewish Enlightenment,whereas the early nineteenth century social and cultural developmentsrepresent its second stage.

What the first two stages of the Haskala yielded was including a Jewishcolor on the contemporary Western European cultural palette. “Jewish” wasunderstood exclusively as one possible faith within the list of Europeanreligions, and nothing more than a religious conviction. An enlightened Jewwas supposed to fully master the educated standard variant of the languageof the society he lived in (Hochdeutsch¸ in most of the cases), without any“Jewish-like” feature. Propagating the knowledge of Hochdeutsch androlling back Jüdischdeutsch had already been the program of MosesMendelssohn when he began writing a modern targum4 of the Bible, theBiur. Further, the same Jew was expected to fully master the contemporaryEuropean culture, including classical languages, sciences and arts. The onlysphere in which this Jew could express his or her being Jewish was thediminished and europeanized arena of religious life. Diminished, because

126 Tamás Bíró

of a secularization of life style; and Europeanized, due to the inclusion ofphilosophical ideals of the Enlightenment together with aesthetic models ofthe Romanticism. The traditional religious duty of constantly learning thetraditional texts with the traditional methods was sublimated into thescholarly movement of the Wissenschaft des Judentums.

The picture changed dramatically in the middle of the nineteenthcentury, when the Haskala, in its third stage, reached the Eastern EuropeanJewry, including Jews in Poland and Lithuania (under Russiangovernment), Eastern Hungary, and Rumania. Here the Jewish populationwas far denser, whereas the surrounding society was far behind WesternEurope in the process of the social and economic development. In fact,Jews would play an important role in the modernization of those areas.Therefore, several people of Jewish origin could take the initiative andinvent absolutely new alternatives to the social constructs that people hadbeen living with so far.

One type of those social alternatives still preserved the idea of theearlier Haskala according to which Jews should become and remain anorganic part of the universal human culture. These alternatives proposedthus some forms of revolutionary change to the entire humankind, as wasthe case in the different types of socialist movements, in which Jewsunquestionably played an important role. Esperantism also belongs here,for its father, Ludwig Zamenhof was a Polish-Lithuanian Jew proposing analternative to national language as another social construct.

The second type of radical answer that Eastern European Jews gave tothe emergence of Enlightenment in the underdeveloped Eastern Europeanmilieu was creating a new kind of Jewish society. Recall that there was adense Jewish population living within a society that itself did not representa modern model to which most Jews wished to acculturate. Differentstreams of this type of answer emerged, although they did not mutuallyexclude each other. Many varieties of political activism, such as earlyforms of Zionism, political Zionism, territorialism or cultural autonomism,embody one level of creating an autonomous Jewish society.

The birth of a new Jewish secular culture, including literature,newspapers or Klezmer music is another one. The question then arosewhether the language of this new secular culture should be Yiddish - andthus a standardized, literary version of Yiddish was to be developed - orHebrew - and therefore a renewal of the Hebrew language was required. Inthe beginning, this point was not such an enormous matter of dispute as itwould later develop into, when “Hebraists”, principally connected with

Weak Interactions 127

Zionism, confronted “Yiddishists”, generally claiming a cultural and / orpolitical autonomy within Eastern Europe. It is the irony of history that thefar more naïve and seemingly unrealistic ideology, calling for the revival ofan almost unspoken language in the distant Palestine, was the one that laterwould become reality.

1.3. Language interactions in the Haskala

Let us now return to our model of language interactions. As we have seen,the intensity of the interaction depends on the number of “exchangeparticles” - language changing individuals - , that is a kind of “distance”measured in the social network; furthermore on the “eligibility” of thelanguages to transmit and to adopt features. We shall now confront thismodel with the linguistic reality of the different stages of the Haskala.

Concerning the first stage, when only a handful of followers of MosesMendelssohn rejected the Jüdischdeutsch and started speakingHochdeutsch, our model will correctly predict that the number of exchangeparticles is insufficient to affect German in a perceptible way.

The number of exchange particles increases dramatically when we reachthe first half of the nineteenth century. However, the people changinglanguage more or less consciously adopted the idea of their original idiombeing an unclean and corrupt version of the target language. Consequently,by nature their language change consisted of not bringing any influence onthe target language with them. By applying our vague physical model tothis situation, we might say that although the two languages were indeedclose - from the viewpoints of geography, linguistic similarity and socialcontacts - , Hochdeutsch was not “eligible” enough to be seriously affected.

What happened in the third stage of the Haskala? The following threecase studies represent three possibilities. The first one, the influence ofYiddish on Hungarian, was actually a case where some elements of stage 2Haskala were still present. The emancipation of the Jews was closelyrelated to their assimilation into the Hungarian society, culture andlanguage. As Jews wished to become an equal part of that society, let uscall this case type e. Each of the many people brings only a very “light”quantum of influence, similarly to the very little mass, if any, of theelectron neutrinos. The type mu designates a case when Jews migrated to anewly created Jewish “land, language and culture”, namely to ModernHebrew. Here less people carry possibly more “weight”, that is why they

128 Tamás Bíró

can be paralleled by the heavier muon neutrinos. In the third case, that isthe birth of Esperanto, only one person of Jewish cultural backgroundwished to transform the entire word, with a total rejection of reference toany form of Jewishness, at least on a conscious level (type tau, referring tothe probably heaviest type of neutrinos).

2. Three examples of weak interaction

2.1. Type e: Yiddish and Hungarian

Nineteenth century Hungary was situated on the border of WesternEuropean Jewry, affected already by the first two stages of Haskala, andEastern European Jewry, which would be reached only by its third phase.From the second half of the previous century onward, the Jewishimmigration from Bohemia and Moravia had been importing a ratherurbanized population speaking Western Yiddish, or even Jüdischdeutsch,whereas Eastern Yiddish speaking Galician Jews inhabiting EasternHungary represented the westernmost branch of Eastern European Jewry.Not only were the linguistic features of the two groups strikingly different,but also their social, economic and cultural background.

In the social and economic fields, Hungary met a first wave ofmodernization in the 1830s and 1840s, which is referred to as the reformage, reaching its peak in the 1848-49 revolution. After the so-calledCompromise with Austria in 1867, the consequence of which had been thecreation of the Austro-Hungarian Empire with a dualistic system, the mosturbanized parts of the country showed an especially remarkable economicand cultural growth.

Parallel to the phenomenon of general modernization, the Jewishpopulation underwent a similar process to the one we have already seenapropos of the French and German Jewry that had gone through thesesocial changes fifty years earlier. The second quarter of the century alreadywitnesses a few Jewish thinkers, mainly rabbis arriving from Germany orBohemia, and bringing modern ideals with them. Yet, their effect cannot beperceived on a larger social scale before the last third of the century.

A few differences should, however, be noted between German andHungarian Haskala. First, for the larger society into which Hungarian Jews

Weak Interactions 129

wished to integrate, Enlightenment was not so much the consequence of theEmbourgeoisement, rather its catalyst. Enormous heterogeneities in thedegree of development could be found within the country, both in social, aswell as economic terms. This general picture was paralleled with aheterogeneous distribution of Eastern and Western type of Jewry. Thus,even if the most Europeanized Jews may have wished, they could notdisown their pre-Haskala coreligionists living close to them.

Moreover, the modern Hungarian society and culture had to be createdin spite of the Austrian occupation. Social constructs underwent hugechanges, and any group of people identifying themselves as Hungarian -and not Austrian - could influence the new shapes of society and culture.Immigrants from all directions played a fundamental role in laying downthe bases of modern Hungarian urban culture. These are the circumstancesunder which most of the Jews chose the Hungarian, rather than the Germanor Yiddish culture and language. This decision was far from being evident.Even most of the orthodoxy adopted Hungarian, though more slowly andby keeping simultaneously Yiddish.

By putting together the pieces, we obtain an image in which thedynamically changing Hungarian culture and society is searching new,modern forms, and is ready to integrate foreign influences - as long as thecarriers identify themselves as new Hungarians. Further, a major part of theJewish population is seeking its place in this new society, wants to adoptthe new culture, but is still strongly connected - often against its will - tothe pre-Haskala Jewry living not so far from them. Consequently, we haveboth a high “eligibility” for being influenced on the part of the Hungarianlanguage, and a large number of “exchange particles” flowing from Yiddishto Hungarian.5

What is the outcome of such a situation? Let us consider a few examplesof Yiddishisms in Hungarian. I shall distinguish between three registers thatYiddishisms entered considerably: the Jewish sociolect of Hungarian, argot(slang), and standard Hungarian.

The vocabulary of Hungarian speaking Jews unsurprisingly includes alarge number of words specific to domains of Jewish culture and religion.

In some cases only phonological assimilation takes place. TheHungarian phonological system lacks a short /a/, and the short counterpartof /��� is /�/. Therefore the Yiddish word [��������] (‘Rosh Ha-shana, nameof the Jewish New Year’, from Hebrew [��� ������], i.e. [���� �������� instandard Hungarian Ashkenazi pronunciation) becomes optionally[��������]. Although the original Yiddish pronunciation [��������] is still

130 Tamás Bíró

possible, the latter emphasizes the foreign origin of the word. An analogousexample is the word barchesz ([������] or [������], ‘chala, a special breadused on Shabbat and holidays’), which is clearly from Yiddish origin, but isunknown outside Hungary; it may have belonged to the vocabulary ofHungarian Yiddish.

Other words immediately underwent Hungarian morphologicalprocesses. In fact, it is a well known phenomenon in many languages of theworld that borrowed verbs, unlike borrowed nouns, cannot be integrateddirectly into the vocabulary of a given language. This is the case in wordslike lejnol (‘to read the Torah-scroll in the synagogue’), lejnolás (‘thereading of the Torah-scroll’) as well as snóder (‘money given as donation’),snóderol (‘to donate money, especially after the public Torah-reading’),snóderolás (‘the act of money donation’). In the first case, the Yiddish verbleyenen (‘idem’)6 was borrowed and one of the two most frequentdenominal verbal suffixes, -l, was added.7 The word lejnolás is the nomenactionis formed with the suffix -ás. The expression tfilint légol (‘to put onthe phylacteries’) originates from German and Yiddish legen, and has gonethrough the same processes. For snóderol, Hungarian borrows a Yiddishnoun,8 which then serves as the base of further derivations.

The Jewish sociolect of Hungarian includes further lexical items, whichdo not belong to the domain of religious practice or Jewish culture. Onesuch word is unberufn (‘without calling [the devil]’), which should beadded out of superstition to any positive statement that the speaker hopes toremain true in the future. For instance: ‘My child grows in beauty,unberufn’ (Blau-Láng, 1995:66). Nowadays, many people of the generationborn after World War II and raised already in an almost non-Yiddishspeaking milieu judge this expression as having nothing to do withsuperstition, but qualifying a situation as surprisingly good, like ‘You don’tsay so! It’s incredible!’ and definitely including also some irony.9 Others ofthat generation say in the same surprising-ironic context: “My grandmawould have said: unberufn…”, even if Grandma had used that word in aslightly different way. This second meaning of unberufn clearly lacks anyreference to superstition, since the same people would use anotherexpression (lekopogom) to say ‘touch wood! knock on wood!’.

Unlike the previous interjections, the adjective betámt (‘nice, intelligent,smart, sweet, lovely’) already enters the “real” syntax of the targetlanguage, even if morphological and phonological changes have not takenplace yet - which happened in the case of lejnol and snóderol. The wordbetámt consists of the Hebrew root taam (‘taste’), together with the

Weak Interactions 131

Germanic verbal prefix be- and past participle ending –t. The resultingword denotes a person who “has some taste”: somebody who has somecharacteristic traits, who is interesting, who has style and some sense ofhumour, which is kind, polite, and so on. It is typically used by “Yiddishemammes” describing the groom they wish their daughter had.

So far, we have seen examples where the language changing populationhas kept its original expression to denote something that could be bestexpressed using items of their old vocabulary. This Jewish sociolect hasbecome an organic part of modern Hungarian, acknowledged, and partiallyknown by many non-Jewish speakers, as well. But do we also findinfluences of Yiddish outside of the Jewish sociolect?

The register that is the most likely to be affected under suchcircumstances is probably always slang: it is non-conformist by definition,and, therefore, it is the least conservative. Slang is also the field wheresocial norms, barriers and older prejudices play the least role. This may bethe reason why Hungarian slang created in the nineteenth century borrowedso much from the languages of two socially marginal groups: the Gipsy(Roma) languages and Yiddish. In contemporary Hungarian slang, one canfind well-known words from Yiddish origin such as: kóser (‘kosher’,meaning ‘good’ in slang); tré (‘bad, crappy, grotty’, from Hebrew-Yiddish-Hungarian tréfli ‘ritually unclean, non kosher food’); majré (‘fear, dread,rabbit fever’, from Hebrew mora ‘fear’ > Ashkenazi [�����] > Yiddishmoyre [�����] > Hungarian [������]), further derived to majrézik (‘to fear,to be afraid of sg.’); szajré (‘swag, loot, hot stuff’, from Hebrew sehora������� ���� ������� � � �� � ��� �� et al., 1967-76). An interestingconstruction is stikában, meaning ‘in the sly, in secret, quitely’. Its origin isthe Aramaic-Hebrew noun ����] ‘remaining silent’, which receives aHungarian inessive case ending, meaning ‘in’.

Through slang, some of the Yiddish words have then infiltrated into thestandard language and become quasi-standard. Thus, the word haver - fromthe Hebrew �����] ‘friend’ - is used nowadays as an informal synonym fora ‘good acquaintance, a friend’. Similarly, dafke means in spokenHungarian ‘For all that! Only out of spite!’. Furthermore, there are wordsof Yiddish origin which did not enter Hungarian through the slang, butthrough cultural interaction: macesz (‘matzo, unleavened bread’, fromHebrew matzot, plural form of matza; its ending clearly shows that theword arrived to Hungarian through Yiddish) or sólet (‘tsholent’, a typicallyHungarian Jewish bean dish, popular among non-Jews, too).10

132 Tamás Bíró

To summarize, the high amount of “exchange particles”, that is, Jewishpeople gradually changing their language from Yiddish to Hungarian, hasaffected the target language in three manners. One of them has been thecreation of a special Jewish sociolect. This was not a secret languagethough, and non-Jews have borrowed quite a few expressions. This fact ledto the second manner of influence, namely to the high amount of Yiddishwords entering the slang. Some of these words have infiltrated even into therelatively more informal registers of the standard language. The thirdmanner is cultural interaction: the exchange of cultural goods - for instancein the field of gastronomy - inevitably has resulted the exchange of thevocabulary designating those goods.

2.2. Type µ: Yiddish and Modern Hebrew

The fruit of Western European Haskala in the field of science was the birthof Wissenschaft des Judentums. The Jewish scholars belonging to thisgroup aimed to introduce modern approaches when dealing with traditionaltexts, Jewish history, and so forth. Their approach contrasted traditionalrabbinical activity the same way as the romanticist cantorial compositionsby Salomon Sulzer and Louis Lewandowski contrasted traditional synagogalmusic: modernists aimed to produce cultural goods that were esteemed bythe modern society, both by Jews and the recipient country. A furthermotivation of the Wissenschaft des Judentums was to expose the values ofpost-Biblical Jewish culture, and to present them as an organic part ofuniversal culture: by emancipating Jewish past, they hoped to be alsoemancipated by contemporary society.

This background illuminates why early Haskala honored so muchHebrew - the language of the contribution par excellence of the Jewishnation to universal culture, which is the Hebrew Bible, and a language thathad been long studied by Christian Hebraists. And also why Yiddish, thesupposedly jargon of the uneducated Jews and a corrupt version of German,was so much scorned in the same time.

Although the goal of the earlier phases of Haskala was to promote theliterary language of the recipient country among Jews, that is practicallyHochdeutsch, and Hebrew was principally only the object of scholarlystudy, still some attempts were made to use the language in moderndomains, at least for some restricted purposes. After a few pioneeringexperiments to establish Hebrew newspapers in the middle of the

Weak Interactions 133

eighteenth century, the Hebrew literary quarterly Ha-Meassef appeared asearly as 1784 (Sáenz-Badillos, 1993:267).

However, it was not before the middle of the next century, whenHaskala reached Russia, that the need of reviving the Hebrew language wasreally articulated. As already discussed, the major reasons for this switchwere that the Jewish population did not see the underdevelopedsurrounding society as a model to which they wanted to assimilate; theRussian society and policy did not show any real sign of wanting toemancipate and integrate Jews, either; furthermore, the huge Jewishpopulation reached the critical mass required to develop something in itself.The summation of these factors led to the idea of seeing Jewry as separate anation in its modern sense. A further factor reinforcing Jewish nationalfeelings both in Eastern and Western Europe was the emergence of modernpolitical anti-Semitism in the 1870s in the West, accompanied by eventssuch as the huge Russian pogroms in 1881, the blood libel of Tiszaeszlár,Hungary (1882-3) or the Dreyfus-affair in France (starting in 1894).

The claims following from this idea were that the Jewish nation has theright to have a country - in Palestine or elsewhere, but at least it shouldreceive some local autonomy - , and also that the Jewish nation must haveits own national language. The two major candidates for the Jewishnational language were Yiddish and Hebrew, although German was not outof the competition, either (cf. e.g. Shur, 1979:VII-VIII).

The first wave of attempts to revive Hebrew consisted mainly of purists,seeing Biblical Hebrew as the most precious layer of the language: some ofthem went so far that they preferred to create very complicated expressionsto designate modern concepts, rather than using non-Biblical vocabulary.The fruits of this early period are among others the first regular Hebrewweekly, Ha-Maggid (1856), the first modern play by D. Zamoscz (1851),novels by A. Mapu, as well as works of S. J. Abramowitsch (MendeleMoykher Seforim), who can be considered one of the founders of bothmodern Hebrew and modern Yiddish literature.

The real upswing was observable in the last quarter of the century,especially after the 1881 pogroms, and when Haskala had reached thebroadest masses, as well. Traditionally, the publication of Eliezer Ben-Yehuda’s article in 1879 entitled ‘A burning question’ is considered to bethe opening of the new era (Sáenz-Badillos, 1993:269). Ben-Yehuda (1858-1922) has been portrayed as the hero of the revival: he moved to Jerusalemin 1881, where he forced himself and his family to speak Hebrew. To speaka language, that is to produce everyday, spontaneous sentences “in real-

134 Tamás Bíró

time”, on a language that had been mostly used for writing and reading andonly in restricted domains. His son, Ithamar (1882-1943), was the firstperson after millennia who grew up in an exclusively Hebrew-speakingenvironment. Ben-Yehuda constantly introduced new words designatingweekday concepts, while he was editing a newspaper and working on hismonumental Thesaurus, which incorporated material from ancient andmedieval literature. In 1890, he founded the Va’ad ha-Lashon (‘LanguageCommittee’), the forerunner of the Hebrew Language Academy, herebycreating a quasi-official institution for language planning.

However, Shur (1979) has argued against an overestimation of Ben-Yehuda’s role. Out of Fishman’s five stages of language planning (in Shur,1979) (1. code selection; 2. ideologization of the choice; 3. codification; 4.elaboration and modernization; 5. standardization, i.e. the acceptance by thecommunity), Ben-Yehuda was salient especially in codification andelaboration, as well as in vitalization, which was also necessary under thegiven circumstances. But for socio-political reasons, he had not muchinfluence on the initial language choice and its ideologization, as well as onthe final acceptance of the codified and elaborated standard.

It is clear that Yiddish was the mother tongue, or one of the mainlanguages for a major fraction of the members of the Va’ad ha-Lashon,including Ben-Yehuda himself. Moreover, people with Yiddish as firstlanguage represented an important part of the speaker community of theold-new tongue in the first half of the twentieth century. Yiddish was notscorned anymore, as it had been a century before, but it was not consideredas a major source for language reform, either. Especially for the latergenerations, Yiddish would symbolize the Diaspora left behind by theZionist movement.

Yiddish speaking “exchange particles” dominated the community, muchmore than in the Hungarian case. Yet, a very conscious ideology requiredchanging the previous ethnic language to the old-new national language,especially after the 1913-14 “Language Quarrel”, wherein the defenders ofHebrew defeated those of German and Yiddish (Shur, 1979:VII-VIII, X).This ideology was actively present in almost each and every individual whohad chosen to move to the Land of Israel in a given period - contrary to theEuropean case, where ideology of changing the language was explicit onlyin the cultural elite. Further, the language change was not slow and gradual,but drastic in the life of the people emigrating to Palestine, combined with asimultaneous radical change in geographical location, social structure andlifestyle. What phenomena would this constellation involve?

Weak Interactions 135

Yiddish influence on Modern Hebrew vocabulary has been investigatedby - among others - Haim Blanc. For instance, the Modern Hebrewinterjection davka (approx. ’necessarily, for all that’) is clearly aHebraisation of Yiddish dafke, of Hebrew origin itself, and mentioned alsoin relation with Hungarian. Similarly, kumzitz ‘get-together, picnic,campfire’ undoubtedly originates from the Yiddish expression ‘come [and]sit down!’, since only in Yiddish do we find [u] in the verb ‘to come’.However, the expression was probably coined in Hebrew, as standardYiddish dictionaries do not mention it. One can easily imagine the earlypioneers sitting around a campfire in the first kibbutzim, chatting in amixture of Yiddish and Hebrew, and inviting their comrades to join them.

Nissan Netzer (1988) analyses the use of the Modern Hebrew verbfirgen and the corresponding de-verbal noun firgun. Officially, the word isstill not considered to belong to the language, for it is not attested in anydictionary of Hebrew that I know. Definitions for this word I have found onthe Internet are: “the ability to allow someone else to enjoy if his or herenjoyment does not hurt one,” and “to treat favorably, with equanimity, tobear no grudge or jealousy against somebody,” and also “to be delighted atthe success of the other”. The word can be traced back to Yiddish farginen‘not begrudge, not envy, indulge’. As Netzer has demonstrated, there is alinguistic gap in Hebrew, for the expressions darash et tovato shel… or lohayta eno tsara be- that should bear that meaning are cumbersome,circuitous, overly sophisticated in style and seems to cloud the truelinguistic message. Therefore, they were not accepted by the linguisticcommunity. When a leading Hebrew linguistics professor used the Yiddishequivalent in the early sixties, the situation made the listeners of anacademic lecture smile, because in that time the Yiddishism was consideredto be a folk idiom that would finally withdraw in favor of a “real Hebrewexpression”. However, firgen would have become more and more acceptedin daily conversation and even in journalistic writings by the eighties.11

This example has led us to the issue of the sociolinguistic status ofYiddish words in Modern Hebrew. Ora Schwarzwald (1995) shows that thevocabulary of the most used classical texts, such as the Hebrew Bible andliturgy, has become the base of Modern Hebrew, in all its registers.Furthermore, loanwords of European languages are also used both informal and non-formal language. However, from less esteemed languages,such as Jewish languages (e.g. Yiddish and Ladino), as well as Arabic,words would infiltrate primarily into lower registers and everyday informalspeech.

136 Tamás Bíró

For instance, chevre ‘friends’ is used mainly when addressinginformally a group of people, and it is the borrowing of the similar word inYiddish (khevre ‘gang, bunch of friends, society’). The latter obviouslycomes from Hebrew chevra ‘society, company, gathering’, whose root ischaver ‘friend’, a well-known word for speakers of Hungarian and Dutch(gabber), too. The originally Hebrew word thus arrived back to ModernHebrew, but keeping the phonological traces of its trajectory. Also note theminor shifts in the semantics during the two borrowings.

Another example for Yiddish influence on informal speech is the use ofthe -le diminutive suffix: abale from aba ‘dad’, Sarale ‘little Sarah’,Chanale ‘little Hanah’, and so forth. Observe that the suffix follows theHebrew word, whereas in Yiddish one would have Sorele and Chaneleexpect.

Thus, the influence of Yiddish on Modern Hebrew is indeed similar toits influence on Hungarian: lower registers and informal speech constituteone of the canals through which this interaction takes place. To make thesimilarity even more prominent, we can point to two further canals, sharedby the Modern Hebrew case and the Hungarian case. Similarly toHungarian, the designation of goods of general culture, such as food names(beygelach ‘bagels or pretzel’) represent a domain for word borrowings.Moreover, Yiddish loan words, or Hebrew words with a Yiddish orAshkenazi pronunciation are likely to appear in religious vocabulary (e.g.rebe ‘Chasidic charismatic leader’); typically in the sociolect of religiousgroups (especially within the ultra-orthodox society), and in the languageused by secular Israelis to mock the stereotypically Yiddish-speaking ultra-orthodox Jews (e.g. dos ‘an ultra-orthodox person’, from Hebrew dat‘religion’; vus-vus-im ‘the Ashkenazi ultra-orthodox Jews’, who often sayVus? Vus? ‘What? What?’ followed by the Hebrew plural ending -im).

2.3. Type τ: Yiddish and Esperanto

Esperanto emerged in the very same context as Modern Hebrew. Its creator,Lazar Ludwik Zamenhof (1859-1917), was born one year after Eliezer Ben-Yehuda, similarly from a Jewish family living in a small Lithuanian town,whose population was composed of Russian, Polish and Lithuanian people,but was dominated by a Jewish majority. The Litvak (Lithuanian-Jewish)Haskala background of both men encouraged traditional Jewish educationcombined with studies in a secular Gymnasium; both of them went on to

Weak Interactions 137

study medicine. Following the 1881 wave of pogroms, in the year in whichBen-Yehuda moved to Jerusalem, Zamenhof published an article calling formass emigration to a Jewish homeland. For a few years, he became one ofthe first activists of the early Zionist movement Hovevei Tzion (“Lovers ofZion”). Berdichevsky (1986) points out the similarities even in thementality and the physical appearance of Zamenhof and Ben-Yehuda.

Nevertheless, two key differences should be pointed out. The first one isZamenhof’s pragmatism. In his 1881 article, Zamenhof imagined theJewish homeland to be in the western part of the United States, a relativelyunsettled area those days, which would have arisen much less sensibilityfrom all sides. Furthermore, Zamenhof shared the skepticism of many ofhis contemporaries in the feasibility to revive the Hebrew language.According to the anecdote, Theodor Herzl said once that he could not buyeven a train ticket in Hebrew. Leading Jewish writers, such as MendeleMoykher Seforim, oscillated between writing in Yiddish and in Hebrew;both of these languages called for the establishment of a modern, secularliterary tongue. The young and pragmatic Zamenhof chose to reformYiddish, the language with millions of native speakers; whereas the firstnative speaker of Modern Hebrew, the son of Ben-Yehuda was not bornyet.

In his early years, Zamenhof wrote a comprehensive Yiddish grammar(completed in 1879, partially published in 1909 in the Vilna Journal, Lebnun Vissenschaft, and fully published only in 1982). He argued for themodernization of the language and fought for the use of the Latin alphabet,instead of the Hebrew one. How is it possible then that a few years laterZamenhof changed his mind, and switched to Esperanto (1887)?

Here comes the second key difference into the picture. Ben-Yehuda wassent by his orthodox family to a yeshiva (traditional school teaching mainlythe Talmud), where one of the rabbis introduced him secretly into therevolutionary ideas of the Haskala. On the contrary, Zamenhof’s father andgrandfather were enlightened high-school teachers of Western languages(French and German). For him, being Jewish probably meant a universalmission to make the world a better place for the whole humankind. Thisidea originates from eighteenth century German Haskala philosophersclaiming that Judaism is the purest embodiment so far existing of theuniversal moral and of the faith of the Pure Reason; even today a major partof Jews worldwide perceive Judaism this way.

Zamenhof did not therefore content himself with the goal of creating aJewish national language. For him, similarly to his semi-secularized

138 Tamás Bíró

coreligionists joining the socialist movement in the same decades, unifyingthe human race and building a new word order presented the solution for -among others - the problems of the oppressed Eastern European Jewry.And also the other way around: the secular messianic idea of the unificationof the dispersed and oppressed Jews into a Jewish nation was just one stepbehind from the secular messianic idea of the unification of the wholemankind into a supra-national unit. This explains not only the motivationsof Zamenhof himself, but also why Jews played such an important role inthe pre-World War II Esperanto movement in Central and Eastern Europe(Berdichevsky, 1986:60). Whereas socialists fought for a social-economicliberation of the oppressed, Zamenhof spoke about the liberation of thehumans from the cultural and linguistic barriers. It is not a coincidence thatthe twentieth century history of the Esperantist movement was so muchintermingled with the one of the socialist movements.

Zamenhof’s initiative was to create a language that would be equallydistant from and equally close to each ethnic language, thus each humanbeing would have equal chance using this bridge connecting cultures andpeople. Hence Zamenhof created a vocabulary and a grammar usingelements of languages he knew: Russian (the language his father spokehome and the language of his highschool), German and French (thelanguages his father and grandfather were teachers of), Polish (the languageof his non-Jewish fellow children), Latin and Greek (from highschool), aswell as English and Italian. Note that the resulting language, similarly tomost artificial languages, is inherently European and Indo-European in itscharacter, though extremely simplified.

However, one should not forget that Zamenhof’s native tongue wasYiddish, this was the language he used with his school mates in the Jewishprimary school (kheyder, cf. Piron, 1984), and most of his life he keptcontact with circles where Yiddish was alive. So one would wonder whyYiddish is not mentioned overtly among the source languages of Esperanto.Seeing Zamenhof’s former devotion for the Jewish sake and the Yiddishlanguage, as well as his later remark that Yiddish is a language similar toany other (in Homo Sum (1901), cf. Piron (1984:17) and Berdichevsky(1986:70)), the possibility that he despised “the corrupt version of German”or that he felt shame at his Yiddish origins, are out of question.

The challenging task now is to find at least covert influences of Yiddishon Esperanto.

As strange as it may sound, a considerable literature has been devoted toetymology within Esperanto linguistics. One of the biggest mysteries is the

Weak Interactions 139

morpheme edz. As a root, it means ‘married person’ (edzo ‘husband’;edzino ‘wife’, by adding the feminine suffix -in-). While as a suffix, it turnsthe word’s meaning into the wife or husband of the stem: lavistino ’washer-woman’ vs. lavistinedzo ‘washerwoman’s housband’; doktoro ‘doctor’ vs.doktoredzino ‘doctor’s wife’. Hungarian Esperantists have tried to use thissuffix to translate the Hungarian suffix -né (‘wife of…’, e.g.: Deákné ‘wifeof Deák, Mrs. Deák’; cf. Goldin (1982:28)). The phonemic content of themorpheme is not similar to any word with related meaning in any of thelanguages that Zamenhof might have taken into consideration.

Zamenhof himself wrote in a letter to Émile Boirac that the morphemewas the result of backformation, and that originally it was a bound form(Goldin, 1982:22f). Boirac suggested in 1913 the following reconstruction:if the German Kronprinz (‘heir apparent’) became kronprinco in Esperanto,while Kronprinzessin (‘wife of a crown prince’, note the double feminineending: the French feminine suffix -esse is followed by the Germanicfeminine -in) turns to kronprincedzino, then the ending -edzin- can beidentified as ‘a woman legally bound to a man’. By removing the femininesuffix -in-, we obtain the morpheme -edz-. Goldin adds to this theory thatthe morphemes es and ec had already been used with other meanings, thatis why the surprising [��] combination appeared. Summarizing, theetymology of the Esperanto morpheme edz would be the French feminineending -esse, which had been reanalyzed with a different meaning due tothe additional feminine suffix in German.However, this is not the end of the story. Other alternatives have been alsoproposed. Waringhien and others (in Goldin, 1982) have brought forwardthe idea that the word serving as the base of backformation was the Yiddishword rebetsin (‘wife of a rabbi’). In fact, this word can be reanalyzed asreb+edz+in, and we obtain the edz morpheme using the same logic asabove. Goldin’s counterargument that the Yiddish word is actually rebetsnwith a syllabic [���] is not at all convincing: old Yiddish spelling often usesthe letter yod to designate a schwa, or even more the syllabicity of an [���],similarly to the <e> in German spelling, like in wissen. Consequently, I canindeed accept the idea that a pre-YIVO spelling rebetsin was in the mind ofZamenhof.

Piron (1984) adds further cases of possible Yiddish influence. In wordstaken from German, the affricate [���] always changes to [�]: German pfeifen‘to whistle’ became Esperanto fajfi. This coincides with Yiddish fayfn.Though, one is not compelled to point to Yiddish as the origin of this word:the reason can simply be that the affricate [���] is too typical to German, not

140 Tamás Bíró

occurring in any other languages that served “officially” as examples forZamenhof. In other words, [���] was not seen as universal enough. But whatabout the consonant clusters ���], ���], ���], which are also characteristicsolely to German (and to Yiddish)? May the solution be that while [���]becomes [�] in Yiddish, these clusters are unchanged; therefore, Zamenhoffelt less discomfort with regard to the latter clusters than with regard to [���]which truly occurs exclusively in German? I do not believe that we can domore than speculate about the different unconscious factors acting within aperson more than a hundred years ago. The only claim we can make is thatsome of these factors must have been related to Yiddish, as expected fromthe fact that Yiddish was one of the major tongues of Zamenhof.

In the field of semantics, Piron brings the differentiation in Esperantobetween landa (‘national, related to a given country’, adjective formedfrom lando ‘country’) as opposed to nacia (‘national, related to a givennation’, adjective from nacio ‘nation’). This differentiation exists inYiddish (landish and natsional), but not in any other languages thatZamenhof might have taken into consideration. Piron also argues againstthe possible claim that this is not a Yiddish influence, rather an innerdevelopment related to the inner logic of Esperanto.

The most evident example of Piron is Esperanto superjaro ‘leap year’, acompound of super ‘on’ and jaro ‘year’. No known language uses thepreposition on or above to express this concept. However, Yiddish hasiberyor for ‘leap year’, from Hebrew ibbur (‘making pregnant’), the termused in rabbinic literature for intercalating an extra month and making theyear a leap year (e.g. Tosefta Sanhedrin 2:1-7). On the other hand, iber alsomeans ‘above’ in Yiddish, which explains the strange expression inEsperanto. I do not know if Zamenhof realized that the Yiddish expressioniberyor is not related to German über, but this is probably not relevant.

Let us summarize this section. Yiddish influence on Esperanto is a casewhere there is only one exchange particle - in the first order approximation,at least, since we have not dealt with the possible influences related to thenumerous later speakers of Esperanto of Yiddish background. Though, thisone particle had a huge impact on the language for a very obvious reason.Even if he did not overtly acknowledge that Yiddish had played a role increating Esperanto, it is possible to discover the - either consciously hiddenor unconscious - traces of Yiddish.

Did Zamenhof want to deny that he had also used Yiddish, as a buildingblock of Esperanto? Perhaps because his goal was indeed to create auniversal, supra-national language, and not the language of the Jewish

Weak Interactions 141

nation? Or, alternatively, was this influence unconscious? I do not dare togive an answer.

3. Conclusion

In linguistics, we could define weak interaction as an interaction that is notovertly acknowledged. No one would deny the influence of the French-speaking ruling class on medieval English, or the impact of the Slavicneighbors on Hungarian. But sometimes, conscious factors hide the effect.Yet, weak interactions are as crucial for the development of a language, asthe nuclear processes emitting neutrinos in the core of the Sun that producethe energy which is vital for us.

We have seen three cases of weak interaction between languages. Infact, all three stories were about the formative phase of a new ormodernized language, in the midst of the late nineteenth century EasternEurope Jewry. In the cases of Yiddish influencing Hungarian and ModernHebrew, the number of “exchange particles”, that is, the amount of initiallyYiddish-speaking people joining the new language community, wereextremely high: roughly one tenth of the Hungarian speaking population innineteenth century Hungary, and probably above 50% of the Jews living inearly twentieth century Palestine. Nonetheless, in both cases we encounteran ideology promoting the new language and disfavoring Yiddish.

Because the level of consciousness of this ideology seems to beinversely proportional to the ratio of “exchange particles” - stronger inPalestine than in Hungary - , the two factors extinguish each other, and wefind similar phenomena. For instance, Yiddish has affected first andforemost lower registers, which are less censored by society; therefrom itinfiltrates into informal standard language. Additional trends are Yiddishwords entering specific domains, such as gastronomy or Jewish religiouspractice. Although it is essential to note that not all concepts that are new inthe target culture are expressed by their original Yiddish word: many newexpressions in these domains have been coined in Hungarian and ModernHebrew, and accepted by the language community.

The third case that we have examined is different. Zamenhof was asingle person, but as the creator of Esperanto, he had an enormousinfluence on the new language. The influence of Yiddish was again weak inthe sense that it was not overtly admitted; however, we could presentexamples where the native tongue of Zamenhof influenced the new

142 Tamás Bíró

language. We could have cited, as the articles mentioned had done,numerous further instances where the influence of Yiddish cannot beproven directly, the given phenomenon could have been taken from otherlanguages, as well; however, one can hypothesize that Yiddish played -consciously or unconsciously - a reinforcing role in Zamenhof’s decisions.

I do hope that I have been able to prove to the reader that seeminglyvery remote fields, such as physics, social history and linguistics, can beinterconnected, at least for the sake of a thought experiment. Furthermore,“exchange particles” in the field of science, and Tjeerd is certainly amongthem, have hopefully brought at least some weak interaction among thedifferent disciplines.

Notes

1 According to http://cupp.oulu.fi/neutrino/nd-mass.html, the mass of the

electron neutrino (νe) is less than 2.2 eV, the mass of the muon neutrino (νµ)does not exceed 170 keV, while the mass of the tau neutrino (νπ) is reported tobe bellow 15.5 MeV. For the sake of comparison, the mass of an electron is 511keV, while the mass of a proton is almost 940 MeV.

2 Physical phenomena are thought to be reducible to four fundamental forces.These are gravity, electromagnetism, weak interaction and strong interaction.The last two play a role in sub-atomic physics.

3 The photons (particles of the light) are the exchange particles for theelectromagnetic interaction; the hypothetical gravitons should transmit gravi-tation; in the case of the weak interaction, the W +, W - and Z vector bosons playthat role; whereas the strong interaction is mediated by pions.

4 Targumim (plural of targum) are the Jewish Aramaic versions of the HebrewBible from the late antiquity, including also many commentaries beside thepure translation. The same way as late antiquity Jews created the commentedtranslation of the Holy Scriptures to their native tongue and using their way ofthinking, Moses Mendelssohn expected his version of the Bible to fit themodern way of thinking and the “correct language” of its future readers.Obviously, the Biur should first have to fulfil its previous task, namely to teachthe modern way of thinking and the “correct tongue” to the first generation ofits readers. Interestingly enough, script was not such a major issue for Mendels-sohn as “language purity”, thus he wrote Hochdeutsch in Hebrew characters; inorder to better disseminate his work among the Jewish population.

5 I assume that the formative phase of modern Dutch society and culture in the17th and 18th century is comparable to that of 19th century Hungary; even more

Weak Interactions 143

is so the role of Jewry in both countries, as a group which was simultaneouslyintegrating into the new society and also forming it. In both cases, the presenceof the continuous spectrum from the pre-Haskala Yid to the self-modernizingIsraelite led to a gradual, though determined giving up of the Yiddish language.This socio-historical parallelism could partially explain why phenomena ofYiddish influence on Dutch are often similar to that on Hungarian.Concerning Dutch-Jewish linguistic interactions, readers interested in Jewishaspects of Papiamentu, a creole language spoken in the Netherlands Antilles,are referred to Richard E. Wood’s article in Jewish Language Review 3(1983):15-18.

6 The etymology of the Yiddish word itself is also interesting. The origin is thelate Latin or Old French root [���] ‘to read’ (cf. to Latin lego, legere, modernFrench je lis, lire), which was borrowed by the Jews living in early medievalWestern Europe. The latter would then change their language to Old HighGerman, the ancestor of Yiddish. At some point, the meaning of the Old Frenchword was restricted to the public reading of the Torah-scroll in the synagogue.

7 Compare to sí ‘ski’ > síel ‘to ski’, ��� ‘fire’ > tüzel ‘to fire’; also: printel ‘toprint with a computer printer’. It is extremely surprising that the word lejnoldoes not follow vowel harmony, one would expect *lejnel. Even though the [�]sound can be transparent for vowel harmony, this fact is not enough to explainthe word lejnol. Probably the dialectal Yiddish laynen was originally borrowed,and this form served as the base for word formation, before the official Yiddishform leynen influenced the Hungarian word. Some people still say lájnol.

8 When being called to the Torah during the public reading, one recites ablessing, the text of which says: “He Who blessed our forefathers Abraham,Isaac and Jacob, may He bless [the name of the person] because he has come upto the Torah / who has promised to contribute to charity on behalf of… etc.”The part of the text ‘who has promised’ sounds in the Ashkenazi pronunciation[�� ������]. This is most probably the source of the word snóder, after vowelin the unstressed last syllable has become a schwa, a process that is crucial forunderstanding the Yiddishization of Hebrew words. The exciting part of thestory is that the proclitic [��] (‘that’) was kept together with the following finiteverbal form ([����] ‘he promised’), and they were reanalysed as one word.

9 When I asked people about the meaning of unberufn on the mailing list [email protected], somebody reported that hernon-Jewish grandmother also used to say unberufn with a similar meaning.

10 Other Hungarian words of Hebrew origin do not come from Yiddish, as shownby their non-Askenazi pronunciation: Tóra ([���] ‘Torah’, as opposed to itsYiddish counterpart Toyre) or rabbi (and not rov or rebe). Words like behemót(‘big hulking fellow’), originally from Biblical Hebrew behema (‘cattle’, plural:

144 Tamás Bíró

behemot; appearing also as a proper name both in Jewish and in Christianmythology) should be rather traced back to Christian Biblical tradition.

11 Note, that the word has kept its original word initial [�], without transforming itinto [�], which would have been predicted by Hebrew phonology. Althoughthis is a remarkable fact for Netzer, it turns out that almost no word borrowedby Modern Hebrew would change its initial [�] to [�]. Even not verbs that havehad to undergo morpho-phonological processes (e.g. fibrek from English tofabricate). The only exception I have found in dictionaries is the colloquialform pilosofiya for filosofiya ‘philosophy’, as well as the verb formed from it,pilsef ‘to philosophise’. Furthermore, it can be argued that pilosofiya is not evena modern borrowing. The only reason why one would still expect firgen�� ������� � ����� ����� �� �� � �������� �� ��� � �� ��� ������� �� � �� ��� ����� ����� �� � ���� �� � �� � ���� �� ���������� ��� �� �� ��� ����� ����� ���� � � ������� ��]. On the other hand, onemay claim that /�/ and /�/ should be considered as distinct phonemes in ModernHebrew, even if no proposed minimal pair that I know of is really convincing.

References

������ ���� �� � ������ ������ ����� ������ A Magyar Nyelv Történeti-Etimológiai Szótára [The Historical-Etymological Dictionary ofThe Hungarian Language]. Akadémiai Kiadó, Budapest.

Berdichevsky, Norman (1986). Zamenhof and Esperanto. Ariel, A Review ofArts and Letters in Israel, 64: 58-71.

Blau Henrik, Károly Láng (1995). Szájról-szájra, Magyar-jiddis�������������� � !� ��"�# $%&�� �'�!���� ���(2.

Goldin, Bernard (1982). The Supposed Yiddish Origin of the EsperantoMorpheme. edz, Jewish Language Review, 2: 21-33.

Graaf, Tjeerd, de (1969). Aspects of neutrino astrophysics. Wolters-Noordhoffnv, Groningen.

Netzer, Nissan (1988). “Fargen” - Employing a Yiddish Root to Bridge aLinguistic Gap in the Hebrew Language (in Hebrew, withEnglish abstract). Hebrew Computational Linguistics, 26: 49-58.

Piron, Claude (1984). Contribution à l’étude des apports du yidiche àl’ésperanto. Jewish Language Review, 4: 15-29.

Sáenz-Badillos, Angel (1993). A History of the Hebrew Language. UniversityPress, Cambridge.

Schwarzwald, Ora (Rodrigue) (1995). The Components of the Modern HebrewLexicon: The Influence of Hebrew Classical Sources, Jewish

Weak Interactions 145

Languages and Other Foreign Languages on Modern Hebrew (inHebrew, with English abstract). Hebrew Linguistics, 39: 79-90.

Shur, Shimon (1979). Language Innovation and Socio-political Setting: TheCase of Modern Hebrew. Hebrew Computational Linguistics, 15:IV-XIII.

Prosodic Acquisition: a Comparison of TwoTheories

Angela Grimm

1. Introduction

During language development, children’s word productions are target of avariety of prosodic processes as e.g. syllable deletions, syllable additionsand stress shift. Using current phonological theory, investigators haveexplained the production pattern in a number of different ways.

In this paper, I review two approaches to the development of wordstress: Fikkert’s (1994) theory of trochaic template mapping and Demuth &Fee’s (1995) prosodic hierarchy account. Both theories assume thatchildren build up the prosodic representation of words step-by-step, startingwith the smallest unit and ending with an adult-like representation. I arguethat both theories are problematic because they overgenerate certainstructures (e.g. level stress), but that the model of Demuth & Fee can betteraccount for the data presented so far.

This paper is organized as follows: since it is crucial in both theories,paragraph 2 briefly introduces the basic assumptions of the prosodichierarchy. In paragraph 3, I will give a survey of Fikkert’s (1994) model ofstress development and Demuth and Fee’s (1995) model based on theprosodic hierarchy. In paragraph 4, I discuss the problems arising with themodels and paragraph 5 concludes.

2. The prosodic hierarchy of words

The prosodic hierarchy up to the word level consists of four constituents.The lowest element of the prosodic hierarchy is the mora (µ). Since thereare often no segmental slots in moraic models, the mora has a doublefunction as the unit of syllable weight and as the unique sub-syllabicconstituent. The moraic level is dominated by the syllabic level (σ), andsyllables are parsed into feet (F) at the foot level above. The highest unit is

148 Angela Grimm

the prosodic word (Wd) which directly dominates the foot level (see Figure1):

Prosodic word (Wd)

Foot (F)

Syllable (σ)

Mora (µ)

Figure 4. The prosodic hierarchy (Selkirk, 1980)

Syllables differ with respect to the number of moras they contain. Lightsyllables contain one mora, while heavy syllables contain at least two. Thetendency of languages to assign stress to heavy syllables is expressed bythe Weight-to-Stress-Principle (WSP). In a parametric approach to wordstress (cf. Hayes, 1995), languages either respect this principle (quantity-sensitive languages) or do not (quantity-insensitive languages).

The next constituent of the prosodic organization above the syllablelevel is the foot. Ideally, the foot is binary branching which implies that itshould consist of two moras or of two syllables. Thus, a binary foot can bemonosyllabic if it contains two moras (e.g. ����� ‘duck’) or it can bedisyllabic if it consists of two syllables or two moras (e.g. ������ ‘papa’).The head constituent of the foot receives stress.

The prosodic word is the domain of stress application. It can alsocoincide with a single foot. Because the foot size is the smallest shape aprosodic word can have, it is called Minimal Word. Many languages haverestrictions such that content words must not be smaller than the minimalword. There is ample evidence that the minimal word restriction alsogoverns the shape of the early words in language acquisition (Demuth &Fee, 1995; Demuth, 1996; Fikkert, 1994; Ota, 2001).

A very important principle of the prosodic hierarchy is the Strict LayerHypothesis (Selkirk, 1984) which demands that layers must not be skipped,i.e. that a given prosodic constituent(n-1) is contained in the constituent(n)

immediately above. Furthermore, it requires that constituents have one andonly one head, which implies that there is always a difference inprominence among the elements forming a given prosodic unit.

Prosodic Acquisition: a Comparison of Two Theories 149

3. The acquisition of word stress: two current models

3.1. Fikkert (1994)

Fikkert’s study of Dutch children is the most detailed research on stressacquisition to date. Fikkert mainly focused on disyllabic words and arguedfor the foot as the basic unit of development.

Although Fikkert’s model is based on Dutch, she claims that thetrochaic template is universal in child language since it is the only quantity-insensitive foot in the typology of Hayes (1991). Thus, children should notshow sensitivity to syllable weight at the earliest stages of prosodicacquisition. The postulation of a universal foot template implies that thechild always makes reference to the foot level in the word productions.Consequently, it is a foot, not a syllable that is being truncated in forms likebelow:

Example 1.

child form adult target gloss����� �������� ‘ballon’����� �� ������ ‘holiday’

Fikkert assumes that the output a child produces is directed by the mappingof a melody template onto a trochaic template via prosodic circumscription.Based on phenomena such as truncation, stress shift and epenthesis, fourdifferent stages of prosodic development are postulated.

Stage 1

According to Fikkert, the child circumscribes the stressed syllable of theadult form together with its segmental material and maps it onto a trochaictemplate. The presumed representation of the child is given in Figure 2 (‘S’denotes the prominent position and ‘W’ the non-prominent position withinthe foot):

150 Angela Grimm

Wd

F

σS σW

�����

Figure 5. The prosodic representation at stage 1

Prosodic circumscription forces the child to divide the input into two parts,the kernel (i.e. the stressed syllable) and the residue. In the mappingprocess, the kernel (�����) is mapped onto the strong position in theprosodic template. The residue (/��/) becomes truncated because there areno empty positions in the template. The mapping onto the trochaic templateaccounts for the fact that, if the result of prosodic circumscription is amonosyllabic foot, sometimes a syllable is added to receive a disyllabicoutput, for example ������ instead of ��������.

Stage 2

At stage 2, the child circumscribes a trochaic foot. Thus, if the prosodiccircumscription already results in a trochee as in ���� ����/ ‘holiday’, thetrochee remains unchanged in the output and appears as �� �����. Wordsconsisting of more than a single foot are circumscribed differently. Fikkertargues that the child selects the next stressed syllable to the left in additionto the stressed final syllable. For instance, Dutch /� ��� ������ ‘crocodile’should be realized as �� �� ���� because the ultimate, main stressedsyllable and the antepenultimate, secondary stressed syllable are kept. Thedisyllabic representation is then mapped onto the trochaic templateresulting in a trochaic pattern. Since the production template still consistsof one single trochaic foot, stress shifts to the initial syllable. Therepresentation of the child is depicted in Figure 3:

Prosodic Acquisition: a Comparison of Two Theories 151

Wd

F

σS σW

����� ����

������ ���

Figure 6. The prosodic representation at stage 2

Stage 3

At stage 3, the productions are extended to two feet. According to Fikkert,the children have noticed that the target words can consist of more than asingle foot. She claims that her subjects realized two syllables of the targetword with equal prominence (level stress). However, her argument for thelevel stress stage is rather weak: she stipulates that the children have toproduce two equally stressed feet because they are unable to realize stressat word level.

The prosodic representation at stage 3 is depicted in Figure 4 below:

Wd

F F

σS σW σS σW

������ ���

Figure 7. The prosodic representation at stage 3

Since the trochaic foot still governs the productions, weak positions in thetemplate can be filled with extra syllables.

152 Angela Grimm

Stage 4

The representations are now adult-like. The word level stress has beenacquired and the child is able to operate at the level of the prosodic word.

3.2. Demuth & Fee (1995)

Demuth & Fee propose a more abstract approach which, although primarilybased on data of English acquiring children, aims to capture the prosodicdevelopment universally. The basic assumption in Demuth & Fee’s modelis that prosodic development goes along the prosodic hierarchy (see Figure1). In contrast to Fikkert, Demuth & Fee avoid the notion of prosodiccircumscription and trochaic template mapping. According to them,sensitivity to the moraic structure of the mother tongue is already therefrom the onset of word production on. They distinguish between thefollowing stages:

Stage 1

The first stage is characterized by sub-minimal (monomoraic) words. Theproductions consist of a single CV-syllable and there are no vowel lengthdistinctions yet. Thus, the phonological representation of the words also isCV.

Stage 2

At stage 2, children realize words of foot-size (Minimal Words). Stage 2 ischaracterized by three successional sub-stages: at the beginning, the foot isdisyllabic as for example in ������ ‘papa’. Second, as soon as the child isable to produce coda consonants the foot can also have a monosyllabicform, e.g. ����� ‘duck’. Third, the vowel length distinction becomesphonemic. The child is now aware of the fact that the stressed syllable ofDutch ������� ‘banana’ has to be realized with a long vowel �����, whilein � ����� ‘giraffe’ the second vowel remains short (examples from Robin,see Fikkert, 1994). Demuth & Fee assume a direct relationship betweendistinctive vowel length and the appearance of coda consonants. Thus, aCVV structure counts as sub-minimal, and a CVVCVV structure asminimal as long as the child does not produce coda consonants.

Prosodic Acquisition: a Comparison of Two Theories 153

Stage 3

Beyond the minimal word stage, syllable structure can be more complexand words can have a larger size than a single foot. This is also the stagewhere the largest progress in the development of the word stress ispredicted. The child seems to become aware that feet have to be stressedand that there are language-specific stress rules. Demuth & Fee do notassume a trochaic template. However, they adopt Fikkert’s assumption ofan obligatory intermediate stage of level stress where two feet are producedwith primary stress.

At the end of stage 3, children acquire stress at the word level and theyrealize one primary stress per word.

Stage 4

At the final stage, extrametrical (i.e. unfooted) syllables are permitted.Children at this stage operate at the level of the prosodic word.

4. Discussion of the models

Although both models can explain a number of frequently observedpatterns like syllable deletions and word size restrictions, there are anumber of empirical and theoretical problems related with the models.

First, Fikkert and Demuth & Fee assume that the prosodic developmentproceeds bottom-up, i.e. from a lower level of representation (the foot orthe mora) to the top of the prosodic hierarchy (the prosodic word). Childreninvariably have to pass trough one stage before they can go to the next. Forexample, multisyllabic words like ‘elephant’ or ‘crocodile’ have to show alevel stress pattern before they can be produced adult-like.

Fikkert explicitly points to that fact. Missing evidence in her data isexplained by the recording modalities or is due to the fact that a given stagetook a very short time. Demuth & Fee, in contrast, are not explicit withrespect to the ordering of the stages. However, they claim that prosodicdevelopment proceeds along the prosodic hierarchy. Since in the prosodichierarchy one constituent strictly dominates the constituent below, stagescannot vary with respect to their temporal order. According to the models,the following realizations for /����������� ‘crocodile’ of Jule, a girlacquiring German, should be chronologically impossible (data from myown corpus):

154 Angela Grimm

Example 2.

child form age description����� (1;08,12) the main stressed syllable is realised�������� (1;08,29) a foot with final stress is realized�������� (1;10,14) level stress emerges

As the examples illustrate, level stress can occur after a finally stressedvariant of the target word was produced, contrary to the predictions of themodels. Such an acquisition order provides empirical evidence against levelstress as an obligatory component of prosodic development. Additionalempirical support comes from the data of English acquiring childrenexamined by Kehoe & Stoel-Gammon (1997) who also could not find asystematic emergence of level stress.

Level stress as assumed in the models above is problematic also from agrammatical point of view: the representation intended to create level stress(see Figure 4 above) essentially violates the strict layer hypothesis becausethe two feet are not correctly bounded into the prosodic word. The problemis that the strict layer hypothesis never can be kept by such a representationbecause there is no gradation in prominence at the word level. According toprosodic theory, two equally stressed feet must not occur within a singleprosodic word:

*Wd

FS FS

σS σW σS σW

������ �� �

Figure 8. The ill-formed representation of the prosodic hierarchy as implied byFikkert (1994) and Demuth & Fee (1995)

Both models remain vague with respect to the source of level stress: it isunclear how the stages of level stress fit to the assumption that prosodicdevelopment is directed by universal prosodic principles. Since they do notdiscuss the possibility of a child-specific representation, the representationaccording to the prosodic hierarchy should look like illustrated in Figure 6:

Prosodic Acquisition: a Comparison of Two Theories 155

Wd Wd

FS FS

σS σW σS σW

������ �����

Figure 9. A prosodic representation that incorporates the requirements of theprosodic hierarchy and that allows for level stress

The representation in Figure 6 admits the co-occurrence of two equallystressed feet because every foot projects its own prosodic word. Thedrawback is that this assumption is ad hoc. There is no motivation forseparating a single prosodic word like Dutch /���������� ‘crocodile’ intotwo prosodic words. In addition, it is an open question which factors couldtrigger the merging of the two prosodic words into a single one later.

Another problem is that the models described above are primarily basedon truncation patterns in multisyllabic words. This is critical from amethodological point of view because it is presupposed that the truncationof syllables is exclusively triggered by prosodic size restrictions. Recentevidence, however, suggests that segmental properties of syllables can alsoaffect the truncation rate. For example, syllables with sonorant onsets seemto be more prone to truncation than syllables with obstruent onsets (Kehoe& Stoel-Gammon 1997).

A comparison of both models suggests that the predictions of thetemplate mapping model of Fikkert (1994) are sometimes too strong. Thus,the prosodic hierarchy model of Demuth & Fee (1995) seems to be superiorbecause of its greater flexibility. First of all, it prevents Fikkert’s circularprocess of assigning a trochaic structure via prosodic circumscription thatactually should be created by the foot template. Furthermore, the prosodichierarchy model allows for more variability in the productions of children.For example, it allows for the co-occurrence of monosyllabic and disyllabicfeet in contrast to Fikkert’s model that only proposes disyllabic trochees fora very long period of time. As the data of children acquiring Englishsuggest, there are doubts on Fikkert’s view that the disyllabic trochee is theunique representation at the early stages (Kehoe & Stoel-Gammon, 1997;

156 Angela Grimm

Salidis & Johnson, 1997). Moreover, Fikkert predicts a systematic stressshift to the left in disyllabic iambs, a pattern that still needs empiricalevaluation. It is also possible that stress shift is rather the result of acomplex interplay of factors like edge preferences, weight sensitivity andsegmental factors than of a simple template mapping mechanism. If this istrue, stress shift can be bidirectional to the left or to the right, depending onthe relative importance of the factors involved.

Fikkert’s model is more detailed than the model of Demuth & Fee. It isat best elaborated for stage 1 and 2. With respect to the later stages sheremains somewhat inconsistent. For example, she strongly argues for thefoot as the relevant prosodic unit, but already at stage 2 the syllable, not thefoot, becomes the target of circumscription:

“[...] the child realises both syllables of the target word. However, stressfalls on the first syllable. The segmental material of both syllables of theadult word is taken out and mapped onto the child’s trochaic template [...]”(p. 210).

Fikkert also considers the possibility of circumscribing a foot. Sheconcludes that the children circumscribe syllables because the survivingsyllables do not constitute a foot in the adult word. But the examples shepresents (p. 211) do form two feet within a weight-sensitive model, witheach foot containing at least two moras (Example 3):

Example 3.

child form adult target gloss��������� ���������� ‘elephant’�� ������� ��������� ‘pelican’���������� �������������� ‘locomotive’��������� ���������� ‘farm’

Fikkert cannot account for this fact because she exclusively assumesweight-insensitive trochees at stage 2. Demuth & Fee’s model, in contrast,would allow for the retention of the foot as the relevant unit since itassumes sensitivity to syllable weight with the emergence of the footstructure.

Demuth & Fee, in contrast, have problems to explain the stress shift tothe first syllable in the examples above for two reasons: first, recentevidence suggests that the relationship between distinctive vowel lengthand the emergence of coda consonants is not as categorial as they claim. Inan examination of Fikkert’s data, Salidis & Johnson (1997) found that,

Prosodic Acquisition: a Comparison of Two Theories 157

contrary to their English acquiring child, the vowel length was notcontrolled by the Dutch children even if they correctly produced codaconsonants. If in turn, children cannot control vowel length appropriately,they cannot assign two moras to a long vowel. The authors relate thedivergence between the languages to the impact of vowel quantity on thestress pattern: in English, the long vowels in (C)VV(C) syllables count asheavy and thus attract stress, contrary to Dutch which rather relies on theopen-closed distinction. In Dutch, a (C)VC syllable counts as heavy, whilea (C)VV does not. Thus, a learner of Dutch presumably does not rely onvowel length as an indicator for stress, while it is crucial for a learner ofEnglish to identify the relationship between vowel quantity and stress.Second, given that the absence of the vowel length distinction is an artifactof the investigation and children have mastered the vowel lengthdistinctions if they produce bimoraic feet. Then neither universal norlanguage-specific constraints could account for the fact that the superheavyfinals lose their primary stress in favor of the less heavy ultimates becauseDutch follows the universal generalization that a (C)VCC (e.g. /����/) or a(C)VVC (e.g. /����/) syllable is heavier than a (C)VV syllable (/��/,/��/).

The observation that Dutch children need more time to acquire vowellength distinctions indicates that language-specific properties may influencethe prosodic representation in a more detailed way than assumed so far.Thus, further empirical work is needed to shed light on the interplay ofuniversal principles and language-specific conditions in prosodicdevelopment.

5. Conclusion

In the present paper, two models of prosodic development are introducedand examined. As they evidence, the acquisition of word prosody largelyconforms to the prosodic hierarchy in such a way that universal prosodicconstituents as the foot or the mora govern children’s word productions.This is essential in both models. However, it has turned out in thediscussion that there are empirical, theoretical and methodicalshortcomings. Common problems of both accounts are the absence ofempirical and theoretical motivation of level stress and the reliance ontruncations as the primary diagnostics of prosodic development.

In sum, the evidence so far rather supports the prosodic hierarchy modelof Demuth & Fee (1995) because it is more flexible than Fikkert’s templatemapping model.

158 Angela Grimm

References

Demuth, K. (1996). The prosodic structure of early words. In: J. Morgan & K.Demuth (eds.) From signal to syntax: Bootstrapping from speechto grammar in early acquisition. Lawrence Erlbaum Associates,Hillsdale, N.J., 171-184.

Demuth, C. & Fee, J. (1995). Minimal words in early phonologicaldevelopment. Ms., Brown University and Dalhousie University.

Fikkert, P.M. (1994). On the acquisition of prosodic structure. Holland Instituteof Generative Linguistics, Dordrecht,

Hayes, B. (1991). Metrical stress theory: principles and case studies. Ms,UCLA.

Hayes, B. (1995). Metrical stress theory. Chicago University Press, Chicago.Johnson, J. & Salidis, J.S. (1997). The production of minimal words: A

longitudinal case study of phonological development. LanguageAcquisition, 6 (1): 1-36.

Kehoe, M. & Stoel-Gammon, C. (1997). The acquisition of prosodic structure:An investigation of current accounts of children’s prosodicdevelopment. Language, 73 (1): 113-144.

Ota, M. (2001). Phonological Theory and the Development of ProsodicStructure: Evidence from Child Japanese. Available athttp://www.ling.ed.ac.uk/~mits/downloadables.shtml

Selkirk, E. (1980). The role of prosodic categories in English word stress.Linguistic Inquiry, 11: 563-605.

Selkirk, E. (1984). Phonology and Syntax: The relation between Sound andStructure. MIT Press, Cambridge, MA.

Base-Identity and the Noun-Verb Asymmetry inNivkh

Hidetoshi Shiraishi

1. Introduction

1.1. Background

Morphologically complex words often exhibit phonological similaritieswith their morphologically related base forms which they are derived from.In a number of cases, these similarities yield a marked phonological patterngiven the general rules or phonotactics of the language (Kenstowicz, 1996;Burzio, 1997, 2002 etc.). In Optimality-Theory (OT), similarity betweenexisting words is captured by Output-to-Output (OO) correspondenceconstraints (Burzio, 1996, 2002; Kenstowicz, 1996, 1997; Benua, 1997ab;Ito and Mester, 1997; Steriade, 2000 etc.). The marked phonologicalpattern arises when similarity between words takes priority over thecanonical phonology of the language. OT expresses this situation byranking OO-correspondence constraints above phonological markednessconstraints. OO-correspondence constraints evaluate the output candidatesand select the one which is most similar to the base.

Since the base plays a crucial role in computing the phonology of itsderivatives, it is important to identify the correct surface form as the base.Many authors have observed that OO-constraints have access to the baseonly if the latter occurs as an independent word (Kenstowicz, 1996; Benua,1997a; Ito and Mester, 1997).1 Consider the s-voicing observed in thenorthern dialects of Italian. In these dialects, s and z are in complementarydistribution. Z appears intervocalically, when the flanking vowels belong tothe same phonological word (examples from Kenstowicz, 1996: 373-374).

160 Hidetoshi Shiraishi

1.1a. a[z]ola ‘button hole’

a[z]ilo ‘nursery school’ca[z]-a ‘house’ca[z]-ina ‘house - diminutive’

b. lo [s]apevo ‘I knew it’telefonati [s]i ‘having called each other’

The distribution of s-voicing in lexical items containing a prefix is morecomplicated. When the target precedes the boundary, s-voicing applies(1.2a). But when the target follows the boundary, s-voicing may or may notapply, even if the structural description of s-voicing is met (1.2b, c).

1.2a. di[z]-onesto ‘dishonest’

di[z]-ugale ‘unequal’b. re-[z]istenza ‘resistance’

pre-[z]entire ‘to have a presentiment’c. a-[s]ociale ‘asocial’

bi-[s]essuale ‘bisexual’pre-[s]entire ‘to hear in advance’

The unexpected blocking of s-voicing in 1.2c is in sharp contrast with theitems in 1.2b where z surfaces intervocalically, following the phonologicalnorm of the language. Nespor and Vogel (1986) pointed out that the crucialdifference between the items in 1.2b and 1.2c lies in the lexical status of thestem to which the prefix is attached; in 1.2c the stem occurs as anindependent word (sociale, sessuale, etc.) whereas in 1.2b it does not(*sistenza, etc.). Following this view, Kenstowicz (1996) claimed that thereis a lexico-morphological pressure from the independently occurring stemto surface its derivative as similar as possible. The presence of such anindependently occurring immediate constituent is thus crucial in computingthe phonology of a morphologically complex item. Kenstowicz dubbed thisgeneralization Base-Identity; the base forces its derivative to be formally assimilar as possible in order to “improve the transparency of morphologicalrelationships between words and enhance lexical access” (Kenstowicz,1996: 372).

Base-Identity and the Noun-Verb Asymmetry in Nivkh 161

1.3 Base-Identity: Given an input structure [X Y] output candidates areevaluated for how well they match [X] and [Y] if the latter occur asindependent words. (Kenstowicz, 1996: 372)

The languages in East Asia provide an interesting test for thisgeneralization. Languages as Korean or Japanese show a systematicdifference in the composition of verbs and nouns; while verbal stemsalways surface with a morphological extension, nominal stems may surfacewithout such an extension. This means that complex words formed from anominal stem always have an independently occurring base to which theyphonologically should conform, whereas verbal derivatives lack such abase and hence should not show such conformity. This prediction is borneout in Korean in which derivatives of nominal and verbal stems are subjectto different phonology (Kenstowicz, 1996. See section 2.3 below.). In thispaper, I discuss another language of East Asia, Nivkh, which also has anasymmetric composition of nouns and verbs like Korean and Japanese. Iwill focus on two phonological phenomena, Consonant Alternation andFinal Fricative Devoicing and show that both phenomena exhibitasymmetries between nominal and verbal phonology. I will discuss eachcase in detail and argue that Base-Identity is the driving force of theseasymmetries.

The article is organized as follows. I will start with a descriptive sketchof Consonant Alternation (section 2.1) and then illustrate the exceptionalbehavior of nominal stems as a case of noun-verb asymmetry (section 2.2).While most previous works, including my own, somehow stipulated theasymmetric behavior of nominal and verbal stems, I will argue that Base-Identity provides a superior analysis which is free from such a stipulation.Section 3 discusses the second phenomenon, Final Fricative Devoicing. Iwill illustrate the asymmetric behavior of fricative-final nominal and verbalstems when followed by a suffix. The pattern of asymmetry is as in CA:while verbal phonology is subject to canonical phonology, nominalphonology is not. Section 4 concludes.

1.2. About Nivkh

Nivkh (also called Gilyak) is an isolated language spoken by the people ofNivkh, who live on the island of Sakhalin and in the lower reaches of theAmur River in the Russian Far East. The language has four dialects and the

162 Hidetoshi Shiraishi

major discrepancy is between the Amur dialect, spoken in the Amur area onthe continent and the west coast of north Sakhalin, and the Sakhalin dialectspoken in the east coast of Sakhalin. Nivkh is listed in the UNESCO RedBook on endangered languages as being seriously endangered. Accordingto the census of 1989, the percentage of speakers is 23, 3% of the totalpopulation of 4,681.2 This article concerns the phonology of the Amurdialect spoken by the continental Nivkh. All the examples are from thefollowing sources, unless otherwise mentioned: Krejnovich (1937), andSaveleva and Taksami (1970).

2. Consonant Alternation

2.1. A descriptive sketch

I will first outline the segmental inventory of Nivkh.

2.1 Consonantal inventory of Nivkh(I) aspirated plosives p� t� c� k� q�(II) non-aspirated plosives p t c k q(III) voiceless fricatives f r� s x �(IV) voiced fricatives v r 3 z � �

nasals m n � �lateral lglides j h

2.2 Vowelsi � u e o a

Consonant Alternation (henceforth CA) is a phonological process whichchanges the feature [continuant] in obstruents when they are placed incertain phonological and morphosyntactic contexts. Descriptively, CAconsists of two processes: spirantization, in which a plosive changes to africative, and hardening, in which a fricative changes to a plosive.Laryngeal features are also relevant since aspirated plosives only alternate

Base-Identity and the Noun-Verb Asymmetry in Nivkh 163

with voiceless fricatives and non-aspirated plosives with voiced fricatives,i.e. the alternation is strictly between the obstruents of row (I) and (III), or(II) and (IV).4, 5

2.3 Spirantization: (I) > (III), (II) > (IV)a. (I) > (III) mac�a [r�]om (< t�om) ‘fat of a seal’

seal fatc�ol�i [�]os (< q�os ) ‘neck of a reindeer neck reindeer’

b. (II) > (IV) p�eq [v]��x (< p��x ) ‘chicken soup’chicken soupmac�a [z]us (< cus) ‘meat of a seal’seal meat

2.4 Hardening: (III) > (I), (IV) > (II)a. (III) > (I) c�x�f [q�]a- (< �a-) ‘to shoot a bear’

bear shootcus [t�]a- (< r�a-) ‘to bake meat’meat bake

b. (IV) > (II) tux [k]e- (< �e-) ‘to take an axe’axe takep�n�nx [t]�u- (< r�u-) ‘to teach one's one's sister teach sister’

The phonological contexts of spirantization and hardening are incomplementary distribution. Spirantization takes place when the target(plosive) follows a vowel, a glide, or a plosive (2.5). There is nospirantization when the target follows a fricative or a nasal (2.6).

2.5 Spirantization Preceding segmentVowel mac�a [r�]om ‘fat of a seal’Glide k��nraj [r�]om ‘fat of a duck’

k��nraj [v]��x ‘duck soup’Plosive �t [r�]om ‘fat of a species of

duck’amsp [v]��x ‘soup of a species

of seal’

164 Hidetoshi Shiraishi

2.6 No spirantizationFricative c�x�f t�om ‘bear fat’

c�x�f p��x ‘bear soup’Nasal k�e� t�i ‘sun ray’

rum d�f ‘Rum(person)’s house’

On the other hand, hardening occurs when the target (fricative) followseither a fricative or a nasal (2.7). When a segment other than fricativeprecedes the target, hardening does not occur (2.8).

2.7 Hardening Preceding segmentFricative cx�f [q�]a- (< �a-) ‘to shoot a bear’

lovr � [c]osq-(< zosq-) ‘to break a spoon’Nasal qan [d]�u-6 (<r�u-) ‘to teach a dog’

2.8 No hardeningVowel �� �a- ‘to shoot an otter’

ma r �a- ‘to bake dried fish’Plosive �t �a- ‘to shoot a species of

duck’Glide k�j seu- ‘to dry a sail’

Although phonological conditions of these alternations seem to becomplex, it turns out to be less so once we focus on the output strings theycreate. Namely, the accomplished segmental sequence is always vowel-fricative, glide-fricative, plosive-fricative on the one hand and fricative-plosive, sonorant-plosive on the other. In sum, spirantization and hardeningconspire to achieve the segmental sequences illustrated below.

2.9 Structural goals of spirantization and hardeninga. vowel

glide fricativeplosive

b. fricativenasal plosive

Whether this sequence is accomplished by spirantization or hardening is amatter of input. Spirantization activates when a plosive is in the input,

Base-Identity and the Noun-Verb Asymmetry in Nivkh 165

whereas hardening activates when a fricative is in the input. In the past,many approaches have overlooked this generalization and described therules as if they had independent structural goals. This is not the case.

Let us now move to the morphosyntactic conditioning. CA targets asegment at the left edge of a derived morphosyntactic unit in the presenceof a preceding segment. CA applies cyclically to every left edge of amorpho-syntactic unit until the maximal projection (NP, VP) is reached.

2.10 Means of derivationPrefixation p�-[r �]u (< t�u) ‘one’s own sledge’

REF-sledgePostposition t��x-tox ‘towards the top’

top-ALLtu-rox ‘towards a lake’qan-dox ‘towards a dog’

Reduplication t�k[r �]�k- ‘to be silent’(Sakhalin dialect, Hattori, 1962: 107)

NP formation mac�a [r�]om ‘fat of a seal’VP formation cx�f [q�]a- (< �a-) ‘to shoot a bear’

On the other hand, CA never targets segments in a non-derivedenvironment, nor does it apply across XP boundary, as shown in 2.11 and2.12, respectively.

2.11 CA does not apply in non-derived environmentutku *ut[�]u ‘man’n��s *n��[c�] ‘teeth’

2.12 No CA across XP boundary (subject-predicate)e�l� r�o- ‘The child holds (something)’= [NPe�l�] [VPr �o-] (‘child’ is subject)

Example 2.13 below differs minimally from example 2.12 above withrespect to the application of CA. In the former, CA applies since the nounis the object of the following predicate. Thus these two words form a VP,differing minimally from example 2.12.

166 Hidetoshi Shiraishi

2.13e�l� [t�]o- ‘(Someone) holds the child’= [VP[NPe�l�][V t�o-]] (‘child’ is object)

2.2. The spirantization – hardening asymmetry

There is one environment in which the regular pattern of CA as depictedabove fails to apply. Nouns beginning with a fricative never undergohardening. In such a case, the structural goal of CA (2.9) is not achieved. Inthis context the otherwise illicit fricative-fricative or nasal-fricativesequence appears.

2.14a. t�ulv vo *t�ulv [b]o ‘winter village’

winter villageb. c���r vox *c���r [b]ox 'a hill covered with grass'

grass hillc. t�f r�� *t�f [t�]� ‘entrance door’

house doord. t�e� vaqi *t�e� [b]aqi ‘coal box’

coal box

Previous works have either described this context as an exception to CA, ordid not discuss it. In most cases, these works simply stipulated that a) nounsdo not undergo hardening, or alternatively b) only transitive verbs undergohardening. Once stated as a condition this way, the application of hardeningto nouns can indeed be avoided. However, adding such a condition (ineither form) to a phonological rule pairs prosodic phonology with specificcategory labels (transitive verb, noun), which is unlikely to occur in naturallanguages (Nespor and Vogel, 1986; Selkirk, 1986 etc.).7 But mostcritically, it is explanatorily unsatisfying; why should hardening berestricted to transitive verbs (or alternatively, why should nouns be anexception to hardening)? No literature provides a satisfactory answer to thisquestion.

The tacit assumption prevailing in the previous works is that the input toCA is the citation form, i.e. the form that appears in isolation. Followingthis assumption, the transitive verbs ought to undergo hardening since theyinitiate with a fricative in the citation form. However, there is no a priori

Base-Identity and the Noun-Verb Asymmetry in Nivkh 167

reason that the citation form should be the underlying form. In Shiraishi(2000), I defended the position that the citation form of these transitiveverbs cannot be the underlying form, if we want to advocate aphonologically plausible analysis for the observed spirantization-hardeningasymmetry. The lack of hardening in nouns could be interpreted asevidence that CA consists solely of spirantization, without hardening. Iargued that transitive verbs of Nivkh initiate with a plosive at theunderlying level, instead of a fricative that appears in the citation form.Initiating with a plosive, transitive verbs now undergo spirantization in thesame way as nouns do.8, 9

2.15Previous analyses Shiraishi (2000)

VP 'shoot abear'

NP 'bird soup' VP 'shoota bear'

NP 'birdsoup'

Underlyingform

cx�f �a- p�eq p��x cx�f q�a- p�eq p��x

Spirantization notapplicable

p�eq [v]��x blocked p�eq [v]��x

Hardening cx�f [q�]a- not applicableSurface form cx�f q�a- p�eq v��x cx�f q�a- p�eq v��x

The analysis in Shiraishi (2000) leaves hardening out of the list ofphonological processes; nouns do not undergo hardening since there is nohardening in the phonology of the language.

2.16Previousanalyses

Shiraishi (2000)

Underlyingform

t�ulv vo t�ulv vo

Spirantization not applicable not applicableHardening t�ulv [b]oSurface form �t�ulv bo t�ulv vo �: incorrect output

This analysis is free from category-specific specification in the structuraldescription of the rule, which was inevitable in the previous analyses.

Although this analysis explains nicely why fricative-initial nouns neverundergo hardening in Nivkh, it is not without problems. First, it

168 Hidetoshi Shiraishi

manipulates the underlying form of a specific lexical category (transitiveverb) in order to explain phonologically exceptional behavior. Althoughsuch a 'prespecification' at the underlying level is not an uncommon way toapproach phonological exceptions (cf. Inkelas, Orgun and Zoll, 1997amongst others), such an approach does not explain why only thisparticular class of words needs to undergo such manipulation. Sinceprespecification puts unpredictable information into the lexicon, it is astrong descriptive device which leaves little space for phonologicalgeneralizations. Contrary to what seems to be the case at first glance, theanalytical gain of Shiraishi (2000) from previous analyses is not soobvious. One may ask correctly what the difference between the twoanalyses is, which claim that a) nouns are exceptions to hardening(previous analyses) or b) transitive verbs undergo spirantization becausethey initiate with plosives underlyingly (Shiraishi 2000). In other words, itremains an arbitrary choice that only transitive verbs, and not othercategories, undergo prespecification.

Secondly, the relationship between the underlying form and the citationform is obscured in transitive verbs. By positing a form other than thecitation form as the underlying form, the citation form would always bederived from the underlying form by some morphological operation. Thatis, Shiraishi (2000) created asymmetry between the morpholexical make-upbetween nominal and verbal stems.

2.17Nominal stem Verbal stem

Underlying form p��x q�a-Surface form p��x �a-

In fact, this asymmetry describes the historical path of derivation oftransitive verbs (Jakobson, 1957; Austerlitz, 1977). On synchronic grounds,however, it is highly doubtful whether such a morphological operation canbe justified.

In the next section I propose an alternative approach to thespirantization-hardening (or noun-transitive verb) asymmetry, which makesuse neither of prespecification nor of information about category labels.Instead, I will argue that correspondence relation between output formsplays a decisive role in distinguishing the phonological behavior of the twogroups. Once stated this way, nothing ought to be stipulated in order to

Base-Identity and the Noun-Verb Asymmetry in Nivkh 169

derive the surface form; this follows naturally from the phonologicalprinciples of the language.

2.3. Noun-verb asymmetry as Base-Identity

In Nivkh, verbal and nominal stems differ from each other in one crucialmorphological aspect; verbal stems should always end in a morphologicalextension but nominal stems do not. Or put differently, verbal stems neversurface in isolation, whereas nominal stems do. This means that bare verbalstems cannot function as citation forms. Usually, the form with aninfinitival suffix (-d�, -t�) provides the citation form.

2.18Stem /�a/ ‘to shoot~’ /r�o/ ‘to take’Infinitive �a-d� r�o-d�(citation form)‘when~’ �a-�an r�o-�an

2.19/vo/ ‘village’ /�ota/ ‘town’

Citation form vo �otaAllative vo-rox �ota-rox

As mentioned in section 1, independent forms often exercise specialinfluence on the realization of morphologically related forms in derivedcontexts. For instance, in certain varieties of English the existence of theform condense guarantees that the vowel of the second syllable in themorphologically related word condensation does not reduce to a schwa.

2.20co�nd[�]nsa �tion co�mp[�]nsa �tioncond[��]nse co�mp[�]nsa �te

Phonology would expect the unstressed vowel of condensation to surfacewith a schwa, as is the case with the structurally similar compensation. Theusual explanation for this asymmetry is that the vowel reduction incondensation is blocked by virtue of the existence of the morphologicallyrelated form condense, which appears with a full vowel [�] (Chomsky and

170 Hidetoshi Shiraishi

Halle, 1968: 110-116). On the other hand, compensation lacks such amorphologically related form with a full vowel. Hence unstressed vowelreduces to a schwa, following the phonological norm of the language.

Another example comes from Korean. In Korean a stem-final consonantcluster surfaces only when it is followed by a vowel-initial suffix. Incombination with a consonant-initial suffix, the cluster is simplified to asingle consonant (Kenstowicz, 1996: 375).

2.21Stem /kaps/ ‘price’ /talk/ ‘chicken’Citation form kap takNominative kaps-i talk-iComitative kap-k'wa tak-k'wa

In the speech of younger generation of Seoul, however, simplification over-applies to contexts where vowel-initial suffix follows the stem.

2.22Nominative kap-i tak-i

Interestingly, this overgeneralization does not apply to verbal stems. Herethe consonant cluster surfaces.

2.23Stem /�ps/ ‘not have’ /palk/

‘be bright’Past-informal �ps-�ss-� (*�p-�ss-�) palk-�ss-�

(*pak-�ss-�)Non-past-formal �p-t'a pak-t'a

Kenstowicz analyzed the absence of the cluster simplification in verbalstems to be due to a lack of corresponding citation forms. As in Nivkh,verbal stems in Korean never appear in isolation; they should alwaysappear with an inflectional ending. In contrast, nominal stems are free toappear without any inflectional ending, so they exercise strong influence onthe realization of their derivatives. Verbal stems, on the other hand, surfacewith consonant clusters since there are no isolated counterparts whichforces conformity to it. This is an instance of Base-Identity, which requiresforms in derived contexts to be formally similar to the base. This is the

Base-Identity and the Noun-Verb Asymmetry in Nivkh 171

generalization captured in the Base-Identity constraint of Kenstowicz (1.3),repeated below.

2.24 (=1.3) Base-Identity: Given an input structure [X Y] output candidates are evaluated for how well they match [X] and [Y]if the latter occur as independent words. (Kenstowicz, 1996: 372)

We can account for the noun-verb asymmetry in Korean using Base-Identity as a high-ranked constraint. By ranking Base-Identity above afaithfulness constraint which prohibits deletion of a segment in the input(MAX), nominal stems surface with a single consonant in concordancewith the base.

2.25constraints →/kaps+i/ base: kapcandidates ↓

Base-Identity *CLUSTER MAX

kapsi *!�kapi *

Base-Identity is vacuously satisfied in verbal stems. Since there is no baseto which verbal stems should conform, verbal stems exhibit canonicalphonology. Consonant clusters surface only if a vowel-initial suffixfollows, elsewhere they are simplified. A phonological markednessconstraint *CLUSTER penalizes every output candidate containing a tri-consonantal cluster.

2.26constraints →/�ps+�ss+�/ base: øcandidates ↓

Base-Identity *CLUSTER MAX

��ps-�ss-�

�p-�ss-� *!

172 Hidetoshi Shiraishi

2.27constraints →/�ps-t'a/ base: øcandidates ↓

Base-Identity *CLUSTER MAX

�ps-t'a *!

��p-t'a *

The noun-verb asymmetry of hardening in Nivkh is strikingly similar to thecase of Korean. As in Korean, verbal stems of Nivkh are not allowed tosurface in isolation; they always require a morpho-syntactic extension(2.18). This is in contrast to nominal stems, which may surface in isolation(2.19). The difference is reflected directly in their phonological behavior;verbal stems undergo hardening, nominal stems do not. In the next section Iwill show how this analysis formally works.

2.4. Base-Identity blocks hardening

I assume that the phonological markedness constraint that induceshardening to be the Obligatory Contour Principle (OCP) [fric].10 OCP [fric]prohibits adjacent fricatives. Base-Identity, as defined in the previoussection, prefers output candidates which are similar to the base. With theranking Base-Identity >> OCP [fric], we obtain the desired output;hardening does not apply to nominal stems.

2.28constraints →/tulv vo/ base: vocandidates ↓

Base-Identity OCP [fric] IDENT [cont]

�tulv vo *tulv bo *! *

Base-Identity is satisfied vacuously in verbal stems since they lack a base.Being free from Base-Identity, an initial fricative now hardens to a plosivein order to circumvent an OCP violation.

Base-Identity and the Noun-Verb Asymmetry in Nivkh 173

2.29constraints →/c�x�f �a-/ base: øcandidates ↓

Base-Identity OCP IDENT [cont]

c�x�f �a- *!�c�x�f [q�]a- *

Since Base-Identity refers to the base and not to the input, this rankingalways derives the correct output no matter of the input value. This isillustrated in the tableau below in which the verbal stem initiates with aplosive in the input (cf. Shiraishi, 2000).

2.30constraints →/c�x�f q�a-/ base: øcandidates ↓

Base-Identity OCP IDENT [cont]

�c�x�f q�a-c�x�f [�]a- *! *

The present analysis correctly derives the observed output no matter of theinput. There is thus no prespecification, in which input strings are fixed totake a particular form. Nor does it make use of information of categorylabels, a condition that was inevitable in previous descriptions in order tolet hardening apply appropriately. The current analysis makes a totallydifferent claim. There is no exception to the hardening rule (nominalstems), nor should the specific undergoer (verbal stems) be prespecified atthe underlying level. Rather, the asymmetry of nominal and verbal stemsfollows from the existence of a base, which is an independent fact of thelanguage. By making use of such morpho-lexical information, the currentanalysis accounts for the noun-verb asymmetry without appealing tolanguage-specific stipulations.

3. Final Fricative Devoicing

Base-Identity plays a crucial role in another phonological phenomenon ofNivkh. In this section, I will discuss such a case.

174 Hidetoshi Shiraishi

3.1. Distribution of laryngeal features

Like Danish, a full contrast of laryngeal features of Nivkh obstruents isrealized only at the stem-initial position, which is the most prominentposition as in many other languages (cf. Beckman, 1996). In otherpositions, laryngeal features do not exercise a phonemic contrast and thefeature value at the surface level is predictable from the context (Jakobson,1957: 83). In principle, non-prominent (stem-medial and final) positionsonly allow non-aspirated plosives and voiced fricatives. Aspirated plosivesand voiceless fricatives, on the other hand, are excluded from thesepositions. Following Jakobson (1957), I will call them the lenis and fortisseries, respectively.

3.1Lenis obstruents non-aspirated plosives : p t c k q

voiced fricatives : v r z � �Fortis obstruents aspirated plosives : p� t� c� k� q�

voiceless fricatives : f r s x �

3.2pal ‘forest’ �t�k ‘father’p�al ‘floor’ �k�n ‘mother’ra-d� ‘to drink’ ova ‘flour’r�a-d� ‘to bake’ muvi ‘porridge’

eri ‘river’

There are two exceptional contexts in which a voiceless fricative appears ina non-prominent position: i) when preceding a plosive, and/or ii) before anI[ntonational] P[hrase] boundary (Jakobson, 1957: 83).

3.3a. esqa-d� ‘to hate’

taft� ‘salt’kins ‘evil spirit’ kins k�u-d� ‘to kill an evil spirit’c�x�f ‘bear’c�x�f k�u-d� ‘to kill a bear’als ‘berry’als p�e- ‘to pick berries’

Base-Identity and the Noun-Verb Asymmetry in Nivkh 175

b. nivx ‘human’erx ‘to him/her’

The examples in 3.3b indicate that it is only the absolute final position thatmatters; the fricative second from the right appears as voiced. In Nivkh,there are no words ending in consecutive voiceless fricatives, indicatingthat voicelessness is required only for the very last fricative in an IP. Iassume this to be due to a restriction which I will call Final FricativeDevoicing (FFD). FFD targets every final fricative within an IP.

Stem-final voiceless fricatives appear as voiced, however, as soon as theabove-mentioned conditions are removed. Thus if a stem-final fricative isembedded in an IP, i.e. not final in the domain, and if it is not adjacent to aplosive it becomes voiced (3.4a). This is in concordance with thephonotactics of stem-medial fricatives which are always voiced (3.4b)unless adjacent to a plosive. This distribution is not surprising since stem-medial fricatives are expected not to coincide with an IP-boundary.

3.4a. [kinz it-]I ‘go insane’

[c�x�v l�j-]I 'to kill a bear'[alz �a-]I ‘to pick berry’

b. ezmu- ‘to like~’urla ‘good’pa�la ‘red’

Outside of these two contexts, only lenis obstruents appear in non-prominent positions. Apparently, lenis obstruents have more distributionalfreedom than fortis obstruents, indicating their unmarked status in thephonology of Nivkh. Since non-prominent positions are predictablyoccupied by lenis obstruents, I assume that obstruents in these positions areunspecified for laryngeal features in the underlying form. Unless context-sensitive requirements contravene, obstruents without laryngealspecifications surface as lenis, the unmarked obstruent of the language.

176 Hidetoshi Shiraishi

3.2. Base-Identity in suffixation

Having discussed the unmarked nature of the lenis obstruents, we are nowready to look at the way FFD interacts with Base-Identity. Such a casearises when a suffix attaches to a fricative-final stem.

Like stem-medial and final positions, the initial obstruent of a suffixdoes not exhibit a laryngeal contrast, indicating that it is a non-prominentposition. Except for a few exceptional cases, only lenis obstruents areallowed.11

3.5-tox/rox/dox allative (case suffix)-�u/gu/ku plural-t�/d� infinitive-gu/ku causative

When affixed to a stem, the redundant [+voice] specification of the stem-final segment spreads to the initial obstruent of the suffix.

3.6ra-d� ‘to drink-INF’pil-d� ‘big-INF’amam-d� ‘to walk-INF’ifk-t� ‘to harness-INF’jup-t� ‘to bind-INF’ro-gu-d� ‘to help-CAU-INF’l�t-ku-d� ‘to do-CAU-INF’c�am-gu ‘shaman-PL’c�am-dox ‘shaman-ALL’

There is an interesting discrepancy between fricative-final nominal andverbal stems in this context; following a verbal stem, the initial segment ofa suffix is always voiced (3.7a), while following a nominal stem, it isalways voiceless (3.7b).

3.7a. fuv-d� ‘to blow/to saw-INF’

i�-d� ‘to kill-INF’t�v�-d� ‘to go inside the house-INF’

Base-Identity and the Noun-Verb Asymmetry in Nivkh 177

jar-d� ‘to feed-INF’roz-gu-d� ‘to divide-CAU-INF’t�mz-gu-d� ‘drop-CAU-INF’

b. kins-ku ‘evil spirit-PL’c�x�f-ku ‘bear-PL’or�r �-ku ‘Uilta-PL’t�f-tox ‘house-ALL’ti�r �-tox ‘wood-ALL’

The reason of this discrepancy is not immediately clear. In particular, thefinal voiceless fricative of nominal stems is a mystery. Being affixed by asuffix, it is no longer in the context of FFD, so nothing prevents it fromappearing in the unmarked voiced fricative. In fact, this is the case withverbal stems; final fricatives of verbal stems are systematically voiced(3.7a). The other context-sensitive requirement, namely, the precedence toa plosive cannot be the reason either since these suffixes have a voicedvariant, which surfaces when following a (redundant) [+voice] segment(3.6, 3.7a). The derivatives of verbal stems in 3.7a show that the initialplosive of these suffixes can accommodate a (preceding) voiced fricative,unlike plosives in a stem. But in fact, this option is not adopted in nominalstems. In short, these context-sensitive requirements cannot explain thedifferent behavior of final-fricatives in nominal and verbal stems.

Under Base-Identity, however, such a discrepancy is explicable. Recallthat nominal and verbal stems have different morpho-lexical compositions.Nominal stems can surface without any morphological ending, making thelast fricative target of FFD. In contrast, final fricative of a verbal stem isalways followed by a morphological extension, making it irrelevant toFFD. Since Base-Identity claims that derivatives should phonologicallyconform to the base, nominal derivatives conform to their base, which endsin a voiceless fricative (due to FFD). This is not the case, however, forverbal stems since they have no base and therefore do not underlie suchpressure. As a consequence, verbal stems undergo canonical phonology andfricatives in non-prominent positions do appear as lenis, the unmarkedobstruents of the language.

Finally, it is important to note that reference to laryngeal specificationsusing Input-to-Output correspondence constraints is not a viable option inthis context. Recall that there is no laryngeal contrast in stem-final positionin Nivkh. A phonological theory which minimizes the specification ofpredictable features in underlying representations, which is the one adopted

178 Hidetoshi Shiraishi

here, makes it impossible for Input-to-Output constraints to refer to thevoiceless status of stem-final fricatives.12 Thus their voicelessness shouldcome from somewhere else. According to the current analysis it originatesfrom the base, the independently occurring isolated form.

4. Conclusion

In this paper I have discussed phonological asymmetries between nominaland verbal stems of Nivkh, as observed in two phonological phenomenaCA and FFD. Though the asymmetries themselves look very different onthe surface, this article has made explicit that they are subject to a commongeneralization, Base-Identity. Given the asymmetric composition of nounsand verbs, Base-Identity makes two predictions: i) nominal and verbalderivatives exhibit different phonological patterns, and ii) it is the nominalstem which exhibits the non-canonical phonology given the strong pressurefrom the base. Both predictions were borne out in the phonologicalphenomena discussed above. The base plays a decisive role in computingthe phonology of nominal and verbal derivatives in both CA and FFD. Asfor CA, the current analysis correctly predicts that nominal derivativesaccommodate the otherwise illicit segmental sequence (fricative-fricative,nasal-fricative), while verbal derivatives do not. This analysis is superior toprevious accounts since it makes no direct use of the notion of exception,which was inevitable in previous works. Rather, the suggested analysisrelates the asymmetry in phonology to the compositional asymmetrybetween nouns and verbs.

As for FFD, nominal derivatives showed conformity to their base, inending in a voiceless fricative. Verbal stems on the other hand, do not showsuch conformity since they lack a base. Unlike nominal derivatives, thestem-final fricatives of verbal derivatives appear as lenis, following thecanonical phonology of Nivkh. Base-Identity provides us with themechanism underlying the noun-verb asymmetry, and it correctly predictstheir phonological behavior with respect to the canonical phonology of thelanguage.

Base-Identity and the Noun-Verb Asymmetry in Nivkh 179

Acknowledgements

I would like to thank Dicky Gilbers, Angela Grimm, Maartje Schreuder,Jeroen van de Weijer and the audiences of ULCL Phonology meeting atLeiden (27-05-2003) and TABU dag (20-06-2003, Groningen) forcomments on parts of this article. I bear all responsibility for errors.

Notes

1 “…identity effects will come into play only to the extent that the immediate

constituents composing the complex structure constitute independentlyoccurring outputs…(Kenstowicz 1996: 373)”, “The base of an OO-correspondence relation is a licit output word, which is both morphologicallyand phonologically well-formed (Benua 1997a: 29)”, “The bound form of astem is segmentally identical with its corresponding free form (Ito and Mester1997: 431)”.

2 See www.let.rug.nl/~toshi/ for more information.3 The rhotic r of Nivkh is classified here and elsewhere in the literature (e.g.

Trubetzkoj 1939) as a voiced fricative since it patterns as such in the CAsystem. Its voiceless counterpart r� is an apical trill containing portions withoutvocal cord vibration (Ladefoged and Maddieson 1996: 236).

4 Regarding this nature of CA, one may postulate a single laryngeal feature(rather than two) for both plosives and fricatives, e.g. [+spread glottis] for bothaspirated plosives and voiceless fricatives. Such an analysis is proposed byJakobson (1957) and Blevins (1993). See also section 3 below.

5 Segments that underwent CA are put in square brackets. Abbreviations are:ALL= allative, asp= aspiration, I=Intonational phrase, INF=infinitive, NP =noun phrase, PL= plural, VP = verb phrase, XP = maximal projection.

6 The alternation (r >) t > d is due to post-nasal voicing.7 CA exhibits aspects of prosodic phonology (I am using this term to contrast

with lexical phonology); it is sensitive to pause insertions and to speech rate. Iwould classify it as a P-structure rule in the terminology of Selkirk (1986). P-structure rules exhibit phonological properties of prosodic phonology, yet theyare sensitive to syntactic bracketing (Selkirk 1986).

8 This line of analysis has antecedents. Amongst them are: Kenstowicz andKisserberth (1979), Rushchakov (1981), Kaisse (1985), and Blevins (1993).Interestingly, Lev Shternberg, the pioneer of Nivkh study, assumed plosive-initial forms to be the input to transitive structures, as well (Shternberg 1908).

180 Hidetoshi Shiraishi

9 Spirantization and hardening are not ordered relative to each other in the

tableau below.10 Post-nasal context requires different markedness constraint but I omit it from

the discussion below. See Shiraishi (2000) for details.11 Following a velar or a uvular plosive, the initial velar of a suffix appears as [x],

spirantizing the former at the same time: �t�x-xu <�t�k+PL ‘fathers’.12 On the other hand, OO-constraints are known to be able to make reference to

non-contrastive features. See Benua (1997b) and Steriade (2000) for such cases.

References

Austerlitz, R. (1977). The study of Paleosiberian languages. In: D. Armstrongand C. van Schoonveld (eds.). Roman Jakobson: Echoes of HisScholarship. Peter de Ridder Press, Lisse, 13-20.

Beckman, J. (1996). Positional Faithfulness. PhD Dissertation, University ofMassachusetts, Amherst.

Benua, L. (1997a). Transderivational Identity: Phonological Relations betweenWords. PhD dissertation, U.Mass, Amherst.

Benua, L. (1997b). Affix classes are defined by faithfulness. In: V. Miglio andB. Moren (eds.), 1-26.

Blevins, J. (1993). Gilyak lenition as a phonological rule. Australian Journal ofLinguistics, 13: 1-21.

Burzio, L. (1996). Surface constraints versus underlying representation. In: J.Durand and B. Laks (eds.),125-144.

Burzio, L. (1997). Strength in Numbers. In: V.Miglio and B.Moren (eds.), 27-52.

Burzio, L. (2002). Missing players: Phonology and the past-tense debate.Lingua, 112: 157-199.

Chomsky, N. and M. Halle (1968). The Sound Patterns of English. Harper &Row, New York.

Durand, J. and B. Laks (eds.) (1996). Current Trends in Phonology: Modelsand Methods. University of Salford Publications, Salford,Manchester.

Hattori, T. (1962). Versuch einer Phonologie des Südostgiljakischen (II) -Alternation. Journal of Hokkaido Gakugei University (Sapporo),13-2: 29-96.

Inkelas, S, O. Orgun and C. Zoll (1997). The Implications of LexicalExceptions for the Nature of Grammar. In: I.Roca (ed.), 393-418.

Base-Identity and the Noun-Verb Asymmetry in Nivkh 181

Ito, J. and A. Mester (1997). Correspondence and Compositionality: The Ga-gyo Variation in Japanese Phonology. In: I.Roca (ed.), 419-462.

Jakobson, R. (1957). Notes on Gilyak. Roman Jakobson. Selected Writings II.Word and language. Mouton, The Hague and Paris, 72-102.

Kaisse, E. (1985). Connected Speech. Academic Press, Orlando.Kenstowicz, M. (1996). Base-Identity and Uniform Exponence: Alternatives to

Cyclicity. In: J. Durand and B. Laks (eds.), 365-395.Kenstowicz, M. (1997). Uniform exponence: Exemplification and extension.

In: V. Miglio and B. Moren (eds.), 139-155.Kenstowicz, M. and C. Kisserberth. (1979). Generative Phonology: description

and theory. Academic Press, New York.Krejnovich, E. (1937). Fonetika nivxskogo (giljackogo) jazyka [Phonetics of the

Nivkh (Gilyak) language]. Uchpedgiz, Moskva - Leningrad.Ladefoged, P. and I. Maddieson. (1996). The Sounds of the World’s Languages.

Blackwell, Oxford.Miglio, V. and B. Moren (eds.) (1997). University of Maryland Working

Papers in Linguistics, vol.5.Nepor, M. and I. Vogel (1986). Prosodic Phonology. Foris, Dordrecht.Roca, I. (ed.) (1997). Derivations and Constraints in Phonology. Clarendon

Press, Oxford.Rushchakov, V. (1981). Akusticheskie xarakteristiki soglasnyx nivxskogo

jazyka (avtoreferat). Ph.D.dissertation, Akademija Nauk CCCP,Leningradskoe otdelenie instituta jazykoznanija.

Savel’eva,V. and C.Taksami. (1970). Nivxsko-russkij slovar. [Nivkh-Russiandictionary] Sovetskaja Enciklopedija, Moskva.

Selkirk, E. (1986). On derived domains in sentence phonology . PhonologyYearbook, 3: 371-405.

Shiraishi, H. (2000). Nivkh consonant alternation does not involve hardening.Journal of Chiba University Eurasian Society. No.3. 89-119(Also available at www.let.rug.nl/˜toshi/list_of_publication.htm).Abridged version has appeared in the Proceedings of the 120th

meeting of the Japanese Society of Linguists, 42-47.Shternberg, L. (1908). Materialy po izucheniju gilijackogo jazyka i fol’klora.

In: Obrachy narodnoj slovesnosti. Vol. 1, Part I. Imper.Akademii Nauk, St.Petersburg.

Steriade, D. (2000). Paradigm Uniformity and the Phonetics-Phonologyboundary. In: M. Broe and J. Pierrehumbert (eds.). Papers inLaboratory Phonology 5. Cambridge University Press,Cambridge, 313-334

Trubetzkoj, N. (1939). Grundzuge der Phonologie. Travaux du CercleLinguistique de Prague, Prague.

The Influence of Speech Rate on Rhythm Patterns

Maartje Schreuder and Dicky Gilbers

1. Introduction1

The topic of this paper is how rhythmic variability in speech can beaccounted for both phonologically and phonetically. The question iswhether a higher speech rate leads to adjustment of the phonologicalstructure, or just to 'phonetic compression', i.e. shortening and merging ofvowels and consonants, with preservation of the phonological structure. Weclaim that the melodic content of a phonological domain is indeed adjustedoptionally when the speech rate increases. In other words, every speech ratehas its own preferred register, in terms of Optimality Theory (Prince andSmolensky, 1993) its own ranking of constraints.

We will investigate prosodic variability as part of our main researchproject, which involves a comparison of the analyses of music andlanguage. Our ultimate aim is to provide evidence for the assumption thatevery temporal behavior is structured similarly (cf. Liberman, 1975).Gilbers and Schreuder (to appear) show that Optimality Theory owes a lotto the constraint-based music theory of Lerdahl and Jackendoff (1983).Based on the great similarities between language and music we claim thatmusical knowledge can help in solving linguistic issues.

In this paper, we will show that clashes are avoided in allegro tempo. Inboth language and music distances between beats are enlarged, i.e. thereappears to be more melodic content between beats. To illustrate this, we rana pilot experiment in which we elicited fast speech. As expected, speechrate plays a role in rhythmic variability.

The paper is organized as follows. In section 2 the data of theexperiment is introduced. Section 3 is addressed to the phonologicalframework of Optimality Theory and the different rankings of andante andallegro speech. The method of the experiment is discussed in section 4 andthe auditive and acoustic analyses plus the results follow in section 5. Theperspectives of our analysis will be discussed in the final section.

184 Maartje Schreuder and Dicky Gilbers

2. Data

We will discuss three types of rhythmic variability in Dutch. The first wewill call “stress shifts to the right”; the second “stress shifts to the left” andthe third “beat reduction”. In the first type as exemplified in stúdietòelage(s w s w w) ‘study grant’, we assume that this compound can be realized asstúdietoelàge (s w w s w) in allegro speech. Perfèctioníst (w s w s) is anexample of “stress shift to the left” and we expect a realization pèrfectioníst(s w w s) in allegro speech. The last type does not concern a stress shift, buta stress reduction. In zùidàfrikáans (s s w s) ‘South African’ compoundingof zuid and afrikaans results in a stress clash. In fast speech this clash isavoided by means of reducing the second beat: zùidafrikáans (s w w s).Table 1 shows a selection of our data.

Table 1. Data

Type 1: stress shift to the right (andante: s w s w w; allegro: s w w s w)stu die toe la ge ‘study grant’weg werp aan ste ker ‘disposable lighter’ka mer voor zit ter ‘chairman of the House of Parliament’

Type 2: stress shift to the left (andante: w s w s; allegro: s w w s)per fec tio nist ‘perfectionist’a me ri kaan ‘American’vi ri li teit ‘virility’

Type 3: beat reduction (andante: s s w s; allegro: s w w s)zuid a fri kaans ‘South African’schier mon nik oog ‘name of an island’gre go ri aans ‘Gregorian’

The different rhythmic patterns are accounted for phonologically within theframework of OT.

3. Framework and phonological analysis

The mechanism of constraint interaction, the essential characteristic of OT,is also used in the generative theory of tonal music (Lerdahl andJackendoff, 1983). In both frameworks, constraint satisfaction determinesgrammaticality and in both frameworks the constraints are potentially

The Influence of Speech Rate on Rhythm Patterns 185

conflicting and soft, which means violable. Violation, however, is onlyallowed if it leads to satisfaction of a more important, higher rankedconstraint. The great similarities between these theoretical frameworksmake comparison and interdisciplinary research possible.

For example, restructuring rhythm patterns as a consequence of a higherplaying rate is a very common phenomenon in music. In Figure 1 we givean example of re-/misinterpretation of rhythm in accelerated or sloppyplaying.

Dotted notes rhythm � triplet rhythm

Figure 1. Rhythmic restructuring in music

In Figure 1, the “dotted notes rhythm” (left of the arrow) is played as atriplet rhythm (right of the arrow). In the dotted notes rhythm the secondnote has a duration which is three times as long as the third, and in thetriplet rhythm the second note is twice as long as the third. In fast playing itis easier to have equal durations between note onsets. Clashes are thusavoided and one tries to distribute the notes, the melodic content, over themeasures as evenly as possible, even if this implies a restructuring of therhythmic pattern. To ensure that the beats do not come too close to eachother in fast playing, the distances are enlarged, thus avoiding a staccato-like rhythm. In short, in fast tempos the musical equivalents of theObligatory Contour Principle (OCP), a prohibition on adjacency ofidentical elements in language (McCarthy, 1986), become more important.

We claim that - just as in music - the allegro patterns in all the differenttypes of data in Table 1 are caused by clash avoidance. There is apreference for beats that are more evenly distributed over the phrase. Thedifferent structures can be described phonologically as a conflict betweenmarkedness constraints, such as FOOT REPULSION (���) (Kager, 1994), andOUTPUT - OUTPUT CORRESPONDENCE constraints (cf. Burzio, 1998) withinthe framework of OT. FOOT REPULSION prohibits adjacent feet andconsequently prefers a structure in which feet are separated from each otherby an unparsed syllable. This constraint is in conflict with PARSE-σ, which

186 Maartje Schreuder and Dicky Gilbers

demands that every syllable is part of a foot. OUTPUT - OUTPUT

CORRESPONDENCE compares the structure of a phonological word with thestructure of its individual parts. For example, in a word such as fototoestel'photo camera', OUTPUT - OUTPUT CORRESPONDENCE demands that therhythmic structure of its part tóestel 'camera' with a stressed first syllable isreflected in the rhythmic structure of the output. In other words, OUTPUT -OUTPUT CORRESPONDENCE prefers fótotòestel, with secondary stress on toe,to fótotoestèl, with secondary stress on stel.

Whereas the normal patterns in andante speech satisfy OUTPUT -OUTPUT CORRESPONDENCE, the preference for triplet patterns in fast speechis accounted for by means of dominance of the markedness constraint,FOOT REPULSION, as illustrated in Table 2.2

Table 2. Rhythmic restructuring in language

a. ranking in andante speech:

constraints → fototoestelcandidates ↓

OUTPUT - OUTPUT

CORRESPONDENCE

��� PARSE-σ

� (fóto)(tòestel) *(fóto)toe(stèl) *! *

b. ranking in allegro speech:

constraints → fototoestelcandidates ↓

��� OUTPUT - OUTPUT

CORRESPONDENCE

PARSE-σ

(fóto)(tòestel) *! � (fóto)toe(stèl) * *

Dutch is described as a trochaic language (Neijt and Zonneveld, 1982).Table 2a shows a preference for an alternating rhythm. The dactyl patternas preferred in Table 2b, however, is a very common rhythmic pattern ofprosodic words in languages such as Estonian and Cayuvava: every strongsyllable alternates with two weak syllables (cf. Kager, 1994). We assumethat the rhythm grammar, i.e. constraint ranking, of Dutch allegro speech

The Influence of Speech Rate on Rhythm Patterns 187

resembles the grammar of these languages. In the next section we willexplore whether we can find empirical evidence for our hypothesis.

4. Method

To find out whether people indeed prefer triplet patterns in allegro speech,we ran a pilot experiment in which we tried to elicit fast speech. Sixsubjects participated in a multiple-choice quiz in which they competed eachother in answering twenty simple questions as quickly as possible. In thisway, we expected them to speak fast without concentrating too much ontheir own speech. In Table 3 one of the quiz items is depicted.

Table 3. Quiz item

Q4 President Bush is een typische ‘President Bush is a typical ’A1 intellectueel ‘intellectual’A2 amerikaan ‘American’A3 taalkundige ‘linguist’

We categorized the obtained data as allegro speech. As a second task thesubjects were asked to read out the answers at a normal speaking rateembedded in the sentence ik spreek nu het woord … uit 'now I pronouncethe word … '. This normal speaking rate generally means that the subjectswill produce the words at a rate of approximately 180 words per minute,which we categorize as andante speech. All data were recorded on minidiskin a soundproof studio and normalized in CoolEdit in order to improve thesignal-noise (S/N) ratio. Normalizing to 100% yields an S/N ratioapproaching 0 dB.

Six trained listeners judged the data auditively and indicated where theyperceived secondary stress. After this auditive analysis the data werephonetically analyzed in PRAAT (Boersma and Weenink, 1992). Wecompared the andante and allegro data by measuring duration, pitch,intensity, spectral balance and rhythmic timing (Sluijter, 1995; Couper-Kuhlen, 1993; Cummins & Port, 1998; Quené & Port, 2002; a.o.). Sluijterclaims that, respectively, duration and spectral balance are the maincorrelates of primary stress. In our experiment, we are concerned withsecondary stress.

188 Maartje Schreuder and Dicky Gilbers

For the duration measurements, the rhymes of the relevant syllableswere observed. For example, in the allegro style answer A2 amerikaan inTable 3, we measured the first two rhymes and compared the values inMsec. with the values for the same rhymes at the andante rate. In order tomake this comparison valid, we equalized the total durations of bothrealizations by multiplying the duration of the allegro with a so-called'acceleration factor', i.e. the duration of the andante version divided by theduration of the allegro version. According to Eefting and Rietveld (1989)and Rietveld and Van Heuven (1997), the just noticeable difference forduration is 4,5%. If the difference in duration between the andante and theallegro realization did not exceed this threshold, we considered therealizations as examples of the same speech rate and neglected them forfurther analysis.

For the pitch measurements, we took the value in Hz in the middle ofthe vowel. The just noticeable difference for pitch is 2,5% ('t Hart et al,1990). For the intensity measurements, we registered the mean value in dBof the whole syllable.

The next parameter we considered concerns spectral balance. Sluijter(1995) claims that the spectral balance of the vowel of a stressed syllable ischaracterized by more perceived loudness in the higher frequency region,because of the changes in the source spectrum due to a more pulse-likeshape of the glottal waveform. The vocal effort, which is used for stress,generates a strongly asymmetrical glottal pulse. As a result of the shortenedclosing phase, there is an increase of intensity around the four formants inthe frequency region above 500 Hz. Following Sluijter (1995) we comparedthe differences in intensity of the higher and lower frequencies of therelevant syllables in both tempos.

Finally, we considered rhythmic timing. The idea is that the beats inspeech are separated from each other at an approximately equal distanceindependent of the speech rate. In other words, a speaker more or lessfollows an imaginary metronome. If he/she speaks faster, more melodiccontent will be placed between beats, which results in a shift of secondarystress. This hypothesis will be confirmed if the distance between thestressed syllables in the andante realization of an item, e.g. stu and toe instudietoelage, approximates the distance between the stressed syllables inthe allegro realization of the same item, e.g. stu and la. If the quotient of theandante beat interval duration divided by the allegro beat interval durationapproximates 1, we expect perceived restructuring.

The Influence of Speech Rate on Rhythm Patterns 189

5. Results

5.1. Auditive analysis

Before we can present an auditive analysis of the data, we have to find outwhether or not the quiz design was successful. The results show that thequiz indeed triggers faster speech by all subjects. Figure 2 shows theiracceleration factors. Subjects 1, 2 and 4 turned out to be the bestaccelerating speakers, whereas subjects 3, 5 and 6 showed less difference induration between andante and allegro realizations. The mean accelerationfactor for the three fast speakers is 1.31, whereas the mean accelerationfactor for the three slow speakers is 1.13.

Figure 2. Acceleration factors of all subjects

Figure 3 shows the mean durations of the items at both speech rates. Itshows that the best accelerating speakers are also the fastest speakers. Weexpect to find more restructured patterns for these speakers, mainly subjects1 and 4, in comparison to the slower speakers, such as subjects 3 and 6.

acceleration factors

11.05

1.11.15

1.21.25

1.31.35

1.4

p1 p2 p3 p4 p5 p6

subjects

fact

or

190 Maartje Schreuder and Dicky Gilbers

Figure 3. Mean word durations

Figure 4 shows that most subjects prefer patterns in which from aphonological point of view markedness constraints dominate thecorrespondence constraints at both rates for right and left shift data, but notfor beat reduction data. There are slightly more restructured patterns inallegro tempo, although the differences are quite small.

Figure 4. All subjects: Number of restructured items per type

When we take the results of two fast subjects apart, subjects 1 and 4, weobserve a stronger preference for restructuring in allegro speech and norestructuring in andante speech, as shown in Figure 5. In other words, thefast subjects display both a greater difference in word durations in andanteand allegro speech, and more variability in their speech patterns due totempo than the slow subjects do.

mean word durations

0.65

0.75

0.85

0.95

1.05

1.15

p1 p2 p3 p4 p5 p6

subjects

seco

nd

s andante

allegro

All Subjects

05

10152025303540

Right Shifts(N=42)

Left Shifts(N=36)

BeatReductions

(N=42)

Types

Num

ber Not Shifted Andante

Shifted Andante

Shifted Allegro

Not Shifted Allegro

The Influence of Speech Rate on Rhythm Patterns 191

Figure 5. Fast subjects: Number of restructured items per type

Obviously, the preference for restructuring the rhythmic pattern in allegrospeech is not an absolute preference. Sometimes restructuring does not takeplace in allegro speech, but on the other hand restructured patterns alsoshow up in andante speech.3 Some items were realized with the samerhythmic pattern irrespective of the tempo. Therefore, we also looked at theword pairs with a different rhythmic pattern in both tempos for eachsubject. We observe that the relatively fast speakers p1, p2 and p4, showthe expected pattern according to our hypothesis, which means that theyshow a restructured pattern in allegro tempo, as shown in Figure 6 for theright shifts.

Figure 6. Right Shifts: Expected combinations

Two of the relatively slow speakers, p3 and p6, show one counterexampleeach, where the subject prefers the restructured patterns in andante tempo.The other slow speaker, P5, displays no different patterns in andante and

Fast Subjects

02468

10121416

Right Shifts(N=14)

Left Shifts(N=12)

BeatReductions

(N=14)

Types

Nu

mb

er

Not Shifted Andante

Shifted Andante

Shifted Allegro

Not Shifted Allegro

Right Shifts Word Pairs

0

1

2

3

4

p1 p2 p3 p4 p5 p6

subjects

nu

mb

er Expected combinations

Counterexamples

192 Maartje Schreuder and Dicky Gilbers

allegro at all. Clearly, we have two different groups of speakers and thisobservation strengthens our claim that restructuring relates to speech rate.

Some items, such as hobbywerkruimte (Type 1) 'hobby room', nevershow a stress shift and other items, such as viriliteit (Type 2) ‘virility’,prefer the shifted pattern in both tempos for all subjects. Possibly, thesyllable structure plays an important role; open syllables seem to lose stressmore easily than closed ones.

5.2. Acoustic analysis

In the current state of phonological research, embodied in e.g. laboratoryphonology, much value is set on acoustic evidence for phonologicalanalyses. Studies such as Sluijter (1995) and Sluijter and Van Heuven(1996) provide acoustic correlates for primary stress. In our study we areconcerned with beat reduction and secondary stress shifts and we wonderwhether or not the same acoustic correlates hold for secondary stress.Shattuck Hufnagel et al (1994) and Cooper and Eady (1986) do not findacoustic correlates of rhythmic stress at all. They claim that it is not entirelyclear which acoustic correlates are appropriate to measure, since thesecorrelates are dependent on the relative strength of the syllables of anutterance. The absolute values of a single syllable can hardly be comparedwithout reference to their context and the intonation pattern of the completephrase. Huss (1978) claims that some cases of perceived rhythmic stressshift may be perceptual rather than acoustic in nature. Grabe and Warren(1995) also suggest that stress shifts can only be perceived in rhythmiccontexts. In isolation, the prominence patterns are unlikely to be judgedreliably. In the remainder of this paper we try to find out if we can supportone of these lines of reasoning. In other words, are we able to support ourperceived rhythmic variability with a phonetic analysis? Therefore, wemeasured the duration, pitch, intensity, spectral balance and rhythmictiming of the relevant syllables as realized by subject P1.

Because Dutch is a quantity-sensitive language, the duration of therelevant syllable rhymes was considered. Onsets do not contribute to theweight of a syllable. In Figure 7, the duration analysis is shown for Type 2data (left shifts). The four columns indicate, respectively, the duration ofthe rhyme of the first and second syllable in andante speech, and theduration of the first and second one in allegro speech. According to Sluijter(1995), duration is the main correlate of primary stress. As a starting point,

The Influence of Speech Rate on Rhythm Patterns 193

we adopt her claim for our analysis of secondary stress. Our measurementswould confirm our hypothesis and our auditive analysis, if the secondcolumn were higher than the first one and if the fourth column were lowerthan the third one. In that case, the subject would realize a word such asperfectionist as perfèctioníst in andante tempo and as pèrfectioníst inallegro tempo.

In the andante tempo, three out of six items show the dominantcorrespondence pattern and in the allegro tempo, four out of six items showthe dominant markedness pattern. That is hardly a preference and it doesnot confirm our auditive analysis of the same data. Furthermore, if weconsider the word pairs with different patterns, there is only one pair thathas the ideal ratio: the patterns of amerikaan.

Figure 7. Duration (Left Shifts by Subject P1)

If duration does not enable us to confirm our auditive findings, maybe pitchis the main stress correlate for this speaker. However, pitch measurementsreveal the same fuzzy result as the duration measurements. Again, only onepattern confirms the auditive analysis. This time it is not the itemamerikaan, but the item perfectionist. Moreover, the differences in pitch inthis item do not exceed the threshold of the 2.5%, which is the justnoticeable difference for pitch. We also analyzed the mean intensity valueof the relevant vowels without recognizable patterns between allegro andandante style. These results support the analyses of Sluijter (1995) andSluijter and Van Heuven (1996), who also claim that the intensityparameter does not contribute much to the perception of stress.

Next, we considered the spectral balance. In order to rule out theinfluence of the other parameters, we monotonized the data for volume and

d u ra t io n rh ym es

00 .0 20 .0 40 .0 60 .0 8

0 .10 .1 20 .1 40 .1 6

perfe

ction

ist

amer

ikaan

pirat

erij

verb

alise

ren

bana

liteit

virilit

eit

ite m

seco

nd

s a n da n te r1

a n da n te r2

a lleg ro r1

a lleg ro r2

194 Maartje Schreuder and Dicky Gilbers

pitch. Then we selected the relevant vowels and analyzed them as acochleagram in PRAAT. The cochleagram simulates the way the tympanicmembrane functions, in other words the way in which we perceive sounds.In Figure 8 we show two cochleagrams of the vowel [a] in the fourthsyllable of, respectively, stúdietòelage 'study grant' (Type 1) in andantetempo and stúdietoelàge in allegro tempo. This item was taken from a pre-study. The allegro data show the expected increased perceived loudness inthe higher frequencies, indicated by means of shades of gray; the darkergray the more perceived loudness.

Figure 8. Cochleagrams of [�] in studietoelage

The right cochleagram (stressed [a]) in Figure 8 shows increased perceivedloudness in the regions of approximately 5 to 22 Bark in the allegro versionof [a] in comparison with the left cochleagram (unstressed [a]). Thisconfirms the results of the study of primary stress in Sluijter (1995). If weconvert this perceptive, almost logarithmic, Bark scale into its linearcounterpart, the Hertz scale, this area correlates with the frequency regionof 3 to 10 kHz.

In order to measure perceived secondary stress, we will measure therelative loudness in the different frequency regions in Phon.4 According toSluijter (1995) stressed vowels have increased loudness above 500 Hzcompared to the same vowel in an unstressed position. This can be shown ifwe take a point in time from both cochleagrams in Figure 8 in which the F1reaches its highest value (following Sluijter, 1995). In Figure 9 the valuesin Phon are depicted for these points and plotted against the Bark values in25 steps.

Time (s)0 0.169371

0

5

10

15

20

25

Time (s)0 0.143209

0

5

10

15

20

25

The Influence of Speech Rate on Rhythm Patterns 195

Figure 9. Loudness in Phon

The white line in Figure 9 indicates the pattern of the allegro stressed [a] instudietoelage and the black line indicates the pattern of the andanteunstressed [a]. We see increased loudness in the region of 13 to 21 Bark,which correlates with the most sensitive region of our ear. The mean Phonvalue in Figure 9 between 5 and 21 Bark is 43.6 Phon for the andanteunstressed [a] and 47.4 Phon for the allegro stressed [a]; a mean differenceof 3.8 Phon.

Now, let us see whether or not we can find similar results for our subjectP1. Figure 10 shows that the spectral balance confirms the leftward stressshift we perceived in the allegro realization of amerikaan. The first syllablevowel in allegro tempo is characterized by more loudness in the higherfrequency regions than its andante counterpart. In the second syllable vowelit is just the other way around.

Figure 10. Spectral balance comparison of the first two vowels of amerikaan

Unfortunately, not all spectral balance data confirm our auditive analysis.For example, we claimed that the pitch analysis of the stress shift in

s t r e s s e d a n d u n s t r e s s e d [ a ]

2 0

3 0

4 0

5 0

6 0

1 5 9 13 17 21 25

b a r k

ph

on a n d a n t e

a l l e g r o

Stressed and unstressed [a] in [a]merikaan

01020304050

1 4 7 10 13 16 19 22 25

Bark

Ph

on andante

allegro

Stressed and unstressed [e] in a[me]rikaan

0

20

40

60

1 4 7 10 13 16 19 22 25

Bark

Ph

on andante

allegro

196 Maartje Schreuder and Dicky Gilbers

perfectionist did confirm our auditive analysis. Therefore, we expectedmore loudness in the allegro realization of the first vowel and less loudnessin the allegro realization of the second vowel, but it appeared that there isrelatively more loudness in the andante realization of per. This resultcontradicts our auditive and our pitch analysis.

We have to conclude that the different phonetic analyses contradict eachother. Sometimes the perceived stress shift is characterized by a longerduration of the stressed syllable; sometimes a relatively higher pitchcharacterizes it. The results of our spectral balance analysis show that thedifferences in loudness pattern with differences in duration. In ourperceived stress shift in allegro perfectionist, pitch turned out to be thedecisive correlate, whereas duration and spectral balance measurementsindicated no shift at all. On the other hand, the perceived shift in allegroamerikaan was confirmed by the duration and spectral balance analysestogether, whereas pitch measurements indicated the opposite pattern. Formost perceived stress shifts, however, the acoustic correlates did not giveany clue.

Finally, we will consider whether the perception of restructuringdepends on rhythmic timing. Just as in music, speech can be divided into amelodic string and a rhythmic string as partly independent entities. Withrespect to speech, the melodic string seems to be more flexible than therhythmic one. Imagine that the rhythm constitutes a kind of metronomepulse to which the melodic content has to be aligned. The listener expectsprominent syllables to occur with beats. This behavior is formulated as theEqual Spacing Constraint: prominent vowel onsets are attracted toperiodically spaced temporal locations (Couper-Kuhlen, 1993; Cummins &Port, 1998; Quené & Port, 2002; a.o.). Dependent on speech rate thenumber of intervening syllables between beats may differ. Suppose the beatinterval is constant at 300 msec., there will be more linguistic material inbetween in allegro speech, e.g. the two syllables die and toe instúdietoelàge, than in andante speech, e.g. only one syllable die instúdietòelage.

If indeed the perception of secondary stress shifts depends on rhythmictiming, i.e. the beat interval between prominent syllables in andante andallegro speech is approximately equal, than we expect that the durationquotient of the interval between, for example, stu and toe in the andanterealization of studietoelage and stu and la in the allegro realizationapproximates 1.

The Influence of Speech Rate on Rhythm Patterns 197

In our pre-study, the interval between the vowel onsets of the first andthird syllable in studietoelage (andante) is 0.358 sec, whereas the intervalbetween the first and the fourth syllable in the allegro realization of thesame word is 0.328 sec. This means that the duration quotient is 1.091,which indeed approximates 1. In other words, this example supports theidea of the Equal Spacing Constraint.

Does the same result hold for our present data? We measured the beatintervals between all possible stress placement sites for all six subjects.Figure 11 depicts the duration quotients for subject 1. Figure 12 shows thebeat intervals of the same data. It depicts as well the duration intervalbetween the first and the third, as the first and fourth syllable for bothspeech rates. We expect restructuring for those data in which the line of thefirst to third syllable interval (andante (black line)) coincides with the lineof the first to fourth syllable interval (allegro (white line)).

Figure 11. Quotient beat intervals of Subject P1

Figure 12. Beat intervals of Subject P1

R ig h t S h i f t s b e a t in t e r v a ls a n d a n t e : a l le g r o

0 .50 .60 .70 .80 .9

11 .1

studie

toela

ge

wegwer

paan

st...

trimes

terin

delin

g

kam

ervo

orzit

ter

hobb

ywer

kruim

te

gem

eent

einsc

hr...

winkelo

pheff

ing

qu

oti

ent

q u o t ie n t

R ig h t S h if ts B e a t In te rva ls P 1

00 .10 .20 .30 .40 .50 .60 .70 .80 .9

studie

toela

ge

wegwer

paan

steke

r

trimes

terin

delin

g

kam

ervo

orzit

ter

hobb

ywer

kruim

te

gem

eent

einsc

hrijv

ing

winkelo

pheff

ing

bea

t in

terv

als

in s

ec

an da nte s1 -3

an da n te s1 -4

a lleg ro s1 -4

a lleg ro s1 -3

198 Maartje Schreuder and Dicky Gilbers

The Figures 11 and 12 indicate that the relevant beat intervals of the items1, 4 and 7, studietoelage 'study grant', kamervoorzitter 'chairman of theHouse of Parliament' and winkelopheffing 'closing down of a shop',respectively, coincide. In other words, we expect to hear restructuring inexactly these three items.

Unfortunately, our auditive analysis indicates only attestedcombinations of restructuring in items 2 and 6: wegwerpaansteker'disposable lighter' and gemeente-inschrijving 'municipal registration',respectively. Obviously, rhythmic timing is not the decisive characteristicof perceived restructuring in allegro speech either.

6. Discussion and Conclusion

In section 4, we presented our phonological account of the restructuringwithin the framework of OT. Our main conclusion is that phoneticcompression cannot be the sole explanation of the different rhythmpatterns. Although the results cannot really confirm our hypothesis thatthere are different grammars, i.e. constraint rankings for different rates ofspeaking, there seems to be something that relates to speech rate. The fastspeakers display different grammars, i.e. constraint rankings, for differentrates of speaking. In their andante tempo, correspondence constraintsprevail, whereas in allegro tempo markedness constraints dominate thecorrespondence ones. These preferences resemble the preferences ofandante and allegro music. In both disciplines clashes are avoided inallegro tempo by means of enlarging the distances between beats.

In section 5, we attempted to confirm our phonological account with aphonetic analysis. Unfortunately, the phonetic correlates of stress -duration, pitch, intensity and spectral balance - do not show the expectedand perceived differences in rhythm patterns in all pairs. Sluijter (1995)found out that duration is the main correlate of primary stress with spectralbalance as an important second characteristic. In our analysis, however,neither differences in duration nor differences in spectral balance couldidentify secondary stress. Therefore, we have to conclude that our analysissupports earlier work by Shattuck Hufnagel et al (1994), Cooper and Eady(1986), Huss (1978) and Grabe and Warren (1995), who all claim thatacoustic evidence for secondary stress cannot be found unambiguously.Although we did find some differences in duration, spectral balance orpitch, these differences were not systematically found in all pairs in which

The Influence of Speech Rate on Rhythm Patterns 199

we perceived rhythmic variability. Finally, we discussed rhythmic timing asa cue for variable patterns. However, the hypothesis that the durationbetween prominent syllables is approximately equal in both andante andallegro speech was not confirmed by the auditive analysis of the data. Itseems that rhythmic restructuring is more a matter of perception than ofproduction. At this point, the question remains: are we fooled by our brainsand is there no phonetic correlate of the perceived phonological stress shiftsin the acoustic signal or do we have to conclude that the real phoneticcorrelate of secondary stress has yet to be found?

Notes

1 This paper is an extension of our paper "Restructuring the melodic content of

feet", which is submitted to the proceedings of the 9th International PhonologyMeeting: Structure and melody, Vienna 2002. We wish to thank GrzegorzDogil, Hidetoshi Shiraishi plus the participants of the 9th InternationalPhonology Meeting, Vienna 2002 and the participants of the 11th ManchesterPhonology Meeting, Manchester 2003 for their useful comments. We are alsograteful to Sible Andringa, Nynke van den Bergh, Gerlof Bouma, John Hoeks,Jack Hoeksema, Wander Lowie, Dirk-Bart den Ouden, Joanneke Prenger,Ingeborg Prinsen, Femke Wester for participating in our experiment. Weespecially thank Wilbert Heeringa and Hugo Quené for supplying us with thePRAAT scripts that we could use for our spectral balance and rhythmic timinganalyses.

2 For reasons of clarity, we abstract from constraints such as FOOTBINARITY

(FTBIN) and WEIGHT-TO-STRESS PRINCIPLE in Table 2. Although theseconstraints play an important role in the Dutch stress system (cf. Gilbers &Jansen, 1996), the conflict between OUTPUT-OUTPUT CORRESPONDENCE andFOOT REPULSION is essential for our present analysis.

3 With respect to the phonological analysis of the data, we suggest a randomranking of weighed correspondence and markedness constraints. By means ofweighing constraints we adopt an OT variant that more or less resembles theanalyses in OT’s predecessor Harmonic Grammar (cf. Legendre, G., Y. Miyata &P. Smolensky, 1990). Note that we do not opt for a co-phonology for allegro-stylespeech in our analysis. In a co-phonology, the output of the andante-style rankingis input or base for the allegro-style ranking. We opt for a random ranking withdifferent preferences for allegro and andante speech, because our data showvariable rhythmic structures at both rates. Both rankings evaluate the same inputform.

200 Maartje Schreuder and Dicky Gilbers

4 The perceived loudness depends on the frequency of the tone. The Phon entity

is defined using the 1kHz tone and the decibel scale. A pure sinus tone at anyfrequency with 100 Phon is as loud as a pure tone with 100 dB at 1kHz(Rietveld and Van Heuven, 1997: 199). We are most sensitive to frequenciesaround 3kHz. The hearing threshold rapidly rises around the lower and upperfrequency limits, which are respectively about 20Hz and 16kHz.

References

Boersma, Paul, and David Weenink (1992-2002). PRAAT, phonetics bycomputer. Available at http://www.praat.org, University ofAmsterdam.

Burzio, Luigi (1998). Multiple Correspondence. Lingua, 104: 79-109.Cooper, W., and J. Eady (1986). Metrical phonology in speech production.

Journal of Memory and Language, 25: 369-384.Couper-Kuhlen, Elizabeth (1993). English speech rhythm: form and function in

everyday verbal interaction. Benjamins, Amsterdam.Cummins, Fred, and Robert Port (1998). Rhythmic constraints on stress timing

in English. Journal of Phonetics, 26(2): 145-171.Eefting, Wieke, and Toni Rietveld (1989). Just noticeable differences of

articulation rate at sentence level. Speech Communication, 8:355-351.

Gilbers, Dicky, and Wouter Jansen (1996). Klemtoon en ritme in OptimalityTheory, deel 1: hoofd-, neven-, samenstellings- enwoordgroepsklemtoon in het Nederlands [Stress and rhythm inOptimality Theory, part 1: primary stress, secondary stress,compound stress and phrasal stress in Dutch]. TABU, 26(2): 53-101.

Gilbers, Dicky, and Maartje Schreuder (to appear). Language and Music inOptimality Theory. Proceedings of the 7th InternationalCongress on Musical Signification 2001, Imatra, Finland.Extended manuscript available as ROA-571.

Grabe, Esther, and Paul Warren (1995). Stress shift: do speakers do it or dolisteners hear it? In: Connell, Bruce and Amalia Arvaniti (eds.).Phonology and phonetic evidence. Papers in LaboratoryPhonology IV.

Hart, Johan, René Collier, and Antonie Cohen (1990). A perceptual study ofintonation. An experimental-phonetic approach to speechmelody. Cambridge University Press, Cambridge.

The Influence of Speech Rate on Rhythm Patterns 201

Huss, V. (1978). English word stress in the postnuclear position. Phonetica, 35:86-105.

Kager, René (1994). Ternary rhythm in alignment theory. ROA-35.Legendre, Geraldine, Yoshiro Miyata, and Paul Smolensky (1990). Harmonic

Grammar - A formal multi-level connectionist theory oflinguistic well- formedness: An application. In: Proceedings ofthe Twelfth Annual Meeting of the Cognitive Science Society,884-891.

Lerdahl, Fred, and Ray Jackendoff (1983). A Generative Theory of TonalMusic. The MIT Press, Cambridge, Massachusetts, London,England.

Liberman, Mark (1975). The Intonational System of English. Garland, NewYork and London.

McCarthy, John J. (1986). OCP Effects: Gemination and antigemination.Linguistic Inquiry, 17: 207-263.

Neijt, Anneke, and Wim Zonneveld (1982). Metrische fonologie - Derepresentatie van klemtoon in Nederlandse monomorfematischewoorden. [Metrical phonology – The representation of stress inDutch monomorphemic words] De nieuwe Taalgids, 75: 527-547.

Prince, Alan, and Paul Smolensky (1993). Optimality Theory: constraintinteraction in generative grammar. Ms., ROA-537.

Quené, Hugo, and Robert F. Port (2002). Rhythmical factors in stress shift.Paper presented at the 38th Meeting of the Chicago LinguisticSociety, Chicago.

Rietveld, Toni, and Vincent van Heuven (1997). Algemene Fonetiek. [GeneralPhonetics]. Dick Coutinho, Bussum.

Schreuder, Maartje, and Dicky Gilbers (submitted). Restructuring the melodiccontent of feet. In: Proceedings of the 9th InternationalPhonology Meeting 2002, Vienna, Austria.

Shattuck Hufnagel, Stephanie, Mari Ostendorf, and Ken Ross (1994). Stressshift and early pitch accent placement in lexical items inAmerican English. Journal of Phonetics, 22: 357-388.

Sluijter, Agaath (1995). Phonetic Correlates of Stress and Accent. HILdissertations 15, Leiden University.

Sluijter, Agaath, and Vincent van Heuven (1996). Spectral balance as anacoustic correlate of linguistic stress. Journal of the AcousticalSociety of America, 100(4): 2471-2485.

List of Addresses

Drs. Markus BergmannUniversity of Groningen, Faculty of Arts, Department of LinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635982, [email protected]

Drs. Tamás BíróUniversity of Groningen, Faculty of Arts, Department of ComputationalLinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3636852, [email protected]

Dr. Dicky GilbersUniversity of Groningen, Faculty of Arts, Department of LinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635983, [email protected]

Dr. Charlotte GooskensUniversity of Groningen, Faculty of Arts, Department of ScandinavianLanguages and CulturesOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635827, [email protected]

Dr. Dr. Tjeerd de Graaf and Drs. Nynke de GraafUniversity of Groningen, Faculty of Arts, Department of LinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635982, [email protected]

Drs. Angela GrimmUniversity of Groningen, Faculty of Arts, Department of LinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635920, [email protected]

Dr. Ing. Wilbert HeeringaUniversity of Groningen, Faculty of Arts, Department of ComputationalLinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635970, [email protected]

204

Prof. Dr. Vincent J. van HeuvenUniversity of Leiden, Faculty of Arts, Department of LinguisticsVan Wijkplaats 4, 2311 BX Leiden, The Netherlands+31 71 5272105, [email protected]

Nienke Knevelp/a University of Groningen, Faculty of Arts, Department of LinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635983, [email protected]

Dr. Jurjen van der KooiUniversity of Groningen, Faculty of Arts, Department of FrisianOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635966, [email protected]

Prof. Dr. Ir. John NerbonneUniversity of Groningen, Faculty of Arts, Department of ComputationalLinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635815, [email protected]

Drs. Maartje SchreuderUniversity of Groningen, Faculty of Arts, Department of LinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635920, [email protected]

Drs. Hidetoshi ShiraishiUniversity of Groningen, Faculty of Arts, Department of LinguisticsOude Kijk in 't Jatstraat 26, 9712 EK Groningen, The Netherlands+31 50 3635982, [email protected]

Dr. Ivilin StoianovUniversity of Padova, Department of General PsychologyVia Venezia 8, 35100 AS Padova, Italy+39 049 8276676, [email protected]