89378429 exceptions in grammar

Expecting the Unexpected: Exceptions in Grammar

Trends in LinguisticsStudies and Monographs 216

EditorVolker Gast

Founding EditorWerner Winter

Editorial BoardWalter BisangHans Henrich HockMatthias SchlesewskyNiina Ning Zhang

Editor responsible for this volumeWalter Bisang

De Gruyter Mouton

Expecting the Unexpected:Exceptions in Grammar

Edited by

Horst J. SimonHeike Wiese

De Gruyter Mouton

ISBN 978-3-11-021908-1e-ISBN 978-3-11-021909-8ISSN 1861-4302

Library of Congress Cataloging-in-Publication Data

Expecting the unexpected : exceptions in grammar / edited by HorstJ. Simon, Heike Wiese.

p. cm. � (Trends in linguistics. Studies and monographs ; 216)Includes bibliographical references and index.ISBN 978-3-11-021908-1 (alk. paper)1. Grammar, Comparative and general � Grammatical categories.

2. Generative grammar. 3. Functionalism (Linguistics) I. Simon,Horst J. II. Wiese, Heike.

P283.E97 2011415�dc22

2010039874

Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie;detailed bibliographic data are available in the Internet at http://dnb.d-nb.de.

” 2011 Walter de Gruyter GmbH & Co. KG, Berlin/New York

Typesetting: PTP-Berlin Protago TEX-Production GmbH, BerlinPrinting: Hubert & Co. GmbH & Co. KG, Göttingen� Printed on acid-free paper

Printed in Germany.

www.degruyter.com

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Introductory overview

What are exceptions? And what can be done about them? . . . . . . . . 3Horst J. Simon and Heike Wiese

Coming to grips with exceptions . . . . . . . . . . . . . . . . . . . . . 31Edith Moravcsik

Classical loci for exceptions: morphology and the lexicon

Exceptions to stress and harmony in Turkish: co-phonologies orprespecification? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Barıs Kabak and Irene Vogel

Lexical exceptions as prespecification: some critical remarks . . . . . . 95T.A. Hall

Feature spreading, lexical specification and truncation . . . . . . . . . . 103Barıs Kabak and Irene Vogel

Higher order exceptionality in inflectional morphology . . . . . . . . . 107Greville G. Corbett

An I-language view of morphological ‘exceptionality’: Comments onCorbett’s paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127Stephen R. Anderson

Exceptions and what they tell us: reflections on Anderson’s comments . 135Greville G. Corbett

How do exceptions arise? On different paths to morphologicalirregularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Damaris Nübling

On the role of subregularities in the rise of exceptions . . . . . . . . . . 163Wolfgang U. Dressler

vi Contents

Statement on the commentary by Wolfgang U. Dressler . . . . . . . . 169Damaris Nübling

Taking into account interactions of grammatical sub-systems

Lexical variation in relativizer frequency . . . . . . . . . . . . . . . . . 175Thomas Wasow, T. Florian Jaeger, and David M. Orr

Corpus evidence and the role of probability estimates in processingdecisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197Ruth Kempson

Response to Kempson’s comments . . . . . . . . . . . . . . . . . . . . 205Thomas Wasow, T. Florian Jaeger and David Orr

Structured exceptions and case selection in Insular Scandinavian . . . . 213Jóhannes Gısli Jónsson and Thórhallur Eythórsson

Remarks on two kinds of exceptions: arbitrary vs. structuredexceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243Susann Fischer

Response to Susann Fischer . . . . . . . . . . . . . . . . . . . . . . . . 251Jóhannes Gısli Jónsson and Thórhallur Eythórsson

Loosening the strictness of grammar

Three approaches to exceptionality in syntactic typology . . . . . . . . 255Frederick J. Newmeyer

Remarks on three approaches to exceptionality in syntactic typology . . 283Artemis Alexiadou

A reply to the commentary by Artemis Alexiadou . . . . . . . . . . . . 289Frederick J. Newmeyer

Three types of exceptions – and all of them rule-based . . . . . . . . . . 291Sam Featherston

Anomalies and exceptions . . . . . . . . . . . . . . . . . . . . . . . . 325Hubert Haider

Distinguishing lexical and syntactic exceptions . . . . . . . . . . . . . 335Sam Featherston

Contents vii

Disagreement, variation, markedness, and other apparent exceptions . . 339Ralf Vogel

What is an exception to what? – Some comments on Ralf Vogel’scontribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361Henk van Riemsdijk

Response to van Riemsdijk . . . . . . . . . . . . . . . . . . . . . . . . 369Ralf Vogel

Describing exceptions in a formal grammar framework . . . . . . . . . 377Frederik Fouvry

Explanation and constraint relaxation . . . . . . . . . . . . . . . . . . . 401Pius ten Hacken

Unexpected loci for exceptions: languages and language families

Quantitative explorations of the worldwide distribution of rarecharacteristics, or: the exceptionality of northwestern Europeanlanguages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411Michael Cysouw

Remarks on rarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433Östen Dahl

Some more details about the definition of rarity . . . . . . . . . . . . . 437Michael Cysouw

Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443Language index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

Preface

The present volume contains a variety of contributions: some have evolved froma selection of contributions to a workshop at the 27th Annual Meeting of theGerman Society for Linguistics (DGfS) in Cologne in 2005; others were invitedby the editors. We have decided to introduce a somewhat exceptional – or atleast rare – structural feature to this volume: Each main article is complementedby an invited critical commentary and by a response from the original author(s)(with the exception of the two introductory chapters, which thus constitute asmall exceptional subset within the broader exceptional pattern of this book).We believe that enhancing the discursivity of the book in this way makes for alivelier and more fruitful discussion, in particular in the case of a topic that isas central to theory and practice in our field, and accordingly as controversial,as that of exceptions.

The beginnings of this book reach back to a time when we were both Re-search Fellows of the Alexander-von-Humboldt Foundation, at the Universityof Vienna and at Yale University respectively, on leave from our shared homeaffiliation at the Department of German Language and Linguistics at HumboldtUniversity; we gratefully acknowledge the support of these institutions.

Horst J. Simon & Heike WieseLondon & Potsdam, 2010

Introductory overview

What are exceptions?And what can be done about them?

Horst J. Simon and Heike Wiese

la question de l’exception est un pointnévralgique de la linguistique

(Danjou-Flaux and Fichez-Vallez 1985: 99)

1. Exceptions and rules

When modelling data, we want the world to be nice and simple. We wouldlike the phenomena we encounter to be easily categorised and neatly related toeach other, and maybe even into causal or at least implicational relationships.However, the world is more complicated. More often than not, when we proposerules in order to capture the observed facts, we find problems. Certain piecesof data refuse to submit to the generalisations we propose; they stand out asexceptions. Or, to put it the other way round, an ‘exception’ necessarily impliesa rule, which it violates. In what follows we illustrate four central aspects of thecomplex relationship between exceptions and rules: (i) the underdeterminationof rules, and hence the impossibility of avoid exceptions, (ii) the formation of‘exceptional rules’ in subsystems, (iii) the interaction of different grammaticallevels influencing rules and exceptions, and (iv) the possibility of having moreexceptions than rule-governed instances.

1.1. The underdetermination of rules

In a general sense, a rule is a generalisation over empirical observations thatallows predictions with regard to data yet to be collected.1 The basic problemwith generalisations is, of course, that we never know the future for certain:one can never know that the next bit of data one examines will not be like the

1. Thus, the concept of ‘rule’ in an empirical science like linguistics must be distin-guished from the concept of a ‘social rule’, which people are expected to adhereto.

4 Horst J. Simon and Heike Wiese

data considered before. The reason for this is the fact that a rule is underdeter-mined by its extension, i.e. by the instantiations of its application. An exampleof what it means to follow a rule has been discussed by Wittgenstein (1953:§143ff., in particular 185f.): Consider a case where you try to teach someonethe rule ‘add 2’ for natural numbers by showing her the series ‘0, 2, 4, 6, 8’. Thepupil then correctly writes ‘0, 2, 4, 6, 8, 10, 12, …’, that is, she can apply therule to new instances. But when reaching 1000, she might go on ‘1004, 1008,1012, …’. In such a case, the pupil might have extrapolated a rule “Add 2 upto 1000, 4 up to 2000, 6 up to 3000, and so on.” (§185). Both the pupil’s ruleand our rule were compatible with the initial data, i.e. with the series from 0 to8, hence, an extrapolation of a rule from these data (its instantiations) is under-determined. Now, since the available data underlying any generalisation are ofnecessity finite, this is a fundamental problem for the empirical sciences.2

Now imagine a slightly different case (not Wittgenstein’s example any-more): The pupil sees the same series ‘0, 2, …, 8’ and this time extrapolatesfrom this data the rule ‘add 2’. However, she then discovers that the series goeson ‘10, 12, …, 1000, 1004’. In order to account for this new data, one option shenow has is to keep the rule ‘add 2’ and mark ‘1004’ as an exception. Anotheroption is to assume a more complex rule, e.g. the one along the lines of ‘Add 2up to 1000, 4 up to 2000, 6 up to 3000, …’. In this simple case, the two differ-ent rules would make two different predictions that could be tested by furtherdata: In the first case, the series should then go on ‘1006, 1008, 1010, …’; in thesecond case, it should go on ‘1004, 1008, 1012, …, 2000, 2006, 2012, …’. Orit might be the case that something in-between is correct: it might turn out thatthe series from 1000 to 2000 forms an irregular, exceptional subsystem with aspecial rule ‘add 4’ that only holds in this domain; then the series would go on‘1008, 1012, …, 2000, 2002, 2004, 2006, …’.

1.2. Exceptional rules

Such in-between phenomena that illustrate the dialectical nature of the relation-ship between rules and exceptions, can be found, for instance, in a linguisticcounterpart of numbers, the formation of number words in natural languages.In most languages of the world, the following generalisation holds: in complexnumber words of an additive make-up, the constituent referring to the larger

2. There are, of course, general methodological considerations to guide one’s gener-alisation process, for instance Occam’s Razor, which basically advises one not toadd complications to an analysis unless absolutely necessary.

What are exceptions? And what can be done about them? 5

number comes first (cf. Hurford’s 1975 ‘Packing Strategy’). For instance, adecade word (words for the decades 10, 20, 30, …, 90) should come beforea word for ones (1, …, 9), as in English forty-two, not *two-forty, so that wehave an order “H-L” of constituents, where H is the higher number word, andL is the lower one. However, the English teens represent an exception to thisrule: number words from thirteen to nineteen follow the pattern ‘L-H’ wherethe lower constituent, namely the expression for the ones, precedes the higherconstituent, i.e. the decade word (hence, we have thir-teen, four-teen, … nine-teen). This is in contrast to, say, French, where the order is H-L (dix-sept, dix-huit, dix-neuf) in keeping with the general rule for the order of additive con-stituents. The English teens hence form a small, exceptional class of their own:given their unified pattern, we can formulate a sub-rule for them, stating that‘the order of constituents is L-H for teens’. What we have here is then an ‘ex-ceptional rule’. This rule is restricted to only a few words and deviates fromthe general pattern of number words in English which follows the usual H-L pattern found in the world’s languages. However, there are also languageswhere the kind of irregular pattern we find in English teens is more gener-alised and is used in all number word constructions consisting of a decade wordand a word for ones. Examples are other Germanic languages like German orDutch, but also genetically and typologically unrelated languages like Arabic.In these languages, the L-H pattern holds not only for the teens, but extendsto 1–20, 2–20, … 9–90. Thus, despite the obvious exceptionality from a typo-logical point of view, we can still find internal regularity in these languages:for a large ‘exceptional’ class, we can formulate a rule ‘LO-HD’, where ‘O’ isa number word for ones, and ‘D’ is one for decades, as a well-defined devia-tion from the general H-L rule. This rule then supports a special, exceptionalsubsystem, a subsystem that covers a larger domain than the one in English,and that is absent in French altogether. In this sense, exceptionality is a grade-able and context-dependent concept: elements can be more or less exceptional,and they can be exceptional with respect to a general rule that governs the sys-tem as a whole, but non-exceptional with respect to a rule that governs a sub-system.

1.3. The interaction of different grammatical levels

The interplay of rule and exception is of methodological and theoretical sig-nificance for any linguistic analysis. It therefore comes as no surprise that thefirst major methodological debate in modern linguistics, in the 1870s, centredexactly around this problem. In compliance with 19th century linguists’ pre-occupation with diachronic issues, this so-called Neogrammarian Controversy


focused on the hypothesis that Sound Laws are without exceptions.3 Followingup on previous achievements of Comparative Indo-European Linguistics, andinspired by possible parallels with the Laws of Physics, the Neogrammariansmaintained that:

Aller Lautwandel, so weit er mechanisch vor sich geht, vollzieht sich nach aus-nahmslosen gesetzen, d.h. die richtung der lautbewegung ist bei allen angehöri-gen einer sprachgenossenschaft, ausser dem fall, dass dialektspaltung eintritt,stets dieselbe, und alle wörter, in denen der der lautbewegung unterworfendelaut unter gleichen verhältnissen erscheint, werden ohne ausnahme von der än-derung ergriffen. (Osthoff and Brugmann 1878: XIII)[All sound change, insofar as it is mechanical, takes places under exceptionlesslaws, i.e. the direction of the sound movement is always the same with all mem-bers of a speech community – unless dialect split occurs – and all words, in whichthe sound undergoing the sound movement occurs in the same circumstances, arewithout exception affected by the change.]

The main initial idea here was that at a certain place in a certain period all wordscontaining the relevant sound (in the relevant phonological environment) wouldhave undergone a particular sound change captured by a certain ‘law’; the mo-tivation for such a general change was primarily seen in physiological factors.Later on, the hypothesis was somewhat relaxed by reducing it to a ‘working hy-pothesis’ – and one which was motivated by considerations from psychology.

The greatest triumph of the rigorous Neogrammarian methodology – anda confirmation of their basic idea – was accomplished by the discovery ofVerner’s Law. Initially, there had remained an embarrassing exception to theoutcomes of the First (Germanic) Consonant Shift, or Grimm’s Law: in thissound shift the Indo-European voiceless plosive consonants /p, t, k/ were frica-tivised to /f, þ, h/ (as exemplified by the correspondence of Ancient Greekphrator and Gothic broþar ‘brother’). However, unexpectedly, the equivalent ofGreek pater was Gothic faðar ‘father’ with a voiced fricative.4 Working withinthe ‘exceptionlessness-paradigm’,5 Verner (1877) could reconcile the deviantfacts with Grimm’s Law decades after its initial formulation. He showed how

3. Neatly documented in Wilbur (1977) and discussed at length in Jankowsky (1972).4. Modern German still evinces differing consonants in this case: Bruder and Vater,

albeit with different voicedness values due to subsequent developments.5. His main tenet was: “Bei der annahme eines zufalls darf man jedoch nicht beharren.

[…] Es muss in solchem falle so zu sagen eine regel für die unregelmässigkeit dasein; es gilt nur diese ausfindig zu machen” (Verner 1877: 101). [However, one mustnot be content with the assumption of chance. In such a case, there must be, so tospeak, a rule for the irregularity; it is just necessary to find it.]


these exceptions could be explained by taking into account the position of theword accent in the proto-language: Grimm’s Law proper applies only if the ac-cent was on the immediately preceding syllable in Proto-Indo-European, other-wise the fricatives are voiced in Germanic: /b, d, g/.6 This case nicely illustratesthat exceptions on one linguistic level (in this case, the segmental-phonologicalsystem) can be accounted for by competing rules from other linguistic – or non-linguistic – levels (in this case, prosodic phonology).

Meanwhile, there were a great many diachronic sound laws advanced, whichapply the idea of ‘blind’, exceptionless sound changes.7 – This is not to say, bythe way, that all sound change is exceptionless. In fact, sometimes even theexact opposite occurs: so-called ‘sporadic’ change – one that exceptionally oc-curs in a single example, both unexplained and inexplicable – as for instance theloss of /r/ in Modern English speech from Old English spræc.8 What is more,there are also many examples where non-phonological factors interfere with theregularity of a sound change. The most notable of these are analogy,9 lexicaldiffusion and general sociolinguistic factors.10 McMahon (1994: 21) capturesthe dialectic relationship of phonology and paradigmatic morphology in whatshe calls ‘Sturtevant’s Paradox’: “sound change is regular but creates irregular-ity, whereas analogy is irregular but creates regularity.”

1.4. Exceptions in the majority

One important factor in the interplay of rules and exceptions is that it is notat all trivial to decide which is which given a mass of initially unstructuredfacts. Often it turns out that what appears to be an exception in one scientificaccount is an instantiation of the rules in a competing analysis. A case in pointis the system of plural formation in German nouns. Nominal plural in Germanis expressed by a variety of suffixes (-e, -en, -er etc.) as well as by umlaut andzero-suffixation, leading to eight different forms of plural formation. In orderto account for the distribution of plural markers over nouns, a number of ruleshave been proposed in traditional German grammar, making use of features

6. Apparently, this correlation between voicedness and accent is still applicable in Mod-ern German, cf. Hannó[f]er vs. Hanno[v]eráner; Udolph 1989).

7. Many of the sound changes inside Indo-European are discussed in Collinge (1985).8. In other words, the initial consonant cluster has been retained in Modern English (as

can be deduced from examples such as spring, spray, sprawl etc.), so there is nosound law in the history of English pertaining to the loss of r in speech.

9. Already alluded to in the above quote from Osthoff and Brugmann (1878).10. Those factors have been discussed amply, and non-conclusively, in the literature,

e.g. in Labov (1981) and de Olviera (1991).


from different grammatical levels, like nominal gender, number of syllables, orthe ending of the singular form. However, these rules can only account for partof the nominal inventory and do not work very well for predictions. There is oneplural ending, though, whose distribution can be accounted for more straight-forwardly, namely the suffix -s. This plural suffix is used as a default; it turnsup whenever there is no existing form already or none that can be formed byanalogy, as in a lot of loan words, in abbreviations, and also in proper names.This had led to accounts that characterise the -s suffix as the regular form, whilethe seven other classes of nominal plural are considered irregular ones that aredriven by analogy (Janda 1991, R. Wiese 1996: 136–143, Pinker 1999: 211–239). Additional support for the ‘regular’ status of the -s plural comes fromovergeneralisations in first language acquisition (Clahsen et al. 1992, Marcuset al. 1995). However, the -s suffix is the least common plural form statisti-cally, hence under this view, only a small part of plural formation is regularlyrule-governed, while most of it is exceptional: exceptions are more commonlyrealised than rules – the statistical relationship of specific rule and Elsewhere-rule is turned upside down; curiously, such an analysis echoes a remark in MarkTwain’s essay, ‘The Awful German Language’:

Surely there is not another language that is so slipshod and systemless, and soslippery and elusive to the grasp. One is washed about in it, hither and thither,in the most helpless way; and when at last [the language learner] thinks he hascaptured a rule which offers firm ground to take a rest on amid the general rageand turmoil of the ten parts of speech, he turns over the page and reads, “Let thepupil make careful note of the following exceptions.” He runs his eye down andfinds that there are more exceptions to the rule than instances of it. So overboardhe goes again, to hunt for another Ararat and find another quicksand. (Twain[1880]1907: 267)

Thus, maybe unfortunately for the language learner (and the language teacher)and fortunately for the linguist who is interested in complex structures, languageis not parsimonious.

As has already become clear from the examples discussed so far, there aredifferent ways that linguists typically handle the exceptions they encounter intheir analyses. In the following sections, we will discuss these in turn.


2. Approaches to exceptions

2.1. Ignoring exceptions

A common approach to the problems posed by exceptions is to simply ignorethem. This can be achieved through more or less sophisticated argumentation.For example, when confronted with an exception to the rule that one has pro-posed, often the easiest way out is to say that this apparently disturbing fact doesnot belong to the linguistic system one analyses, using the infamous answeringtechnique: ‘Well, in my dialect …’.11 As Labov (1972: 292) has noted, “‘[m]ydialect’ turns out to be characterized by all sentence types that have been ob-jected to by others.”

In the statistical analysis of data, doing away with exceptions is part of a rea-sonable methodology: in any empirical study, one has to take into account thatthe collected data can be ‘spoiled’ for a variety of reasons. In order to minimiseunwanted statistical effects due to ‘bad’ data (which appear as a kind of ex-ceptions to the general picture), one usually abstracts away from what is called‘outliers’, i.e. the most deviant pieces of data on any given test item; they areheld to likely be mistakes or other ‘irrelevant’ phenomena.

2.2. Re-analysing exceptions

Another type of exceptional data – not mentioned so far – can be entire lan-guages. In cross-linguistically informed typological linguistics, where corre-lations between logically independent facts are investigated, one rarely findsdownright ‘universal’ phenomena; the formulations of the ‘statistical univer-sals’ are usually hedged by phrases like ‘with overwhelmingly greater thanchance frequency’, thereby allowing for a small number of languages behav-ing not as expected (cf. Dryer 1998 for discussion).

One type of exception one frequently encounters in linguistics is the oddlanguage that does not follow the generalisations made in large-scale cross-linguistic investigations. Thus, in linguistic typology, Greenberg-type univer-sals are extrapolated from large databases: basically they are predictions of oc-currences of a certain structure, or rather predictions of the fact that they donot occur or can occur only under specific conditions. However, there almost

11. Obviously, the background assumption in such a strategy is that data from a dif-ferent micro-variety need not be taken into account since “[l]inguistic theory isconcerned primarily with an ideal speaker-listener, in a completely homogeneousspeech-community, who knows its language perfectly …” (Chomsky 1965: 3).


always appear to be a few, some, or just single examples of languages wherethe structure in question does in fact exist in a ‘forbidden’ context.

While singleton languages not displaying the usual phenomena are interest-ing laboratories for the typologist, who can then seek to find an explanationas to what functional or other factors have played a role in creating such a pe-culiar system, such languages can be a great challenge for the formal linguist.Especially in the Chomskyan tradition with its strong emphasis on explanationson the grounds of a genetically endowed Universal Grammar (UG), languageswith exceptional grammatical peculiarities pose problems.12 Since grammati-cal distinctions, be they universal or only the speciality of a single language,should be captured by UG mechanisms in order to be acquirable by the child,the UG component becomes more unwieldy if it has to cater for all those ex-ceptional characteristics. Because of this extra burden that exceptions put onthe language faculty, within this framework it is very much desirable to showthat any account that assumes exceptions is flawed and can be replaced by onethat re-analyses the phenomena in a way that exceptions disappear.

To give an example of an allegedly exceptional trait that turned out to bea chimera on closer inspection:13 all recently proposed morphological featureinventories designed to capture the various systems of person-number combi-nations in the pronouns of the world (cf. Harley and Ritter 2002 and subsequentwork) have difficulties when it comes to distinguish the putative clusivity con-trast in second person plural pronouns, i.e. the difference between a set of onlyaddressees on the one hand and a group comprising addressee(s) and other(s)(non-speech-act-participants) on the other. While some authors have categor-ically denied the existence of such a distinction, others have claimed to havefound exceptions to the statement ‘no language distinguishes clusivity in thesecond person’. A closer look at these purported exceptional systems revealed,however, that in each case there had been some kind of mistake in the transmis-sion of the data: in the case of South-East Ambrym, the original author had inad-vertently conflated two geographically distinct dialects in his paradigms; withregard to Abkhaz (and a number of other, unrelated languages), an essentiallyemphatic suffix had been misinterpreted as involving clusivity; in the descrip-tions of Ojibwe (and a few other languages), the term ‘second person inclusive’

12. In this context, a reviewer has mentioned Newmeyer’s (2005) discussion of the rela-tionship between typology and UG. In our understanding, however, this is somewhatbeside the point, since Newmeyer seems to be concerned with broad-sweeping typo-logical generalisations and how they can be accounted for, not necessarily with theexplanation of potential individual counter-examples.

13. This example is more thoroughly discussed and documented in Simon (2005).


had been used in a terminologically confused (and confusing) way. In short, acareful study of the details of the particular pronominal systems (aided by philo-logical and dialect-geographic information) could show that the ‘exception’ inquestion actually vanished when studied more closely.14 In this case, then, ex-ceptions could be ‘analysed away’ by a more careful look at the data; there isno need to complicate the model of morphological features (Simon 2005).

Nonetheless, exceptions frequently do refuse to go away. In a typologicalvein, several phenomena have been recorded among the languages of the worldthat are ‘rara, rarissima or even none-suchs’.15

2.3. Integrating exceptions

In the study of a single language, one also frequently encounters a set of datathat cannot be handled straightforwardly by the usual grammatical rules of thatlanguage. In that case, at least two principal options arise for the researcher: s/hecan restrict the domain of the main rule and allocate the exceptions to a specialsub-component of the grammar (a sub-rule, e.g. the one discussed above fordecade number words, or, more extremely, the Lexicon); or s/he can design thegrammatical apparatus in a softer fashion so that the exceptions can be acco-modated in the main component itself. We will demonstrate these options inturn.

In the first kind of approach, one reduces the scope of the main grammaticalrule when one finds data contradicting it. In this case, one defines a smallerdomain for the rule and makes no prediction for the rest, i.e. for the space ofthe exceptions. Thus one gets a ‘core grammar’ and some kind of peripherywhere the normal rules simply do not apply. A case in point are interjections,response particles and the like, which allow for phonological, morphological,syntactic and semantic structures that are otherwise ruled out in the language.

14. Ironically, new possible evidence for the category just discussed – and thus possiblyan exceptional linguistic trait –, which was brought forward by Simon (2005) fromBavarian, has meanwhile also been disputed by Gehling (2006). Here, in fact, thedebate revolves around the question where to draw the boundary between grammarand pragmatics – the question of how much of politeness and other pragmatic factorsneeds to be incorporated into a referentially-oriented grammatical system.

15. They have been collected in the internet archive ‘Das grammatische Raritätenkabi-nett’ (‘The grammatical rarity cabinet’) at the University of Constance, searchableat: http://typo.uni-konstanz.de/rara/intro/. There one can also find the UniversalsArchive which lists not only the typological universals proposed, but also pertinentcounter-examples that have been noted in the literature. – Cf. also the recent collec-tions of studies on this matter: Wohlgemuth and Cysouw (2010a,b).


For instance, this is the domain where German allows for non-vowel syllablenuclei of whole words ([pst], [′P ′P] / [′Pm′Pm])16.

A version of that domain-restricting strategy is the strong reliance on a no-tion of the Lexicon as a storage space for all the peculiar characteristics of lexi-cal items. In consequence, the syntactic (or phonological, or semantic) compo-nent proper is freed from all complications. Any idiosyncrasies are relegated tothe individual lexical entries, where they are, by definition, not exceptions butonly specific lexicalised properties.

Rather than relegating exceptions to the lexicon altogether, one can define aspecific rule for them that creates an exceptional subset, and an Elsewhere-rulefor the rest. This can account for exceptions that constitute sub-systems of theirown, which is quite common, since exceptions tend to cluster. However, as theexample of German plural formation illustrated, it may not always be an easytask to decide which sub-system represents the main rule and which one theexceptional rule.17

An altogether different approach to the problem of exceptions is the secondkind of approach mentioned above, which is based on the notion of a softergrammar, a grammar without hard contrasts, where exceptions pose much lessof a problem. One way of achieving this is to build the model of grammar onprototypes. In doing so, one defines focal elements, which combine a numberof key characteristics. Grammatical items will be more or less similar to theseprototypes; those that are least similar are what used to be called exceptions;they have now turned their status into ‘non-prototypical members of their cat-egory’. The obvious advantage of such an approach is the great flexibility andcross-categorial cohesion it creates. A potential disadvantage is that its lack ofclear-cut distinctions makes it hard to formalise, and brings with it the risk thatuseful distinctions might be blurred (for discussion cf., e.g., Tsohatzidis 1990and Aarts et al. 2004).

Another flexible approach to the problem of exceptions in grammar is to al-low different rules to compete with each other. This means that one will not havea single, definite prediction but that several, possibly graded, alternatives arise.

16. The second example is the phonological representation of a colloquial variant ofthe negative response particle (ie., the counterpart of ‘no’) (cf. H. Wiese 2003 for adetailed discussion of the exceptional status of interjections).

17. A comparable situation holds for diachronic facts: system-internally, it is far fromclear whether English has a rule that results in the loss of /r/ in certain syllable po-sitions (cf. bass, equivalent to Modern German Barsch), and the r is occasionallyretained as in horse, or whether horse is what one expects and r was lost excep-tionally in bass; only facts of extra-linguistic history (and consequently probablelanguage contact scenarios) help to clarify the situation (cf. Hoenigswald 1978: 26).


In a subcase of this scenario, rules from different grammatical sub-systems ac-cess the same domain so that, for instance, regularities from semantics and fromsyntax are in competition. Consequently, the phenomenon at hand is an excep-tion in one system, but is predicted in the other system.

Take, for instance, case assignment of some psych-verbs in German, likefrieren ‘to feel cold’. A sentence with a psych-verb like (1) poses a problem fora syntactic account of German.

(1) Mich1sg.acc

friert.freeze.3sg

‘I am freezing.’ (lit.: “Me freezes.”)

In (1) the only argument slot is occupied by an accusative pronoun. So, do wehave an accusative (or ergative) subject here, in contrast to what we find inGerman sentences as a rule? Against this analysis, we find no person-numberagreement between the pronoun and the finite verb. So, have we instead got anentirely subject-less clause? This would constitute an exception in the syntaxof German as well.18 But despite this syntactic anomaly, the structure makesperfect sense from a semantic point of view: The Experiencer-role is typicallycoded by dative or accusative case,19 whereas the nominative subject of a clauseis typically an Agent. In this example, there is a mismatch between the syntacticand semantic requirements of a ‘normal’ clause; the two components competewith each other. The syntactic requirements are fulfilled when the sentence iscoded as in (2), a construction that is more common in modern German, replac-ing the subject-less alternative illustrated in (1). In this case, we also get thesyntactically expected subject-verb agreement. The morphosyntactic system’sgain is, however, semantics’ loss, because of the unusual correlation of case andsemantic role.

18. The variant Es friert mich gives evidence of a rescue strategy available in this case:the use of an expletive subject whose only function seems to be to rectify the excep-tionality of (1).

19. As in Ich streichle ihmdat - exp den Bart. (lit.: “I stroke him the beard.”, ‘I am strokinghis beard’) and Ich lehre ihnacc - exp singen. (lit.: “I teach him sing.”, ‘I teach him tosing’).


(2) Ich1sg.nom

friere.freeze.1sg

‘I am freezing.’

The observed pattern extends to other examples of relatively recent change aswell, showing that we are dealing with a real – if exceptional – sub-system inthe case system of German. (3a) vs. (3b) shows a similar phenomenon for thecase of denken ‘to think’, where the development towards morpho-syntacticregulation is even more advanced – that is, (3a) sounds already archaic andis rarely used in contemporary German anymore – presumably driven by themore agentive status of the (Experiencer-) role that denken assigns comparedto frieren:20

(3) a. Mich1sg.acc

dünkt.think.3sg

b.→ Ich1sg.nom

denke.think.1sg

‘I think.’

What this development illustrates, then, is the interplay not only of exceptionsand rules, but of exceptions, rules, and grammatical (sub)systems: what appearsas an exception in one system can be perfectly in accordance with a rule fromanother system. Such an interlocking network of rules (or rather: constraints) –each of them violable – is focussed on in approaches within Optimality Theory:in this framework, the rules themselves need not be modified, they are just seento be operating on different levels of grammar, and taking different dominanceover each other.

3. Why are there exceptions? How do they arise, and how dodisappear?

3.1. The emergence of exceptions

At a first glance, one should think that a language system without exceptionswould be best. And indeed, that is what one roughly gets – at least at the begin-ning – when people invent an artificial language, such as Esperanto.21

20. Moreover, the obsolete form of the verb is replaced by a newer, more regular one.21. Cf. Hagège (2005) for discussion.


However, since natural languages are biological systems they are susceptibleto evolutionary change (cf. e.g. Ritt 2004); it is only natural that they evolvegradually. In such a view of language change that crucially involves the ideaof bricolage – tinkering with what happens to be at hand (cf. Lass 1997: 313–316) – small-scale incremental changes necessarily produce structures that areexceptions to the system before the change.22

So how exactly do exceptions come into existence? We will discuss twomajor scenarios: first, the interplay of different levels of grammar can createcomplexity and irregularity on one level when changes occur on another level;second, changes due to extra-grammatical factors can unbalance the distributionof forms in a grammar.

An illustration for the first kind of scenario comes from a part of the gram-mar of English and German that appears quite confusing and exception-ladentoday, but started out as a fairly regular component in Proto-Indo-European:the group of co-called ‘irregular verbs’. This group comprises for the most partwhat historical linguists call ‘strong verbs’, i.e. those verbs which form theirpast tense and their past participle forms with ablaut of the stem vowel. In Con-temporary German, this area seems to be hardly rule-governed at all. The 5th

edition of the Duden-grammar of Modern German (Duden 1995), for instance,lists as many as 39 ablaut-classes for the ca. 170 ablauting verbs (p. 125),23 sev-eral with only one verb that follows the particular pattern – each of them beingan exception to all others so to speak. By contrast, the system of ablaut was en-tirely regular in an early variety of Indo-European.24 and still fairly predictable

22. Taken seriously, this fact contradicts the research methodology of strict structuralism(purporting to analyse ‘un système où tout se tient’) as it is most succinctly statedby Beedham (2005: 153): “Yet exceptions do exist, so how do they arise? It seemsto me that they arise to the extent that we, the grammarians, have got it wrong. Weintroduce them from outside with rules that are not quite right. If a rule is 100%correct it will have no (unexplained) exceptions whatsoever, if it is almost right itwill have a smaller number of exceptions, and if it is badly wrong it will have lots ofexceptions.” Reasonable as such a view may seem as a methodological premise, inthe light of the inevitability of exceptions in diachrony, it will have to be discarded.

23. To be fair, the most recent edition brings some systematisation into this list (Duden2005: 458–461).

24. That is at least the picture one gets if one subscribes to the not uncontroversiallaryngeal-hypothesis for Pre- or Proto-Indo-European (cf. Lehmann 1993); other-wise, more traditionally, some form of accentual difference will have to be taken asthe decisive factor. For a description of the fate of ablaut in the history of German cf.Nübling et al. (2006: 199–209); Mailhammer (2007) provides a new systematisationof Germanic ablauting verbs.


in Old High German, when the phonological make-up of the stem determinedto which of the seven ablaut-classes a verb would belong. The break-up of thisold morphologically regular system seems to be due to a phonological change:the loss of laryngals or a prosodic change. Hence in this case, an independentdevelopment in phonology creates exceptions on the morphological level.

Similarly, the loss of phonological distinctions in final syllables betweenOld and Middle High German obscured the phonological trigger for umlautingin German morphology. Therefore umlauting became ‘free’ to be a purely lex-ically based morphological process, for example in the formation of nominalplurals.25 In this way an irregularity effect was created: only some nouns takeumlaut in their plural, cf. Fadensg – Fädenpl ‘thread’ vs. Fladensg – Fladenpl

‘flat bread’.26

An example for the second kind of scenario is provided by the virtual dis-appearance of the second person singular pronoun thou in Standard ModernEnglish, where pragmatic and sociolinguistic factors were responsible for thespread of one form at the expense of another, thus creating a typological excep-tion in the English pronominal system. In Middle English, and well into Shake-speare’s time, there was a politeness distinction in English pronouns of addresscomparable to that of Modern French or Russian. There were two second personpronouns: the (informal) singular form thou and ye/you, which was employedin second person plural reference and also when a single person was to be ad-dressed politely. In a kind of inflationary process, the usage of you-forms thenbecame more and more generalised, so that thou was relegated to the fringe,used only in very restricted circumstances, such as certain religious contexts.Therefore, a kind of markedness reversal due to a sociolinguistic overgener-alisation took place: the relationship between marked and unmarked, betweenexception and rule was turned upside-down, so that what used to be the un-marked form, the informal thou, became an exception, while the more markedform, the formal ye/you, became the rule. A side-effect of this generalisationof you is that Standard Modern English stands out among the languages of theworld as having number distinctions in nouns and pronouns in general, but notin second person pronouns.27

25. Cf. Sonderegger (1979: 297–319) for a detailed description of this development.26. With the additional complication that for some nouns there exists regional variation

as to whether they take their plural with or without umlaut, e.g. Wagen ‘car’.27. According to Cysouw (2003: 118), this situation “is not common at all.”


3.2. The disappearance of exceptions

Given the situation in Modern English with a double-fold exception in the pro-nominal system – the lack of a number opposition in the second person pluralis unusual both from a cross-linguistic point of view and language internallysince English does otherwise encode number quite firmly – it is not surprisingthat many non-standard varieties of English ‘repair’ their systems: they createnew plural forms by morphological reinforcement: y’all, youse etc. (cf. Hickey2003). Thus, the exceptional gap in the paradigm is filled again).

In general, two potential diachronic scenarios for the gradual disappearanceof an exceptionality in a language system are conceivable: either the sub-classforming the exceptional trait loses some or all of its members, or the sub-classis strengthened, thereby creating a stronger, less exceptional sub-system of thelanguage. The basic mechanism is here that a set of exceptions exhibits a certaininternal regularity, which is significant enough to attract new members gravitat-ing to that group. Again, we illustrate these two possibilities with developmentsin the verbal system of German.

An example for the first case is the abovementioned exceptional class ofstrong verbs that is overall on the decline in German: Sound change has ob-scured its phonological basis; new verbs entering the language as loan wordsare automatically assigned to the weak class; some strong verbs, mostly the lessfrequent ones, undergo inflection class changes and lose their ablaut-formationover time, so for example in a relatively recent case with backen ‘to bake’and melken ‘to milk’.28 Thus, the relative frequency of the two sub-classes ofverbs (strong vs. weak) has reversed since the creation of the latter in Proto-Germanic.29

A phenomenon illustrating the second case is the integration of the Ger-man verb brauchen ‘need’ into the group of modal verbs. Unlike in the case ofpsych-verbs discussed above, where syntactic regulation overruled semantics,in this case, the morphosyntactic development is driven by semantic pressure.Since this development might be less well-known than some of our other ex-amples and because it is currently happening under our very eyes, we discuss itin somewhat more detail.

28. Here, the old past tense forms buk and molk have practically died out, in favour ofregular backte and melkte.

29. On a comparative note it is worth mentioning that the other Germanic languagesfollow a similar diachronic drift; in the extreme case of Afrikaans the strong-weak(i.e. irregular-regular) distinction has been lost almost completely, viz. outside theauxiliary system.


Three core grammatical properties of modal verbs in German are importantfor our understanding here. First, they are exceptional with respect to inflection:given their origin as old preterite presents, modals lack the usual final morpheme-t in the third person singular of the present tense indicative (5a), in contrast toregular verbs (5b):

(5) a. Sieshe

mussmust

/ kanncan

/ darf …may

b. Sieshe

sagtsays

/ machtmakes

/ singt …sings

Second, modals display a syntactic peculiarity in that they subcategorise infini-tive phrases without the complementiser zu ‘to’ (6a), unlike many but not allnon-modal verbs with infinitival complement (6b):

(6) a. Sieshe

mussmust

singen.sing

[modal verb]

b. Sieshe

hoffthopes

zuto

singen.sing

[non-modal verb]

Third, when used in the (periphrastic) perfect tense, modals exhibit the so-calledIPP-effect:30 basically this means that instead of an expected past participle, themodal occurs in the infinitive:

(7) a. Er3sg

hathas

singensing

müsseninf.mustinf

b. *Er3sg

hathas

singensing

gemusstp ii.mustp ii

‘He has had to sing.’

brauchen ‘need’ shares a central meaning aspect ‘modality’ with modal verbswhen used with an infinitive: apart from its use with a nominal complement(as in Sie braucht einen Regenschirm. ‘She need an umbrella.’), this verb canalso be used with an infinitival complement, in particular under negation (orin the context of a restrictive particle like nur).31 Unlike the core set of modal

30. I.e., ‘infinitivus pro participio’, also known as ‘Ersatzinfinitiv’.31. Unlike its English counterpart, the negation of müssen usually takes wide scope over

the whole sentence, not just over its complement, hence Er muss nicht singen. (‘lit.:He must not sing.’) does not mean ‘He must: not sing.’, but rather ‘Not: he mustsing.’. Negation of brauchen takes wide scope as well, while having a weaker mean-ing, along the lines of English ‘He need not sing.’ The domain of English ‘must not’


verbs, however, ‘brauchen’ does not go back to an old preterite present, andaccordingly, in compliance with its more distant origins as a normal transitiveverb, should behave as a regular verb morpho-syntactically; that is, suffix final-t in the third person singular and select an infinitive with zu. And this regularbehaviour is exactly what one finds in most instances of written language usage,as in (8):

(8) Er3sg

brauchtneeds

nichtnot

zuto

singen.sing

‘He need not sing.’

This construction, however, is presently developing into one that agrees withthe irregular modal verb pattern, as illustrated in (9): no -t and no zu:

(9) Er3sg

brauchneed

nichtnot

singen.sing

‘He need not sing.’

Moreover, in perfect tense constructions, the IPP-effect comes into force:

(10) Er3sg

hathas

nichtnot

singensing

braucheninf.need

‘He hasn’t needed to sing.’

At present, this is found predominantly in Spoken German, but it appears moreand more in written varieties as well (cf. Askedal 1997). Note that there is nophonological motivation for the loss of final -t in German, which is shown bythe fact that -t never fails to occur with, e.g., rauchen ‘to smoke’ despite thephonological near-identity of the verbs:

(11) Sie3sg

rauchtsmokes

/ *rauch.smoke

[regular verb]

‘She smokes.’

Thus, what we are witnessing at the moment with the spread of the type ErbrauchØ nicht Ø singen is the integration of a regular verb into a morpho-syntactically irregular, exceptional subsystem, based on shared semantic fea-tures: From the general point of view of the morpho-syntax of German verbs,brauchen becomes exceptional – it develops from a regular verb into one with

with narrow scope is covered by German dürfen ‘may / to be allowed to’, e.g. Erdarf nicht singen. (i.e.: ‘He must not sing. / He is not allowed to sing.’


irregular features – but from the point of view of modal verbs, brauchen be-comes regularised, being integrated into their specific, exceptional, subsystem.This development demonstrates the power of the system not only in the caseof the overall, more general system – here: verbal morpho-syntax – but also inthe case of subsystems constituted by irregular forms that present an exceptionfrom the point of view of this general system.

In sum, brauchen exemplifies the interaction of different grammatical lev-els in the development of exceptions, in this case semantically-driven morpho-syntactic integration.32

3.3. Morphology as a locus of exceptions

Exceptions typically spread unevenly over the grammatical system as a whole,i.e. not all grammatical sub-systems are equally prone to exceptionality. In par-ticular, the status and make-up of morphology as a central organisational devicein the interaction of grammar and lexicon makes it open to the development ofexceptions.

Morphology is often considered as an evolutionarily earlier domain for theconstruction of complex linguistic elements than, say, syntax (cf. Fanselow1985; Jackendoff 2002). In comparison to syntax, the interpretation of complexforms in word formation is underdetermined by their constituent structure andless driven by strict rules of syntactic-semantic cocomposition; instead, it makesmore use of contextual information. This is evident, for instance, in the case ofdetermining the semantic relation between constituents of a compound. Takeagain a German example, the nominal compound Fischfrau ‘fish-woman’. Thisword can mean ‘woman who sells fish’, ‘wife of a fish’, ‘woman whose zodiacsign is pisces’, ‘mermaid’, and a number of other things – Heringer (1984: 2)lists ten possible meanings – all we know from the make-up of the compoundis that there has to be some relation between a woman and a fish or fishes, butnot which one.33

In comparison to syntax, morphology is also less characterised by clear-cut classes with particular defining features and more often based on proto-patterns that form the basis for classes that are driven by associations. This canoften lead to deviations from general patterns and the formation of exceptional

32. The import of semantics on the development of this particular domain of Germanmorphology is further shown by the following fact: Old High German had a few moreverbs which behaved morphologically as preterite presents (e.g. turran ‘to dare’);among those verbs only the ones that belonged to the semantic sub-class of modalshave survived into the present form of the language.

33. Similarly, note in English the difference between a pork butcher and a family butcher.


subsystems. An example coming from inflectional morphology is the formationof tense forms of irregular verbs in English (cf. Jackendoff 2002: ch. 6) andGerman (cf. Beedham 2005).

Since complex morphological constructions are often semantically under-determined, the formation of such patterns can be based on aspects of meaningof the elements involved. This holds for semantic as well as pragmatic aspects.The example of brauchen ‘need’ above illustrated a case where the develop-ment into an inflectionally irregular verb is driven by the semantic affiliationwith elements of a morpho-syntactically exceptional subsystem. An interest-ing example from morphopragmatics comes from diminutives in English andGerman (cf. H. Wiese 2006).

The diminutive affixes -chen and -i in Contemporary German and similarly-ish in English exhibit some exceptional, erratic behaviour from the morpho-syntactic point of view, although they present a unified picture on the morpho-pragmatic side. On the morpho-syntactic level, no clear classification of diminu-tive suffixes as heads or modifiers is possible. They act as prototypical heads(not just relativised heads in the sense of Di Sciullo and Williams 1987) withsome stems, while with other stems, they behave like prototypical modifiers.

In (12), English -ish and German diminutives -chen and -i behave as ad-jectival or nominal heads, respectively, with adjectives, nouns, quantifiers, andverbs as a basis:

(12a) [yellow]Aish]A, [child]Nish]A, [fifty]Qish]A

(12b) [Hünd]Nchen]N / [Hund]Ni]N ‘dog-dim, i.e. doggy’, [Lieb]Achen]N

‘dear-dim, i.e. dearie’, [Schnäpp]Vchen]N ‘grab-dim, i.e. bargain’

However, in (13), German diminutive suffixes behave as prototypical modifierswith particles as a basis, in particular with greeting particles (GP) and answerparticles (INT) in informal speech:

(13) [Tschüss]GPchen]GP / [Tschüss]GPi]GP ‘bye-dim’,[OK]GPchen]GP ‘ok-dim’, [OK]INTchen]INT ‘OK-dim’, [Jau]INTi]INT ‘yes-dim’

And, likewise in informal contexts, English -ish can be used as a modifier, albeitas one that is even more of an outlier from a morphosyntactic point of view: itcan be used not only with a morphological stem, but also with a syntacticallycomplex phrase, thus neglecting a crucial syntactic distinction:

(14) a. Nikki and I woke up at quarter-to-eight-ish.[data from internet forum:http://www.exposedbrain.com/archives/000301.html; 4/5/2005]


b. Breakfast: 8am – 2pm ish[menu of Little Deb’s Café, Provincetown, MA, 2000]

This makes these diminutives highly exceptional suffixes from the morpho-syntactic point of view. However, their erratic behaviour turns out to be moresystematic when viewed from a morphopragmatic perspective: on the prag-matic level, diminutives contribute the notions of ‘informality’ or ‘intimacy’(cf. Dressler and Merlini Barbaresi 1994), and it is this expressive componentthat the morphosyntactically exceptional distribution of –chen and –i in (13) andof –ish in (14) draws on. Hence, the possibility of directly involving pragmaticaspects in morphology can lead to the establishment of morpho-syntacticallyexceptional subsystems: in this case, the hybrid syntactic status of diminutivesuffixes inbetween head and modifier.

Strangely, there are also cases where morphology itself seems to be thesource of exceptional behaviour at the higher syntactic level – a phenomenonwhich runs counter to the otherwise well-established, though not universally ac-cepted34 principle that syntax cannot ‘see’ the internal word-formational make-up of the lexical items it deals with, known as the ‘Lexical Integrity Hypothe-sis’ (Di Sciullo and Williams 1987: 48). Perhaps the best example for this is theerratic behaviour of certain complex verbs in German that fail to appear in V2-position (that is, in the standard position for verbs in assertative main clauses)(16a, 16b), but are perfectly fine at the end of a clause (the position of verbs insubordinate clauses) (16c):

(16) a. *Dasthe

Flugzeugplane

not-landetemergency-lands

inin

Paris.Paris

‘The plane makes an emergency landing in Paris.’b. *Das

theFlugzeugplane

landetlands

inin

ParisParis

not.emergency

‘The plane makes an emergency landing in Paris.’c. …, weil

becausedasthe

Flugzeugplane

inin

ParisParis

not-landet.emergency-lands

‘because the plane makes an emergency landing in Paris.’

The verbs concerned are word formation products in some way or other (fromback formation, conversion, incorporation or double-prefixation). So, what wesee here is a syntatic exception that is governed by the morphological make-upof its constituent parts. But this alone cannot be sufficient; other factors such

34. Cf. Spencer and Zwicky (1998: 4–6).


as potential analogy to particle verbs seem to play a role as well, cf. the muchbetter acceptability of (17):35

(17) Dasthe

Flugzeugplane

landetlands

inin

ParisParis

zwischen.between

The plane makes a stop-over landing in Paris.’

In sum, morphology appears to be a prime locus for exceptions. It is the cen-tral part of the grammatical system, determined by and partially determiningexceptionality in grammatical structure.

4. The significance of exceptions – what this book has to offer

As we have seen, the study of exceptions is relevant for linguistic theory on asubstantial level. In linguistics, like in all areas of science, the pursuit of scien-tific knowledge implies the creation of abstractions, which are then formalisedin rules, or constraints etc. This will, as a matter of principle, lead to a potentialfor exceptions at all levels involved, i.e. on all grammatical levels – phonology,morphology, syntax, semantics – as well as in their interaction with each otherand with pragmatics and other extragrammatical areas. Even if the researchertakes it as a methodological principle that exceptions must not be postulatedunless absolutely necessary, there are many cases when deviant facts cannotbe accomodated in a simple and elegant model.36 Such a challenge generates arange of approaches and can lead to new insights into the nature of the linguisticsystem and its (internal and external) interfaces.

The analysis of exceptions can be instructive in at least two respects. Firstly,from a methodological point of view, the treatment of exceptions will highlightdifferent ways of dealing with empirical data, each leading to a different statusof the concept of ‘rule’ in the respective theory. Secondly, from the point ofview of the linguistic system, exeptions show us what kind of system languageis: an arrangement of interlocking structures, each of them more or less flexible,always in flux such that variation and change are possible.

In the present book, we have collected studies that tackle the problem ofexceptions from a number of different angles. Most papers (and the commen-

35. Contrary to what some studies suggest, it is still not clear what exactly it is thatdetermines the status of a given lexical item as a non-V2-verb; cf. Freywald andSimon (2007) for a brief overview and some empirical investigations.

36. Maybe this holds even more for linguistics than for other areas of science, given thecurious duality of language as both biologically and culturally determined.


taries we have invited on them) focus on syntactic phenomena, but there arealso discussions of morphology and of phonology as well as of languages asmacro-structures.

The introduction of this book consists of two parts. The present introductorychapter is complemented by a paper by Edith Moravcsik, who surveys possibleapproaches to the problems exceptions pose, with a focus on syntactic theory;her taxonomy of exceptionality in language and how linguists cope with it canserve as a basis for all further discussion of the subject.

The main body of the book is then divided into four parts. The papers inthe first part take a closer look at the area where exceptions are traditionallytaken to be stored: the lexicon. This is the designated location of word-basedexceptionality in a language,37 comprising morphological as well as phono-logical idiosyncrasies. The papers in the second part discuss the interrelationof grammatical subsystems, in particular syntax and semantics, but also syn-tax and extra-grammatical aspects such as processing. The third part is dedi-cated to a common method to accommodate exceptions: relaxing the system-constituting elements of grammatical structure, be they conceptualised as e.g.rules or as constraints. The fourth and final part provides a statistically in-formed consideration of wholesale exceptionality (or unusualness) of languagesas such.

In whole, the papers in this book (and the respective comments and re-sponses) offer a multi-faceted body of work on the significance of exceptionsfor linguistic theory. They show the potential of different approaches to capturegrammatical exceptions; and they demonstrate how the study of exceptions canbe productive for the development of new grammatical models and new per-spectives on grammatical systems. This is true for a number of controversialclaims in current linguistic research:

First, there is more to the systematic study of language than just grammar:aspects of linguistic structure interact with external systems, and thus excep-tions can be explained. For instance, morphologically exceptional (i.e. irregu-lar) structures emerge because processing pressures – such as the productionneed to be phonologically brief – act on the demand to produce informationallydistinct forms, as discussed by Damaris Nübling in her contribution on Ger-manic verbal morphology. Similarly, Frederick J. Newmeyer invokes parsingstrategies in his explanation of cross-linguistic variation and exceptional pat-terns therein.

Second, within the system, grammatical subcomponents can interact in sucha way as to enhance the stability of exceptions, which then resist regularization:

37. And some linguists would maintain that all exceptionality is word-based.


e.g. a certain class of oblique subjects in Icelandic and Faroese was reinforcedby its semantic coherence and has thus survived into the present systems, as isargued by Jóhannes Gísli Jónsson & Thórhallur Eythórsson in their contribu-tion. A grammar-internal view on exceptionality can also lead to the adoption ofsofter models of grammar, which can incorporate seemingly exceptional casesas instances of less central structures. As Sam Featherston demonstrates, sucha way of thinking is well-qualified to tackle the problem of grammatical gra-dience – the fact that there are grey zones of more or less severe awkwardnessbetween ‘fully acceptable’ and ‘inacceptable’ structures.

Several contributions in this book are apt to challenge our traditional viewsof the notion of exception as they discover new kinds of exceptions. Thus,Thomas Wasow, T. Florian Jaeger and David M. Orr identify exceptions inlanguage use that one notices when taking into account quantitative data; again,these exceptions can be accounted for in terms of processing and other extra-grammatical factors. Ralf Vogel, by contrast, invokes a population-based no-tion of ‘exception’: he uncovers different tolerance levels on the part of nativespeakers of German with regard to contradictory case-information in free rela-tive clauses; from such a perspective, exceptions exist in the minds of certainspeakers, but not others. Greville G. Corbett discerns a kind of (higher-order)hyper-exception that occurs when different types of exception come togetherand interact with each other; he thus underlines their great importance – espe-cially those linguistic structures that are extraordinarily rare – for the under-standing of what is possible in human language. Frederik Fouvry, in turn, takesa broad view of exceptions: while traditional approaches in computational lin-guistics tend to treat both production errors and linguistic exceptions as extra-grammatical structures that are neither covered nor possible to cover, by thegrammar formalism, the alternative apparatus he proposes captures all types ofdeviance from the expected data: exceptional but acceptable idiosyncrasies ofa data set and mere errors are both encompassed within one arrangement ofconstraints that can be relaxed as needed to accommodate them.

A more conservative line of reasoning is followed by Barıs Kabak and IreneVogel. Concentrating on patterns of vowel harmony and stress assignment inTurkish, they maintain that it is not possible to determine an externally moti-vated sub-class of exceptional words, such as loan words or names; thereforeemploying a component of lexical prespecification in the grammar is unavoid-able, which basically reverts to the traditional idea that every word must betreated on its own terms.

Finally, Michael Cysouw has a different focus in his contribution: instead oflooking at instances of exceptional grammar in individual languages, he zoomsout and takes on a macro-perspective by looking at patterns of co-ocurrence of


rare, exceptional traits in a multitude of languages: according to his findings,there are some clusters of linguistic exceptionality, or unusualness, in certainareas of the world, among them North-Western Europe, whose languages formthe basis of most of the theorising in contemporary linguistics.

Taken together, the contributions to this volume explore a range of new av-enues to an understanding of exceptions: they probe deeper into the analysis ofalready established grammatical exceptions, they re-define and develop furtherthe notion of exceptionality, and they invoke a variety of concepts to describethe formation of exceptions and to explain their existence in grammatical sys-tems. While they are understood to be rare and thus in need of special efforts tobe grasped, exceptions are expected in the various models utilized – either be-cause of some grammar-internal competition or because of extra-grammaticalfactors bearing on grammar proper. Needless to say, because of the exception-ally complex phenomenon of exceptions, it can be expected that not every lin-guist will agree with the analyses and models offered. But in any case, we expectexceptions to keep fascinating linguists who are keen to understand the work-ings of language.

‘Il serait absurde de dire que l’exception est mieux traitéedans une perspective que dans l’autre.’

(Danjou-Flaux and Fichez-Vallez 1985: 116)

Abbreviationsa adjectiveacc accusativedim diminutivegp greeting particleinf infinitive

int interjectionn nounnom nominativep ii 2nd participlesg singular

References

Aarts, Bas, David Denison, Evelien Keizer, and Gergana Popova2004 Fuzzy Grammar. A Reader. Oxford: Oxford University Press.

Askedal, John Ole1997 brauchen mit Infinitiv. Aspekte der Auxiliarisierung. Jahrbuch der

ungarischen Germanistik 1997: 53–68.

Beedham, Christopher2005 Language and Meaning. The Structural Creation of Reality. Ams-

terdam/Philadephia: Benjamins (Studies in Functional and StructuralLinguistics 55).

Chomsky, Noam1965 Aspects of the Theory of Syntax. Cambridge (MA): MIT Press.


Clahsen, Harald, Monika Rothweiler, Andreas Woest, and Gary F. Marcus1992 Regular and irregular inflection in the acquisition of German noun

plurals. Cognition 45: 225–255.

Collinge, N.E.1985 The Laws of Indo-European. Amsterdam/Philadelphia: Benjamins.

Cysouw, Michael2003 The Paradigmatic Structure of Person Marking. Oxford: Oxford Uni-

versity Press.

Danjou-Flaux, Nelly, and Élisabeth Fichez-Vallez1985 Linguistique taxonomique et grammaire générative. Le traitement de

l’exception. Langue Française 66: 99–116.

Di Sciullo, Anna Maria, and Edwin Williams1987 On theDefinition ofWord. Cambridge (MA)/London: MIT Press (Lin-

guistic Inquiry Monographs 14).

Dressler, Wolfgang U., and Lavinia Merlini Barbaresi1994 Morphopragmatics. Diminutives and Intensifiers in Italian, German,

and Other Languages. Berlin/New York (Trends in Linguistics. Stud-ies and Monographs 76).

Dryer, Matthew1998 Why statistical universals are better than absolute universals. Chicago

Linguistic Society 33, 123–145.

Duden1995 Grammatik der deutschen Gegenwartssprache, Günther Drosdowski

(ed.). 5th edition. Mannheim: Dudenverlag (Duden 4).

Duden2005 Die Grammatik, Dudenredaktion (ed.). 7th edition. Mannheim:

Dudenverlag (Duden 4).

Fanselow, Gisbert1985 Die Stellung der Wortbildung im System kognitiver Module. Linguis-

tische Berichte 96: 91–126.

Freywald, Ulrike, and Horst J. Simon2007 Wenn die Wortbildung die Syntax stört: Über Verben, die nicht in

V2 stehen können. Verbale Wortbildung im Spannungsfeld zwischenWortsemantik, Syntax und Rechtschreibung, Maurice Kauffer andRené Métrich (eds.), 181–194. Tübingen: Stauffenburg (Eurogerman-istik 26).

Gehling, Thomas2006 Die Suche nach dem anderen ihr. Zur Inklusiv-Exklusiv-Distinktion

in der Zweiten Person. Einblicke in Sprache, Festschrift für Clemens-Peter Herbermann zum 65. Geburtstag, Thomas Gehling, Viola Vossand Jan Wohlgemuth (eds.), 153–180. Berlin: Logos.


Hagège, Claude2005 Le défi de la langue ou la souillure de l’Exception. Faits de langues

25, 53–60 (special issue: L’exception entre les théories linguistiqueset l’expérience. Irina Vilkou-Poustovaïa (ed.)).

Harley, Heidi, and Elizabeth Ritter2002 Person and number in pronouns. A feature-geometric analysis. Lan-

guage 78: 482–526.

Heringer, Hans-Jürgen1984 Wortbildung: Sinn aus dem Chaos. Deutsche Sprache 12: 1–13.

Hickey, Raymond2003 Rectifying a standard deficiency: Second-person pronominal distinc-

tion in varieties of English.Diachronic Perspectives on Address TermSystems, Irma Taavitsainen and Andreas H. Jucker (eds.), 345–374.Amsterdam/Philadephia: Benjamins (Pragmatics and Beyond N.S.107).

Hoenigswald, Henry M.1978 The Annus Mirabilis 1876 and posterity. Transactions of the Philo-

logical Society 76: 17–35.

Hurford, James R.1975 The Linguistic Theory of Numerals. Cambridge: Cambridge Univer-

sity Press (Cambridge Studies in Linguistics 16).

Jackendoff, Ray S.2002 Foundations of Language. Oxford: Oxford University Press.

Janda, Richard D.1991 Frequency, markedness, and morphological change. On predicting

the spread of noun-plural -s in Modern High German and West Ger-manic. Proceedings of the 7th Eastern States Conference on Linguis-tics, Yongkyoon No and Mark Libucha (eds.), 136–153. Columbus(OH): Ohio State University.

Jankowsky, Kurt R.1972 The Neogrammarians. A Re-Evaluation of Their Place in the Devel-

opment of Linguistic Science. The Hague/Paris: Mouton.

Labov, Wiliam1972 Language in the Inner City. Studies in the Black English Vernacular.

Philadelphia: University of Pennsylvania Press.

Labov, Wiliam1981 Resolving the Neogrammarian controversy. Language 57: 267–308.

Lass, Roger1997 Historical Linguistics and LanguageChange. Cambridge: Cambridge

University Press (Cambridge Studies in Linguistics 81).


Lehmann, Winfred P.1993 Theoretical Bases of Indo-European Linguistics. London: Routledge.

Marcus, Gary F., Ursula Brinkmann, Harald Clahsen, and Richard Wiese1995 German inflection: The exception that proves the rule. Cognitive Psy-

chology 29: 189–256.

Mailhammer, Robert2007 Islands of resilience. The history of German strong verbs from a sys-

tematic point of view. Morphology 17: 77–108.

McMahon, April M.S.1994 Understanding Language Change. Cambridge: Cambridge Univer-

sity Press.

Newmeyer, Frederick J.2005 Possible and Probable Languages. A Generative Perspective on Lin-

guistic Typology. Oxford: Oxford University Press.

Nübling, Damaris, Antje Dammel, Janet Duke, and Renata Szczepaniak2006 Historische Sprachwissenschaft des Deutschen. Eine Einführung in

die Prinzipien des Sprachwandels. Tübingen: Narr.

de Oliveira, Marco Antonio1991 The neogrammarian controversy revisited. International Journal of

the Sociology of Language 89: 93–105.

Osthoff, Hermann, and Karl Brugmann1878 Morphologische Untersuchungen auf dem Gebiete der indogerma-

nischen Sprachen. Erster Theil. Leipzig: Hirzel.

Pinker, Steven1999 Words and Rules. The Ingredients of Language. New York: Basic

Books.

Ritt, Nikolaus2004 Selfish Sounds and Linguistic Evolution. A Darwinian Approach to

Language Change. Cambridge: Cambridge University Press.

Simon, Horst J.2005 Only you? Philological investigations into the alleged inclusive-

exclusive distinction in the second person plural. Clusivity. Typol-ogy and Case Studies of the Inclusive-Exclusive Distinction, Elena Fi-limonova (ed.), 113–150. Amsterdam/Philadelphia: Benjamins (Ty-pological Studies in Language 63).

Sonderegger, Stefan1979 Grundzüge deutscher Sprachgeschichte. Diachronie des Sprachsys-

tems. Berlin/New York: de Gruyter.

Spencer, Andrew, and Arnold M. Zwicky1998 Introduction. The Handbook of Morphology, Andrew Spencer and

Arnold M. Zwicky (eds.), 1–10. Oxford/Malden (MA): Blackwell.


Tsohatzidis, Savas L. (ed.)1990 Meanings and Prototypes. Studies in Linguistic Categorization. Lon-

don: Routledge.

Twain, Mark1907 The awful German language. In A Tramp Abroad, Mark Twain, 267–

284. New York: P.F. Collier & Son [originally: 1880].

Udolph, Jürgen1989 Verners Gesetz im heutigen Deutsch.Zeitschrift für Dialektologie und

Linguistik 56: 156–170.

Verner, Karl1877 Eine ausnahme der ersten lautverschiebung. Zeitschrift für verglei-

chende Sprachforschung 23: 97–130.

Wiese, Heike2003 Sprachliche Arbitrarität als Schnittstellenphänomen. Habilitation the-

sis. Humboldt-University Berlin.

Wiese, Heike2006 Partikeldiminuierung im Deutschen. Sprachwissenschaft 31: 457–

489.

Wiese, Richard1996 The Phonology of German. Oxford: Oxford University Press.

Wilbur, Terence H. (ed.)1977 The Lautgesetz-Controversy. A Documentation. Amsterdam/Phila-

delphia: Benjamins (Amsterdam Studies in the Theory and Historyof Linguistic Science, Series 1, 9).

Wittgenstein, Ludwig1953 Philosophical Investigations. Transl. by G.E.M. Anscombe. Oxford:

Blackwell.

Wohlgemuth, Jan, and Michael Cysouw (eds.)2010a Rethinking Universals. How Rarities Affect Linguistic Theory. Berlin/

New York: Mouton de Gruyter (Empirical Approaches to LanguageTypology 45).

Wohlgemuth, Jan, and Michael Cysouw (eds.)2010b Rara & Rarissima. Documenting the Fringes of Linguistic Diversity.

Berlin/New York: Mouton de Gruyter (Empirical Approaches to Lan-guage Typology 46).

Coming to grips with exceptions

Edith Moravcsik

Abstract. Based on a general definition of the concept of exception, the problematicnature of exceptions is made explicit by showing how they weaken the generality ofdescriptions: they disrupt a superclass without forming a principled subclass. Focus-ing on examples from syntax, three approaches to dealing with exceptions are iden-tified.

1. Why are exceptions a problem?

1.1. Defining exceptions

Typical exceptions are a small subclass of a class where this subclass is not oth-erwise definable. What this means is that apart from their deviant characteristicthat renders them exceptional, there is no additional property that distinguishesthem from the regular cases. Given also that the exceptional subclass has gener-ally much fewer members than the regular one, exceptions can be characterizedas a subclass of a class that is weak both quantitatively (fewer members) andqualitatively (only a single distinguishing characteristic).

The description of an exception must include five components:

– the pertinent domain;– the class within which the items in question are exceptional, which we will

call superordinate class (or superclass for short);– the regular subclass and the irregular subclass;– the characteristic in which the two subclasses differ; and– the relative size of the two subclasses.

This is shown in (1) on the example of English nominal plurals, where RSClabels the regular subclass and ESC is the label for the exceptional one1.

1. A large inventory of lexical exceptions in English is cited and their exceptionalityrelative to transformational rules discussed in Lakoff (1970: 14–21, 30–43 et pas-sim).

32 Edith Moravcsik

(1) – domain: English– superordinate class: plural nouns– subclasses: RSC: apples, cats, pencils, etc.

ESC: oxen, children, brethren– distinguishing property: plural suffix is {s} versus /∂ n/– relative size of membership: RSC > ESC

Three components of the schema call for comments. Starting with domain:a structure may be exceptional within a language, a dialect of a language, a lan-guage family, a language area, or across languages. M. Cysouw’s paper in thisvolume is a study of crosslinguistic exceptionality and so is part of S. Feather-ston’s article.2 It is important to indicate the domain within which an excep-tion holds because exceptionality is relative to it. First, what is an exceptionalstructure in one language may not be exceptional in another. An example is themorphosyntactic alignment of subjects of one-place predicates with patient-likearguments of two-place predicates: this is regular in ergative languages but ex-ceptional in accusative languages. Second, language-internal and crosslinguis-tic exceptionality do not necessarily coincide. For example, click sounds arevery numerous in Zulu but very rare across languages; and passive construc-tions are infrequent in Kirghiz, but frequent across languages.

A second set of comments has to do with the distinguishing property of theexceptional class. Several papers in this volume emphasize the unique nature ofexceptions. B. Kabak and I. Vogel are very explicit about this point as they an-alyze Turkish vowel harmony and stress assignment and argue for the need forlexical pre-specification of the irregular items as both necessary and sufficientfor an adequate account. J.G. Jónsson and Th. Eythórsson also emphasize thattruly exceptional structures have no correlating properties. They show genitiveobjects in Icelandic to be clearly exceptional by this criterion, as opposed toaccusative subjects, which show subregularities.

As two of the papers in the volume show, items may differ from the reg-ular class in more than one characteristic. G. Corbett discusses lexemes thatshow higher-order exceptionality by multiply violating normal morphologicalpatterns. Utilizing the WALS database, M. Cysouw computes rarity indices forlanguages and language areas and shows that they may be multiply exceptionalto varying degrees. Paradoxically, exceptions that differ from the regular sub-

2. For a rich collection of crosslinguistically rare grammatical constructions, seethe Grammatisches Raritätenkabinett at http://lang.uni-konstanz.de/pages/proj/sprachbau.htm. On the inherent difficulties of establishing a grammatical structureas crosslinguistically rare, see Cysouw (2005).

Coming to grips with exceptions 33

class in more than one way are less exceptional by our definition since eachexceptional property finds its correlates in the other deviant characteristics.

Lexical items may be exceptional not by structurally deviating from oth-ers but by exhibiting skewed, rather than balanced, frequency patterns of theiralternative forms. For example, the passive form of the English verb convictoccurs with unusual frequency relative to the passive of other verbs. Such “softexceptions” are in the focus of Th. Wasow, F. Jaeger, and D. Orr’s paper (thisvolume) as they explore correlates for the omission of the conjunction that inEnglish relative clauses.

The third comment pertains to relative size. Note that having fewer membersis a necessary but not sufficient characteristic of an exceptional subclass. Thatit is necessary can be shown by the example in (1): without reserving the label“exception” for the smaller subclass, English nouns whose plural is formed with{s} would qualify for being the exceptions even though intuitively we do notto consider them exceptional.

But being a small subclass is not sufficient for exceptionality. For example,of the English verbs whose past tense form ends in {d}, relatively few employthe allomorph /∂ d/. But this subclass of verbs is not exceptional because themembers have a phonological property in common that defines them as a prin-cipled, rather than random, class.

An apparent counterexample to the regular class having more members thanthe exceptional class(es) is nominal plural marking in German. There are fiveplural markers: -0, -e, -(e)n, and -s; which – if any – should be considered theregular one? Although most nouns of the German lexicon take -(e)n, Clahsen,Rothweiler, and Woest (1992) argue convincingly that -s is actually the de-fault form: it is the only productive one, used with names (e.g. die Bäckers) andwith newly-minted words such as clippings (e.g. Loks for Lokomotiven) or loanwords (e.g. Kiosks). Given that relatively few existing nouns are pluralized with-s, declaring this form to be the regular ending would seem to conflict with thegeneral pattern of the regular class having a larger membership than the excep-tional ones. However, there is in fact no conflict: the very fact that -s is produc-tive expands indefinitely the class of nouns that take it as their plural suffix.

1.2. Two problems with exceptions

Why are exceptions a problem? The short answer is that they fly in the faceof generalizations. This is so due to two aspects of their definition. First, bytoken of the very fact that they form a subclass of a class, they conflict with ageneralization that would otherwise hold for the entire superordinate class.

This problem so far is not specific to exceptions: it is posed by all instancesof subclassification. Subclasses, by definition, compromise the homogeneity of

34 Edith Moravcsik

a superclass. But as long as the subclasses have at least one characteristic otherthan the one that the split is based on, the loss of the supergeneralization iscompensated for by a sub-generalization that describes the subclasses.

For an example of regular subclasses, let us consider those English nounsthat form their plural with the suffix {s}. This is not an undivided class in that theparticular shape of the suffix is variable: -/s/, -/z/, and -/∂ z/. However, each sub-class is definable by phonological context: /∂ z/ after alveolar and palatal frica-tives and affricates, /s/ after other voiceless sounds and /z/ after other voicedsounds. Thus, none are exceptions.

Exceptional subclasses are different from normal subclasses of this sort be-cause they have no additional characteristics to independently identify them.This is the second reason why exceptions pose a problem: they do not onlyscuttle a generalization that would otherwise hold for the entire superordinateclass but they do not allow for a generalization about their subclass, either. Thefact that exceptions have much fewer members than their sister-classes com-pounds the problem: their sporadicity suggests that correlating properties maynot exist at all: they may be random chance phenomena.3

All in all: exceptions disrupt supergeneralizations without supporting sub-generalizations. In the case of English noun plurals, the two generalizations thatthe exceptions disallow are given in (2).

(2) a. supergeneralization lost:**All English nouns form their plural with {s}.

b. subgeneralization not possible:**All those English nouns that form their plural with /∂ n/ haveproperty P.

The two problems posed by exceptions can be similarly illustrated with a cross-linguistic example: phoneme inventories that lack nasal consonant phonemes.

(3) – domain: a sample of languages– superordinate class: consonant phoneme inventories– subclasses: RSC: consonant phoneme inventories of English, Irish,

Amharic, etc.

3. Regarding crosslinguistic exceptionality, compare Haiman (1974: 341): “If a wordexhibits polysemy in one language, one may be inclined, or forced, to dismiss its var-ious meanings as coincidental; if a corresponding word in another language exhibitsthe same, or closely parallel polysemy, it becomes an extremely interesting coinci-dence; if it displays the same polysemy in four, five, or seven genetically unrelatedlanguages, by statistical law it ceases to be a coincidence at all.”


– subclasses: ESC: consonant phoneme inventories of Quileute, PugetSound, Duwamish, Snoqualmie, Mura, Rotokas

– distinguishing property: presence versus absence of nasal consonantphonemes

– relative membership: RSC > ESC

The two generalizations disabled by the exceptional consonant phoneme inven-tories are as follows:

(4) a. supergeneralization lost:**All consonant phoneme inventories of languages include nasalconsonant phonemes.

b. subgeneralization not possible:**All those languages that lack nasal consonant phonemes haveproperty P.4

The lesser number of nasal-less languages suggests once again that their occur-rence is for no reason: it may be an accident.

How are the twin problems posed by exceptions responded to in linguisticanalysis? The purpose of this paper is to address this question by surveying thevarious ways in which exceptions have been dealt with in syntax. The alter-natives fall into three basic types. First, many descriptive frameworks repre-sent exceptional structures as both exceptional and non-exceptional. What thismeans is that the representation of the exceptional structure is split into twoparts: one shows it to be exceptional but the other part draws it into the regu-lar class. Second, there are proposals for regularizing exceptions: re-analyzingthem so that they turn out to be fully unexceptional. And third, some accountsacknowledge exceptions as such and try to explain why they are exceptional.

The three options of accommodating, regularizing, and explaining excep-tions will be discussed in the next three sections in turn.

2. Accommodating exceptions in syntax

Let us consider ways of representing syntactic exceptions as hybrid structures,part exceptional and part regular. The idea is similar to psychiatrists ascribing

4. Note that the class of languages that have no nasal consonant phonemes is not definedeither by genetic or by areal relationship: while Quileute (Chimaukan) and the Salishlanguages: Puget Sound, Duwamish, and Snoqualmie, are geographically close, Mirais spoken in Brazil and Rotokas in New Guinea. For some Niger-Congo languageswithout nasal consonant phonemes, see Bole-Richard (1985).

36 Edith Moravcsik

deviant behavioral traits of people to a separate persona coexisting with thenormal personality. Four such approaches may be identified in the literature:

– two faces of a single representation– two strata in a single representation– separate representations in a single component– separate representations in separate components

We will take a closer look at each.

2.1. Two faces of a single representation

In this type of account, exceptional and non-exceptional characteristics of aconstruction are represented on opposite sides of the same structural diagram.An example is Katalin É. Kiss’s transformational generative account of long-distance agreement in Hungarian (É. Kiss 1987: 226–243).

In Hungarian, the verb agrees with both its subject and its direct object.Person agreement with the object is illustrated in (5).

(5) a. ÉnI

szeretnémwould.like-S1SUBJ.3OBJ

öt.him

‘I would like him.’

b. ÉnI

szeretné-lekwould.like-S1SUBJ.2OBJ

téged.youS

‘I would like youS.’

However, verb agreement in sentences such as (6) is unexpected.

(6) a. ÉnI

szeretnémwould.like-S1SUBJ.3OBJ

látnito:see

öt.him

‘I would like to see him.’

b. ÉnI

szeretné-lekwould:like-S1SUBJ.2OBJ

lát-nito:see

téged.youS.ACC

‘I would like to see youS.’

The problem is that the verb in the main clause – ‘would like’ – has a suffixselected by the direct object of the subordinate clause ‘you’ rather than by itsown object, which would be the entire subordinate clause, as in (7).5

5. The verb-agreement pattern in Hungarian is actually more complex than shown bythese examples but the details are not relevant here.


(7) ÉnI

szeretném,would-like-S1SUBJ.3OBJ

haif

láthatnálak.I.could.see.you

(6) is an exception since agreement in general is local: controller and target areclause-mates. Because of the type of agreement shown in (6), the supergeneral-ization according to which all agreement is local is lost and no subgeneralizationis apparent holding for cases such as (6) where agreement is not local.

One might try to define the subclass of structures that exhibit this kind oflong-distance agreement by the schema main verb + infinitive complement. Ifthis definition were successful, the structures would form a regular, rather thanexceptional, subclass since they would have a common denominator other thanshowing long-distance agreement. However, not all verb + infinitive construc-tions show this kind of agreement: transitive verbs (such as ‘want’ or ‘try’) andsome intransitive ones (such as ‘strive’) do but other intransitive ones (e.g. ‘beready’) do not (cf. É. Kiss 1987: 227–229, 2002: 203–205).

Exceptionality would be eliminated if we could analyze the entire sentencesin (6) as a single clause because then agreement controller and agreement targetwould be clause-mates. There is indeed some evidence indicating the mono-clausality of the sentence even apart from agreement. For example, if the sub-ordinate object ‘youS’ is to be focused it may occur in immediately pre-verbalposition relative to the main verb. However, there is also evidence that this sen-tence is bi-clausal: the subordinate object ‘youS’ may also be focused by beingplaced in front of the subordinate verb.

Thus, sentences like (6) are exceptional when considered as biclausal struc-tures but regular when considered as monoclausal. Since there are arguments forboth analyses, É. Kiss concludes as follows (1987: 237, 239; emphasis added):“It appears that the monoclausal and biclausal properties are equally weighty;neither can be ignored or explained away. What is more, they are simultane-ously present; consequently, the biclausal structure and monoclausal structurethat can be associated with [this construction] cannot represent two subsequentstages of the derivation, but must hold simultaneously …”

Here is a simplified version of the representation suggested by É. Kiss (1987:238).

38 Edith Moravcsik

(8)

NP

Inf

szeretnélek én látni téged

NP Inf

The sentence is shown as both exceptional and not exceptional depending onwhich face of the tree we consider. The top face represents the biclausal, ex-ceptional structure: agreement controller and agreement target are in separateclauses. The bottom face in turn is monoclausal rendering the agreement con-figuration regular, with controller and target situated in the same clause. Thus,the supergeneralization according to which agreement is local is denied by thetop half of the tree but it is saved with respect to the bottom half.

2.2. Two strata in a single representation

In É. Kiss’s account, the exceptional and non-exceptional “personalities” of theconstruction are co-present at a single stage of the grammatical derivation. Inother types of accounts, the regular and irregular facets of the construction areseparated by derivational distance.

For an example, let us first consider the analysis of passives in RelationalGrammar. In this framework, passive sentences are viewed as exceptional rela-tive to actives. The example sentence skeletally represented in (9) is The womanwas eaten by the crocodile (Blake 1990: 1–2). (P stands for predicate, 1 standsfor subject, 2 stands for direct object, Cho (“chomeur”) stands for the demotedsubject: the by-phrase.)

(9)

P 1 2

P Cho 1

eat crocodile woman


The structural representation shows the sentence as having the passive structureon the final (lower) stratum but it has the active – i.e., regular – structure onthe initial (upper) stratum. The passive structure is derived by a grammatical-relations-changing rule: advancement of the initial object and demotion of theinitial subject.

The supergeneralizations that are lost due to the existence of passives are thealignment of the more active participant of a two-place predicate with the gram-matical subject and the alignment of the less active participant with the object.There is also no subgeneralization that would render the alternative, passive-type alignment predictable. The label “passive” would not provide an indepen-dent characterization of this subclass since it is simply a label for the exceptionalstructure. The Relational Grammar account restores the supergeneralization inthat it holds for the initial stratum of passive sentence representations, althoughnot for the final one.

The derivational distance between the irregular and regular facets of the sen-tence is more pronounced when they are represented as two separate tree struc-tures. This will be illustrated next.

2.3. Separate representations in a single component

This approach to exceptions is familiar from various versions of Transforma-tional Generative Grammar: exceptional structures are represented by two ormore trees within the syntactic component of the grammar connected by trans-formational rules. We will look at two examples, one involving a movementrule, the other, raising.

The first example has to do with verb-object-particle order in English. Giventhe generalization that components of a lexical entry must be adjacent, and giventhat the verb and the particle – e.g. wipe and off – form a single lexical item,the prediction is that the verb will be immediately followed by the particle, asis the case in Megan wiped off the table.

However, this prediction is not always valid since Megan wiped the tableoff is also grammatical. The verb & object & particle order thus contradicts thesupergeneralization about components of lexical items being adjacent and, inthe absence of some condition under which the exceptional order occurs, thereis no subgeneralization possible, either. The descriptor “particle verb” wouldnot define the class independently of the deviant order since this label is basedon the separability of the two elements.

In some versions of Transformational Grammar, sentences where verb andparticle are not adjacent are shown as having the regular order on the underlyinglevel with the particle directly following the verb, while the exceptional order

40 Edith Moravcsik

is shown on the surface level (see for example Jacobs and Rosenbaum 1968:100–106).

(10) – underlying structure: Megan wiped off the table.⇓

– surface structure: Megan wiped the table off.

Thus, the supergeneralization is restored with respect to the underlying struc-ture, with exceptionality relegated to surface structure.

A second example of this approach to exceptions is the analysis of long-distance agreement in languages such as Imbabura Quechua proposed by MariaPolinsky (2003). This case is similar to the one seen in Hungarian: the verbof the main clause shows agreement with the object of the subordinate clauserather than with its own object, which would be the entire subordinate clause.(11) is an example (NMLS stands for ‘nominalizer’) (Polinsky 2003: 292).

(11) JoseJose

yacha-wa-nknow-S1OBJ-S3SUBJ

ñuca-tame-ACC

maria-taMaria-ACC

juya-j-ta.love-PRES.NMLS-ACC‘Jose knows that I love Maria.’

Polinsky acknowledges that the controller is in the subordinate clause on anunderlying level but argues that it is subsequently raised into the main clause.This means that controller and target end up in the same clause and thus thesupergeneralization about the locality of agreement is preserved intact on thesurface level although it is violated in underlying structure.

The anomaly that the grammatical operation of raising solves here is anoma-lous agreement. In addition, as is well-known, raising has also been adopted forresolving anomalous case marking. In English sentences like Mary believes himto have won the race, the accusative case of him is problematic. First, it thwartsthe supergeneralization according to which verbs assign case to their own argu-ments because this noun phrase is the semantic subject of to have won and notan argument of any kind of believes. Second, no general conditions are apparentunder which this anomaly crops up and thus no subgeneralization is possible.

In Government and Binding theory, the exceptionality of such instances isexplicitly acknowledged by the label “Exceptional Case Marking”, attributedto the exceptional nature of the main verbs that allow for this case assignmentpattern (Chomsky 1984: 64–74, 98–101; Webelhuth 1995: 35–38). An alterna-tive tack is taken in Paul Postal’s classic account of 1974, as well as in HowardLasnik’s more recent proposal (Lasnik 1999): both opt for the raising analysis,


whereby the main verb and the subordinate subject are in separate clauses onthe underlying level but in the same clause in surface structure. Since surfacestructure legitimizes the assignment of the accusative by the main verb to theunderlying lower subject that has been raised into the main clause, the super-generalization regarding case marking is upheld in surface structure.6

The distribution of regular and exceptional over underlying and surfacestructure is not the same in these accounts. In the analysis of passives in Re-lational Grammar and in the analysis of particle constructions, the exceptionalstructures are shown to be regular underlyingly and irregular on the surface,while in the case of long-distance agreement in Imbabura Quechua and of ex-ceptional case marking in English, it is the opposite: the irregular structure isshown underlyingly and the derived structure – the result of raising – is reg-ular. What is nonetheless common to all of these analyses is that there is aderivational split between two facets of the exceptional construction, only oneof which is exceptional.

2.4. Separate representations in separate components

In the long-distance agreement pattern of Hungarian discussed above, the ex-ceptional and regular patterns are simultaneously present: neither is derivation-ally prior to the other (see (8) above). In the account of English passives inRelational Grammar ((9) above) and in the accounts of English verb-particleorder ((10)), of long-distance agreement in Imbabura Quechua ((11)), and ofexceptional case marking in English, out of the two representations – one reg-ular, one exceptional – one derivationally precedes the other within the samecomponent.

In yet another type of representation of exceptional structures, the “distance”between the exceptional and regular facets of the construction is widened. Thisis illustrated by Jerrold Sadock’s Autolexical Grammar analysis of particle or-der in English (Sadock 1987: 296–297, 2003). Here, the regular and irregularrepresentations of an irregular structure are in different components with thetwo linked by non-directional lines of association. The verb & object & particleorder is shown as exceptional in syntax but regular in semantics. Thus, the su-pergeneralization that lexical items are contiguous holds true in semantics andviolated only in syntax. This is shown in (12).

6. For discussion, see Newmeyer (2003: 157–160).

42 Edith Moravcsik

(12) SYNTAX: S

NP VP

NP

N V Article N Particle

Megan wiped the counter off.. . . .

. . . .

. . . .SEMANTICS: Megan wiped-off the counter.

Let us summarize the four basic ways of accommodating exceptional struc-tures discussed in section 2. In the diagrams below, R stands for ‘regular’, Estands for ‘exceptional’.

(13) a. two faces of a single representation

R

E

b. two strata in a single representation

c. separate representations in a single component

R E


d. separate representations in separate components

R E

The four ways of splitting exceptional structures differ in the amount ofindependent support available for the two contradictory representations of theconstruction. If there is independent evidence for the existence of the two faces,strata, levels, or components that the structures are split into, the analysis is moreconvincing. Thus, Sadock’s account, where discontinuous particle structuresare regular in their meanings but irregular in their forms, rests on the firmestground: the basis of the split is meaning versus syntactic form – a dichotomy thatis widely supported and the mismatches between the two multiply evidenced.Different levels in the same component and different strata in a single tree mayor may not be justified depending on the amount of independent evidence forthe existence of the levels and strata. The most conflicted representation is thebifacial tree – although, given the facts and the framework assumed, it seemsindeed fully unavoidable.

3. Regularizing exceptions

The analyses we have surveyed so far go half-way towards eliminating excep-tionality: they represent exceptional structures as exceptional in part of the ac-count but regular in another part. An alternative approach taken in the literatureis re-analyzing exceptions as fully regular.

As noted in section 1.2, there are two problems with exceptions. First, theysplit the superclass and thus disable a general rule that would hold for that class.Second, since the regular and exceptional subclasses are not otherwise identi-fiable, no sub-generalization is possible either. It follows then that exceptionsmay be regularized in two ways: either by restoring the homogeneity of thesuperclass by abolishing the subclasses (since in that case, the supergeneral-ization can be maintained); or, somewhat paradoxically, by strengthening thesubclasses through identifying a correlating property which renders subgener-alizations possible. In other words, one tack is to show that the regular andirregular distinction does not exist: there are no subclasses within the class; theother is to acknowledge that there are indeed subclasses and showing that theyare all robust. We will now see examples of both kinds of solution.

44 Edith Moravcsik

3.1. Restoring the superclass

There are two ways of eliminating subclasses within a superclass. One is byreanalyzing the subclasses so that there is no difference between them, after all.The general schema is shown in (14). RSC stands for the regular subclass, ESCstands for the exceptional one.

(14)RSC RSC → (RSC) (ESC)

The other way of eliminating subclasses amounts to deepening the differencebetween the regular and exceptional cases so that the exceptional cases fall out-side the superclass. This is diagrammed in (15).

(15)RSC RSC → (RSC) (ESC)

Let us see examples of each approach.

A. Restoring the superclass by unifying the subclassesEnglish verb-particle constructions once again offer an example. Their transfor-mational analysis was described above; here is an alternative account. PaulineJacobson proposes that the single rule – the direct object immediately followsthe verb – holds both for the regular and the seemingly exceptional cases (Ja-cobson 1987: 32–39). This is made possible by assuming that the lexicon listsboth call and call up as verbs. The seemingly exceptional order call Sue up istherefore not exceptional since it obeys the same rule as the regular order callup Sue: in both cases, the direct object immediately follows the verb.

Other examples of resolving exceptions by showing them to be regular comefrom long-distance agreement. In her paper of 2003 cited above, Maria Polinskysurveys several languages where the same exceptional pattern crops up: themain verb agreeing with an argument of the subordinate clause. Her proposedsolutions fall into three types. For Imbabura Quechua – as was discussed above(see (11)) – she proposes a raising analysis which puts the agreement controllerfrom the subordinate clause into the main clause and thus halfway regularizesthe construction.

For the other two kinds of long-distance agreement (in Algonquian lan-guages and in Tsez, respectively) she proposes two avenues of full regulariza-tion. (16) and (17) present examples of the two patterns (Polinsky 2003: 285,303).


(16) Blackfoot (glossing is simplified)nit-wikixtatwaa-wa1SUBJ-want-3OBJ

n-oxko-wamy-son-3

m-áxk-a’po’takixsi3SUBJ-might-work

‘I want my son to work.’

(17) Tsez (A stands for a long /a/; II indicates Class II)

užirboy

y-iyxII-knows

kidbAgirl

kaγatletter.II.ABS

t’At’ruliread

‘The boy knows that the girl has read the letter.’

For Blackfoot and other Algonquian languages, Polinsky proposes that the con-troller in the subordinate clause has a “proxy” in the main clause and cites inde-pendent evidence. The main verb thus agrees with this proxy – a clause-mate.This analysis merges the exceptional cases into the regular class so that thesupergeneralization about the class-mate-hood of controller and target standsunimpaired.

For Tsez, she suggests that the very domain of agreement be re-defined:rather than controller and target having to be clause-mates, both have to occurin the domain of head-government. This amounts to formulating a new, broadergeneralization into which both regular and irregular cases fit with their differ-ence wiped out.

Both solutions amount to eliminating the boundary between the regular andexceptional subclasses.

B. Restoring the superclass by exempting the exceptionsThe examples above showed two ways in which the boundary between reg-ular and exceptional subclasses can be eliminated: either by reanalyzing theexceptions, as Jacobson does for verb-particle constructions and Polinsky forthe Algonquian-type long-distance agreement, or by reformulating the super-generalization, as Polinsky does for Tsez. As noted in the beginning of thissection, the unity of the superclass can also be restored by more dramaticallyre-analyzing the exceptions so that they do not even belong to the superclass.

An example is Ivan Sag’s analysis of English verb-particle constructions(Sag 1987: 329–333). In Sag’s analysis, when the particle is separated from theverb, it is not a particle but a prepositional phrase. For instance, in the sentenceMeganwiped off the table, off is a particle but inMeganwiped the table off, off isa prepositional phrase. Thus, this off is simply not beholden to the generalizationaccording to which lexical items – such as wipe off – have to be continuoussince it does not form a single lexeme with wipe.

46 Edith Moravcsik

Another proposal that resolves exceptionality by removing the apparent ex-ception from the superclass within which it might be seen as exceptional is byPeter Cole and Gabriella Hermon (Cole and Hermon 1998). The problematicstructure is long-distance reflexives in Singapore Malay: as shown in (18), thepronoun diri-nya can take either a local or a long-distance antecedent (Cole andHermon 1998: 61; Ahmad is male, Salmah is female.)

(18) AhmadAhmad

tahuyknow

SalmahSalmah

akanwill

membelibuy

bajuclothes

untikfor

diri-nya.self-S3

‘Ahmad knows Salmah will buy clothes for him.’ OR‘Ahmad knows Salmah will buy clothes for herself.’

The word diri-nya is a crosslinguistic anomaly both in its internal structure andin its distribution: it does not exhibit the usual characteristics of long-distancereflexives in other languages (such as Mandarin). Two of the generalizationsthat it is an exception to are that long-distance reflexives are monomorphemicand that they require a subject antecedent.

Cole and Hermon propose that diri-nya’s properties deviate from long-dis-tance reflexives in other languages not because it is an exceptional long-distancereflexive but because it is not a long-distance reflexive at all: instead, it is astructure indeterminate between a reflexive and a pronoun. They offer variousbits of evidence in support of the proposal that will not be reproduced; whatis important here is the type of argument employed to deal with the exception.As in Sag’s analysis of discontinuous verb-particle constructions, the offend-ing exception is lifted out of the superclass and thus freed of the obligation toconform.

As noted in the beginning of section 3, there are two basic ways of regu-larizing exceptions. One is by eliminating the regular-exceptional distinctionwithin the superclass and thus restoring the supergeneralization. The other isby strengthening the subclasses and thus making subgeneralizations possible.So far we have seen examples of the first approach; we will now turn to exam-ples of the second.

3.2. Strengthening the subclasses

As discussed in section 1.2, exceptions form a subclass that is both small andundefined. Thus, strengthening the subclass of exceptions may be achieved intwo ways. First, exceptions may be strengthened quantitatively. If the number ofexceptions can be shown to be larger than first thought, it is more likely that theexceptions are principled rather than chance phenomena. Second, the excep-tional subclass may be strengthened qualitatively if additional characteristics


can be identified other than the one on which the regular-irregular distinctionrides: correlating properties that render the exceptions predictable. (19) dia-grams the two approaches; r1 and r2 stand for properties of the regular subclassand e1 and e2 are properties of the exceptional subclass.

(19) Strengthening the exceptional subclass …

a. … quantitatively

RSC ESC → RSC ESC

b. … qualitatively

r1 e1 → r1r2

e1e2

A recent study that quantitatively strengthens a crosslinguistically excep-tional subclass is Rachel Nordlinger and Louisa Sadler’s article of nominaltense (2004). It has been generally assumed that nouns are time-stable enti-ties and therefore tensed nouns are exceptional across languages. Nordlingerand Salder, however, marshal evidence for tensed nouns from ten languages,some of them areally and genetically distant (e.g. Hixkaryana, Potawatami, andSomali). The fact that tensed nouns are more frequent than generally believedmakes it likely that their occurrence is not just a freak accident: there may be astructural condition to predict their existence.

Let us now turn to proposals that shore up exceptional subclasses qualita-tively. For the first example, we will return once more to English verb-particleconstructions.

In her book on English verb-particle constructions, Nicole Dehé (2002) as-sumes a transformational account whereby the contiguous verb-particle con-struction is underlying and the disjoint structure is derived. The additional stepthat she takes is probing into the conditions under which the discontinuous con-struction is used. She finds that this exceptional structure does have an informa-tion-theoretical correlate (103–207, 279–283). In particular, a noun-headed ob-ject follows the particle if the object is part of the focus of the sentence. If,however, the object is known to the speaker and hearer and the focus is on thecomplex verb, the object intervenes between the verb and the particle. Thus,Andrew handed out the papers to the students is used if the papers is new infor-mation and Andrew handed the papers out to the students is used if the papersis topical.

48 Edith Moravcsik

According to this account, the two order patterns of the English verb-particleconstruction do not form arbitrary subclasses. Their dichotomy is maintainedbut since an information-structural correlate has been identified for each class,the order patterns are predictable rather than random.7

Proposing correlations for exceptions and thus showing that they are regu-lar is a focus of several papers in this volume. As already mentioned above, J.Jónsson and Th. Eythórsson propose that the apparently exceptional Icelandicverbs that take accusative subjects form a syntactically and semantically coher-ent class; Th. Wasow, F. Jaeger, and D. Orr’s study reveals that the exceptionalomission of the conjunction that in English relative clauses is correlated withthe predictability of the conjunction in those structures; and M. Cysouw and G.Corbett describe clusterings of exceptional properties in and across languages.

Finding correlates to structures that are crosslinguistically exceptional is thecentral goal of language typology. A recent study that exemplifies this endeavorwith respect to crosslinguistic exceptions is Masayuki Onishi’s (2001). Onishi’sconcern is with non-canonical case marking – i.e., patterns that depart from thenormal case marking of intransitive subjects, transitive subjects, and direct ob-jects in a language. He finds that non-canonical case marking is not randomacross languages: it correlates with certain semantico-syntactic predicate types;for example, stative verbs expressing physiological states and events or psycho-logical experiences such as ‘enjoy’, ‘be happy’, and ‘be pleased’.

4. Explaining exceptions

In the preceding two sections, we have seen two basic ways in which excep-tions can be dealt with: representing them as both exceptional and regular; andreanalyzing them as fully regular. A third alternative of dealing with exceptionsis accepting them as fully – or partially – exceptional and finding reasons whythey are so; i.e., explaining them.

This is an extension of identifying correlating properties since such proper-ties are required for explaining exceptions. They are, however, not quite suffi-cient: for a maximally convincing explanation, there has to be a causal relationbetween a correlating property and the exceptional feature. The basic idea isdiagrammed in (20), where r1 and e1 are the properties in terms of which theexceptional subclass is exceptional and, as before, r2 and e2 are the correlatingproperties. The arrow stands for explanatory deduction.

7. For several alternative accounts of verb-particle construction in English and otherlanguages, see Dehé et al. (2002).


(20) Explaining exceptions

r1 e1 → r1r2

e1e2

The studies by J. Jónsson and Th. Eythórsson on Icelandic accusative objectsand by Th. Wasow, T. Jaeger, and D. Orr on English relative clauses mentionedabove are explanatory if we take meaning and processing ease to be explana-tions of form. An example from outside this volume is Langacker’s account ofraising structures that were discussed in section 2.3. Rather than an arbitraryexception within the class of subordinate constructions, he recognizes this typeof construction as an instance of a widespread structural pattern in language.

The analysis is based on an observation made by Relational Grammariansunder the label Relational Succession Law. What it says is that a noun phraseraised into the main clause inherits the grammatical role of the clause that it israised from (cf. Blake 1990: 94). Thus, noun phrases that come from subjectclauses are raised to subject (as Fred in Fred seems to be happy, derived from[[Fred is happy]S seems]S) and if they come from an object clause, they areraised to object – as the computer inFred believes the computer to have been de-livered, derived from [Fred believes [that the computer has been delivered]S]S .

Langacker goes a step further by showing that raising structures are an in-stance of “pars-pro-toto” constructions: a part stands for the whole, as in Giveme a hand, where hand stands for manual assistance by a person, or So you gotnew wheels, where wheels stands for a car. Given the generalization that thewhole may be represented either by the whole itself or by a part, both raisedand unraised constructions are brought under a single generalization and areexplained as both regular instances of a very general, independently attestedpattern.

However, synchronic observations cannot provide direct causes for languagestructure; they act only indirectly on language processing, language acquisition,and ultimately on historical change. D. Nübling’s paper in this volume proposesto explain the morphological irregularity of four German verbs by tracing theirhistorical origins and relating them to well-known pathways of change.

But even diachronic explanations cannot do more than render exceptional-ity possible or perhaps probable rather than necessary. To see this, let us returnto the two examples given in the beginning of this paper. The historical back-ground of English nouns with /∂ n/ plural (see (1)) is that they were weak nounsin Old English and for weak nouns, /∂ n/ was the regular plural. But this fact doesnot predict that this suffix should have been retained by any noun at all and evenless that it should have been retained by the three nouns where it occurs today.

50 Edith Moravcsik

Similarly, languages that have no nasal consonant phonemes (see (3)) aresaid to have had them at some point in their history before the nasals turnedinto voiced oral consonants (Hockett 1955: 119). But the availability of such aprocess does not predict that it should actually have happened in any languageat all and even less that it should have happened in those particular languageswhere it has.8

5. Conclusions

In this paper, exceptions were characterized as posing a conflict in categoriza-tion. All instances of subclassification disrupt the homogeneity of a class; but ifthe subclasses are characterized by clusters of properties, they can be describedin terms of subgeneralizations. Exceptions, however, form a rogue subclass thatis both quantitatively and qualitatively lean and thus not subsumable under asubgeneralization.

Various ways of coming to grips with exceptions were surveyed; here is asummary of the approaches discussed above.

(A) Representing exceptions as both exceptional and regular by means of

(a) two faces of a single representation, or(b) two strata in a single representation, or(c) separate representations in a single component, or(d) separate representation in separate components

(B) Regularizing exceptions by

(a) restoring the homogeneity of the superclass– by unifying the regular and exceptional subclasses,

– through re-analyzing the exceptions as regular, or– through positing a new, more comprehensive superclass within

which both the erstwhile regular and erstwhile exceptional casesturn out to be regular; or

– by assigning the exceptions to a different superclass; or by

8. Exceptions, also known as irregularities, anomalies, or simply counterexamples togeneralizations, loom large in all sciences both social and natural. For relevant dis-cussions in the philosophical literature about ceteris paribus generalizations, seeCartwright (1988a, 1988b); Hempel (1988); Darden (1992); and Carroll (no date).Whether this paper might contribute to a general account of how exceptions are dealtwith across sciences remains to be seen.


(b) strengthening the exceptional subclass– quantitatively, and/or– qualitatively

(C) Explaining why the exceptions are exceptional

While we have seen that solutions to exceptions vary with the theoreticalframework, it is important to recognize that the very status of a grammaticalpattern: whether it is or is not exceptional to begin with, is also highly theory-dependent (Plank 1981: 4–7). The most fundamental variable across differentapproaches is whether the empirical domain in question is assumed to be well-regulated so that generalizations are to be expected to hold exceptionless; orwhether the domain is seen as a less tidy sort without tight rules. If structuralpatterns are assumed to be mere probabilistic tendencies, what would otherwisecount as exceptions will be “automatically anticipated” (Hempel 1988: 152). Ifthere is no strict regularity, there cannot be irregularity, either.

The four last papers in this volume propose to change the theoretical as-sumptions in the light of which certain phenomena are exceptional. R. Vogel’spaper about alternative case assignment to relative pronouns in German free rel-atives argues that none of the alternatives is the norm; instead, variation itselfis the norm in the grammar of German resulting from the conflicting desideratathat case assignment needs to satisfy. Somewhat in the same vein, F. Fouvrysuggests that grammatical rules be relaxed to operate probabilistically so thatexceptions are still rule-governed.

F. Newmeyer similarly calls into question the very concept of regularitywithin the superclass that exceptional phenomena are generally assumed to be-long to. He suggests that typological correlations in syntax are performance-based rather than stemming from principles of linguistic competence. Giventhat the domain of performance is less constrained overall, there is no reasonto expect typological generalizations to be free of exceptions. The competence-performance distinction is also central to S. Featherston’s paper. He proposesthat if well-formedness is allowed to be gradient, rather than binary, grammarshave no exceptions. Exceptions are in turn the result of output selection by thespeaker – a function of language processing.

These proposals are akin to the way of dealing with exceptions that we sawabove: separating them out of the superclass within which they would appear tobe exceptions (e.g. Sag’s analyzing particles that are separated from the verb notas irregularly placed particles but as regularly ordered prepositional phrases).The difference is that in these accounts, not individual exceptions but entireclasses of exceptions are lifted out of the broader domain of strictly regulatedphenomena.

52 Edith Moravcsik

In sum: just as no grammatical construction is exceptional all by itself butonly if considered in comparison with other similar constructions, it is excep-tional only if the theoretical framework assumed would expect it to be regular.

AcknowledgementFor their thoughtful comments, I am grateful to a reviewer, to the editors of thisvolume, to Professor Márta Fehér, to the participants of the 27th Annual Meetingof the Deutsche Gesellschaft für Sprachwissenschaft in Cologne, February ’05,and to the audience at the University of Debrecen, Hungary, where I presenteda version of this paper in November ’05. Many thanks also to Michael Listonfor directing me to relevant literature in the philosophy of science.

References

Blake, Barry J.1990 Relational Grammar. London/New York: Routledge.

Bole-Richard, Rémy1985 Hypothèse sur la genèse de la nasalité en Niger-Congo. Journal of

West African Languages 15 (2): 3–28.

Carroll, Johnno date Laws of nature. [Available at http://plato.stanford.edu/entries/laws-

of-nature]

Cartwright, Nancy1988a Truth does not explain much. In How the Laws of Physics Lie, Nancy

Cartwright, 44–53. Oxford: Clarendon Press.

Cartwright, Nancy1988b Do the laws of physics state the facts? In How the Laws of Physics

Lie, Nancy Cartwright, 54–73. Oxford: Clarendon Press.

Cartwright, Nancy1988c How the Laws of Physics Lie. Oxford: Clarendon Press.

Chomsky, Noam1984 Lectures on Government and Binding. The Pisa Lectures. Dordrecht:

Foris.

Clahsen, Harald, Monika Rothweiler, and Andreas Woest1992 Regular and irregular inflection in the acquisition of German noun

plurals. Cognition 45: 225–255.

Cole, Peter, and Gabriella Hermon1998 Long distance reflexives in Singapore Malay: An apparent typological

anomaly. Linguistic Typology 2: 57–77.


Cysouw, Michael2005 What it means to be rare: The variability of person marking. In

Linguistic Diversity and Language Theories, Zygmunt Frajzyngier,Adam Hodges and David S. Rood (eds.), 235–258. Amsterdam/–Philadelphia: Benjamins.

Darden, Lindley1992 Strategies for anomaly resolution. In Cognitive Models of Science,

Ronald N. Griere (ed.), 251–273. Minneapolis: University of Min-nesota Press.

Dehé, Nicole2002 Particle Verbs in English. Syntax, Information Structure and Intona-

tion. Amsterdam/Atlanta: Benjamins.

Dehé, Nicole, Ray Jackendoff, Andrew McIntyre, and Silke Urban (eds.)2002 Verb-particle Explorations. Berlin/New York: Mouton de Gruyter.

É. Kiss, Katalin1987 Configurationality in Hungarian. Budapest: Akadémiai Kiadó.

É. Kiss, Katalin2002 The Syntax of Hungarian. Cambridge: Cambridge University Press.

Francis, Elaine J., and Laura A. Michaelis (ed.)2003 Mismatch. Form-function Incongruity and the Architecture of Gram-

mar. Stanford, CA: Center for the Study of Language and Informa-tion.

Haiman, John1974 Concessives, conditionals, and verbs of volition. Foundations of Lan-

guage 11: 341–359.

Hempel, Carl1988 Provisoes: A problem concerning the inferential function of scientific

theories. Erkenntnis 28: 147–164.

Hockett, Charles1955 Manual of Phonology. Baltimore, MD: Indiana University Publica-

tions in Anthropological Linguistics, Memoir 11.

Huck, Geoffrey, and Almerindo E. Ojeda (eds.)1987 Discontinuous Constituency. (Syntax and Semantics 20) Orlando, FL:

Academic Press.

Jacobs, Roderick A., and Peter S. Rosenbaum1968 English Transformational Grammar. Waltham, MA: Blaisdell.

Jacobson, Pauline1987 Phrase structure, grammatical relations, and discontinuous constitu-

ents. In Discontinuous Constituency, Geoffrey Huck and Almerindo

54 Edith Moravcsik

E. Ojeda (eds.), 27–69. (Syntax and Semantics 20) Orlando, FL: Aca-demic Press.

Lakoff, George1970 Irregularity in Syntax. New York/Chicago: Holt, Rinehart and Win-

ston.

Langacker, Ronald W.1995 Raising and transparency. Language 71 (1): 1–62.

Lasnik, Howard.1999 Minimalist Analysis. Malden, MA, Oxford: Blackwell.

Newmeyer, Frederick J.2003 Theoretical implications of grammatical category – grammatical re-

lation mismatches. In Mismatch. Form-function Incongruity and theArchitecture of Grammar, Elaine J. Francis and Laura A. Michaelis(eds.), 149–178. Stanford, CA: Center for the Study of Language andInformation.

Nordlinger, Rachel, and Louisa Sadler2004 Nominal tense in crosslinguistic perspective. Language 80 (4): 776–

806.

Onishi, Masayuki2001 Non-canonically marked subjects and objects: Parameters and prop-

erties. In Non-canonical Marking of Subjects and Objects, AlexandraY. Aikhenvald, R. M. W. Dixon and Masayuki Onishi (eds.), 1–51.Amsterdam/Philadelphia: Benjamins.

Plank, Frans1981 Morphologische (Ir-)Regularitäten. Aspekte der Wortstrukturtheorie.

Tübingen: Narr.

Polinsky, Maria2003 Non-canonical agreement is canonical. Transactions of the Philolog-

ical Society 101: 279–312.

Postal, Paul M.1974 OnRaising. One rule of English and its theoretical implications. Cam-

bridge/London: MIT Press.

Sadock, Jerrold M.1987 Discontinuity in autolexical and autosemantic syntax. In Discontin-

uous Constituency, Geoffrey Huck and Almerindo E. Ojeda (eds.),283–301. (Syntax and Semantics 20) Orlando, FL: Academic Press.

Sadock, Jerrold M.2003 Mismatches in autonomous modular versus derivational grammar. In

Mismatch. Form-function Incongruity and the Architecture of Gram-


mar, Elaine J. Francis and Laura A. Michaelis (eds.), 333–353. Stan-ford, CA: Center for the Study of Language and Information.

Sag, Ivan A.1987 Grammatical hierarchy and linear precedence. In Discontinuous Con-

stituency, Geoffrey Huck and Almerindo E. Ojeda (eds.), 283–301.(Syntax and Semantics 20) Orlando, FL: Academic Press.

Webelhuth, Gert1995 Government and Binding Theory and the Minimalist Program. Ox-

ford/Cambridge: Blackwell.

Classical loci for exceptions:morphology and the lexicon

Exceptions to stress and harmony in Turkish:co-phonologies or prespecification?

Barıs Kabak and Irene Vogel

Abstract. We examine the nature of regularities and exceptions in Turkish vowel har-mony and stress assignment and show how the use of lexical strata or co-phonologiesfails to determine classes of exceptions or sub-regularities in a principled manner. In-stead, we propose a single mechanism, lexical pre-specification, to handle exceptions inboth processes and advance an analysis based on truncation to mark disharmonic rootvowels while maintaining the phonological integrity of the vowels in question. We alsoshow that pre-specification also provides the most viable and unified treatment for alltypes of irregularly stressed roots in Turkish, rather than singling out specific categoriessuch as place names.

1. Introduction

Exceptions within phonology have plagued all modern phonological theories.Indeed, they have been the impetus behind such major theoretical developmentsas Kiparsky’s well-known “Elsewhere Principle” and the entire model of Lex-ical Phonology. Examination of the phonological phenomena that are consid-ered “exceptional” reveals a variety of properties, ranging from isolated casesto fairly general sub-regularities, from forms that are considered atypical bynative speakers to forms that are not found to be noteworthy in any way.

In this paper, we first briefly examine the various types of phonological ex-ceptions as well as several of the mechanisms that have been proposed to handlethem. We then focus on two well-known phenomena of Turkish, Vowel Har-mony and Stress Assignment and show how these quite regular phenomena aresubject to specific types of exceptions. It is shown that previous types of anal-yses of these regular phenomena and their exceptions have serious drawbacks.An alternative proposal is advanced and shown to be simpler and more com-prehensive than previous analyses.

60 Barıs Kabak and Irene Vogel

2. Types of phonological exceptions and treatments

2.1. Overview of types of exceptions

It is possible to distinguish two broad groups of phonological exceptions interms of whether they involve a) phonotactic constraints or b) (morpho-) phono-logical rules. Within each type, we can find exceptions that are fairly isolatedcases and others that represent fairly widespread patterns.

With regard to phonotactic constraints, let us first consider several well-known cases in English. There is a fairly limited set of Yiddish borrowingsthat include initial clusters such as [Sl-] and [Sm-] (e.g. schlep, schmuck) whichviolate the phonotactic constraints of English. Those speakers who use the pro-nunciations in question are aware of their foreign nature and such forms do notappear to have any affect on the nature of English phonology per se. For exam-ple, speakers would still judge a nonce word such as *schlick unacceptable, asopposed a word such as blick. Furthermore, many speakers ignore the violationand regularize the clusters to fit the phonotactics of English (i.e. [sl-] and [sm-]).

More interesting are cases involving the initial cluster [sf-] (e.g. sphinx,sphere). While such words are historically loan words of Greek origin, the typ-ical native speaker of English is unaware of this. Aside from the somewhatuncommon spelling of [f] as “ph”, such words are not felt to be exceptional.Differently from the Yiddish case, however, we do not find English speakers“regularizing” such words for example with [sp-] in place of [sf-] (i.e. *[spir]for sphere). It might thus seem that, while limited in distribution, the [sf-] onsetdoes not actually constitute a phonotactic exception in English. Nevertheless,native English speakers typically reject nonce words with the same onset (e.g.*sfick as opposed to blick), and may have problems pronouncing the same clus-ter in a foreign language (e.g. Italian sfortuno ‘misfortune’).

To the extent that Vowel Harmony represents a phonotactic constraint on thepossible vowel combinations in a word, so-called “disharmonic” words consti-tute phonotactic exceptions. Interestingly, in Turkish, such exceptions to VowelHarmony run the range from a) words felt to be foreign and thus marginal to thesystem (e.g. randevu ‘appointment’, rötar ‘delay’, monitör ‘monitor’,) analo-gous to Yiddish clusters in English, to b) words that are felt to be perfectly finewords of the language (e.g. radyo ‘radio’, gazete ‘newspaper’, siyah ‘black’),even though similar nonce items might be rejected as potential words, anal-ogous to the case of [sf-] in English (see Yavas 1980 for a psycholinguisticinvestigation).

Exceptions with respect to Phonological Rule (P-Rule) application typicallyinvolve morphophonological alternations. As in the case of phonotactic ex-

Exceptions to stress and harmony in Turkish: co-phonologies or prespecification? 61

ceptions, these exceptions may constitute isolated cases and be recognized asmarginal, or they may represent more general patterns and not be recognized asbeing atypical.

Certain irregular plurals are recognized as exceptions to the general rules thatdetermine the surface form of [-z], [-s] or [-@z] (e.g. tooth – teeth vs. booth –booths). Like the case of the Yiddish clusters, such forms are felt to be externalto the system of English. Thus, when making the plural of a nonce item suchas looth, speakers will use the regular pattern (i.e. looths) as opposed to anirregular form (i.e. *leeth).

Other morphophonological exceptions, however, are not noted as such bynative speakers. In a substantial number of words, a final /d/ is pronounced as [Z]when followed by the suffix [-@n] ‘-ion’ (e.g. divi[Z]ion, corro[Z]ion, elu[Z]ion,inva[Z]ion). The phonetically quite similar suffix, [-@v] ‘-ive’, however, causesthe appearance of [s] in the same words (e.g. divi[s]ive, corro[s]ive, elu[s]ive,inva[s]ive). Both of these patterns are in a sense exceptional since other suffixesthat begin with schwa typically do not cause the /d/ to undergo any changes,for example, [-@r] ‘-er’ (e.g. divi[d]er, corro[d]er, etc.), or [-@b@l] ‘-ible/-able’(e.g. divi[d]able, corro[d]ible, etc.). There are, however, also several excep-tions involving additional changes with [-@b@l] (e.g. divi[z]ible). Despite thevarious treatments of word final /d/ when followed by different suffixes, speak-ers do not identify the forms in question as exceptions but rather consider thempart of the basic system of English. Likewise, the loan suffix -istan in Turk-ish, which derives country names, contains vowels that are disharmonic, vio-lating the vowel harmony rules of Turkish (e.g., Türkmen-istan ‘Turkmenistan’,Hind-istan ‘India’, Macar-istan ‘Hungary’). The suffix, however, is felt to beperfectly regular by native speakers, especially evinced by the fact that it canbe used productively to create made-up country names (e.g., hayvan-istan ‘thecountry of animals’; çocuk-istan ‘the country of children’).

2.2. Overview of treatment of exceptions

In modern phonological theories, a number of proposals have been advancedto account for the fact that in addition to the general patterns of a language,there tend to be not only sporadic exceptional items but classes of items thatexhibit their own sets of patterns that are systematically different from thoseof the core phonology. It is possible to distinguish two general approaches totreating exceptions: a) those that simply mark individual items in some way as“foreign” (e.g. SPE) and thus exempt them from the regularities of the languageand b) those that attempt to enrich the structure of the phonology of the languageto recognize and formalize the role of exceptions in the make-up of the language


as a whole (e.g. Lexical Phonology). The problem of the former is that whilesome exceptions are felt to be marginal and excluding them from the phono-logical system of the language seems reasonable (e.g. the Yiddish borrowings),where more widespread patterns exist and where these are not interpreted as“foreign” by native speakers, simply excluding the items in question is not aviable option.

In the SPE model of phonology, in addition to exempting individual itemsfrom undergoing phonological rules, more general structural mechanisms wereavailable for handling exceptions. By using abstract underlying representations(URs), it was possible to distinguish items based on differences in UR suchthat a particular rule/s would apply or fail to apply in order to arrive at the ap-propriate surface form. In addition, to account for the fact that certain P-Rulesmight apply only in certain morphophonological contexts, different types ofboundary symbols were used, most notably ‘+’ and ‘#’. Such mechanisms, how-ever, were criticized as being merely diacritic notations.

Attempts were thus made to a) constrain the nature of URs and b) eliminatethe arbitrariness of boundary symbols. The latter in particular gave rise to Lex-ical Phonology in which different P-Rules were associated with different levelsof representation; roughly, the ‘+’ and ‘#’ boundary phenomena were locatedon Levels 1 and 2, respectively. It was still necessary to stipulate at which levela given rule was operative, however, the levels themselves imposed a principledordering relationship among the sets of rules applying at the different levels. Inaddition to the claim that irregularities were relegated to Level 1, it was notedthat a strong correlation existed between the Latinate and Germanic componentsof English and Levels 1 and 2, respectively. While the rules of Level 2 (and pos-sibly beyond, depending on the number of levels in the model) were the onesconsidered synchronically productive, it could also be seen that many of therules of Level 1 were quite productive, but only among the Latinate structures.Thus, the P-Rules of Level 2 could be interpreted as characterizing the basicsof English phonology, while those of Level 1 could be seen as constituting amore exceptional component of the phonology, albeit a substantial one. Anal-ogous situations were also identified in other languages in which the differentlevels in Lexical Phonology appear to correspond, at least roughly, to phenom-ena with different linguistic origins (e.g. Sanskrit and Dravidian in Malayalam;cf. Mohanan 1986).

While the ordering relationships among phonological forms and phenom-ena inherent in Lexical Phonology provided a number of important insights,they also introduced their own problems. One problem was that it was not clearhow many levels were needed and on what grounds they were established. Fur-thermore, any attempt at ordering phenomena led to the well-known “ordering


paradoxes”. Co-phonologies (cf. Inkelas 1998; Anttila 2002; Inkelas and Orgun2003) were introduced to account for the fact that different phonological phe-nomena may apply to different items in a language without, however, requir-ing that the items be relegated to different levels in the lexicon. In fact, whilecertain words might behave in a similar fashion with respect to one rule, theymight not share the same property with respect to another rule. For example, inTurkish, certain place names might exhibit an atypical stress pattern and thusbe placed in a given co-phonology for this purpose (cf. Inkelas 1999; Inkelasand Orgun 2003). At the same time, however, these same words might behaveregularly with respect to other phenomena such as Vowel Harmony. By associ-ating the appropriate words with a particular co-phonology, no assumptions aremade with respect to other phenomena, whereas in Lexical Phonology the lev-els correspond to clusters of phenomena. The main problem is, however, that itis unclear how many co-phonologies (like lexical levels) a language may have,and on what grounds they are established.

Finally, within Optimality Theory, exceptions are typically addressed eitherin terms of the nature of the representation or the ordering of constraints in re-gard to the exceptional items. In the former case, a particular property of an itemis specified in such a way that it is “protected” by a Faithfulness Constraint, withthe result that the more typical form appears to be a less highly valued option. Inthe latter case, a particular lexical item may be specified for a special orderingof constraints, either in terms of a re-ranking of certain constraints or in termsof observation of a lexically specific constraint. Both of these possibilities treatexceptions as isolated items. While this might be appropriate for such cases asthe Yiddish clusters in English, it fails to capture the fact that certain exceptionsare found in groups of items, an insight inherent in both the Lexical Phonologyand co-phonology approaches. Only to the extent that phonological patterns areassociated with specific types of morphological junctures can OT account for atleast some general patterns of exceptions, via the use of Alignment Constraints(cf. Mc Carthy and Prince 1993).

2.3. How different are the approaches to exceptions?

Despite the apparent differences among the previous approaches to exceptions,they all share several fundamental properties. That is, they all crucially rely onprespecification of a) particular features in the underlying representation of alexical item or b) a property of the item that causes it to exhibit atypical be-havior with regard to the general rules (or constraints) of the language. Whilethe SPE approach used “diacritics”, this is not all that different from the use ofitem-specific rewrite rules (e.g. tooth + pl → teeth) and the application of the


Elsewhere Condition in Lexical Phonology. Similarly, in OT the specificationof particular features in a UR and the related Faithfulness constraints “protect”a given atypical surface form. In each case, individual lexical items are speci-fied with whatever properties are needed so that they do not participate in whatare considered the general patterns/phenomena of the language.

Both Lexical Phonology and co-phonologies attempt, furthermore, to ad-dress the fact that certain exceptions are not isolated cases but constitute pat-terns themselves. The use of diacritics is not avoided, however, since they arerequired to indicate the location of items in their appropriate portion of thephonological system. These diacritics, however, do not refer to the nature of theUR and thus do not provide information about the nature of the exceptionality,they merely serve as a type of tag indicating which set of rules is applicable.Furthermore, it has been pointed out (e.g., Inkelas, Orgun, and Zoll 1996, 1997,2004) that establishing co-phonologies for capturing static regularities in gram-mar leads to the proliferation of co-phonologies, and similar objections to theproliferation of levels were raised in regard to Lexical Phonology.

In OT, too, the mechanism for capturing exceptions and static patterns isprespecification in the context of the ‘richness of the base’ (McCarthy 1995;Prince and Smolensky 1993; Smolensky 1996). Lexicon Optimization (Princeand Smolensky 1993) ensures that underlying representations are posited suchthat the fewest highly ranked constraints are violated in the determination ofsurface forms. That is, it is essentially the surface forms of morphemes thatdetermine their underlying specification. Thus, “for structures which excep-tionally fail to show the expected alternations and instead maintain a constantsurface form, Lexicon Optimization will naturally chose them to be stored un-derlyingly in their surface forms-that is, to be prespecified” (Inkelas, Orgun,and Zoll 2004: 549).

3. Two types of phonological exceptions and treatments

We now consider the treatment of exceptions to two general phonological phe-nomena in Turkish: Vowel Harmony VH and (final) Stress Assignment (SA).We will show how previous analyses have required multiple mechanisms, in-cluding the use of diacritic specifications, and we will evaluate the consequencesof using multiple mechanisms for handling phonological exceptions. This eval-uation leads to a proposal that a single mechanism of lexical prespecificationof the exceptional behavior provides the simplest and the most efficient way tohandle exceptions.


3.1. Vowel harmony

The Turkish vowel system comprises eight vowels with symmetrical high-low,front-back, and round vs. non-round oppositions, as seen in Table 1.1

Table 1. Turkish vowel phonemes.

Front Back

unrounded rounded unrounded rounded

High i ü 1 u

Low e ö a o

These vowels may only combine in certain ways, exhibiting Vowel Harmony(VH), within a specific domain, roughly a non-compound word. Abstractingaway from theoretical differences with respect to the nature of the phonologicalfeatures assumed for the Turkish vowel system, it is generally assumed thatthere are two patterns of VH: (i) palatal harmony, which requires that all vowelsagree in terms of the frontness-backness dimension, and (ii) labial harmony,which requires that high vowels agree with the preceding vowel in terms ofroundness (e.g., Kardestuncer 1982; Clements and Sezer 1982; van der Hulstand van de Weijer 1991; Polgardi 1999; Kaye, Lowenstamm and Vergnaud1985).

It can be seen that VH applies within roots, as in (1), as well as in morpho-logically complex words, as in (2).

(1) Vowel Harmony in Rootsa. bar1s ‘peace’b. yedi ‘seven’c. soguk ‘cold’d. kötü ‘bad’

1. In this paper, we mostly use the general conventions of Turkish orthography in Turk-ish examples. Accordingly, ü and ö represent high and low front rounded vowels,respectively. Instead of the orthographic ı, a barred-i (1) is used for the high backunrounded vowel (IPA: [W]) to avoid possible confusion with the symbol i whichrepresents the high front unrounded vowel. The symbol s represents the voicelesspalato-alveolar fricative (IPA: [S]) ; y the palatal glide (IPA: [j]); ç and c indicatevoiceless and voiced palatal affricates (IPA: [tS] and [dZ]), respectively. The letterknown as soft-g (g), which corresponds to a voiced velar fricative in Anatolian dia-lects, is argued not to be produced in standard Istanbul Turkish, but to hold an emptyconsonantal position in the underlying representation.


(2) Vowel Harmony in Morphologically Complex Words

Nom Sing Acc Sing Nom Plural Acc Pluralk1l ‘hair’ k1l-1 k1l-lar k1l-lar-1il ‘city’ il-i il-ler il-ler-ikus ‘bird’ kus-u kus-lar kus-lar-1süt ‘milk’ süt-ü süt-ler süt-ler-ikat ‘floor’ kat-1 kat-lar kat-lar-1tel ‘wire’ tel-i tel-ler tel-ler-iyol ‘road’ yol-u yol-lar yol-lar-1çöl ‘desert’ çöl-ü çöl-ler çöl-ler-i

Despite such regularities, certain exceptions also exist, as detailed below.

3.1.1.Disharmonic roots: a subregular pattern?

Clements and Sezer (1982) observe that vowels from the set /i, e, a, o, u/ maycombine freely in roots, leading to the violation of palatal and labial harmony,as in (3).

(3) Root disharmony

Violationsa. sa:hip ‘owner’ Palatal Harmonyb. polis ‘police’ Palatal and Labial Harmonyc. büro ‘office’ Palatal and Labial Harmonyd. pilot ‘pilot’ Palatal and Labial Harmony

In all of these items, palatal harmony is violated. Labial harmony is additionallyviolated in (3b), where we would expect the non-initial vowel to be round sinceit is high, and in (3c, d), where the non-inital vowels are rounded although theyare low.

By contrast, vowels from the set /ü ö 1/ have been observed not to occurin disharmonic roots (e.g., Clements and Sezer 1982), although several words,all of which are borrowings, contain a combination of /i/ and /ü/ (e.g., virüs‘virus’, ümit ‘hope’). Goldsmith (1990: 304–309) points out that the first group,/i, e, a, o, u/ coincides with the cross-linguistically favoured five vowel system,where the specification for the frontness/backness dimension (i.e., [back] ac-cording to Goldsmith) is fully predictable from rounding. According to Gold-smith, [back] is underspecified at this level, leading these five vowels to com-bine freely within a root without violating frontness-backness harmony. Thespecification of [back] is required, however, when the root contains /ü/, /ö/, or/1/, since these vowels involve marked combinations of frontness and rounding.


As Clements and Sezer (1982) observe, furthermore, these vowels fail to sur-face with other vowels in disharmonic roots since they tend to be regularized insuch cases, as illustrated in (4).

(4) a. mersörize ∼ merserize ‘mercerized’b. kupür ∼ küpür ‘clipping’c. bisküvit ∼ büsküvüt ‘biscuit’d. püro ∼ puro ‘cigar’e. komünist ∼ kominist ‘communist’

Although Clements and Sezer (1982) note such regularizations, they neverthe-less suggest that Vowel Harmony is not an active process in roots. The oppositeconclusion is drawn by others such as Van der Hulst and van de Weijer (1991)who argue that since VH is independently required for suffixes, harmonic rootsget a ‘free ride’ with regard to the harmony rules. In addition, they require aset of restrictions on possible vowel combinations permitted in disharmonicroots, which would presumably be part of the (phonological) grammar of Turk-ish speakers:

(5) Restrictions on Disharmonic Roots:

a. /1/ does not occur disharmonically.b. /ü/ and /ö/ do not occur with back vowels.c. Non-initial syllables do not contain round vowels.

A similar set of restrictions on disharmonic roots is incorporated into Polgardi’s(1999) Optimality Theoretic analysis. As will be shown below, however, both ofthese approaches have shortcomings in that they a) require underlying specifica-tion of exceptionality on already exceptional roots, and b) result in a substantialnumber of incorrect predictions.

3.1.2.Previous accounts of vowel harmony exceptions

Van der Hulst and van de Weijer (1991) assume unary vowel components (prim-itives) that either regularly extend over the word domain, or are linked to spe-cific vowel (V) positions. Accordingly, Turkish vowels are assumed to occupya V position, and are classified in terms of L(ow), F(ront), and R(ound). V posi-tions with no further information are interpreted as high and back, presumablyyielding the unmarked vowel in Turkish. The combinations of R and F with Vor the specification of L, provide the remaining 7 vowel phonemes in Turkish.For example, the vowel /ö/ exhibits the maximum number of components (i.e.,L, F, and R), a reflection of the marked status of the vowel. Vowel Harmony is


essentially viewed as the process of associating a vowel primitive to every V orL following it in the relevant domain, as illustrated in (6). Since low roundedvowels are unattested in non-initial position, an additional condition is positedto ban the association of R to a non-initial L. If no F or R element is available tobe associated to subsequent positions, the regular patterns surface as illustratedin (7).

(6)

(7) V L L V L L V V/1 a/ /a 1/ /a a/ /1 1/g1rtlak alt1 kara k1s1m‘throat’ ‘six’ ‘black’ ‘part’

Disharmonic roots, by contrast, may not contain a bare V position, which wouldyield /1/ in violation of the generalization in (5a) above, or a V position with twospecified elements, which would result in the violation of (5b), as shown in (8).Thus, the disharmonic roots containing the combinations /i-o/, /e-u/, /e-o/, and/i-u/, in either order, contain V positions with only one specified element.

(8) Properties of disharmonic roots (after van der Hulst and van de Weijer1991)

a. Disharmonic roots do not contain bare VExamples:

F R

V V V L */e / */i o/ i


b. Disharmonic roots do not contain roots with two prosodiesExamples:

F F

V L V V

R R */¸ a/ */u ü/

The same claims are made within an OT framework by Polgardi (1999). Theconstraint harmony essentially associates every vowel element (A = Low, U =Round, I = Front) to a vowel position. An additional constraint, *lıcense (a,u),ensures that combinations of the elements A (Low) and U (Round) are onlylicensed in initial position. A third constraint, *multıple (α), conflicts withharmony, banning the multiple association of elements. Since A (Low) cannotspread in Turkish, *multıple (a) must outrank harmony and a more general*multıple (α) constraint. Polgardi (1999) blocks VH in roots by assuming thedec (Derived Environment Constraint), which restricts harmony to derived en-vironments. Specifically, by ranking the dec above constraints that require theassociation of elements (i.e., harmony), VH will only be observed in derivedenvironments.

Beyond using the dec to block harmony in roots, Polgardi employs anotherset of constraints to account for the additional generalizations regarding dishar-monic roots given above in (5) to ensure that impossible disharmonic rootsdo not surface. The constraint *elements prohibits the presence of elements(prosodies such as I, U, A), with the effect of banning /ü/ and /ö/ in dishar-monic roots. Similarly, /1/ is prohibited in disharmonic roots by fill whichavoids empty positions. Finally, *license (i, u) ensures that the combinationof I and U (i.e., /ü/), is only licensed by multiple association of I, which servesto ban the combination of /ü/ and /ö/ with back vowels. The tableau in (9) il-lustrates how these constraints regularize an ungrammatical disharmonic formcontaining /ü-1/ to a form with [ü-i]. In the tableau, A, U, and I stand for Low,Round, and Front, respectively, and ‘v’ indicates the absence of a particularelement. Each vowel is represented by the elements placed on top of one an-other (e.g., [v I U] = /ü/; [v v v] = /1/; [A v v] = /a/). Spreading is indicated by‘»’, and the large dot indicates the target that is affected by the spreading ele-ment. Three dots ‘…’ between the vowels show that there can be an interveningsegment (e.g., a consonant).


(9) Regularization of ungrammatical disharmonic root /ü 1/ (adapted fromPolgárdi 1999: 196)

vIUü

…………

vvv1 *ELEMENTS LIC (I, U) FILL DEC HARMONY *MULT (α )

a. v …vI » *U » *ü …ü

** **! **

b. v …vI » *U v

Z ü …i

** * * *

c. v …vI vU » *ü …u

** *! * * *

d. v …vI vU vü … 1

** *! *! **

e. v …AI vU vü …a

***! * * ***

Finally, a recent approach to exceptions in phonology, the introduction of co-phonologies that are intended to represent sub-regularities in the lexicon, canalso be applied to the case of VH exceptions in Turkish. In particular, to captureGoldsmith’s (1990) observation that /i, e, a, o, u/ may freely combine in dishar-monic roots, different sets of roots would need to be relegated to at least thetwo co-phonologies outlined in (10). In addition, it would still be necessary toresort to lexical specification to account for exceptions to these co-phonologies,as presented in (11).

(10) Co-phonology A:

– This co-phonology is subject to the feature values imposed by the5-vowel system, where vowels are underspecified for frontness-backness dimension.

– The members of this co-phonology are roots that contain any ofthe vowels from the set /i, e, a, o, u/, which are seemingly har-


monic with respect to palatal harmony, but otherwise disharmonicin the context of the 8-vowel system.

– Examples: kitap ‘book’, kalem ‘pencil’, etc.

Co-phonology B:

– This co-phonology is subject to the feature values imposed by the8-vowel system, where vowels must be specified for the frontness-backness dimension, rounding, and height.

– The members of this co-phonology are roots that obey palatal androunding harmony.

– Examples: bal1k ‘fish’, eksi ‘sour’, konu ‘topic’, etc.

(11) Lexıcal specıfıcatıon:

– Vowels in roots that belong to neither of the above co-phonologiesmust be associated with lexically specified feature values.

– Examples: k1rlent ‘pillow’; büro ‘office’, etc.

3.1.3.Drawbacks of previous treatments of vowel harmony exceptions

One problem with the above analyses is that they do not crucially distinguishbetween harmonic roots and those disharmonic roots that fail to obey the prop-erties given above in (8). That is, both types of roots may contain a) V positionswith two elements (e.g. /ö-ü/ and /ü-ü/), as in the examples in (6), and b) bareV positions, as in (7). Nothing in the representation of the disharmonic casesindicates the nature of their exceptionality, so some type of additional markingis required to distinguish the two categories of roots.

A second type of problem is the existence of disharmonic items with se-quences that violate the generalization in (8b), some of which are quite frequent(e.g., virüs ‘virus’, küllah ‘cone’, mahküm ‘convict’, rötar ‘delay’, büro ‘of-fice’ etc.). All such items (i.e. those with combinations of /ü-i/, /i-ü/, /ü-a/, /a-ü/,/ü-o/, /o-ü/, /ö-a/, and /a-ö/) contain Vs with two prosodies. In addition, thereare several disharmonic roots with bare V positions, in violation of the general-ization given in (8b). These include many commonly used borrowings in whichan epenthetic vowel is introduced to break up onset clusters that are otherwiseillicit in Turkish (e.g., [k1rem] from cream; [k-ıredi] from credit), as well asthose borrowings where certain vowels were replaced with [1] for various rea-sons during the process of nativization (e.g., [k1rlent] ‘pillow’ from ghirlanda(Italian); [k1dem] ‘rank’ from kidem (Arabic)). Such cases would need someidiosyncratic marking to allow them to surface as words of Turkish.

The problem is not resolved by the OT treatment, as Polgardi’s (1999) con-straint ranking in (9) runs into the same difficulty. As can be seen in the tableau,


the sequence /ü -a/ in (9d) incurs multiple violations of the unranked constraint*elements because the sequence contains three prosodies. Likewise, all theother sequences noted above also incur multiple violations of this constraint,incorrectly predicting that such sequences should fail to appear in the surfaceforms of disharmonic roots in Turkish. Furthermore, the constraint fill alwaysmilitates against disharmonic roots with bare V positions thus incorrectly dis-allowing the combination of [1] with vowels other than [a] from surfacing inthe Turkish lexicon. In fact, due to recent borrowings, Turkish is replete withsuch words where the vowel [1] is the result of epenthesis to break up onsetconsonant clusters, as mentioned above. In spoken Turkish, these vowels aretypically produced, although they are not always specified in the spellings incurrent dictionaries.

The cophonology approach entertained above also encounters some sub-stantial problems. As argued by Inkelas, Orgun and Zoll (1996, 1997, 2004),as soon as a grammar divides morphemes into distinct classes based on somedetectable pattern, it permits the proliferation of (potentially uninteresting) co-phonologies. One might, however, invoke the concept of statistical significancehere, particulary to restrict morpheme-specific co-phonologies. A quantitativeapproach might support Co-phonology A in (10), since there are many itemsthat follow this pattern, while less frequent patterns would simply need to behandled by lexical prespecification of some sort. Inkelas, Orgun and Zoll, pointout, furthermore, that even statistically significant patterns might neverthelessnot be of phonological importance in a language.

Another way to limit co-phonologies suggested by Inkelas et al. is to restricttheir nature, for example, by allowing them to involve only the non-native por-tion of the lexicon of a language (cf. Ito and Mester 1993, 1995). This wouldnot solve the problem in Turkish, however, since there is no obvious way toseparate the vocabulary into native vs. non-native categories, except possiblyin the case of certain very recent borrowings.2 Furthermore, such groupingswould not result in a clear-cut distinction between disharmonic and harmonicroots, and in fact, there exist certain non-native words that are harmonic (e.g.,lise ‘lycee’) as well as native Turkish words that are disharmonic (e.g., anne‘mother’).

Despite differences in their approach, all of the previous models have incommon that their treatment of VH requires multiple types of idiosyncraticmarking: (a) the specification of a particular class of disharmonic words as being

2. See Lightner (1972) for an attempt to distinguish non-native words from native onesin the Turkish lexicon on a number of phonological properties. The results do not,however, provide the distinctions needed here.


subject to the observed sub-regularity, and (b) a lexical specification of wordsthat violate the sub-regularity. As we will see below, a similar degree of com-plexity arises in previous treatments of irregular stress patterns in Turkish.

3.2. Stress

It is well known that Turkish regularly places primary stress on the final syllableof a word, whether this is a root or a combination of root and suffixes (e.g., Lees1961; Lewis 1967; Underhill 1967; Inkelas 1996; Kabak and Vogel 2001), asillustrated in (12).

(12) Regular Word-Final Stressa. kedí ‘cat’b. kedi-lér ‘cats’c. kedi-ler-ím ‘my cats’d. kedi-ler-im-íz ‘our cats’e. kedi-ler-im-iz-ín ‘of our cats’

Despite this regularity, however, exceptions are also observed. While irregularstress exists in both roots and morphologically complex items, the focus here ison irregular stress in roots, illustrated in (13). The reader is referred to Kabakand Vogel (2001, 2005) for a discussion of other types of stress irregularities.

(13) Irregular Stress: exceptional lexical itemsa. Edírneb. Kastámonuc. Üsküdard. tiyátro ‘theatre’e. fabríka ‘factory’f. négatif ‘negative’

It has been widely noted that a number of words exhibit irregular stress, follow-ing a quantity-sensitive pattern similar to the Latin Stress Rule, often referredto as the Sezer Stress Rule (cf. Sezer 1981, Kaisse 1985; Barker 1989, Çakır2000 among others). These words are typically, though not exclusively, nativeor foreign place names, personal names and other loan words. According to theSezer Stress Rule (SSR), in these irregularly stressed words stress falls on theantepenultimate syllable if it is heavy and the penultimate syllable is light; oth-erwise it falls on the penultimate syllable. It is easily seen, however, that such ageneralization does not do justice to the facts of irregular root stress in Turkishsince a variety of irregularly stressed words fail to follow the SSR: (13b and f)are stressed on the antepenultimate rather than the predicted penultimate syl-


lable; (13c) and (13e) are stressed on the penultimate syllable rather than thepredicted antepenultimate syllable. Recently, the argument has been advancedthat the SSR is productive exclusively in place names (Inkelas and Orgun 1998;Inkelas 1999; Inkelas and Orgun 2003). It is argued, furthermore, that derivedwords which do not themselves follow the SSR exhibit this pattern when used asplace names (e.g., ulús ‘nation’ vs. Úlus (place name); sirke-cí ‘vinegar seller’vs. Sírkeci (place name); kulak-s-íz ‘without ear’ vs. Kuláks1z (place name)).It should be noted, however, that several exceptions to this generalization ex-ist. As noted above, place names such as in (13b) and (13c) fail to conform tothis pattern. In addition, Demircan (1976) lists several morphologically com-plex place names with final stress that thus also fail to exhibit the SSR (e.g.Adalár, Y1ld1rán, Savas-ır, Arabac1lár, Yumurtal-ík, Degirmencí, etc.). There arealso other morphologically complex place names with non-final stress wherethe SSR is violated in other ways (e.g., Ármaganl1, Óklaval1, Seméreci, etc.).Furthermore, numerous place names follow the regular compound stress rule,where the stress is retained on the first member of the compound (e.g., Fenér-bahçe; Kad-í-köy; K1r-ík-kale).3 Finally, it should also be noted that there is con-siderable variation with respect to stress placement in certain place names (e.g.,Söylemez ∼ Söyleméz; Emírgan ∼Emirgán ; Bálaban ∼ Balában ∼ Balabán;Égridir ∼ Egrídir; see Demircan 1976: 410 for further examples), although it

3. For completeness, it should be added that there are also compounds that arestressed on the final word of the whole compound construction in Turkish (e.g., ak-ar+yak-ít (flow-Aor+fuel) ‘fuel oil’, imam+bay1l-d-í (imam+faint-Past) ‘a pot roastof lamb with eggplant puree’; uyu-r+gez-ér (sleep-Aor+wander-Aor) ‘sleepwalker’;gök+del-én (sky+pierce-Rel) ‘sky-scraper’; bilgi+say-ár (information+count-Aor‘computer’ ). The same pattern is also manifested in several proper names (Son+gül(last+rose); Gül+áy (rose+moon); Bin+náz (thousand+caprice)), as well as a fewplace names (e.g., Fener+bahçé (lantern+garden) that are formed through com-pounding. In the dialect of the first author, a native speaker of Istanbul Turkish,however, some of these compounds, except for proper names, follow the regularcompound stress pattern (e.g., imám+bay1ld1; uyúr-gezer, Fenér+bahçe). The factthat stress appears on the last syllable of the rightmost word in certain compound for-mations, especially in the case of proper name, suggests that these constructions havebeen grammaticalized as simplex nouns, and are no longer analyzed as compoundsin the synchronic grammar. The distinction between compounds that are stressed onthe leftmost word vs. the rightmost element also seems to be grounded in the mor-phosyntactic and semantic properties of compounds (e.g., endocentricity vs. exocen-tricity; see Demircan 1996: 147–148 for details). The discussion of these properties,however, is outside the scope of this paper.


is not clear whether this is due to dialectal, or even idiolectal, variation. Whilesome of these variants follow the SSR, clearly others do not.

3.2.1. Stress irregularities and co-phonologies

In several recent analyses of Turkish stress, it has been proposed that the differ-ent irregular stress patterns, in particular those found in place names, be treatedas separate co-phonologies, beyond the general co-phonology that accounts forthe regular stress pattern (cf. among others Inkelas, Orgun, and Zoll 1997; Inke-las 1999; Inkelas and Orgun 2003). In such proposals, roots as well morphologi-cally complex place names are grouped together with regard to the SSR. Severalproblems are immediately apparent, however, with such an approach.

First, as was noted, irregular stress is not limited to place names, so treat-ing place names as distinct from other items requires additional mechanismsfor the latter, and creates the expectation that the different types of words arecrucially distinct in some way. Second, the problem of the proliferation of co-phonologies arises with regard to place names. As was also noted, while someirregularly stressed place names exhibit the SSR, some show other irregularstress patterns, while a considerable amount of place names actually follow theregular Turkish stress patterns (e.g., final stress and compound stress). This sug-gests that lexical items that represent place names require a multiplicity of co-phonologies, minimally, one for the regular stress pattern, one for SSR and an-other one for place names that do not fit into either of these two co-phonologies.In fact, examination of the various patterns discussed in Inkelas and Orgun(2003) reveals seven distinct categories of place names with regard to theirstress assignment (cf. Kabak and Vogel 2001). If each of these categories ispermitted to constitute a separate co-phonology, we would expect that the in-troduction of a new item that does not fit any of these patterns would then alsobe allowed to form yet another co-phonology. It is not clear on what groundsone may establish a co-phonology, and if certain restrictions are imposed, whatwould happen to forms that fall outside the co-phonologies that are establishedin accordance with the restrictions. The obvious solution would be to permitone co-phonology that is a type of catch-all for any items that are not otherwiseaccounted for. In this case, it would be necessary to simply specify the locationof stress individually for all of the forms in this “overflow co-phonology”. Thequestion such an approach raises is whether it makes sense for a grammar toinvoke a variety of different mechanisms including co-phonologies and what-ever means are required to a) identify the items that are associated with eachco-phonology and b) ensure that the necessary phonological rules operate in theappropriate order in each co-phonology.


A third, and more general, concern regarding the co-phonology approach iswhat it implies about the nature of the grammar itself. It has been suggestedthat the SSR is psychologically real and operates productively in Turkish (cf.Inkelas and Orgun 2003). Thus, we might interpret the presence of a given co-phonology as an indication that the relevant portion of the grammar is syn-chronically “active” in the language. It is also claimed that speakers follow thispattern in stressing new place names, although this claim has not been substanti-ated with experimental data. Furthermore, based on an examination of the 948irregularly stress place names listed in the TELL data base (= Turkish Elec-tronic Living Lexicon: http://ist-socrates.berkeley.edu:7037/TELLhome.html),Kabak and Vogel (2005) determine that only 19% of the items unquestionablyconform to the Sezer Stress pattern. The other irregular stress patterns are rep-resented by even smaller percentages. While numerical data are not necessarilyindicative of productivity (or its absence), it is also the case that the informal ob-servations of the first author do not support the suggestion that native speakersof Turkish regularly apply the Sezer stress pattern to new place names. Thus,if the existence of a Sezer co-phonology implies that this stress rule activelydetermines the pronunciation of place names in Turkish, such a claim is at bestquestionable without experimental data to support it. Note that speakers mightgeneralize the fact that place names have just non-final stress, which may notnecessarily correspond to the SSR, and apply this to novel place names via anal-ogy to the existing ones.

In addition, the introduction of a “Sezer co-phonology” into the grammar ofTurkish, through which all place names must pass, would obscure several factsof Turkish that a grammar would normally be expected to capture. For exam-ple, we would lose the information that a considerable number of place namesin Turkish actually exhibit regular stress, or follow other regular patterns of thelanguage. Indeed, not only is Turkish morphology extremely productive, thesame mechanisms (i.e., suffixation, compounding) used in regular word for-mation processes are also employed in the formation of complex place names.If the place names are isolated from the rest of the lexicon in some way, thisgeneralization is missed.

Moreover, if it is assumed that the SSR is what accounts for stress place-ment in place names, the grammar must somehow be even further complicatedto handle those that do not follow the SSR. As mentioned above, such itemsactually constitute the majority of place names, so we must ask at this point towhat extent the grammar is really capturing and representing the phonologicalgeneralizations of Turkish.

In an effort to maintain the concept of co-phonologies, it might be possible tosimplify the system by allowing the Sezer co-phonology only to apply to place


names that in fact exhibit the SSR. Thus we could limit the place names to onlythree categories: a) those that are predicted via the regular stress patterns ofTurkish, b) those that follow the SSR, and c) those that are irregular but followneither the SSR nor the regular stress patterns of Turkish. The third categorywould inevitably require some type of stress prespecification, as they do in fact,in Inkelas and Orgun’s co-phonology treatment.

The question that arises at this point is whether the introduction of co-phono-logies offers any substantive advantages to the grammar of Turkish. As we haveseen, in the case of place names, the result is a fairly complex system which re-quires the segmentation of the Turkish lexicon into a number of components,with the consequence that certain generalizations about stress and word for-mation processes are missed. Furthermore, the introduction of co-phonologiesdoes not free the grammar of the need for overt prespecification of irregularitiesin the stress system, as even in a model with a Sezer co-phonology, the major-ity of place names with irregular stress will need some direct specification ofwhere this stress falls. Given that such a direct method of prespecification is re-quired in any case, the question that arises is whether the additional machineryinvolved in co-phonologies is necessary. That is, it appears to add “cost” to thesystem without reducing the need for the mechanism of lexical prespecificationof idiosyncratic properties.

4. Motivating prespecification

It seems uncontroversial that the inclusion of item-specific information in a lex-ical representation is an indication of some form of irregularity. As we haveshown, in Turkish such information is required in both disharmonic roots andatypical stress patterns. Since the introduction of additional constructs such aslexical levels or co-phonologies does not eliminate the necessity of lexical spec-ification, these constructs can be seen as merely adding “cost” to the system. Wepropose, instead, that lexical specification be used as the sole means of repre-senting phonological exceptions.

As indicated above, this is essentially the approach taken within OT, whereexceptions are handled via the relatively high ranking of faithfulness constraintsthat protect underlying structure (e.g., Inkelas 1999). Indeed, given two com-peting input forms, one fully specified (i.e. prespecified as containing a partic-ular property), and one partially specified (i.e. prespecified as belonging to aspecial co-phonology), Lexicon Optimization ensures that the fully specifiedalternative will be preferred. Thus, this use of prespecification in fact makesco-phonologies superfluous within OT.


In the following sections, we show in detail how a prespecification modelaccounts for the facts of disharmonic roots and exceptional stress. In fact, wepropose that this type of model is potentially extendable in a simple and straight-forward way to other types of phonological exceptions as well.

4.1. Prespecification in disharmonic roots

In a model that uses prespecification as the sole mechanism for treating dishar-monic roots in Turkish, there are at least three options for analyzing the vowelsystem. In the context of the overall system of Turkish phonology, we will ex-clude two of these options. We will then present our proposal, Option 3, inwhich we argue that all that is required is that atypical roots be lexically speci-fied as being excluded from the progressive Vowel Harmony rules via the mark-ing of the precise exceptionality – or disharmony. It is, furthermore, the modeladvanced as Option 3 that will be used below in accounting for the exceptionalstress patterns.

4.1.1.Prespecification in disharmonic roots: two potential problems

As indicated above, while we propose to treat phonological exceptions via themechanism of prespecification, the choices of what to prespecify, and in whatway, are not trivial. In this regard, we first consider two options that turn out tobe untenable, despite what appear to be reasonable assumptions.

The first case, Option 1, accounts for disharmonic roots by underlyinglyspecifying those features that do not undergo Vowel Harmony, making use ofopaque segments and their interaction with the general autosegmental associa-tion conventions (e.g., Clements 1981). The standard view of opaque segmentsentails that such segments not only fail to undergo VH themselves, but blockthe spreading of a given feature(s) and at the same time initiate the spread-ing of a different feature(s). According to Clements and Sezer (1982), all rootvowels, whether harmonic or not, are opaque (i.e. fully specified). Similarly, inInkelas (1995) (a restricted version of) Lexicon Optimization determines thatpredictable feature values are underspecified only when they enter into surfacealternations. Since root vowels never alternate, this approach entails that allroot vowels, regardless of whether they are harmonic on the surface, must bespecified for the relevant features (e.g., backness, rounding). Thus, only suffixvowels may be the targets of VH since these are the only ones that exhibit pre-dictable alternations for backness and rounding in Turkish. As shown in (14),the root vowels are all specified since they themselves do not alternate; only thelast feature specifications spread to subsequent morphemes (cf. Clements andSezer 1982).


(14) Option 1: Lexical specification of all root vowel features

Although root vowels do not alternate, there is experimental research that in-dicates that such vowels may nevertheless be underspecified (cf. Harrison andKaun 2000, 2001). In an initial study involving a language game, Turkish speak-ers were taught a reduplication rule which involved replacing the initial vowelof a set of real (Turkish) words, both harmonic and disharmonic, with [a] or [u].Harrison and Kaun (2001) report that while the subjects tended to re-harmonizethe harmonic roots according to the pattern of backness harmony (e.g., kibrit‘match’ → [kabr1t] not *[kabrit]), they failed to show the same pattern with thedisharmonic roots (e.g., butik ‘boutique’ → [batik] not [bat1k]). More system-atic experimentation is needed, however, the fact that similar results were alsoobtained from other languages including Finnish and Tuvan, suggests that per-vasive surface-true patterns, such as the harmonic vowel sequences in Turkishroots, should in fact be analyzed as underspecified; only idiosyncratic patterns,such as vowel sequences in disharmonic roots, require full specification (cf.Harrison and Kaun 2001).

The second drawback of Option 1 is its lack of representational economy.While VH is successfully blocked by the full specification of non-harmonizingroot vowels (and suffixes) specifying the features of all root vowels misses acrucial generalization. That is, it fails to show that, in fact, most roots sharevowel features. This, in turn, leads to excessive feature specification, where theharmonizing root vowels could otherwise benefit from a ‘free ride’ (cf. van derHulst and van de Weijer 1991).

Since Option 1 is not feasible, let us now consider Option 2, where lexicalpre-specification of vowel features is allowed only in disharmonic roots (15a).


Harmonic roots are underspecified except for the initial vowel which bears thefeatures for frontness-backness and rounding that spread throughout the rest ofthe item (15b).

(15) Option 2: Lexical specification of vowel features only in disharmonicrootsa. +R -R +R

k E r t I z E n -l I [kortizon-lu] ‘with cortisone’

+B -B +B

b. -R

p I r l E n t E -l I [pirlanta-li] ‘with brilliants’

+B

While this approach avoids the drawbacks associated with specifying all theroot vowel features seen in relation to Option 2, it nevertheless gives rise toseveral other problems. These center on the feature (i.e. [-Back], or [Coronal] inother frameworks)4 specification of front vowels in disharmonic roots. First, itshould be noted that the prespecification of [Coronal] (i.e., [-Back] in Clementsand Sezer 1982) in disharmonic roots is inconsistent with a redundancy-free lex-icon, where predictable and redundant features are excluded from underlyingrepresentations. The Articulator feature [Dorsal] (corresponding to [+Back] inClements and Sezer 1982) and the Tongue Height features (e.g. [Low]) are suf-ficient to distinguish all the vowel phonemes of Turkish. Thus, [Coronal] is es-sentially redundant and should be excluded from the underlying representation.

Second, underlying specification of [Coronal] conflicts with the cross-lin-guistically unmarked status of this feature. That is, if the vowels are fully spec-ified, this would require that the unmarked [Coronal] feature also be specified,

4. We opt to use [Coronal], corresponding to [-Back] in Clements and Sezer (1982), tocharacterize front vowels. This choice is based on the original feature organizationproposed by Jakobson et al (1952), where vowels and consonants share the sameplace features such as [Labial], [Coronal], and [Dorsal] (See also Lahiri and Ev-ers (1991) for a similar view). The theoretical consequences of the choice between[-Back] and [Coronal] are, however, tangential to the purposes of this paper.


contrary to an approach in which only marked feature values are specified (fordiscussion of the status of [Coronal], see among others Lahiri 2000 for a sum-mary of the issues; Paradis and Prunet 1991 for several papers on coronals; andKabak 2007 for further arguments pertaining specifically to the phonologicalinertness of [Coronal] in Turkish).

Furthermore, the prespecification of front vowels for [Coronal] in dishar-monic roots makes the prediction that these front vowels behave differentlyfrom the same vowels in harmonic roots. In fact, there is at least one phonolog-ical process, Vowel Assimilation (VA), where such a distinction, in fact, makesan incorrect prediction. VA optionally eliminates vowel sequences ([V1.V2])by the assimilation of V2 to V1, yielding a long vowel (cf. Sezer 1986, Kabak2007). Typically, V2 must be high and both V1 and V2 must share backness androunding features in order for VA to apply (e.g., [yourt] → [yoort] ‘yoghurt’;[a1r] → [aar] ‘heavy’; [göüs] → [göös] ‘breast’). Although the sequence [e.i]satisfies these conditions, VA fails to apply in this case (e.g., [beit] → *[beet]‘couplet’, [deil] → *[deel ‘not’). This difference is not limited to roots, but isalso observed between a root and a suffix (e.g. [aya-1] (foot-Poss.3S) → [ayaa],but [bebe-i] (baby-Poss.3S) → *[bebee]).

If we assume that VA applies only when the vowels in question share thesame Place feature (cf. Kabak 20075 ), and we assume as in Option 2 (and Op-tion 1) that [Coronal] is underlyingly specified in disharmonic roots, the predic-tion is made that VA should apply in sequences of [e.i] as in other sequences indisharmonic roots. Consider the representation of the disharmonic root kafein‘caffeine’ in (16).

(16) The representation of kafein ‘caffeine’

[Low] [High]

kV f V V n [kafein] ‘caffeine’

[Dor.] [Cor.]

5. Kabak (2007) argues that the two vowels in a sequence must share the same specifiedPlace feature in order to undergo VA. The seemingly exceptional nature of the se-quence [e.i] is then straightforwardly accounted for in terms of the assumption thatvowels are not specified as [Coronal] in Turkish. Since [Coronal] is absent in un-derlying representations, and neither [e] nor [i] is [Labial], there is indeed no Placefeature that is shared by the members of the [e.i] sequence, and hence no motivationfor VA.


Given that the final two vowels share the feature [Coronal], it is predicted thatVA will apply, yielding *[kafeen]. In fact, this is not the correct result, sinceVA does not apply here, or in other disharmonic roots with the sequence [e.i](e.g., ateist ‘atheist’ → *[ateest]). Thus, while the prespecification of all vowelfeatures in disharmonic roots does provide a mechanism for blocking VH, itresults in an incorrect analysis of of another phenomenon, VA.

4.1.2.Prespecification in Disharmonic Roots: maximum underspecification

The third alternative, Option 3, not only provides coverage for the full rangeof facts of Turkish phonology, it also avoids the problems seen in the previ-ous analysis. In this option, disharmonic root vowels are represented with theminimal number of features necessary to capture their lexical status.6 The onlylexical marking required for a disharmonic root is whether or not it obeys thegeneral principles of VH. As mentioned above, we are assuming that [Coronal]is not specified underlyingly since it is a default feature, and will be filled inat a subsequent stage. As shown in (17), roots that disobey VH are, however,marked as carrying a [Dorsal] (or [Labial] in the case of disharmonic rootsviolating labial harmony) specification. This feature, however, is prohibitedfrom spreading rightward, as indicated here by the association line truncatedwith “x”.

(17) Disharmonic Root (spreading truncated): maximum underspecification

[Dorsal] cannot spread. [Coronal] is insertedlater as default on last V.

It should be noted that when the [Dorsal] (or [Labial]) feature is associated withthe final vowel of a root, spreading is not blocked by a truncated associationline. This different behavior of the final root vowel is, in fact, precisely whatwe would expect since suffixes achieve their full feature specifications by VH,

6. This is not quite the same as Radical Underspecification (cf. Archangeli 1988;Archangeli and Pulleyblank 1994), however, the details of this difference are notrelevant here.


which spreads the requisite features rightward from the root, as seen in (18a).In this respect, while a root final [Dorsal] (or [Labial]) feature is free to spread,a [Coronal] feature in the case of a previously truncated feature is inserted asa default feature, as in (18b). In both cases, the spreading applies from rele-vant starting points as it does in regular harmonic roots, illustrated in (18c) and(18d).7

(18) [Dorsal]

a. V y V t - d V [e ya-da] ‘furniture-Loc’

[Low] [Low]

Root-final specified [Dorsal] is free to spread onto the suffix vowel.[Coronal] is inserted as default on the first V later at the phonetic level.

[Dorsal]

VH is blocked within the root. [Coronal] is inserted later as default onthe root final V as well as the suffix V at the phonetic level.

[Dorsal]

c. V r k V -d V [arka-da] ‘behind-Loc’

[Low] [Low]

VH applies: [Dorsal] feature spreads to all following Vs

7. It should be noted that suffix vowels unexpectedly surface as coronal, instead of dor-sal, in a certain set of words (e.g., alkol-den (alcohol-Abl); saat-im (watch-Poss.1S),etc.). In such cases, too, the feature [Dorsal] in the final syllable of the root is pro-hibited from spreading onto the suffix vowels via truncation (see Kabak 2007 fordetails).


d. k V s V r -d V [keser-de] ‘adze-Loc’

[Low] [Low]

VH applies: [Coronal] is inserted on all underspecified Vs via redun-dancy rule later.

It should be noted that marking disharmonic roots as exceptions to VH does notexclude them from the application of the so-called epenthesis-driven VowelHarmony. In this type of VH, vowels epenthesized to break up consonant clus-ters receive their specification from neighboring vowels (and consonants), how-ever, the pattern is different from that observed in the more usual case of (pro-gressive) Vowel Harmony. That is, unlike progressive VH, epenthesis-drivenVH operates from right to left, and is sensitive to the types of consonants withinthe clusters. For example, in /k/-clusters, the epenthesized vowel is always back(e.g. [k1rem] vs. *[kirem] ‘cream’). In addition, it is observed that even thoughspreading generally proceeds from right to left, low round vowels tend not totrigger rounding on a preceding vowel (e.g., [k1rom] vs. *[kurom]‘chrome’).

As mentioned, we are assuming that [Coronal] is not specified in underlyingrepresentations, but rather arises as a default feature. At first glance, it mightseem that [Coronal] must be specified to account for a phenomenon of palatal-ization, whereby the velars and /l/ are fronted when followed by a front vowelwithin the same syllable (e.g. [sa.k-ız] ‘bubble-gum’, [kat] ‘floor’ vs. [se.kjiz]‘eight’, [kjir] ‘dirt’; [par.lak] ‘bright’, [mal] ‘property’ vs. [ka.lje] ‘castle’, [kelj]‘bald’). We assume such fronting is a matter of surface phonetic form and thusdoes not depend on the presence of [Coronal] in the underlying representation.Even if this approach is not taken, it is not necessary to provide the feature[Coronal] in the underlying representation. That is, palatalization can be an-alyzed as the delinking of [Dorsal] from velar consonants in the context of afront vowel. This by no means requires an extra mechanism: feature delinkingis one of the well-establihed notions within autosegmental phonology. Afterdelinking, the tongue height specification (i.e. [High]) of the velar consonantsin question (both the stops and the lateral) suffices to yield the requisite palatal-ized variants on the surface. It should be noted that certain loan words in Turkishhave palatalized segments where this cannot be due to the vocalic context (e.g.[be.kjar] ‘single’, [se.ljam] ‘greeting’). In these cases, the palatalized soundsare in contrast with velars, as can be observed in several minimal pairs (e.g.[kar] ‘snow’ vs. [kjar] ‘profit’; [sol] ‘left’ vs. [solj] ‘the musical note G’). Inthese special cases, the palatal consonants must be afforded special status (cf.


Kabak 2007, where the underlyingly velar consonants are specified for [Dorsal]and [High] while those that are underlying palatal are only specified for [High],the Articulator node being underspecified), like the case the Yiddish clustersin English, and as such, do not require a revision of the core phonology of thelanguage.

In sum, the prespecification model proposed here has several advantagesover the other two presecification options. Specifically, it ensures that under-lying representations are redundancy-free, giving vowel features a “free ride”to the largest extent possible via VH. Furthermore, since the same features thatare assumed to be underspecified in the general system of Turkish are also notspecified in disharmonic roots, generality is achieved in addition to economy.Thus, the proposed model succeeds in respecting and expressing the overallphonological structure of Turkish in a simple and insightful manner.

4.2. Prespecification of exceptional root stress

As was seen above, lexical prespecification of irregularly stressed syllables isrequired in the different approaches considered. That is, both the co-phonologyand OT approaches invoke underlying specification of idiosyncratically stressedsyllables in addition to the other mechanisms they include. The proposal ad-vanced here requires only the mechanism of prespecification, and thus sharesthis property with previous proposals. At the same time, it also avoids the com-plexities associated with the other proposals by not requiring a combination ofother mechanisms as well. Furthermore, the approach advanced here providesa unified treatment for irregularly stressed roots in Turkish, instead of singlingout specific categories such as place names.

As indicated above, in both place names and other types of roots (e.g., wordsof foreign origin) with irregular stress, non-final stress sometimes falls on thesyllable identified by the quantity sensitive Sezer Stress Rule, as in (19), butsometimes it falls on other syllables, as in (20).

(19) Irregularly stressed roots that follow the SSR

a. Place names:Ánkara, Kanáda ‘Canada’, Edírne, etc.

b. Other roots of foreign origin (proper names, loan words, etc.):Doróti ‘Dorothy’, Katarína, Tosíba ‘Toshiba’, sandálye ‘chair’,kafetérya ‘cafeteria’, fakülte ‘faculty’, sampánya ‘champagne’,gazéte ‘newspaper’, etc.


(20) Irregularly stressed roots that do not follow the SSR

a. Place names:Üsküdar, Belçíka ‘Belgium’, Afríka ‘Africa’, Avrúpa ‘Europe’,Bermúda, etc.

b. Other roots of foreign origin (proper names, loan words, etc.):Gorbáçov ‘Gorbachov’, Mandéla, négatif ‘negative’, pózitif’‘positive’, fabríka 8 ‘factory’, etc.

If the place names are treated separately from other categories of words, thegeneralization that similar patterns are found across the lexicon is missed. Fur-thermore, the establishment of a representation that focuses on the fact that anumber of place names happen to follow a quantity sensitive stress pattern ob-scures the facts that a) not all irregularly stressed place names follow this pat-tern, and b) not all words that happen to follow the stress pattern in question areplace names.

The mechanism we propose for representing irregular root stress of anysort is the prespecification or marking of the relevant syllable as being stress-bearing. Possibilities for such a representation include the use of a grid mark,special foot structure, or some type of diacritic stress feature. We opt for the useof grid structure since it permits the unification of the representation of stresswithin words as well as within larger phonological constituents. Examples areshown in (21), where ‘*’ above a syllable indicates that it bears exceptionalstress.

(21) Exceptional Stress∗

a. Ankara∗

b. Kanada ‘Canada’∗

c. Belçika ‘Belgium’∗

d. fabrika ‘factory’∗

e. negatif ‘negative’

8. It should be noted that the [b] in fabrika functions as the coda of the first syllablerather than the onset of the second syllable since complex onsets are impermissiblein Turkish.


While the focus here is on roots, it should be noted that the mechanism of pre-specification of the locus of irregular stress in terms of a grid mark automaticallyalso accounts for irregular stress in more complex word structures. For example,a word such asAvrupa-l1-las-arak ‘while/by becoming European’ contains bothan irregularly stressed root (Avrúpa ‘Europe’) and an irregularly stressed suffix(-(y)ÉrEk ‘while/by’). This is shown with the relevant grid markings in (22).

(22) ∗ ∗Avrupa-l1-las-arak ‘while/by becoming European’Europe-Der-Der-while/by

While this is a well-formed word from in terms of its morpho-syntactic struc-ture, it is not well-formed as a Phonological Word (PW), since PWs may onlycontain a single primary stress. In Kabak and Vogel (2005), it is argued thatsuch items, in fact, constitute a Clitic Group (CG), the constituent in the phono-logical hierarchy between the PW and the Phonological Phrase. In Turkish, thestress assignment rule for the CG assigns prominence to the leftmost lexicalstress, thus yielding a representation such as (23).9

(23) ∗∗ ∗

Avrupa-l1-las-arak ‘while/by becoming European’

Since there is currently no systematic work on secondary stress in Turkish, itis not clear whether the rightmost stress is lost in such structures, or whether itremains as a type of secondary stress. In either case, there is independent mo-tivation for the CG stress rule (cf. Kabak and Vogel 2001, 2005). Furthermore,the specification of idiosyncratic stress as part of the underlying representa-tion permits a simple and straightforward account of irregular stress in bothroots and morphologically complex items, and avoids the introduction of addi-tional representations or operations. Thus, stress is generally assigned via theregular stress rules (i.e., PW stress rule, CG stress rule), or is realized as spec-ified in the underlying representation in the case of exceptional stress. In bothcases, maximum use is made of the general principles of Turkish stress assign-ment.

9. This fact is handled by a principle of “Leftmost Stress Wins” in Inkelas and Orgun(2003), however, there are fundamental differences between their approach and ananalysis involving Clitic Group stress, as discussed in detail in Kabak and Vogel(2005).


5. Conclusion

In this paper, we have argued that prespecification is the only descriptively ad-equate and theoretically viable means of handling various kinds of phonolog-ical exceptions. On the basis of evidence from disharmonic and exceptionallystressed roots in Turkish, we have shown that previous approaches do not allowus to determine classes of exceptions or subregularities in a principled mannereither in terms of lexical strata or co-phonologies. Furthermore, lexical specifi-cation is inevitable in any model of exceptions.

The lexical (pre-)specification model proposed here maximizes representa-tional economy, by requiring a minimum of underlying feature specification.With respect to Vowel Harmony, we have shown that maximum underspecifi-cation of vowel features can be applied in the same way to both disharmonicand harmonic roots. We have employed feature truncation as a means of lexicalspecification, preventing the spreading of the lexically specified (exceptional)features (e.g., [Dorsal]) to unspecified segments. This permits any redundantfeatures to remain unspecified in the underlying representation. As such, a sin-gle mechanism, lexical specification, is used to capture two separate objectives:(i) to mark the exceptional property in question, that is disharmony, as well as(ii) to ensure that the a given redundant feature has the same property throughoutthe lexicon. While the truncation convention may seem add an extra mechanismto grammar, it should be viewed as a variant of autosegmental tools that are al-ready in use for lexical specification. This type of lexical marking is cruciallyneeded to separate the marking of the exceptional pattern in question from therepresentation of phonological features. Within the phonological grammar ofTurkish, there is no reason to believe that /e/ in keman ‘violin’ (a disharmonicroot) has a different mental status than in kefil ‘guarantor’ (a harmonic loanfrom Arabic), or in kemik ‘bone’ (a harmonic native root), or even in kel ‘bald’for that matter. Thus, any model that aims to capture this generalization must in-evitably resort to different conventions to mark disharmony while maintainingthe phonological integrity of the vowels within the root in question. In our case,we accomplish both objectives by using a single principle, lexical specification,within a single phonological grammar.

Likewise, in the case of exceptionally stressed roots, we have shown thatthe lexical specification of atypical stress is required in addition to other mech-anisms proposed in previous models, such as co-phonologies. In the presentproposal, the only mechanism required is prespecification in the form of a gridmark associated with the exceptionally stressed syllable. Not only does this ap-proach avoid the complexities associated with previous models that include pre-specification alongside other mechanisms, it provides a unified treatment for all


types of irregularly items in Turkish, rather than singling out specific categoriessuch as place names.

In conclusion, we would like to point out that by adopting a model of lexicalspecification, we are by no means suggesting that exceptions constitute uninter-esting phenomena. In fact, such a proposal suggests a fundamental distinction inthe grammar between the “core” phonology and any other phenomena or sub-systems. It furthermore permits a principled means of assessing the extent towhich an item is exceptional in terms of the number of prespecified features itrequires. Thus, an item might be exceptional with regard to either VH or stress,or it may be exceptional with regard to both. The isolation of exceptional itemsin separate parts of the phonology makes it clear that they do not adhere to thegeneral phonological principles of the language, however, it does not provideinformation as to how and to what extent those items constitute exceptions.

The issues raised here ultimately need to be examined on the basis of exper-imental research aimed at investigating the acquisition of the exceptional prop-erties in question as well as their productivity and extension to novel items.The diachotomy between arbitrary exceptions and those exceptional patternsthat display restricted productivity has been noted in other linguistic areas suchas morphology and syntax (e.g., Jónsson and Eythórsson, this volume), includ-ing those that exhibit extreme instances of exceptionality (e.g., Corbett, thisvolume). It is generally noted, for example, that semi-productive morpholog-ical patterns can be extended to already existing or new words based on lin-guistic similarity. For example, Jónsson and Eythórsson (this volume) attributethe diachronic stability and partial productivity of verbs with accusative sub-jects in the history of Icelandic to the fact that such verbs form coherent andhomogenous subclasses due to the synactic and semantic similarities that ex-ist between them. It seems that such similarities have been transparent to thelearner, leading to the maintanence of this subclass of verbs for generations.How exceptional stress and harmony patterns arise and why they are main-tained by learners remains to be explored in Turkish. In the absence of positivepsycholinguistic evidence for partitioning the phonology of Turkish into someindeterminate number of components, the use of lexical specification of excep-tional features remains the simplest and most direct means of accounting forall types of exceptional phonological behavior. It should be noted that recentpsycholinguistically oriented approaches towards patterned exceptions also as-sert that information about exceptionality must be listed in lexical entries. Forexample, Zuraw (2000) marks exceptions to nasal substitution in Tagalog inthe lexicon, but allows the exceptional pattern in question to perpetuate intonew words through stochastically (low)-ranked markedness constraints withina single grammar. This approach is in line with our proposal that lexical specifi-


cation is at the foundation of the account of phonological exceptionality acrosslanguages.

AcknowledgementsThe research was supported in part by SFB 471 “Variation and Evolution inthe Lexicon” at the University of Konstanz, funded by the German ResearchFoundation (Deutsche Forschungsgemeinschaft).

AbbreviationsAcc AccusativeDat DativeDer Derivational morphemeLoc LocativeNom NominativePl Plural

References

Anttila, Arto2002 Morphologically conditioned phonological alternations.Natural Lan-

guage and Linguistic Theory 20: 1–42.

Archangeli, Diana1988 Aspects of underspecification theory. Phonology 5: 183–207.

Archangeli, Diana, and D. Pulleyblank1994 Grounded Phonology. Cambridge, MA: MIT Press.

Barker, Christopher1989 Extrametricality, the cycle, and Turkish word stress. In Phonology at

Santa Cruz, J. Ito, and J. Runner (eds.), 1: 1–33. University of Cali-fornia, CA: Syntax Research Center.

Clements, George N.1981 Akan vowel harmony: a nonlinear analysis. In Harvard Studies in

Phonology. Vol. 2, George N. Clements (ed.), 108–177. Blooming-ton: Indiana University Linguistics Club.

Clements, George N., and Engin Sezer1982 Vowel and consonant Disharmony in Turkish. In The Structure of

Phonological Representations, Part II, Harry van der Hulst, and Nor-val Smith (eds.), 213–255. Dordrecht: Foris.


Çakır, Cem2000 On non-final stress in Turkish simplex words. In Studies on Turkish

and Turkic Languages. Aslı Göksel, and Celia Kerslake (eds.), 3–10.Wiesbaden: Harrassowitz.

Demircan, Ömer1976 Türkiye yer adlarında vurgu. Türk Dili 300: 402–411.

Demircan, Ömer1996 Türkçenin Sesdizimi. Istanbul: Der Yayınevi.

Goldsmith, John A.1990 Autosegmental and Metrical Phonology. Oxford: Blackwell.

Harrison, Davin, and Abigail Kaun.2000 Pattern-responsive lexicon optimization. Proceedings of NELS 30.

Harrison, Davin, and Abigail Kaun.2001 Patterns, pervasive patterns, and feature specification. In Distinctive

Feature Theory, Tracy A. Hall (ed.), 211–236. Berlin: Mouton deGruyter.

Hulst, Harry van der, and Jeroen van de Weijer1991 Topics in Turkish phonology. In Turkish Linguistics Today, H. E.

Boeschoten, and L. T. Verhoeven (eds.), 11–59. Leiden: Brill.

Inkelas, Sharon1995 The consequences of optimization for underspecification. In Proceed-

ings of the Northeastern Linguistics Society 25, Jill Beckman (ed.),287–302, Amherst: GLSA.

Inkelas, Sharon1996 The interaction of phrase and word rules in Turkish: An apparent para-

dox in the prosodic hierarchy. Linguistic Review 13: 193–217.

Inkelas, Sharon1998 The theoretical status of morphologically conditioned phonology: A

case study from dominance. Yearbook of Morphology 1997, 121–55.

Inkelas, Sharon1999 Exceptional stress-attracting suffixes in Turkish: representations vs.

the grammar. In The Prosody-Morphology Interface, René Kager,Harry van der Hulst, and Wim Zonneveld (eds.), 134–87. Cambridge:Cambridge University Press.

Inkelas, Sharon, and Cemil Orhan Orgun1999 Level (non)ordering in recursive morphology: evidence from Turk-

ish. In Morphology and its relation to phonology and syntax, StevenLapointe, Diane Brentari, and Patrick Farrell (eds.), 360–392. Stan-ford: CSLI.


Inkelas, Sharon, and Cemil Orhan Orgun2003 Turkish stress: A review. Phonology 20 (1): 139–161.

Inkelas, Sharon, Cemil Orhan Orgun, and Cheryl Zoll1996 Exceptions and static phonological patterns: cophonologies vs. pre-

specification. ROA-124-0496.

Inkelas, Sharon, Cemil Orhan Orgun, and Cheryl Zoll1996 Implications of lexical exceptions for the nature of grammar. In Deri-

vations and Constraints in Phonology, Iggy Roca (ed.), 393–418, Ox-ford: Clarendon Press.

Inkelas, Sharon, Cemil Orhan Orgun, and Cheryl Zoll2004 Implications of lexical exceptions for the nature of grammar. In Opti-

mality Theory in Phonology: A Reader, John J. McCarthy (ed.), 542–551. Malden: Blackwell.

Ito, Junko, and Armin Mester1993 Japanese phonology: Constraint domains and structure preservation.

University of California, Santa Cruz: Linguistics Research CenterPublication (LRC-93-06).

Ito, Junko, and Armin Mester1995 Japanese phonology. In The Handbook of Phonological Theory, John

Goldsmith (ed.), 817–838. Cambridge: Blackwell.

Jakobson, Roman, Gunnar Fant, and Morris Halle1952 Preliminaries to Speech Analysis: The Distinctive Features and their

Correlates. Cambridge, MA: MIT Press.

Kabak, Barıs2007 Hiatus resolution in Turkish: an underspecification account. Lingua.

117: 1378-1411.

Kabak, Barıs, and Irene Vogel2001 Phonological word and stress assignment in Turkish. Phonology 18:

315–360.

Kabak, Barıs, and Irene Vogel2005 Irregular stress in Turkish. Unpublished ms. University of Konstanz /

University of Delaware.

Kaisse, Ellen1985 Some theoretical consequences of stress rules in Turkish. In Papers

from the General Session of the 21st Regional Meeting, W. Eilfort,P. Kroeber, and K. Peterson (eds.), 199–209. Chicago, IL: ChicagoLinguistic Society.

Kardestuncer, Aino1982 A Three-Boundary System for Turkish. Linguistic Analysis 10 (2):

95–117.


Kaye, Jonathan, Jean Lowenstamm, and Jean-Roger Vergnaud1985 The internal structure of phonological elements: A theory of charm

and government. Phonology Yearbook 2: 305–328.

Lahiri, Aditi2000 Phonology: Structure, representation, and process. In Aspects of Lan-

guage Production, Linda Wheeldon (ed.), 165–225. Hove/Philadel-phia: Psychology Press.

Lahiri, Aditi, and Vincent Evers1991 Palatalization and coronality. In The Special Status of Coronals, C.

Paradis, and F. Prunet (eds.), 79–100. London: Academic Press.

Lees, Robert B.1961 The Phonology ofModern Standard Turkish. (Uralic and Altaic Series

6) Bloomington: Indiana University Publications.

Lees, Robert B.1966 On the interpretation of a Turkish vowel alternation. Anthropological

Linguistics 8: 32–39.

Lewis, Geoffrey L.1967 Turkish Grammar. Oxford: Oxford University Press.

Lightner, Theodor1972 Problems in the Theory of Phonology. Vol. 1: Russian Phonology and

Turkish Phonology. Edmonton, Champaign: Linguistic Research Inc.

McCarthy, John1995 Extensions of Faithfulness: Rotuman Revisited. ROA-110.

McCarthy, John, and Alan S. Prince1993 Prosodic Morphology I: Constraint Interaction and Satisfaction. Un-

published ms.

Mohanan, K. P.1986 The Theory of Lexical Phonology. Dordrecht: Reidel.

Paradis, Carole, and Jean-Francois Prunet (eds.)1991 The Special Status of Coronals. Vol. 2: Phonetics and Phonology.

New York: Academic Press.

Polgárdi, Kriztina1999 Vowel harmony and disharmony in Turkish. The Linguistic Review

16 (2): 187–204.

Prince, Alan S., and Paul Smolensky1993 Optimality Theory: Constraint Interaction in Generative Grammar.

Ms. Rutgers University and the University of Colorado: Boulder.

Sezer, Engin1981 On non-final stress in Turkish. Journal of Turkish Studies 5: 61–69.


Sezer, Engin1985 An autosegmental analysis of compensatory lengthening in Turkish.

In Studies in Compensatory Lengthening, W. L. Wetzels, and E. Sezer(eds.), 227–250. Dordrecht: Foris.

Smolensky, Paul1996 The initial state and “Richness of the Base” in Optimality Theory.

Technical Report, JHU-Cogsci-96-4. Baltimore: Department of Cog-nitive Science, John Hopkins University.

Underhill, Robert1976 Turkish Grammar. Cambridge, MA: The MIT Press.

Yavas, Mehmet S.1980 Borrowing and its implications for Turkish phonology. Ph. D. diss.,

University of Kansas.

Zuraw, Kie2000 Patterned exceptions in phonology. Ph. D. diss., University of Cali-

fornia Los Angeles.

Lexical exceptions as prespecification: some criticalremarks

T.A. Hall

1. Introduction: Lexical exceptions as prespecification

In their article on exceptions to Vowel Harmony and Stress Assignment in Turk-ish Kabak and Vogel (henceforth K&V) discuss and reject two competing the-ories of lexical prespecification and then propose an alternative model whichthey ultimately adopt. Let us review briefly one of the models they reject andcompare it with the one they endorse.

According to the former, the exceptionality of certain segments to VowelHarmony (VH) in disharmonic roots is captured by lexically prespecifying theseexceptional vowels with the features which propagate in VH. The structures in(1) are my interpretations of representations at the stage before VH; cf. (15)in K&V for the equivalent structures at the point when VH applies. On the ap-proach in (1), VH spreads [+B(ack)] (via palatal harmony) and [−R(ound)] (vialabial harmony) in (1a) for the word [p1rlanta-l1] ‘with brilliants’. The spread-ing of [+Back] and [−Round] affects all of the vowels to the right of the first /I/in this example because they are not specified for the features that spread. Thedisharmonic root [kortizon-lu] ‘with cortisone’ is presented in (1b) at the stagebefore VH.

(1) Lexical Prespecification with a Distinctive Feature (LPDF):

–R +R –R +R

a. p I r l E n t E - l I k E r t I z E n - l I

+B +B –B +B

b.

The exceptionality of the second vowel in (1b) – referred to henceforth asthe opaque vowel – is captured by prespecifying it for the two features whichspread in VH. In a rule-based approach the structure-building rule of VH cannotspread [+Round] and [+Back] to the opaque vowel because it is underlyingly

96 T.A. Hall

[−Round] and [−Back]. In an OT-style treatment there are high ranking faith-fulness constraints (e.g. Max- [−Round], Max-[−Back]) which protect the un-derlying features and therefore ensure that they surface as such. I refer to theapproach in (1b) henceforth as Lexical Prespecification with a Distinctive Fea-ture (LPDF).

As an alternative to the LPDF treatment, K&V endorse an analysis in whichopaque vowels are represented with the minimal number of features necessaryto capture their lexical status. For the examples in (1) I interpret the input toVH in K&V’s approach to be the representations in (2). On this approach VHspreads the privative feature [D(orsal)] to the right in (2a).1 The exceptionalexample is given in (2b). Here we see that a crucial difference between thisstructure and the one in (1b) is that the opaque vowel in (2b) is underspecifiedfor all features that spread in VH.

(2) Maximum Underspecification (MU):

[R] [R]

a. p I r l E n t E - l I b. k E r t I z E n - l I

[D] [D][D]

The approach to lexical exceptionality in (2b) can be referred to as the Maxi-mum Underspecification (MU) model.

Note that the approach to VH and its exceptions in the LPDF model in (1)makes crucial use of binary features, whereas the MU treatment in (2) employsprivative features. A consequence of the privative approach is that the excep-tionality of opaque vowels cannot in principle be captured by specifying themfor the opposite values of the features which spread in VH because these valuessimply do not exist. What this suggests is that the approach to prespecificationin (1b) seems to crucially depend on binary features, whereas the treatment in(2b) cannot make use of this type of prespecification on a priori grounds. Myconclusion is that a truly thorough comparison of LPDF and MU clearly ex-ceeds the goals of the present commentary because it would necessarily involvea discussion of the merits and drawbacks of privative vs. binary features.

In the remainder of this commentary I focus instead on two more specificquestions: First, how does the MU model account for the blockage of VH in (2b)from spreading [Dorsal] and [Round] from /E/ onto the opaque /I/ (section 2)?

1. Since the model in (2a) has privative features, labial harmony cannot involve thespreading of [−Round], as in (1a). This point is not important in the following dis-cussion.

Lexical exceptions as prespecification: some critical remarks 97

And second, how do K&V’s arguments against the LPDF model fare when con-fronted with examples of lexical exceptions from other languages (section 3)?I conclude with some brief comments about a possible rule type which mightrequire some of the mechanics necessary in the MU treatment for exceptionality(section 4).

2. Truncation

An obvious question with respect to the MU treatment in (2b) is how it can blockVH from spreading [Dorsal] and [Round] from the first /E/ onto the opaque /I/.In the examples provided by K&V (see (17) and (18) in their article) the authorswrite that the VH features do not spread because spreading is ‘truncated’, butas I show below this point needs explication within the context of (2b).

According to Kabak (2007: note 28) ‘truncation’ is supposed to indicatethat “the feature in question is only linked to the segment that it is associatedwith in the underlying structure but is banned from being further realized onother segments.” But a moment’s reflection reveals that this proposal can onlywork if the word-initial vowel in (2b) is equipped with a diacritic feature whichsays that [Dorsal] and [Round] cannot spread. Although K&V do not use theterm ‘diacritic feature’, they seem to be aware of its necessity when they write(section 4.1) that “[t]he only lexical marking required for a disharmonic root iswhether or not it obeys the general principles of VH.”2

Truncation as described above is assumed to have precedence in the litera-ture on intonation (Kabak 2007: note 28). However, once we consider how thisterm is employed in this area of phonology we will see that those authors usethe word ‘truncation’ in a very different sense than K&V.

According to Ladd (1996: 132–136) ‘truncation’ is one of the strategies em-ployed in the literature on intonational phonology (in addition to ‘compression’)to adapt intonational contours to short utterances, such as monosyllabic words.Thus, if an intonational contour consists of two or more pitch accents and if theutterance consists of only one syllable languages can either associate all of thepitch accents with this syllable (compression) or delete one of them (truncation).Ladd mentions English as an example of a compression language and Hungar-ian as a truncation language. In the latter language question intonation involvesthe tonal sequence L*..H..L%, where the first (H) edge tone is preferentially

2. The diacritic feature necessary in (2b) is very different from an SPE-style diacriticwhich would be associated with the entire morpheme. The reason is that the diacriticfeature for (2b) must be attached only to the first root vowel because the features[Dorsal] and [Round] on the final root vowel must be allowed to spread to the suffix.

98 T.A. Hall

associated with the penultimate syllable (Ladd 1996: 132). In monosyllablesthis tonal sequence is reduced to a simple rise. Hence, in the underlying tonalsequence for the monosyllable sör ‘beer’ in (3) only the first two tones of thethree-tone question tune are realized.

(3) L*HL%

sör

In the phonetic representation the final tone (L%) in (3) is unrealized, i.e. it is‘truncated’.

Ladd (1996: 135) suggests that truncation could be formalized as a phono-logical rule which deletes (i.e. truncates) an unassociated tone. See, for exam-ple, Grice (1995: 171ff.), who does something similar in her analysis of PalermoItalian intonation contours. This being said, Ladd concedes that in its originalusage (Grønnum 1991, which I have not seen) compression and truncation areintended as phonetic and not phonological descriptions. Since K&V are assum-ing that truncation is phonological and not phonetic, let us consider in greaterdetail what the phonological analysis of a language like Hungarian would entail.

With respect to (3) truncation in intonational phonology refers to the deletionof an unassociated autosegment, but in in (2b) truncation refers to the failure ofan autosegment to spread. A proponent of the MU approach might argue thatthe two operations are related because examples like the one in (3) also need tocapture the fact that the final tone fails to associate with the syllable. Pursuingthis line of thought reveals that truncation in (3) really involves two steps: (a)the failure of an underlying autosegment to associate to a syllable, and (b) thesubsequent deletion of the same autosegment.

In any phonological analysis one needs to say what the motivation is for(a) and (b). In the Hungarian example in (3) the lack of spreading, i.e. the (a)clause, makes sense because the association of all three of these autosegments tothe same syllable would violate a constraint motivated in many other languageswhich bans three tones on one syllable. The actual deletion of the unassociatedautosegment, i.e. the (b) clause, occurs by rule, which is similar to the kinds ofrules one encounters in African tone languages.

The comparison between the Turkish example in (2b) and the Hungarianone in (3) is important because it reveals a crucial difference between the two:In Hungarian there is a reason for why clause (a) does not apply but in Turkishthere is no reason for why [Dorsal] and [Round] on the first /E/ in (2b) cannotspread by VH. Put differently, for Turkish truncation must be captured with adiacritic feature, but for Hungarian it need not (and should not) be. My con-clusion is that the blockage of VH from applying to the opaque vowel in (2b)


is due to the presense of the diacritic feature and not to some general princi-ple/convention of truncation.

One might point out that spreading and deleting tones in intonational phonol-ogy is not always as straightforward as in Hungarian; hence, a more apt com-parison would be between the Turkish example in (2b) and some language inwhich the spreading of tones is unpredictable. An apparent example of sucha language are the dialects of Catalan described by Prieto (2002). In these di-alects there are two deletion strategies when two pitch accents and an edge tonesequence adapt to short utterances: either the first or the second pitch accentdeletes. However, Prieto (2002) argues that the choice of which pitch accent todelete is a consequence of a more enriched phonological representation of tone,suggesting that Catalan does not require a diacritic feature saying which toneneeds to be deleted.

3. The LPDF model reconsidered

K&V present three arguments against the LPDF model in (1b). Let us considerthe third one, which I consider to be the most convincing.

According to this argument the LPDF approach is not a desirable theorybecause it cannot account for the phonological process of Vowel Assimilationin Turkish. The reason is that the rule requires that front vowels in harmonicroots be underspecified for [Coronal], but the representation in (1b) has thisfeature present in disharmonic roots. The prediction the LPDF model makes isthat front vowels in harmonic and disharmonic roots behave differently withrespect to Vowel Assimilation, but K&V show that this is not the case.

As a non-specialist for Turkish this language-specific argument againstLPDF seems sound, and I see it as being the kind of argument necessary to refutethe LPDF approach. However, I see both LPDF and MU as being very generaltheories for capturing lexical exceptions. This means that both models need tobe tested with rules other than the language-specific processes of Vowel Har-mony, Stress Assignment and Vowel Assimilation. Seen in this broader context,one needs to take care not to reject a particular model on the basis of a singleexample from one language.

Imagine some language with a structure-building rule like VH which hap-pens to have a handful of idiosyncratic exceptions. The difference between thishypothetical language and Turkish is that there is no equivalent rule of VowelAssimilation which can be used to argue against the LPDF representation in(1b). Assuming there were such a language the argument against representingthe lexical exceptions as in (1b) would vanish.

100 T.A. Hall

Consider the case of German. In that language there is a very regular processreferred to in the literature as s-Dissimilation, which converts a word-initial /s/to [S] before [−High] consonants like /p t m n l ö/, i.e. all consonants exceptfor ([+High]) velars. This is a neutralization rule which suspends the lexicalcontrast between /s/ and /S/ to [S], e.g. Specht [Speçt] ‘woodpecker’ and schmal[Sma:l] ‘narrow’ with [S] (from /s/) represent the normal case. Significantly, therule has a small number of lexical exceptions, e.g. the initial sibilant in Smaragd[s m a ö a k t] ‘emerald’ surfaces as [s] and not as [S].

In the LPDF approach one might analyze s-Dissimilation as a rule adding thefeature [+High] to a voiceless sibilant unspecified for that feature (see Wiese1991, Hall 1992, Alber 2001 for various treatments along these lines), while de-viant words like Smaragd would require the relevant segment to exceptionallybe prespecified for [−High]. On this approach s-Dissimilation applies to the /s/in (4a) (for schmal) and is blocked in (4b) (for Smaragd):

(4) a. / s m a: l/ b. / s m a ö a k d /

--

--

[+High] [−High]

Thus, s-Dissimilation fails to apply to the /s/ in Smaragd because the rule isstructure-building and is therefore blocked from applying to a segment that isalready marked for the feature that is added.

The s-Dissimilation example is important because in German there is noequivalent rule like Vowel Assimilation in Turkish which would cause one toreject the LPDF representation in (4b) for an exceptional item.3

3. It was noted at the beginning of this section that K&V present three arguments against(1b), but I have only shown that the third one does not hold for (4b). An inspectionof their article reveals that K&V’s first two arguments against (1b) do not apply to(4b) either: The first argument is that structures like the one in (4b) (or (1b)) are in-consistent with a redundancy-free lexicon, but the representation in (4b) is not moreredundant than the MU alternative, which would require instead of [−High] an /s/underspecified for that feature and a diacritic feature (presumably attached to the /s/)saying that s-Dissimilation does not apply to that segment. K&V’s second argumentagainst (1b) is that it requires a segment to be underlyingly specified for [Coronal],which they see as being a feature which should be underspecified (if possible). How-ever, this is a feature-specific argument, which clearly does not hold against (4b)because [−High] is not [Coronal].


Clearly the analysis of non-Turkish examples goes beyond the goals ofK&V’s analysis, but in future work one might want to investigate a wider spec-trum of rules with exceptions in order to evaluate LPDF and MU.

4. A problem for LPDF?

The LPDF approach to exceptionality works well in the case of structure-build-ing operations like Turkish VH or German s-Dissimilation; however, a potentialproblem for that model involves lexical exceptions to structure-changing rules.For example, if a rule deletes a segment [A] and if there are some words inwhich [A] is exceptionally not deleted, one cannot capture the deviant forms inthe LPDF model by prespecifying these words because [A] has no opposite.

An example of a structure-changing rule with exceptions is (intervocalic)Velar Deletion in Turkish. Inkelas and Orgun (1995: 767–768) and Inkelas,Orgun and Zoll (1997: 405) cite examples like [bebek] ‘baby’ vs. [bebe-i] ‘baby-accusative’, where the latter word illustrates the rule, but there are lexical ex-ceptions, e.g. [tahakkuk-u] ‘verification-accusative’, Kabak 2007: note 4). Ve-lar Deletion poses an apparent problem for the LPDF approach because theredoes not appear to be anything with which one could prespecify the exceptionalsounds. Inkelas, Orgun and Zoll (1997: 409, note 15) recognize the problem andsuggest that the exceptional velars are prespecified for syllable structure andthat Velar Deletion only affects velars which are not syllabified in the input.What this implies is that a true problem for LPDF would involve a deletion rulewhich is not sensitive to syllable structure, e.g. a rule which deletes a segment[A] only in word-initial or word-final position. If there were such a rule and ifit had exceptions then it is not clear how the LPDF model could prespecify theexceptional forms, but in the MU treatment one would simply posit a diacriticfeature attached to the exceptional [A]’s, ensuring that they are not deleted.

Whether or not examples like the one just described exist is a question I leaveopen, but they would provide crucial evidence for capturing exceptions with adiacritic feature.

References

Alber, Birgit2001 Regional variation and edges: glottal stop epenthesis and dissimi-

lation in standard and Southern varieties of German. Zeitschrift fürSprachwissenschaft 20: 3–41.

102 T.A. Hall

Grice, Martine1995 The Intonation of Interrogation in Palermo Italian: Implications for

Intonation Theory. Tübingen: Niemeyer.

Grønnum, Nina1991 Prosodic parameters in a variety of regional Danish standard langua-

ges with a view towards Swedish and German. Phonetica 47: 188–214.

Hall, T.A.1992 Syllable Structure and Syllable Related Processes in German. Tübin-

gen: Niemeyer.

Inkelas, Sharon, and Orhan Orgun1995 Level ordering and economy in the lexical phonology of Turkish. Lan-

guage 71: 763–793.

Inkelas, Sharon, Orhan Orgun, and Cheryl Zoll1997 The implications of lexical exceptions for the nature of grammar. In

Derivations and Constraints in Phonology, Iggy Roca (ed.), 393–418.Oxford: Oxford University Press.

Kabak, Barıs2007 Hiatus resolution in Turkish: an underspecification account. Lingua

117: 1378–1411.

Ladd, D. Robert1996 Intonational Phonology. Cambridge: Cambridge University Press.

Prieto I Vives, Pilar2002 Tune-text association patterns in Catalan: An argument for a hierar-

chical structure of tunes. Probus 14: 173–204.

Wiese, Richard1991 Was ist extrasilbisch im Deutschen und warum? Zeitschrift für

Sprachwissenschaft 10: 112–133.

Feature spreading, lexical specification andtruncation

Barıs Kabak and Irene Vogel

1. Introduction

Tracy Hall in his commentary presents a clear and succint summary of two ofthe ways of handling Vowel Harmony exceptions discussed in Kabak & Vogel(this volume). We will use Hall’s terms to refer to these two approaches, namely(i) Lexical Prespecification with a Distinctive Feature (LPDF) and (ii) LexicalSpecification with Maximum Underspecification (MU).

Hall also introduces independent considerations from German, in particu-lar the rule of s-Dissimilation, in order to examine the broader applicability ofthese approaches. He also briefly mentions another phonological phenomenonof Turkish in this regard: Velar Deletion. In addition, Hall addresses the trunca-tion mechanism introduced in Kabak & Vogel as a means for blocking VowelHarmony. While not all of these considerations lead to conclusive findings, theydo provide the grounds for both broader and deeper considerations of the issuesunder consideration.

2. Two approaches to exceptional features

LPDF and MU primarily differ with respect to the conditions under which theylexically mark a given distinctive feature. The LPDF model ensures the surfacerealization of an exceptional feature by lexically marking the feature in ques-tion regardless of its status. The MU model, by contrast, cannot resort to lexicalmarking in similar situations if the feature in question is inactive in the phono-logical system. In this regard, MU respects the phonological status of a givenfeature throughout the entire lexicon.

To enrich the discussion of these two models, Hall introduces some obser-vations regarding German s-Dissimilation, which neutralizes the distinction be-tween initial /s/ and /S/ to [S], a [+High] segment, before [−High] consonants(e.g., [S]muck ‘jewelry’; [S]lange ‘snake’). This is considered a structure build-


ing rule, and Hall shows that the LPDF model provides sufficient theoreticalcoverage to account for noted exceptions (e.g., [s]mart ‘a car brand’; [s]lip‘briefs/panties’). In particular, pre-specifying /s/ as [−High] blocks the appli-cation of the rule that adds [+High] to an initial /s/. Thus, Hall suggests weshould not exclude the LPDF model – and retain the MU model – since there isno emprical evidence against the LPDF in the German data. We agree with Hallthat s-Dissimilation does not demonstrate a need for the MU model, however,we would not like to suggest that it favors the LPDF model. Instead, it seemsmerely to be an instance of a phenomeonon that can be handled by either model,and thus does not provide a test ground to chose between the two.

Returning to Turkish, Hall considers an additional phenomenon involving astructure-changing rule, Velar Deletion (VD), whereby intervocalic velars areproductively deleted (e.g., /inek-I/ → [ine-i] ‘cow-Acc’; /gök-I/ → [gö-ü] ‘sky-Acc’), although there are several exceptions (e.g., /kök-I/ → [kök-ü] (*[kö-ü])‘root-Acc’). Hall indicates that this case poses a serious challenge for the LPDFmodel. The problem arises because this model involves lexical marking of fea-tures, but there is no way to pre-specify the exceptional velars by using dis-tinctive features. The MU approach, however, does not encounter this problemsince it crucially adopts mechanisms other than feature marking in order to ac-count for analogous types of exceptions in other instances. Hall claims that theMU model would need to use a diacritic feature to ensure that the velar is notdeleted in such cases. While it is beyond the scope of this response to discuss themechanisms needed to handle the additional phenomenon of VD, we would liketo suggest that use of diacritic features is not in itself unacceptable, as long as itcan be reasonably restricted cross-linguistically, and validated on independentgrounds.1

3. Truncation in Turkish Vowel Harmony

An additional point raised by Hall involves Kabak & Vogel’s use of a truncationrule to block VH in the exceptional cases in which its spread is restricted. Ac-cording to Hall, this truncation convention is a form of diacritic specification,which merely reflects the necessity to identify the exceptional behavior at hand.We agree that the sole purpose of marking exceptions is to show exceptional

1. One such mechanism is proposed in models where lexical representations comprisetwo dimensions: one to determine the anchoring of phonological elements (tones,accents, features) in underlying structure; another to dictate how and where suchelements are pronounced (e.g., Revithiadou 2007). Thus, VD could be blocked if thetwo dimensions are pre-specified to match.]

Feature spreading, lexical specification and truncation 105

behavior, however, we would like to emphasize again that this is not necessarilyproblematic, as long as the means for determining the diachritic features restson general principles.

In fact, the truncation analysis is consistent with our treatment of exceptionsto regular stress assingment, involving lexical marking of accent, a mechanismof metrical phonology incorporated in most of the approaches we examined.With regard to VH, we simply extended the fundamental concept, expressingexceptionality by means of a trucation mechanism, as opposed to the specifi-cation of accent position. We find this an advantageous means for capturingVH exceptionality since truncation has been introduced in other autsegmentalanalyses involving spreading and its interruption, in particular in intonation phe-nomena. As Hall points out, the application of truncation in inonational phonol-ogy may be different from our use in relation to VH. In the case of intonation,truncation often corresponds to deletion of an unassociated autosegment (tone),while in the case of VH, it corresponds to the failure of an autosegmental feature(e.g., a distinctive feature) to spread. Rather than being a problem, we believethis difference is, in fact, quite interesting.

Specifically, we find a crucial difference in the fact that the features of VH,as opposed to intonation, involve a dual function. First, they characterize theway vowels are realized on the surface in terms of their specific articulatoryproperties. Second, they determine which features can spread to other vowelswithin the same domain. The two functions, however, need not co-occur. Forexample, in the case of disharmony, pre-specified features are only realizedon the segments with which they are associated; they do not exhibit spreadingbehavior. The tones in intonation patterns, however, may at times not be realizedat all; a feature is thus lost, not just prevented from spreading.

Furthermore, there is an interesting difference regarding the representationof truncation in VH and in intonation phenomena. We assume for VH that trun-cation is manifested in the underlying representation of segments that do notparticipate in spreading. By contrast, in intonation phenomena, there is noth-ing in the representation of tones that causes their deletion. Instead, truncationsimply operates to eliminate a tone that fails to unassociate within a particulardomain.

Beyond the differences, however, we observe that in both cases truncationessentially results in the same generalization: an underlying autosegmental fea-ture fails to spread. Since this convention has been independently motivated as acomponent of autosegmental phonology, we find it interesting that it lends itselfeasily to resolving challenges in different realms of autosegmental spreading.


4. Conclusions

In sum, we find that Tracy Hall has clearly summarized two possible modelsfor handling Vowel Harmony exceptions discussed in Kabak & Vogel (this vol-ume): (i) Lexical Prespecification with a Distinctive Feature and (ii) LexicalSpecification with Maximum Underspecification. Two additional phenomenawere also considered in this regard: German s-Dissimilation and Turkish Ve-lar Deletion. While the former appears to be inconclusive with regard to thetwo models, the latter seems to lend support to the second one. Hall also ad-dresses the truncation mechanism proposed for treating exceptions to VowelHarmony, and its somewhat different use in intonation phenomena. We suggestthat this difference, rather than being problematic reveals interesting differencesbetween spreading in the two domains – harmony and intonation.

Reference

Revithiadou, Anthi2007 Colored turbid accents and containment: a case study from lexical

stress. In Freedom of Analysis? Sylvia Blaho, Patrik Bye and MartinKrämer (eds.). 149–174. Berlin/New York Mouton de Gruyter.

Higher order exceptionality in inflectionalmorphology

Greville G. Corbett

Abstract. We start from the notion of ‘canonical’ inflection, and we adopt an inferential-realizational approach. We assume that we have already established the features andtheir values for a given system (while acknowledging that this may be a substantial an-alytic task). In a canonical system, feature values “should” multiply out so that all pos-sible cells exist. Paradigms “should” be consistent, both internally (within the lexeme)and externally (across lexemes). Such a scheme would make perfect sense in functionalterms: it provides maximal differentiation for minimal phonological material. However,real systems show great divergences from this idealization. A typology of divergencesfrom the canonical scheme situates the types of morphological exceptionality, including:periphrasis, anti-periphrasis, defectiveness, overdifferentiation, suppletion, syncretism,heteroclisis and deponency.

These types of exceptionality provide the basis for an investigation of higher or-der exceptionality, which results from interactions of these phenomena, where the ex-ceptional phenomena target the same cells of the paradigm. While some examples arevanishingly rare, they are of great importance for establishing what is a possible word inhuman language, since they push the limits considerably beyond normal exceptionality.*

1. Introduction

We propose a part of a typology of inflectional morphology, and within it weconcentrate on extreme instances of exceptionality.

* A version of this paper was presented at the Arbeitsgruppe “Auf alles gefasstsein: Ausnahmen in der Grammatik” at the 27th Annual Meeting of the DeutscheGesellschaft für Sprachwissenschaft, Cologne, 23–25 February 2005. I wish to thankthose present and the two anonymous referees for their suggestions. The support ofthe ESRC under grants RES-000-23-0375 and RES-051-27-0122 and of the ERC(grant ERC-2008-AdG-230268 MORPHOLOGY) is gratefully acknowledged.

108 Greville G. Corbett

1.1. Canonicity in typology

If we are to tackle some of the most difficult areas of language from a typo-logical perspective, we shall need new methods. The one suggested here is the‘canonical’ approach (Corbett 2005). The basic idea is that we define carefullya theoretical space, and only then situate the real language phenomena within it.The canonical point, specified by converging definitions, is where we find thebest, clearest, most indisputable examples (for applications of the approach seeSeifart 2005: 156–74; Suthar 2006: 178–98; Corbett 2006, 2007a). However,canonical examples may be rare or even non-existent, hence it is vital to main-tain a distinction between what is canonical, and what is usual or frequent. Whatis canonical gives us the measure against which real examples can be situated,and from which different degrees of irregularity can be calibrated. It also givesus a way of analyzing and celebrating the diversity of inflectional morphologyby confronting it with an elegant order.

1.2. Canonical inflection

Linguists are interested in what is a possible human language. A part of thataccount is coming to understand what is a possible word. In this paper we nar-row that question down to looking at possible word from the point of view ofinflection. We set up a framework of canonical inflection, within which we cansituate different morphological phenomena. The system of terms for inflectionalmorphology is still inconsistent in places, despite interesting work by Mel’cuk(1993) and others. Greater consistency in terminology gives us a surer way toidentify exceptions. All the predicted individual deviations from canonicity arefound, and we shall illustrate only some of these types of possible word (forillustration of some other types see Corbett 2007b). This is because we are con-cerned in this paper with even less canonical items.

1.3. Higher order exceptionality

Our specific focus is on ‘higher order’ exceptionality. By this we mean the in-teraction of exceptional phenomena. These examples are of interest becausethey show us extreme cases of possible word. Here too we must look at a sub-set of the possible interactions. Examples are very scarce, partly because theyare genuinely rare, but also because they have been little discussed, and so lin-guists have not been on the lookout for them. It is hoped that this discussionwill lead specialists working on various languages to be aware of them, so thatthe general inventory of these examples is increased.

Higher order exceptionality in inflectional morphology 109

2. Assumptions

We start from the point where the features and their values are established forthe language in question; in other words, analysis of the ‘syntactic’ part of mor-phosyntax is well advanced. This is not to minimize the problems; this taskcan involve complex analytical decisions (see Zaliznjak 1973 [2002]; Comrie1986; Corbett 1991: 145–188 for examples). Our general stance will be that ofinferential-realizational morphology, as defined and discussed in Stump (2001:1–30). The specific variant in mind is Network Morphology (for which see Cor-bett and Fraser 1993; Evans, Brown and Corbett 2002, and references there). Itis important for the reader to be aware of this orientation, but the main points ofthis general typology could be restated in other frameworks. We assume furtherthat geometry is not relevant to inflectional morphology, but that neverthelesspresenting paradigms in tabular form is a helpful method of representation. Thefinal assumption is that when discussing particular phenomena we always imply“all other things being equal”. For instance, when discussing whether inflectionsare the same or different in particular cells of the paradigm we assume, unlessspecifically mentioned, that the stem remains the same.

3. Canonical inflection

We will now outline the notion of canonical inflection, which will serve as thebasic for approaching various interesting deviations from canonicity in §5. Asnoted earlier, we assume that we have the features and their values established.Given that, in a canonical system these should ‘multiply out’, so that all possiblecells in a paradigm exist. For example, if a given language has four cases andthree numbers in its nominal system, the paradigm of a noun should have twelvecells. (This is equivalent to Spencer’s notion of ‘exhaustivity’ 2003: 252.)

Furthermore, to be fully canonical, a paradigm should be ‘consistent’, ac-cording to the following criteria:


(1) Canonical inflection

comparison across cellsof a lexeme(level one comparison)

comparison acrosslexemes(level two comparison)

1. composition/structure same samecf. §4.2.1

2. lexical material(≈ shape of stem)

samecf. §4.1.1

different

3. inflectional material(≈ shape of inflection)

differentcf. §4.1.2

samecf. §4.2.2

outcome(≈ shape of inflected word)

different different

This schema implies two levels of comparison:

level one: we start from the abstract paradigm gained by multiplying out thefeatures and their values. We then examine any one lexeme fitted within thisparadigm. The centre column of (1) compares cell with cell, within a singleparadigm. We take in turn the criteria in the left column:

1. we look at the composition and structure of the cells; suppose the first con-sists of a stem and a prefix: for this lexeme to have a canonical paradigm,every other cell must be the ‘same’ in this regard. Finding a suffix, or a clitic,or any different means of exponence would reveal non-canonicity.

2. in terms of the lexical material in the cell, we require absolute identity (thestem should remain the same).

3. on the other hand, the inflectional material ‘should’ be different in everycell.

The outcome for such a lexeme (last row) is that every cell in its paradigm willrealize the morphosyntactic specification in a way distinct from that of everyother cell.

level two: this involves comparing lexemes with lexemes within the given lan-guage (right column). We use the same criteria as before:

1. a canonical system requires that the composition and structure of each cellremains the same, comparing across lexemes.

2. we require that the lexical information be different (we are, after all, com-paring different lexemes).


3. in the canonical situation, the inflectional material is identical. That is, if ourfirst lexeme marks dative plural in -du, so does every other.

The outcome is that every cell of every lexeme is distinct. We illustrate thiswith a hypothetical example:

(2) Illustration (hypothetical)

DOG-a DOG-i CAT-a CAT-i

DOG-e DOG-o CAT-e CAT-o

This system of canonical inflection would make perfect sense in functionalterms. There is perfect differentiation within the morphology, while using theminimal material.

4. Deviations from canonical inflection

Real systems, however, show great divergences from this idealization. Its valueis that we can use the notion of canonicity as a way of calibrating the phenomenawe find. We look at the deviations from canonicity first internally, comparingthe cells of a single lexeme, then externally, comparing across lexemes. It is thetypology of these divergences which allows us to move towards a consistentset of terms. A general pattern is that where we actually find ‘same’ in place ofcanonical ‘different’ this will give a non-functional outcome. If we find ‘differ-ent’ in place of canonical ‘same’ this will lead to increased complexity and/orredundancy.

Working through the different deviations gives us an overall classificationof the phenomena of inflectional morphology. That is a long undertaking, andspace does not allow us to complete it here. Instead we will take some illustra-tive instances, selecting as examples those that we shall need for the discussionof higher order exceptionality.

4.1. Internal non-canonicity

We start with phenomena that can be defined within the lexeme, and we taketwo key types.

4.1.1. Lexical material

In the canonical situation, lexical meaning (and only that) is conveyed by lexi-cal material, the stem; grammatical meaning, and only that, is conveyed by theinflection. Thus the stem is inert, and all the differentiation in the paradigm is


due to the inflectional material. Contrary to this canonical situation, we find allsorts of alternations of stem, from the predictable, through the less regular, rightup to full suppletion as, for example, in Russian rebenok ‘child’ deti ‘children’.Suppletion has rightly attracted a good deal of interest, as in Carstairs-McCarthy(1994), Mel’cuk (1994), Corbett (2007a); see Chumakina (2004) for an anno-tated bibliography, and Brown, Chumakina, Corbett and Hippisley (2004) foran on-line typological database. In terms of possible words, suppletion is of par-ticular interest because it means that there are lexemes which have forms withno phonological shape in common.

4.1.2. Inflectional material

Since inflectional material conveys grammatical meaning, in the canonical situ-ation we find a different inflection in each cell. Contrast this with the followingparadigm from Slovene:

(3) Paradigm of Slovene kot ‘corner’ (Priestly 1993: 400–402)

singular dual pluralnominative kot kota kotiaccusative kot kota kotegenitive kota kotov kotovdative kotu kotoma kotominstrumental kotom kotoma kotilocative kotu kotih kotih

A morphosyntactic analysis of Slovene produces good evidence for six casesand three numbers. We therefore expect a paradigm with eighteen cells. Thisparticular lexeme has only nine phonologically distinct forms filling these cells.It shows numerous examples of syncretism, that is, instances where we havea single form which realizes more than one morphosyntactic specification. Weuse syncretism as a cover term; different examples may be analysed in differentways (see Baerman, Brown and Corbett 2005 for extensive discussion).

4.2. External non-canonicity

We now move on to deviations which are to be defined in terms of comparisonsacross lexemes.


4.2.1. Composition/structure

In the canonical situation, the composition and structure of a lexeme’s paradigmwill be constant when we compare across the class. For instance, if we find thatnouns in a given language distinguish singular and plural, in the canonical sit-uation this will hold generally true. One of the deviations from this canonicalsituation is overdifferentiation (Bloomfield 1933: 223–224; Nübling, this vol-ume). Lexemes which are overdifferentiated stand out from the rest of the groupin that they have an additional cell in their paradigm. For example, in Maltesemost nouns distinguish singular from plural. Now consider uqija ‘ounce’:

(4) Example of the Maltese dual

singular dual pluraluqija uqitejn uqijiet

Around 30 nouns distinguish singular from dual from plural; this is a ‘minornumber’ (Corbett 2000: 96). With only eight of them, according to Fenech(1996), is the use of the dual obligatory. Uqija ‘ounce’ is overdifferentiatedin having a dual, but its use is not obligatory; for ‘two ounces’ one can useeither the dual uqitejn or the form with the numeral: zewg uqijiet.

4.2.2. Inflectional material

In the canonical situation, inflectional material is the same across lexemes. Wecan specify that the first singular present tense active has a particular form justonce in the grammar. Of course there are many deviations from this. One of themost interesting, and least studied, is deponency, for which see Embick (1998,2000), Corbett (1999), Sadler and Spencer (2001), Stump (2002), Kiparsky(2005), Baerman, Corbett, Brown and Hippisley (2007), and for on-line typo-logical material see Baerman (2005).

Deponency goes against the notion of ‘regularity of inflection’: in particularthe expectation that certain forms have certain functions. Consider the partialparadigm of two Latin verbs (Kennedy 1955: 72, 82):


(5) Partial paradigm of a regular Latin verb

amare ‘to love’active passive

1 sg amo amor2 sg amas amaris3 sg amat amatur1 pl amamus amamur2 pl amatis amaminı3 pl amant amantur

Here we see a regular differentiation of active and passive. There are manyverbs like this one. In principle, given a particular inflection, one can tell imme-diately whether it the form is active or passive. Now contrast this with deponentverb:

(6) Partial paradigm of a deponent Latin verb

venarı ‘to hunt’active

1 sg venor2 sg venaris3 sg venatur1 pl venamur2 pl venaminı3 pl venantur

With this verb we have the forms which ‘ought’ to be passive taking the role ofactive inflections. We can say this only by comparison across lexemes: there aremany verbs with the pattern of amare ‘to love’ and relatively few like venarı‘to hunt’.

Deponency is generally discussed with reference to Latin. Indeed it is some-times even defined as being a phenomenon found in Latin: “Class of verbs inLatin, intransitive or active in syntax but with inflections that usually mark pas-sives.” Matthews (1997: 93). However, the basic phenomenon, which we shallcall ‘extended deponency’, need not be restricted to Latin, to voice, nor evento verbs. The phenomenon consists of inflections which have an establishedfunction in the morphological system being used in a minority of instances forthe opposite function. This covers the Latin deponent verbs, and extends to arange of interesting phenomena which, because they have had no name, havebeen little studied. For a range of examples see Baerman (2005); an example ofdeponency in this wider sense will also be analysed in § 6.4.


5. Interactions

Some of the examples examined so far are well-known and present fairly minorinstances of exceptionality. However, they provide the basis for an investigationof higher order exceptionality, which results from interactions of these phe-nomena. By interactions, we mean not simply that a given lexical item showsmore than one type of exceptionality, but that the exceptional phenomena tar-get the same cells of the paradigm. That is, we are dealing not just with a smallsubclass (Moravcsik, this volume) but with the intersection of small subclasses.

5.1. Suppletion and syncretism

One interaction that has been discussed is from the South Slavonic languageSlovene, found in the noun clovek ‘man, person’; see Priestly (1993: 401),Plank (1994), Corbett and Fraser (1997), Evans, Brown and Corbett (2001:215), Baerman, Brown and Corbett (2005: §5.1.1). This is a particularly inter-esting case, which deserves further mention here. It shows an interaction of sup-pletion and syncretism. The suppletion involves a plural stem as opposed to thatfor singular and dual. This interacts with a more general syncretism: Slovenenouns always have the genitive dual syncretic with the genitive plural (similarlythe locative dual is syncretic with the locative plural). This is one of the syn-cretisms in (3) above. Clearly, then, the genitive and locative dual will involvean interaction of these suppletion and syncretism. The effect can be seen in (7):

(7) Slovene clóvek ‘man, person’ (Priestly 1993: 401)

singular dual pluralnominative clovek cloveka ljudjeaccusative cloveka cloveka ljudigenitive cloveka ljudi ljudidative cloveku clovekoma ljudeminstrumental clovekom clovekoma ljudmilocative cloveku ljudeh ljudeh

In this interesting paradigm certain cells are targeted both by suppletion and bysyncretism. The interaction creates an unusual pattern of stems; the general ruleof syncretism seems to ‘win out’ over the suppletion.


5.2. Suppletion and overdifferentiation

Our second example also concerns suppletion, this time interacting withoverdifferentiation. Consider these East Norwegian dialect forms for the ad-jective ‘small’:

Norwegian (East Norwegian dialect, Hans-Olav Enger, personal com-munication)

(8) enart.m.sg.indf

lit-ensmall-m.sg.indf

gutt.1

boy(m)[sg.indf]‘a small boy’

(9) denart.m/f.sg.def

veslesmall.sg.def

gutt-enboy(m)-sg.def

‘the small boy’

(10) eiart.f.sg.indf

lit-asmall-f.sg.indf

jent-egirl(f)-sg.indf

‘a small girl’

(11) denart.m/f.sg.def

veslesmall.sg.def

jent-agirl(f)-sg.def

‘the small girl’

(12) etart.n.sg.indf

lit-esmall-n.sg.indf

barnchild(n)[indf]

‘a small child’

(13) detart.n.sg.def

veslesmall.sg.def

barn-etchild(n)-sg.def

‘the small child’

This adjective has three suppletive stems, lit- in the singular indefinite, vesle inthe singular definite,2 and in the plural there is små. This latter also deservesillustration:

(14) småsmall.pl

gutt-erboy(m)-pl.indf

‘small boys’

1. The Leipzig Glossing Rules are adopted (for details see http://www.eva.mpg.de/lingua/index.html).

2. In the dialect cited these forms are obligatory. Various other Norwegian speakersI have asked accept these forms, but for them vesle is optional.


(15) småsmall.pl

jent-ergirl(f)-pl.indf

‘small girls’

(16) småsmall.pl

barnchild(n)[indf]

‘small children’

(17) deart.pl.def

småsmall.pl

gutt-eneboy(m)-pl.def

‘the small boys’

(18) deart.pl.def

småsmall.pl

jent-enegirl(f)-pl.def

‘the small girls’

(19) deart.pl.def

småsmall.pl

barn-achild(n)-pl.def

‘the small children’

We can see the evidence for suppletion just looking within this one lexeme.To demonstrate that this adjective is also overdifferentiated, we need to compareit with an ordinary adjective:

(20) Regular tjukk ‘thick, fat’ and liten ‘small’ in East Norwegian (Hans-Olav Enger, personal communication)

singular plural singular pluralindf def indf def

m tjukk m litenf tjukke f lita vesle smån tjukt n lite

This dialect has three genders, as shown by the articles. Yet a normal adjectivelike tjukk ‘thick, fat’ does not distinguish all three; rather, it makes only onedistinction, masculine and feminine together versus neuter (Enger and Kristof-fersen 2000: 104). The instance of overdifferentiation involving liten ‘small’ iswithin one of the suppletive stems. Besides this, tjukk ‘thick, fat’ and other nor-mal adjectives do not distinguish definite plural from definite singular; tjukk-efunctions for both. However, vesle is the definite singular, but in the plural småis used. This distinction, not made by normal adjectives, is between the supple-tive stems which bring about the overdifferentiation. Putting all this togetherwe see that in the positive, a normal adjective has three forms, while liten hasfive forms, resulting from the interaction of suppletion and overdifferentiation.


5.3. Overdifferentiation and syncretism

Given the number of relevant morphological phenomena, the number of poten-tial interactions, in addition to those we have seen, is potentially rather large.It is therefore an attractive idea to ask whether there are logical restrictions onwhich two-way interactions are possible. To date, none has been established.Quite on the contrary, one of the most likely restrictions is disproved by dataalready available.

At first sight it would seem impossible to have an interaction of overdiffer-entiation and syncretism. After all, one creates ‘too many’ forms, and the other‘too few’. They would therefore, apparently, cancel each other out. The dataare more complex than that. They involve the Russian ‘second genitive’. Rus-sian has unarguably six primary cases. But there are additional forms which areharder to analyse (see Zaliznjak 1973; Worth 1984; Comrie 1986 for discus-sion). Contrast these forms of the nouns kisel’ ‘kissel’ (a Russian fruit drink, abit like thin blancmange) and caj ‘tea’. First both have a regular genitive:

(21) vkustaste

kiselj-akissel-sg.gen

‘the taste of kissel’

(22) vkustaste

caj-atea-sg.gen

‘the taste of tea’

However, in certain partitive expressions we find a contrast:

(23) stakanglass

kiselj-akissel-sg.gen

‘a glass of kissel’

(24) stakanglass

caj-utea-sg.gen2

‘a glass of tea’

Here kisel’ ‘kissel’ is now an example of a normal regular noun, while caj ‘tea’is one of a subclass which has a separate form, the so-called second genitive.The number of nouns with this second genitive is restricted, but they numberdozens rather than a handful.3 Within those nouns which have a second genitive,

3. Ilola and Mustajoki (1989: 41–41) identify 396. However, some of these are ratherrare nouns. Moreover, Ilola and Mustajoki’s source is Zaliznjak (1977), and the formhas been in decline since then. Thus kisel’ ‘kissel’ is given as having the secondgenitive; however, my consultants do not accept this form, and Google gives over


for some of them the second genitive is normally used in partitive expressions,for the others the second genitive is a possibility in competition with the or-dinary genitive; for data on this see Panov (1968: 180), Graudina, Ickovic andKatlinskaja (1976: 121–125) and Comrie, Stone and Polinsky (1996: 124–125).

What concerns us particularly is the form of the second genitive, caju. Con-sider the following partial paradigms:

(25) Russian partial singular paradigms

nominative kisel’ cajgenitive kiselja cajagenitive 2 as genitive cajudative kiselju caju

Here we see that the ‘extra’ form of caj ‘tea’, the second genitive, is syncreticwith the dative. Note that we cannot push the problem into syntax and claim thatthe form used is the dative, since any agreements are indeed genitive. This isnot obvious, since in the modern language the inclusion of an agreeing modifierstrongly disfavours the use of the second genitive; instead the ordinary genitiveis more likely:

(26) stakanglass

zelen-ogogreen-m.sg.gen

caj-atea(m)-sg.gen

‘a glass of green tea’

Here the presence of the modifier zelenogo ‘green-m.sg.gen’ makes much morelikely the use of the ordinary genitive caja. However, in those instances wherethe noun stands in the less likely second genitive in an expression similar to (26)genitive agreement is still required. Thus zelenogo caju ‘green tea’ is possible –if rare – as a second genitive. We should therefore test what happens if we putthe attributive modifier in the dative:

(27) zelen-omugreen-m.sg.dat

caj-utea(m)-sg.dat

‘green tea’

(27) can be used only in syntactic positions where a dative is required. It is nota second genitive, and could not be used in (26). The problem is therefore amorphological one and not a syntactic issue: second genitives are not syntacticdatives. We can conclude that the nouns with a second genitive are overdifferen-

200 examples of stakan kiselja ‘glass of kissel’ and none of stakan kiselju. Thus the396 figure is rather high.


tiated, and that the additional form is expressed by syncretism (with the dative).We do indeed have an interaction of overdifferentiation and syncretism. This inturn means that the most promising suggestion for a logical restriction on two-way interactions (that we could not find an interaction of overdifferentiationand syncretism) does not in fact hold.

5.4. (Extended) deponency, suppletion and overdifferentiation

A second natural way in which we might hope to constrain the possibilitiesfor interactions is simply in terms of quantity. The examples we have seen havebeen of two-way interaction. Can we state that as the limit? Clearly, if three-wayinteractions are found, then the space of possibilities expands dramatically. Thelaws of chance are likely to make three-way interactions rare, but an examplehas been found:

(28) Serbian dete ‘child’ and žena ‘woman, wife’

nominative dete deca ženavocative dete deco ženoaccusative dete decu ženugenitive deteta dece ženedative detetu deci ženiinstrumental detetom decom ženomlocative detetu deci ženi

singular singular

Consider the forms in the unlabelled column (deca and so on). These functionas the plural of dete ‘child’. Viewed against the rest of the inflectional systemthey look odd. First there is a problem with the stem (dec- instead of det-).This is not a possible alternation in modern Serbian, and so we must recog-nize the stems as being suppletive. Not fully suppletive of course, but partiallysuppletive (or as showing a completely irregular alternation, if preferred). Sec-ond, and rather worse, are the inflections. They are apparently completely outof place as plural; the plural inflections look rather different from these.4 A

4. Agreements are complex and interesting. In brief, there are some instances in whichan unambiguously feminine singular form is used. There are others where a clearplural is used, and still others where a gender/number form is used and where itcan be argued that this is best analysed as neuter plural. Personal pronouns with anoun phrase headed by deca ‘children’ as antecedent can stand in the neuter pluralor the masculine plural, dependent on the type of reading, which means that overallit can control three different types of agreement (feminine singular, neuter plural and


comparison with the singular forms of žena ‘woman, wife’, a regular noun ofa different inflectional class, shows what is going on. We have a set of inflec-tions which have an established function in the morphological system beingused in a minority of instances for the opposite function. That is, an instance ofextended deponency. And third, a noun in the plural in Serbian normally distin-guishes three case forms (nominative-vocative-accusative versus genitive ver-sus dative-instrumental-locative) though one large group has four forms (thisgroup has also a unique form for the accusative). Deca ‘children’ has six formsand so is overdifferentiated. Thus it is possible to find an instance of a three-wayinteraction. This means that the space of possible items which we characterizeas showing higher order exceptionality is potentially very large.

6. Conclusion

The paper represents part of a new attempt to bring the phenomena of inflec-tion into a coherent scheme. This is done within a canonical approach to ty-pology. Such an approach has the advantage of conceptual clarity.5 It allowsus to systematize the various minor irregularities of inflectional morphology.However, our focus was rather on those lexemes that are more than merely ex-ceptional. We concentrated on those which show interactions of non-canonicalphenonema and so represent a higher order of exceptionality. Such examples areof great importance for establishing what is a possible word in human language,since they push the limits considerably beyond normal exceptionality. In termsof the theoretical possibilities, we were not able to eliminate any of the possi-ble two-way interactions of non-canonicity, which shows that there are a goodmany potential types. Furthermore, we identified a three-way interaction, whichdemonstrates that the potential space is large. The initial picture that emerges isthat individual lexemes can indeed be exceptionally exceptional: they can showhigher order exceptionality in various ways. The range of possible words is re-

masculine plural), if personal pronouns are counted as agreement targets. See Corbett(1983: 76–86), Wechsler and Zlatic (2000: 816–821) and Corbett (2006) for details.In part the patterns fall under the typological regularities governing the distribution ofsyntactic and semantic agreement. However, there are remaining issues, notably theinteraction of these choices with case, which make deca ‘children’ problematic foragreement. While particular items may be highly irregular in morphological terms,this does not normally lead to any impact on syntax. Deca ‘children’ is particularlychallenging in that its aberrant behaviour appears not to be restricted to morphology.

5. It also has the practical advantage of proving a good basis for typological databases,see: http://www.smg.surrey.ac.uk/ for examples.


markable broad. As yet only some of the potential types have been found, but itseems likely that several others exist. From the perspective of a language’s lex-icon as a whole, however, lexemes showing higher order exceptionality are –not surprisingly – rare.

Abbreviations

art articledat dativedef definitef femininegen genitiveindf indefinitem masculinen neuterpl pluralsg singular

References

Baerman, Matthew2005 A survey of deponency in a sample of 100 languages. [Available on-

line at: http://www.surrey.ac.uk/LIS/MB/WALS/WALS.htm]

Baerman, Matthew, Dunstan Brown, and Greville G. Corbett2005 The Syntax-Morphology Interface: A Study of Syncretism. Cambridge:

Cambridge University Press.

Baerman, Matthew, Greville G. Corbett, Dunstan Brown, and Andrew Hippisley (eds.)2007 Deponency and Morphological Mismatches. Oxford: Oxford Univer-

sity Press (Proceedings of the British Academy 145).

Bloomfield, Leonard1933 Language. New York: Holt, Rinehart and Winston.

Brown, Dunstan, Marina Chumakina, Greville G. Corbett, and Andrew Hippisley2004 The Surrey Suppletion Database. [Available online at: http://www.

smg.surrey.ac.uk/]

Carstairs-McCarthy, Andrew1994 Suppletion. In Encyclopedia of Language and Linguistics. Vol. 8,

R. E. Asher (ed.), 4410–4411. Oxford: Pergamon.


Chumakina, Marina2004 An annotated bibliography of suppletion. [Available online at: http://

www.surrey.ac.uk/LIS/SMG/Suppletion_BIB/WebBibliography.htm]

Comrie, Bernard1986 On delimiting cases. In Case in Slavic, Richard D. Brecht, and James

Levine (eds.), 86–106. Columbus, OH: Slavica.

Comrie, Bernard, Gerald Stone, and Maria Polinsky1996 The Russian Language in the Twentieth Century. Oxford: Clarendon

Press.

Corbett, Greville G.1983 Hierarchies, Targets and Controllers: Agreement patterns in Slavic.

London: Croom Helm.

Corbett, Greville G.1991 Gender. Cambridge: Cambridge University Press.

Corbett, Greville G.1999 Defectiveness, syncretism, suppletion, ‘deponency’: four dimensions

for a typology of inflectional systems. Guest lecture at The SecondMediterranean Meeting on Morphology, 10–12 September 1999, Uni-versity Residence, Lija, Malta.

Corbett, Greville G.2000 Number. Cambridge: Cambridge University Press.

Corbett, Greville G.2003 Agreement: Canonical instances and the extent of the phenomenon. In

Topics inMorphology: Selected papers from the ThirdMediterraneanMorphology Meeting (Barcelona, September20-22, 2001), GeertBooij, Janet DeCesaris, Angela Ralli, and Sergio Scalise (eds.), 109–128. Barcelona: Universitat Pompeu Fabra.

Corbett, Greville G.2005 The canonical approach in typology. In Linguistic Diversity and Lan-

guage Theories, Zygmunt Frajzyngier, Adam Hodges, and DavidS. Rood (eds.), 25–49. (Studies in Language Companion Series 72)Amsterdam: Benjamins.

Corbett, Greville G.2006 Agreement. Cambridge: Cambridge University Press.

Corbett, Greville G.2007a Canonical typology, suppletion and possible words. Language 83,

8–42.

Corbett, Greville G.2007b Deponency, syncretism, and what lies between. In Baerman et al.

(eds.), 21–43.


Corbett, Greville G., and Norman M. Fraser1993 Network Morphology: A DATR account of Russian inflectional mor-

phology. Journal of Linguistics 29: 113–42. [Reprinted 2003 in Mor-phology: Critical Concepts in Linguistics, VI: Morphology: Its Placein the Wider Context, Francis X. Katamba (ed.), 364–396. London,Routledge.]

Corbett, Greville G., and Norman M. Fraser1997 Vycislitel’naja lingvistika i tipologija <Computational linguistics and

typology>. Vestnik MGU: Serija 9: Filologija 2: 122–140.

Embick, David1998 Voice systems and the Syntax/Morphology Interface. In MIT Work-

ing Papers in Linguistics 32: Papers from the Penn/MIT Roundtableon Argument Structure and Aspect, Heidi Harley (ed.), 41–72. Cam-bridge, MA.: MIT.

Embick, David2000 Features, syntax and categories in the Latin perfect. Linguistic Inquiry

31: 185–230.

Enger, Hans-Olav, and Kristian E. Kristoffersen2000 Innføring i norsk grammatikk: Morfologi og syntaks [Introduction to

Norwegian Grammar: Morphology and syntax]. Oslo: Landlaget forNorskundervisning / Cappelen Akademisk Forlag.

Evans, Nicholas, Dunstan Brown, and Greville G. Corbett2001 Dalabon pronominal prefixes and the typology of syncretism: a Net-

work Morphology analysis. In Yearbook of Morphology 2000, GeertBooij and Jaap van Marle (eds.), 187–231. Dordrecht: Kluwer.

Evans, Nicholas, Dunstan Brown, and Greville G. Corbett2002 The semantics of gender in Mayali: Partially parallel systems and for-

mal implementation. Language 78: 111–155.

Fenech, Edward1996 Functions of the dual suffix in Maltese. Rivista di Linguistica 8: 89–

99.

Graudina, L. K., V. A. Ickovic, and L. P. Katlinskaja1976 Grammaticeskaja pravil’nost’ russkoj reci: opyt castotno-stilistices-

kogo slovarja variantov [Norms in Russian: a frequency dictionaryof stylistic variants].Moscow: Nauka.

Ilola, Eeva, and Arto Mustajoki1989 Report on Russian Morphology as it appears in Zaliznyak’s Gram-

matical Dictionary. (Slavica Helsingiensia 7) Helsinki: Departmentof Slavonic Languages, University of Helsinki.


Kennedy, Benjamin H.1955 The Revised Latin Primer. [Edited and further revised by James

Mountford.] London: Longmans, Green and Co.

Kiparsky, Paul2005 Blocking and periphrasis in inflectional paradigms. In Yearbook of

Morphology 2004, Geert Booij, and Jaap van Marle (eds.), 113–135.Dordrecht: Kluwer.

Matthews, P. H.1997 The Concise Oxford Dictionary of Linguistics. Oxford: Oxford Uni-

versity Press.

Mel’cuk, Igor1993 Cours de morphologie générale (théorique et descriptive). I: Intro-

duction et Première partie: Le mot [A Course in General Morphol-ogy (theoretical and descriptive). Vol. I: Introduction and first part:The word]. Montréal: Les Presses de l’Université de Montréal.

Mel’cuk, Igor1994 Suppletion: toward a logical analysis of the concept. Studies in Lan-

guage 18: 339–410.

Panov, M. V. (ed.)1968 Morfologija i sintaksis sovremennogo russkogo literaturnogo jazyka

(Russkij jazyk i sovetskoe obšcestvo: Sociologo-lingvisticeskoe issle-dovanie: III) [The morphology and syntax of the modern Russianstandard language (Russian language and Soviet society: A sociolin-guistic investigation: III)]. Moscow: Nauka.

Plank, Frans1994 Homonymy vs. suppletion: A riddle (and how it happens to be solved

in …). Agreement Gender Number Genitive &, 81–86. (EUROTYPWorking Papers VII/23) Konstanz: University of Konstanz.

Priestly, T. M. S.1993 Slovene. In The Slavonic Languages, Bernard Comrie, and Greville

G. Corbett (eds.), 388–451. London: Routledge.

Sadler, Louisa, and Andrew Spencer2001 Syntax as an exponent of morphological features. In Yearbook ofMor-

phology 2000, Geert Booij, and Jaap van Marle (eds.), 71–96. Dor-drecht: Kluwer.

Seifart, Frank2005 The structure and use of shape-based noun classes in Miraña (North

West Amazon). Ph. D. diss., Radboud University, Nijmegen.


Spencer, Andrew2003 Periphrastic paradigms in Bulgarian. In Syntactic Structures andMor-

phological Information, Uwe Junghanns, and Luka Szucsich (eds.),249–282. (Interface Explorations 7) Berlin: Mouton de Gruyter.

Stump, Gregory T.2001 InflectionalMorphology: A theory of paradigm structure. Cambridge:


Stump, Gregory T.2002 Morphological and syntactic paradigms: arguments for a theory of

paradigm linkage. In Yearbook of Morphology 2001, Geert Booij, andJaap van Marle (eds.), 147–180. Dordrecht: Kluwer.

Suthar, Babubhai Kohyabhai2006 Agreement in Gujarati. Ph. D. diss., University of Pennsylvania.

Wechsler, Stephen, and Larisa Zlatic2000 A theory of agreement and its application to Serbo-Croatian. Lan-

guage 76: 799–832.

Worth, Dean S.1984 Russian gen2, loc2 revisited. In Signs of Friendship: To Honour

A.G.F. van Holk, Slavist, Linguist, Semiotician, J.J. van Baak (ed.),295–306. Amsterdam: Rodopi.

Zaliznjak, Andrej A.1973[2002] O ponimanii termina ‘padež’ v lingvisticeskix opisanijax [Under-

standing the term ‘case’ in linguistic descriptions]. In Russkoeimennoe slovoizmenenie, Andrej A. Zaliznjak (ed.), 613–47.Moscow: Jazyki slavjanskoj kul’tury. [originally in Problemygrammaticeskogo modelirovanija, Andrej A. Zaliznjak (ed.), 53–87.Moscow: Nauka.]

Zaliznjak, Andrej A.1977 Grammaticeskij slovar’ russkogo jazyka: slovoizmenenie [A Gram-

matical Dictionary of Russian: Inflection]. Moscow: Russkij jazyk.

An I-language view of morphological‘exceptionality’: comments on Corbett’s paper

Stephen R. Anderson

1. Introduction

Corbett starts from what he calls “the notion of ‘canonical’ inflection,” corre-sponding closely to an ideal “one form – one meaning” pattern with respect tothe roots and formal markers found in inflected words. He discusses some of themyriad ways in which actual inflectional paradigms are found to deviate fromthis simple schema, and provides a number of important and thought-provokingexamples.

Many linguists have felt that something like this ‘canonical’ inflection mustin some way characterize morphological structure in general. Some might seethis as an ideal form underlying the apparent complexity of surface shapes, as inLounsbury’s (1953) notion of an idealized “agglutinating analog,” while others(such as the practitioners of “Natural Morphology”) regard it as a fundamentalconstraint on linguistic structure. Examples such as Corbett’s make it clear thatany claim to the effect that inflectional paradigms are “basically” regular must ata minimum be carefully hedged to allow for all sorts of deviation from regularityin practice.

I address here two consequences that follow from observations such as thosein Corbett’s paper. I first note, in section 2, that exceptionality in inflectionalmorphology finds its importance not directly in terms of comparisons betweensurface forms, but rather in the grammar that underlies them: in I-languagerather than E-language, to put the matter in Chomsky’s terms (cf. Anderson& Lightfoot 2002). In section 3 I then suggest that the range of exceptional-ity referred to in Corbett’s discussion argues that morphological theory, per se,has no place for the notion of such an ideal structural type. To the extent much(indeed, most) actual inflectional structure matches it, the explanation is to befound outside of the theory of word structure itself, in areas such as the patternsof diachronic change that lead to observed synchronic systems.

128 Stephen R. Anderson

2. The locus of exceptionality

Corbett discusses ways in which observed inflectional paradigms can deviatefrom the pattern of canonical inflection, including the traditional notions ofsuppletion, syncretism, overdifferentiation, and deponency. He discusses thesenotions, as is quite standard in traditional grammar, in terms of surface wordforms: thus, syncretism is described as “instances in which we have a singleform which realizes more than one morphosyntactic specification.”

This approach, however, leads to a certain amount of indeterminacy. For in-stance, how do we know in a given case, whether we are dealing with “overdif-ferentiation” in one subset of the lexicon of a language as opposed to “syn-cretism” in a complementary subset? Is it the case that some Maltese nounsare overdifferentiated for number, or rather that the others (the great majority)show syncretism of the dual and plural? How could we differentiate these ac-counts, and does it actually matter that we do so? Simply observing that nounswith a distinct dual constitute a small minority makes the difference a matterof mere numbers and seems to trivialize the issue, but it is hard to see how wecan improve on that so long as our attention is confined to patterns in surfaceforms. If the difference between syncretism and overdifferentiation is genuinelysignificant, this must be because they correspond to distinct mechanisms in thegrammar of a language. In every case, the observation of a surface pattern de-viating from the canonical one only raises the question of what lies behind it,rather than serving as a (self-confirming) diagnosis of the nature of the excep-tionality.

2.1. Identifying true suppletion

For example, “suppletion” cannot be identified in any significant sense as merenon-identity of the lexical bases of two (or more) morphosyntactically distinctforms within a paradigm. In some instances, quite considerable differences inthe shape of the base among paradigmatically related forms may follow directlyfrom the phonological regularities of the language. For instance, yhden, the gen-itive form of the Finnish numeral ‘one’, differs substantially from the nomina-tive yksi. These differences follow from the phonology of Finnish, however,given a basic stem such as /ükte/. In the nominative, /e/ is raised to [i] in fi-nal position, which results in spirantization of the /t/ to [s]. In the genitive, theaddition of the regular ending /n/ prevents the stem-final vowel from raising.It also closes the second syllable, resulting in consonant gradation of the stem/t/ to /d/, and concomitant dissimilation of continuance which causes the /k/ ofthe stem to be realized as [h]. Surely we should not speak of suppletion here,

An I-language view of morphological ‘exceptionality’ 129

despite the disparity of stem shape: true suppletion corresponds to the case inwhich an alternation in form is lexically idiosyncratic, and thus must be repre-sented by distinct memorized forms, rather than mere difference in the surfaceform of the base.

We often proceed as if we could identify genuine cases of suppletion in termsof the distance between variants of the base, and the phonological naturalnessof their connection. Sometimes, though, quite minor alternations in shape canhave the lexical character that leads us to call them suppletive. An example ofthis is furnished by the Surmiran form of Rumantsch. Here, as in many otherRomance languages, verbal endings differ as to whether or not they bear stress,and the vowels of the stem may change in limited ways depending on whethera form takes stress on the stem or on the ending. For instance, in the paradigmof cantar ‘to sing’ we find cantas [’kant@s] ‘sing (2sg)’ alternating with can-tagn [k@n’tañ] ‘sing (1pl)’. Given that Surmiran only has the vowels [i,@,u] inunstressed syllables, as opposed to a full set of seven vowels (short and long)plus several diphthongs in stressed syllables, it seems that this alteration mustbe a purely phonological matter of vowel reduction in unstressed syllables. Insome instances, such as the verb eir ‘to go’, we certainly find suppletion: thealternation of vast [vast] ‘go (2sg)’ with giagn [dZañ] ‘go (1pl)’ is unconnectedwith any general phonological rule(s). Surely the alternation in forms of cantaris phonological, though.

Closer examination shows that this cannot be not correct. As a consequenceof wholesale re-structuring of vocalic patterns in individual verbs, there is nolonger a predictable correspondence between stressed and unstressed vowels.For any one of the three unstressed vowels, there are seven or eight possiblecorresponding stressed vowels (or diphthongs). This would hardly be unusual,if the correspondence were unique in the other direction, but that is not the case.In fact, there is no stressed vowel whose unstressed correspondent is unique;and some of the stressed vowels (e.g., [a] and [o]) can correspond to any one ofthe unstressed vowels, depending on the lexical identity of the verb in question.Some of these patterns are commoner than others, but not at all to the extentthat the simple and phonologically natural alternations can be derived by rule.Although there is no doubt that these alternations originated historically in astraightforward phonological rule reducing vowels in unstressed syllables, thepattern in the language today is arguably such that every verb must have its twoalternants (stem-stressed vs. desinential-stressed) indicated in its lexical entry:suppletion by definition, if we think of morphology in terms of grammars (seeAnderson 2008 for further discussion of these facts).


2.2. Sources of syncretism in grammar

If we see syncretism pre-systematically as the coincidence in a single surfaceform of multiple morphosyntactic possibilities, as Corbett’s definition (quotedabove) suggests, then we must realize that this can have several different sourcesin the grammar of a language.

In some instances, the overlap of multiple morphosyntactically distinct spec-ifications in a single surface form is surely a matter of simple homophony, not tobe confused with syncretism. In Icelandic, for example, the genitive singular ofstrong nouns is marked either with -s or with -ar. The nominative (and in somecases, accusative) plural of strong masculine and feminine nouns is marked with-ar, -ir, -ur or -r (of which this last may disappear phonetically after stem-finalr or assimilate to a preceding sonorant). There are some principles governingthe distribution of each of these sets of alternants, but they are quite indepen-dent of one another. In particular, while some nouns do show -ar both in thegenitive singular and the nominative plural (e.g., kerling ‘old woman’), thereis no reason to treat this as a systematic syncretism, because many others show-ar only on one for the other (e.g., hlutur ‘thing’, gsg. hlutar, npl. hlutir; hestur‘horse’, gsg. hests, npl. hestar). The grammar, that is, establishes no system-atic connection between nouns with a genitive in -ar and those with an identicalnominative(/accusative) plural. The two categories simply have markers thatmay accidentally coincide, leading to word forms that are homophonous andnot syncretic.

In other cases, surface identity of morphosyntactically distinct forms is a re-sult of the phonology. For example, a substantial subset of the “irregular” verbsof English (i.e., those that do not form their past and past participle by adding/d/) involve the addition of a very similar ending, /t/, at an earlier level of the lex-ical phonology (see Anderson 1973 for some discussion), as in the (now some-what archaic) burn/burnt, learn/learnt. As opposed to the ‘regular’ ending /d/,this /t/ has the effect of devoicing a preceding voiced fricative and shorteningthe vowel of the stem, as in leave/left, lose/lost, and other verbs. The vowel-shortening effect, regular before syllable-final clusters at this level, can occur byitself (creep/crept,mean/meant, deal/dealt). When the stem ends in /t/, the clus-ter is simplified, but not before triggering vowel shortening: bite/bit, meet/met.For stems ending in /d/, the same cluster reduction occurs (without regressiveassimilation of [Voice]), again with vowel shortening: lead/led, slide/slid.

This cluster of phonological effects is quite regular and characteristic of theappropriate level of the phonology. What is interesting is the consequence ofadding this /t/ to stems ending in a dental stop and containing a basic shortvowel. In such forms (e.g. set/set, fit/fit; rid/rid, wed/wed) the vowel shortening


has no visible effect (since the vowel is already short), and the reduction of thefinal stop cluster results in the complete loss of any surface reflex of the ending.The consequence is a phonologically derived homophony of the present andpast forms, but surely not a morphologically governed syncretism.

Somewhat similar consequences can follow from the operation of morpho-logical regularities that are sensitive to phonological form. It is of course wellknown that the regular plural and possessive forms of English nouns are gener-ally identical phonologically: both boys and boy’s are pronounced [bojz]. Thisis probably a matter of simple homophony, though the fact that essentially allproductive inflection in English involves the two phonological forms /z/ and /d/,each adjoined (not simply concatenated) at the right edge of the word, suggeststhat some more general principle may be at work.

What is somewhat more interesting is the fact that the possessive form of aword ending in the regular plural is also homophonous with the simple pluraland possessive: boys’ is also phonetically [bojz], not the expected *[bojzz].This cannot be an instance of syncretism, because exactly where the plural of anoun is formed in some way other than with /z/, the possessive plural is distinct:children’s, women’s. Furthermore, the same homophony is found (for at leastsome speakers, and perhaps more manuals of English usage) with a class ofproper names ending in /z/: Jones’ theory. Consider also that for some speakers,the presence of the 3rd person singular /z/ also blocks the overt expression ofthe possessive: The girl Harry adores’ long hair [is actually a wig] (cf. (Zwicky1987)). The regularity is thus not the morphologically conditioned one “singularand plural possessive are identical” that the syncretism analysis predicts.

There are various ways to describe these facts (see Anderson 2005: 89–94for some discussion). We might say, for instance, that the plural, possessiveand 3sg present /z/’s are adjoined to the word with which they appear, and thephonology then reduces a pair of identical adjoined elements to a single in-stance. Or perhaps the possessive rule adjoins /z/ unless its host already endsin an adjoined /z/. Or perhaps, within an Optimality-theoretic framework, thepossessive rule simply says: the last word of a DP with the property [Poss] mustend in adjoined /z/ (a condition satisfied without change if the host word alreadycontains a /z/). On any of these analyses, we also have to say that for (style booksand) speakers who prefer Jones’ theory to Jones’s theory, the proper names inquestion contain a semantically vacuous and purely formal adjoined /z/.

The differences among these accounts, while undoubtedly significant, donot matter to the present point. On any one of them, the homophony of theregular plural and its possessive form follows not from a rule of syncretism, butrather from the morphophonology of the rule of phrasal inflection that marksthe possessive form.


It is the existence of these various circumstances that can give rise to sur-face identity among morphosyntactically distinct forms that heightens the sig-nificance of examples of genuine syncretisms, such as the paradigm of clovek‘man, person’ in Slovene discussed by Corbett. Here the fact that the genitiveand locative dual forms are built in the same way as the corresponding plurals,while the remaining forms of the dual are built on the singular stem, shows thatthe morphological rule of syncretism cited by Corbett (“Slovene nouns alwayshave the genitive [and locative] dual syncretic with the genitive [and locative]plural”) accurately captures the generalization at work. Accidental homophonyor phonologically derived neutralization are not serious candidates in such acase, which shows that morphological structure must countenance such “rulesof referral” (as some have called them).

I do not by any means intend to suggest that Corbett is unaware of the dif-ferences I have pointed to here. Indeed, one finds much in work of his and hiscolleagues referred to in the paper under discussion that is of great importancefor making exactly these distinctions. I mean only to emphasize that in any dis-cussion of “non-canonical” patterns in inflectional morphology, it is essentialto keep one’s focus on just where in the grammar a given (apparently) excep-tional pattern has its source, and not only on the surface forms that realize it.Exceptionality in morphology does not wear its diagnosis on its sleeve.

3. The significance of exceptionality for morphological theory

Returning to the conceptual framework of Corbett’s paper, we can ask whatstatus should be accorded to the notion of “canonical inflection” in terms ofwhich exceptionality is defined. In line with the remarks of the previous section,I suggest that if this notion has systematic status, it should be in terms of thearchitecture of a grammar, and not directly in terms of patterns among surfaceforms.

In linguistic theory it has been common to posit an innate structural pref-erence for paradigmatic relations of the sort Corbett designates as “canonical,”under a variety of names: Uniform Exponence, Paradigm Coherence or Uni-formity, Output-Output Correspondence, Natural Morphology, etc. For such aprinciple to have the status of a constraint on grammars represented in Uni-versal Grammar, it ought to have the properties of what Kiparsky (2008) calls“true universals,” and not simply those of “typological generalizations.” Cor-bett, however, demonstrates here for us that there is actually no limit in prin-ciple to the range of exceptions that languages may display to such a pattern;and thus, that it cannot constitute the kind of characterization of the human lan-


guage capacity that is the business of Universal Grammar. Along similar lines,Garrett (2008) demonstrates that an independent synchronic constraint on non-uniform paradigms does not serve as a mechanism in historical change (drivinginstances of paradigmatic leveling); rather, the effect is in the opposite direction,with observed instances of “leveling” following from independent mechanismsof change (paradigm extension).

These observations are in line with evidence accumulating in a variety ofareas that many observed typological regularities are to be attributed not tothe structure of Universal Grammar, but to logically quite separate externalsources such as linguistic change or processing preferences. Such argumentshave been made for phonology, for example, by Blevins (2004); in syntax, byNewmeyer (2006), and for morphology in Anderson (2004). Corbett’s discus-sion reinforces this conclusion by demonstrating the extent to which any suchregularities of word structure are subject to pervasive exceptions of fundamen-tal sorts.

Whatever the architecture of morphological theory, it is quite unlikely thata notion such as Corbett’s canonical inflection has systematic status within it.The study of surface patterns manifested in E-languages, of course, providesvital evidence about the nature of language, but they are not themselves the ex-plananda of grammatical theory. It is, rather, the structure of I-language objects(grammars) that we seek to account for.

Of course, we are left here (as in other areas of grammar) with the prob-lem of how to identify genuine constraints on the human cognitive capacityfor language, constraints that are true universals and thus appropriate for in-corporation into the theory of synchronic grammars. There is little doubt thatsuch constraints exist, but the primary source of evidence for proposals in thisarea has been the identification of widespread patterns – typological general-izations, in Kiparsky’s formulation. Once we recognize that these are actuallydue, at least in a great many cases, to factors other than the structure of UG,our search for an appropriate theory of grammar becomes much harder – but noless fundamental.

References

Anderson, Stephen R.1973 Remarks on the phonology of English inflection. Language & Liter-

ature I (4): 33–52.

Anderson, Stephen R.2004 Morphological universals and diachrony. Yearbook of Morphology

2004: 1–17.


Anderson, Stephen R.2005 Aspects of the Theory of Clitics. Oxford: Oxford University Press.

Anderson, Stephen R.2008 Phonologically conditioned allomorphy in Surmiran (Rumantsch).

Word Structure 1: 109–134.

Anderson, Stephen R. and David W. Lightfoot2002 The Language Organ. Linguistics as cognitive physiology. Cam-

bridge: Cambridge University Press.

Blevins, Juliette2004 Evolutionary Phonology. The emergence of sound patterns. Cam-

bridge: Cambridge University Press.

Garrett, Andrew2008 Paradigmatic uniformity and markedness. In Language Universals

and Language Change, Jeff Good (ed.), 125–143. Oxford: OxfordUniversity Press.

Kiparsky, Paul2008 Universals constrain change; change results in typological general-

izations. In Language Universals and Language Change, Jeff Good(ed.), 23–53. Oxford: Oxford University Press.

Lounsbury, Floyd Glen1953 Oneida Verb Morphology (Yale University Publications in Anthro-

pology 48). New Haven: Yale University Press.

Newmeyer, Frederick J.2006 Possible and Probable Languages. Oxford: Oxford University Press.

Zwicky, Arnold M.1987 Suppressing the Zs. Journal of Linguistics 23: 133–148.

Exceptions and what they tell us: reflections onAnderson’s comments1

Greville G. Corbett

1. Introduction

When you have a paper that you have researched, drafted, redrafted, polished,received helpful comments on, rewritten … and now wish to send off, thereare (at least) three questions you may ask. Where (to send it)? Why (have I gotthese results)? So what?

When the paper is for an edited collection, the first question is already an-swered. The second question, one may hope at least, has an answer along thelines that the paper represents a small step beyond what was there before, andin turn poses new issues. It fits in the circle of problems and partial solutions,through which we spiral upwards. It is the last question – ‘So what?’ – whichcan be daunting. It is therefore a wonderful luxury to have someone else askthe question and attempt to answer it, with the interest and persistence shownin this case by Anderson. It is tempting to say ‘Thanks, Steve, you’ve done thejob, and let everyone read your suggestions.’ But that would be missing theopportunity for debate which the editors have offered.

Anderson asks ‘so what?’, and offers two answers to the question: first that“exceptionality in inflectional morphology finds its importance not directly interms of comparison between surface forms, but rather in the grammar that un-derlies them” (section 2), and second that “morphological theory, per se, hasno place for the notion of such an ideal structural type”, where the ideal is thefully canonical system, set up as a standard of comparison (section 3). At therisk of ruining the suspense, let me say that I agree with these two points, andwill develop them, following Anderson’s sections.

1. I thank Stephen Anderson for his comments, and Matthew Baerman for some noteson my response.


2. The locus of exceptionality

Anderson points out that: “the observation of a surface pattern deviating fromthe canonical one only raises the question of what lies behind it, rather than serv-ing as a (self-confirming) diagnosis of the nature of the exceptionality.” Thatis, we find numerous deviations from the canonical, but so what? Andersonconcentrates on two phenomena, suppletion and syncretism. For suppletion hepoints out that (a) there are forms which are so far apart in phonological terms asto lead us to believe they are suppletive, and yet may follow from phonologicalregularities; (b) conversely, apparently close forms may not be subject to anyphonological rule and may be – surprisingly – suppletive (see Corbett 2007 forfuller discussion of suppletion). Equally, concerning syncretism, he notes thatwhen a single surface form realizes distinct morphosyntactic specifications thismay be a matter of simple homophony or it may be an instance of systematicsyncretism (see Baerman, Brown and Corbett 2005 for examples and discus-sion). His general conclusion is that: “Exceptionality in morphology does notwear its diagnosis on its sleeve.”

This is right, and how boring morphology would be if it were otherwise.As a consequence, we need to be able to talk in a scientific way about ‘symp-toms’. Usage in this area is confused and confusing; introducing canonicity isintended to make our system of concepts sharper and more consistent, and so tocontribute to our understanding. However, the issue goes deeper. It is not justlinguists who are confronted by ‘symptoms’. Hearers are too. When a heareris confronted by a form which has more than one morphosyntactic description,there is theoretically a possibility of miscommunication. The strategies hear-ers use in this circumstance are a concern of linguistics, and it is an interestingresearch question if and when hearers pay attention to the source of the coin-cidence of form (whether it is an individual homophony or a systematic syn-cretism). Thus even when the surface coincidence is a result of totally regularrules, it is still of interest, in that we have to ask what such instances tell usabout comprehension.

The speaker is a problem too, since speakers (at least most speakers) don’tdo theoretical linguistics. Let us go back to the example of forms that havethe symptoms of suppletion (they are distant from each other in phonologicalterms) and yet can be related by regular phonological rules. Though we canwrite the rules, we may not be able to demonstrate that speakers use them. Itmay be that for (some) speakers the forms are stored, just as indisputably sup-pletive forms are. This is an area where eventually we may hope for help fromneurolinguists.

Exceptions and what they tell us: reflections on Anderson’s comments 137

3. The significance of exceptionality

Anderson argues that: “Whatever the architecture of morphological theory, it isquite unlikely that a notion such as Corbett’s canonical inflection has system-atic status within it.” Indeed, the examples of exceptions I provided, showingthat there is no principled limit to the range of exceptionality, help confirm An-derson’s view that the regularities we do observe are to be attributed to externalsources rather than to Universal Grammar.

Anderson’s conclusion that canonicity does not have a systematic statuswithin morphological theory is also reasonable, in my view. To change the anal-ogy, in many fields of investigation we may observe that the units of measure-ment and the means to facilitate measurement have no systematic status withinthe theory. Theories do not incorporate millimetres or microscopes. Yet hav-ing generally agreed standardized units of measurement and tools which aid inmeasurement is of immense value in moving research forward. The canonicalstandard offers a point of reference, from which we can calibrate the real lan-guage examples we discover, and in particular those which are most relevantfor morphological theory. Deviations from canonicity all demand an explana-tion, and instances of higher order exceptionality may well prove particularlysignificant. We investigate them in order both to map out the extent of the ex-ceptionality we find, and to enable our thinking about what this tells us abouthuman linguistic ability.

References

Baerman, Matthew, Dunstan Brown, and Greville G. Corbett2005 The Syntax-Morphology Interface: A Study of Syncretism. Cambridge:


Corbett, Greville G.2007 Canonical typology, suppletion and possible words. Language 83:

8–42.

How do exceptions arise?On different paths to morphological irregularity

Damaris Nübling

Abstract. In order to understand the function of a certain phenomenon, it is instructiveto analyze how this phenomenon emerged. Inflectional irregularities are often under-stood as residuals of phonological change which were not subject to analogical leveling.This article describes four different paths through which irregularity may arise and re-veals several connections with other linguistic phenomena. The results are based on astudy of the diachronic development of six highly frequently used verbs in ten Germaniclanguages. In addition, the precise position of the irregularity in the word form as wellas in the paradigm is examined. Furthermore, the impact of the inflectional category onthe degree of irregularity is discussed. Finally it is shown that irregularity often leads tooverdifferentiated paradigms. Therefore, it is argued that irregularity is more adequatelydescribed as differentiation leading to a higher degree of morphological distinctiveness.

1. Introduction

Many, if not all, inflecting languages have a certain amount of irregularity intheir grammatical forms. For some time it has been recognized that the mostcentral part of the language, i.e. the most frequently used units, constitute mostof the exceptions of a grammatical system, or, in the terms of Corbett (thisvolume), of the “canonical inflection”. This article focuses on verbs in Ger-manic languages. Almost all articles about irregularity give the impression thatirregular forms are very old, if not present from the first written records of alanguage.

Many historical grammars end with the so-called “anomalia”, a group ofexceptions in the sense of extremely irregular, sometimes even suppletive, verbswhich do not share many common characteristics within their paradigms. Inmost of the Germanic languages, the so-called athematic verbs form this group,but sometimes other verbs such as contracted or otherwise “deformed” ones areincluded in this wastepaper basket. Most important is the fact that they occurvery frequently; thus their grammatically marginal status does not correspondat all to their importance on the performance level.

140 Damaris Nübling

Yet the question as to how and why these verbs became irregular neverarises. In the case of the aforementioned athematic verbs (be, do, go, stand),the answer can only be given by Indo-European linguistics. But in many othercases, irregularization developed only after the first written records of the re-spective language and can thus be documented step-by-step. For example, everyGermanic language has the verbs have or say, two originally weak (ie. regu-lar) verbs which developed highly irregular patterns in every of the Germaniclanguages, especially have, which is involved in the grammaticalization of thepresent perfect; for the irregularization process especially of these two verbs,I refer – in order to avoid repetitions – to Nübling (2001a). Looking for thechronology and the most important steps of this development involves divinginto the footnotes of historical grammars, if they mention these processes atall. No historical grammar is really interested in deviations from the rules. Thismakes it very difficult to document irregularizations. However, studying the dif-ferent instances of irregularization reveals a sort of “regularity of irregularity”,meaning that on the one hand it is often the same type of verbs which becomesexceptional and, on the other hand, a number of special paths to irregularizationwhich lead to the same goal can be observed.

In the following, the most important paths will be described shortly (Sec-tion 2). Then, the precise place of irregularization in the word as well as in theparadigm will be defined (Section 3). Section 4 looks at the categories which areaffected by irregularity. It will be shown that tense, for instance, tends to moreexceptions than person. Finally, the substitution of the negatively-connotatedterm of irregularity by the more appropriate term of differentiation or dis-tinctiveness (Section 5) is proposed. None of the studied cases developed syn-cretisms – quite the contrary: paradigms which are affected by irregularities aremore differentiated than completely regular paradigms. Section 6 summarizesthe most important results.

All the following data are based on the systematic investigation of the di-achronic development of the six irregular verbs have, become, give, take,come, and say, in ten Germanic languages, which was conducted by Nübling(2000). These verbs underwent a visible process of irregularization. Roughly1000 years ago they belonged to regular strong or even weak verbs. Four tradi-tionally irregular verbs were added there in an appendix (be, do, go, stand) be-cause they sometimes influenced these six verbs in departing from their originalinflection. In the following, only some instructive examples can be presented.Since significant cross-linguistic investigations of the emergence of exceptionalmorphology have yet to be conducted, the main focus of this article is empir-ical rather than theoretical. Nevertheless, it should be stressed that the devel-opment of irregularity violates what Naturalness Theory (Mayerthaler 1981)

How do exceptions arise? On different paths to morphological irregularity 141

terms as universally valid “naturalness principles”. This theory postulates clearone-to-one relations in morphology, comparable to what Corbett describes as“canonical inflection”: uniform stems (lexemes) in combination with uniformand transparent affixes. Corbett, however, calls it an ‘idealization’ (this volume)from which real inflectional systems usually diverge. For Naturalness Theory,this scheme constitutes the goal of language change. The response of this the-ory to irregularity is the elimination of deviant forms through analogical lev-eling. Even the system-dependent morphological naturalness which introducedso-called system-defining structural properties cannot solve the problem of realinflectional exceptions without any integration into an inflectional class (Wurzel1984/2001, 1994a). Later, as a reaction to an intensive debate with representa-tives of the Economy Theory (Werner 1987a, 1987b, 1989; Ronneberger-Sibold1980) a further Natural Principle, Distinctiveness in the “Me-First”-domain,was introduced (Wurzel 1990, 1994b) allowing exceptions in the basic vocab-ulary. However, this domain never was thouroughly defined, and there was nointerest in asking where these irregularities come from. Therefore more appro-priate theories will be considered such as the Dual Processing Model, the As-sociative Network Model and the Economy Theory. Incidentally, most of theselinguists1 are not interested in the emergence of irregular forms. Even Bybee ismainly concerned with the mental representation of irregular morphology andalmost always speaks of “maintainence of irregularity and suppletion” (Bybee1988: 132). Therefore, this article concentrates on the diachronic emergence ofirregularity.

2. Different ways of becoming an exception

There are at least four different paths through which exceptions arise. The mosttraditional one consists in the preservation of the effects of sound changes whichtend to automatically lead to morphological diversity and heterogenity (2.1). Asecond path is what we call accelerated sound change, i.e. there are deviantphonological changes which occur only in conjunction with high token fre-quency (2.2). Here, the effect of irregularity is reached faster than in the firstcase. Still more striking are morphological changes which lead to disorder. Usu-ally, it is assumed that morphological change, mostly in the form of analogy,leads to paradigmatic order and homogenity, but here, the reverse case can alsobe observed (2.3). The most drastic method is the mixing of different lexemes

1. The rise of irregularity through sound change is documented in Werner (1977).Ronneberger-Sibold (1987, 1990) deals with the diachronic emergence of supple-tion.


to form one paradigm (strong suppletion). Here, the effect of maximal irregu-larity is achieved the fastest. Only extremely token-frequent items are affectedby this extraordinary strategy (2.4). It is obvious that the strategies from 2.1to 2.4 require progressively less time to create short and differentiated forms,which is the more positively-connotated term for irregularity.

2.1. Accumulated sound shift (without analogical leveling)

This path to irregularization is the most frequently described path and can besubsumed under the “passive” type, meaning that the effects of sound change isaccepted or accumulated by morphology. Every word undergoes phonologicalchange, which, in most cases, is reductive. Therefore, it should be expected thatall the forms of a paradigm change in the same way, leading to shorter, but notmore differentiated forms. Since many sound changes consist of assimilationsin the sense that they are conditioned by the phonological surroundings, theoutputs consist not only of shorter but also more heterogeneous forms. Of themany possible examples, we will discuss only a few. One of the most promi-nent of these is NHG sein ‘be’ with the two finite forms ist (3rd sg.pres.) ‘is’and sind (3rd pl.pres.) ‘are’, which synchronically show the maximum possibleirregularity, i.e. total suppletion, although they diachronically developed in acompletely regular fashion (the IE forms still are regular):

Table 1. Suppletion through regular sound change.

Nr. Prs. IE > GMC > OHG > NHG

Sg. 3 *és-ti > *ist(i) > ist > is(t)

Pl. 3 *s-énti > *sinti > sint > sind

In IE the accent position could change depending on the ablaut stage of the re-spective verb (here: full grade *és- in the singular and zero grade *s- in the plu-ral). The following developments consist of completely regular sound changes.The decisive point is that analogy never took place to level the highly-divergedforms. Suppressed analogy would therefore be the more appropriate term forthis path. This also holds for Engl. was vs. were going back to Verner’s Lawwhich was effective more than two thousand years ago and which was neverlevelled.

Another example of accumulated sound change is hebben ‘have’ in Dutch(Table 2).

The irregular finite form heeft (3rdsg.pres.) ‘has’ containing the fricative -f-is the regular result of the preservation of Middle Dutch hevet < *habid. MiddleDutch hevet [he:v@t] was later syncopated to heeft whereby the [v] before [t] be-


Table 2. The paradigm of hebben in Dutch.

tense number person hebb-en [hEb@(n)]

pres. Sg. 1 heb [hEp]

2 hebt [hEpt]

3 heeft [he:ft]

Pl. 1–3 hebben [hEb@(n)]

past Sg. 1–3 had [hat]

Pl. 1–3 hadden [had@(n)]

past part. gehad [X@"hat]

came voiceless. The other forms were affected by the West Germanic consonantgemination and therefore show [b] today. Today, heeft is preserved whereas allthe other paradigms (except for zijn ‘be’) show syncretisms in the 2nd and 3rd sg.(therefore *hebt should be the expected 3rd sg.): the analogous change did nottake place for the other forms. This is what we call the “passive way” to irreg-ularity: The effects of sound changes are preserved while analogy is blocked.

The same holds for the past which shows vowel change from e to a. Hebbenis the one and only remainder of the so-called rückumlaut verbs. This specialgroup of weak verbs underwent i-umlaut in the present and no umlaut in the past.German still has brennen – brannte ‘burn – burned’ and five further examples.In Dutch, all but one of these rückumlaut verbs were leveled through analogy.The only exception is hebben, which synchronically resembles the strong verbs:not only because of its tense-related vowel change, but also because of its mono-syllabic past pl. form had instead of the obligatory bisyllabic weak forms (suchas werkte ‘worked’). Due to the difference between had and the correspondingpresent form heeft, apocope could take place as a further regular sound change:Middle Dutch hadde > Dutch had [hat]. The only irregular reduction is the elim-ination of the root-final consonant in the past and in the past participle, whichwas already absent in Middle Dutch; this is an example of accelerated soundchange, the subject of Section 2.2.

The selective effect of analogy depending on frequency rather than onmarkedness principles is described by Bybee:

It is important to note that analogical leveling does not take place in all forms withthe relevant alternation at once. Rather, it tends to affect the less frequent lexicalitems first: thus weep/weeped is susceptible to change, but keep/kept is not likelyto change to keep/keeped immediately. This is because the model for the pasttense of keep is highly available, while the model for the past of weep does not


occur as often. […] While some have argued that it is the system-internal marked-ness structure that determines the allomorph that survives in leveling, there isevidence that the relative frequency of the forms in which the allomorphs occuris the determining factor. (Bybee 1994: 2560)

With regard to Dutch hebben, it can be shown how a rather regular and wellintegrated verb became an exception because it resisted analogical change.

2.2. Accelerated sound change (without analogical leveling)

In the above-mentioned sample of ten verbs in ten Germanic languages, ac-celerated sound change plays a very important role on the road to irregularity.One rather frequently occurring example is the loss of the stem-final conso-nant -b(b)- in the past of Dutch {hebb}{en} which is {ha_}{d} in the singularand {ha_}{dden} in the plural; here, it lacks the -b-, indicated by “_”. The factthat a part of the lexeme is affected by reduction leads to “non-canonicity” inthe sense of Corbett (in this volume): “2. in terms of lexical material in thecell, we require absolute identity (the stem should remain the same)”. Most ofthese sound changes occur only in selected forms of the paradigm. This leadsto paradigmatic “disorder”.

In many Germanic languages, Have provides evidence for this, cf. Engl.has instead of the regular form *haves (cf. behaves), and had, NHG hast, hat(2nd/3rd sg.pers.pres.), hatte (past). In present-day spoken German, a furtherwidespread irregular development is at work, namely the contraction of haben> ham. This contraction does not occurr with verbs of the same pattern suchas graben which never becomes *gram ‘dig’ (except in some dialects). Thesame holds for Swedish ha (inf.), har (pres.) and hade ["had:e]. In the supinehaft, however, the old -f- is preserved. In Danish and Faroese, this loss is hid-den by orthography but nonetheless exists: Dan. have [hæ] ‘have’ and havde[hæ(:)ð@] ‘had’ (vs. haft [hafd] in the supine), Far. hevði [hEij@] (past sg.) andhøvdu [hœd:U] (past pl.) where <v> is no longer pronounced. In Faroese evena vocalic number distinction emerged, which usually only occurs with strongverbs. Other languages such as Luxembourgish and Norwegian jettisoned theroot-final consonant in every position so that this loss does not help in differen-tiating the paradigm, only in shortening the forms. In these languages, however,further irregularities brought about irregularity. For similar developments withsay, cf. Nübling (2001a).

A very instructive example is come in such different languages or dialects asSwiss German, Luxembourgish, North Frisian, and Icelandic, where the stem-final consonant [m] was assimilated to [n] before the dental of the inflectionalending -s and -t, respectively: Lux. *këmm-s > kënn-s (2nd sg.) and *këmm-t


> kënn-t (3rd sg.). This, however, only happened in the present singular, notin the 2nd plural (komm-t) nor in the past (koum-s: 2nd sg. past). Obviously,these assimilations are driven by high token frequency. There are no frequencylists for Swiss German, Luxembourgish, and North Frisian but in spoken Ger-man, kommen constitutes the third most frequently used verb (Ruoff 21991),and West Frisian komme is the fifth most frequent one (Van der Veen 1984).Table 3 demonstrates this cross-linguistic parallel:

Table 3. Selective assimilations (bold print) in the present of come in Swiss German,Luxembourgish, and North Frisian (Wiedinghard).

number person Swiss Germancho

Luxembourgishkommen

North Frisiankäme

sg. 123

chum-echun-schchun-t

komm-enkënn-skënn-t

kämkän-stkän-t

pl. 123

chöm-echöm-etchöm-e

komm-enkomm-tkomm-en

käm-ekäm-ekäm-e

In Icelandic, it is the 2nd sg. imperative, which contains this exceptionality al-though hidden by writing: komdu [’khOndY].

In Swiss German, the very frequent verb nää ‘take’ shows the same excep-tion (nin-sch, nin-t), but no other verb with stem final [m]: In all these cases, thepartial assimilation worked regressively. Some Low German dialects, however,show the reverse direction, in this case again restricted to kuemmen ‘come’ andniëmen ‘take’ where it is the inflectional ending -t (3rd sg.) which is bilabial-ized to -p after -m: kümp < kümt ‘comes’, and nimmp < nimmt ‘takes’ (Lindowet al. 1998: 122). As these examples prove, it is not only the lexical but alsothe categorial token frequency which promotes assimilations and brevity; thepresent is always more strongly affected than the past tense. The same holdsfor the singular vs. the plural and the indicative vs. the subjunctive. This re-futes the often-mentioned causes in historical grammars, such as “unstressedposition” and the like (Paul 1989: §§23, 109, 287). It is already hard to believethat verbs like come und take should be stressed less than other verbs, but it isstill harder to believe that only some parts of a paradigm should be affected bythis.


In the case of come, a further reduction can be observed: This verb goes backto the Germanic stem *kwem- (cf. OHG queman ‘come’).2 In most Germaniclanguages the stem was first assimilated to *kwom- (progressive rounding ofe > o after w) and then reduced to kom-. Other verbs with the same onset,such as OHG quedan ‘speak’ (which later died out) and quelan ‘well, pour’,did not undergo this process (cf. NHG quellen and not *kollen). In most ofthe languages, this onset simplification spread to the whole paradigm, except inDutch komen were the past still preserves the old cluster: kwam – kwamen (pastsg. – pl.). It does not appear to be a coincidence that the semantically markedand, above all, less token-frequent past preserved the longer (more complex)form.

A third example is presented by what are termed short or contracted verbs,which lost their stem-final consonant and later contracted to a monosyllabicword, such as MHG hân < OHG haben ‘have’, and lân < OHG lazzan ‘let’.Neither [b] nor (geminated) [s:] were affected by loss. Here, usually two expla-nations are offered by historical grammars: an extraordinary loss of consonantor an analogical process directed by the traditionally short athematic verbs, suchas gan ‘go’ and stan ‘stand’ in OHG (Paul 1989: §§283–288; Michels 1979:§284; Mettke 2000: §145). Some further similarities to the athematic verbs,such as (temporary) strong past forms with ablaut in hie ‘had’ (after gie ‘went’),point in the second direction and should therefore be treated in 2.3 due to thefact that they are morphologically conditioned.

The same holds for the Swedish short verbs, such as ha < hava ‘have’, ge< giva ‘give’, bli < bliva ‘become’, ta < taga ‘take’, dra < draga ‘draw’,etc. Here, the new short forms clearly occur more frequently than the still co-existing long forms (Allén 1972; Östman 1992; Teleman et al. 1999, vol. 2).Other forms of the paradigms, especially semantically marked forms (past, su-pine), preserved the old forms with root final consonant: haft, gav – givit, blev –blivit, tog – tagit, drog – dragit. As already illustrated, the stem final consonantplays an important role in the irregularization process.

2.3. Morphological change

As already mentioned, there are contracted verbs such as MHG hân ‘have’ andlân ‘let’ which borrowed irregularities from other verbal paradigms. This sortof irregularization can be called the ‘active’ type because the following pro-

2. There still is one remnant of OHG queman, the NHG adjectival derivation bequem‘comfortable’ which preserved the old complex onset.


cesses are located on the morphological level. However, this term should notbe misunderstood in the sense that verbs are acting subjects.

Frisian provides a striking example for a morphologically conditioned irreg-ularization: In the past tense, all four athematic verbs cluster together by form-ing a schema of a monosyllabic past tense ending in -ie (without any stem-finalconsonant) in the singular and with an epenthetic -n- (whose origin remainsrather unclear) in the plural. The origin of the diphthong ie has yet to be locateddefinitively, but it is certain that it arose due to various analogical processes.This is confirmed by the weak verb ha ‘have’ which also shows a short, strongpast tense: hie ‘had’ (Table 4).

Table 4. Analogous processes in the past tense of Frisian verbs.

nr. infinitive pret. sg. pret. pl.

1.2.3.4.5.

gean ‘go’stean ‘stand’dwaan ‘do’wêze ‘be’ha ‘have’

giestiediewiehie

giene(n)stiene(n)diene(n)wiene(n)hiene(n)

In the Frisian case, irregularity is concentrated within a small group of similarverbs the only common trait of which is high token frequency. Even originallyweak verbs entered this small class and achieved brevity and distinctivenessthrough the most direct path. Many such irregularizations via analogy to moreirregular patterns can be found, especially in Frisian, e.g. the past of jaan ‘give’,which is joech [ju(:)X] and which is the result of a partial analogy to sloech[slu(:)X], the past of slaan ‘hit’. The same is true for droech [dru(:)X], the pastof drage ‘carry’. There are no more verbs which form their past like this.

The same holds for the infinitive jaan in relation to the infinitive slaan ‘hit’and/or dwaan ‘do’: The Old Frisian form for ‘give’ was jeva (inf.) which devel-oped to jewa, later contracted to jâ, and eventually adopted the n from the shortverbs slân ‘hit’or dwân ‘do’. The Old Frisian stem forms of the strong verb‘give’ were jeva (inf./pres.) – jef (pret.sg.) – jeven (pret.pl.) – (e)jeven (pastpart.), i.e. the ablaut distinction had been completely eliminated as the vocaliccontrast was leveled by regular sound change. Today this verb, containing thestem forms jaan – joech – jûn, is once again highly differentiated thanks to thisastonishing, interparadigmatically-motivated, irregularization strategy. This istermed “differentiation analogy” (c.f. the dashed bold arrows in Figure 1). Mostdecisive is the fact that not only ‘structure’ was transferred (as, e.g., in bring –brang after sing – sang) but real substance (jef → joech from droech, sloech;underlined in Figure 1).


Figure 1. Irregularization strategies of Frisian jaan ‘give’.

Another example is the present of the Frisian dwaan ‘do’, doch- in the singularand dogge in the plural. This is the result of analogy to the respective presentforms of Frisian sjen ‘see’ and tsjen ‘draw’, sjoch-/sjogge and tsjoch-/tsjogge.It is characteristic that only the beginning of the word forms, the most salientpart, remains unchanged (cf. Section 3).

These examples of interparadigmatic analogy could be regarded as evidencefor the usage-based Associative Network Model in the sense proposed by By-bee (1985, 1988, 1991, 1995, 1996). Due to their high token frequency, thelexical strength of Old Frisian sjen ‘see’ and tsjen ‘draw’ presumably was sohigh that they constituted a type of schema for dwaan ‘do’ (there is no fre-quency list for Old Frisian; for Modern West Frisian c.f. the “frekwinsjelist”in Van der Veen 1984). In addition, these verbs shared a similar phonologicalpattern (monosyllabics ending in -n), which generally strengthens network con-nections. Still more remarkable is the case of West Fris. jaan ‘give’ which wasassociated to slaan ‘hit’ and drage ‘carry’. In this exceptional case it must betaken into consideration that jâ(n) had a lack of intraparadigmatic differentia-tion. The Network Model does not make a strict distinction between regular andirregular morphology in contrast to other models such as the Dual-ProcessingModel, which postulates that regularity and irregularity are processed in differ-ent modules of grammar. Transitions from regular to irregular and even supple-tive verbs can be more easily integrated in the first model although the latter oneclearly predicts that high-frequent items behave differently from low-frequentones. Yet none of these models was concerned with the emergence of irregu-larities.


Another way to irregularity by morphological processes is represented bythe mixed paradigm of German haben ‘have’ (the short forms are underlined):

– present: habe vs. hast, hat vs. haben, habt, haben– past: hatte etc., subjunctive: hätte etc.– past participle: gehabt

Only language history tells us the story of this mixed paradigm, which emergedby the combination of two originally complete paradigms, one with the old longforms (MHG haben) and the other one with the contracted forms (MHG hân).Thus, OHG haben was split into two MHG paradigms, the first one serving asa lexical verb, the second one as an auxiliary. In ENHG these paradigms werereunified to one with the above-sketched distribution. Another morphologicallybased process in this paradigm is umlaut in the subjunctive form hätte, whichdeveloped in analogy to the strong verbs, e.g. gäbe ‘would give’. Weak verbsnever developed umlauted subjunctives.

In a similar way, Swedish ge ‘give’ mixed short and long variants: Therewere two variants, geva and giva, and their respective short forms, ge and gi,from which the first (short) one is used to form the infinitive and present andthe other (long) one the past and the supine, i.e. ge and giva were combined.The result is a highly differentiated paradigm: Swedish ge [je:]/ger [je:r] – gav[gA:v] – givit [ˇji:vIt] (instead of *gevit). Thus, the paradigm received threedifferent vowels (e – a – i) instead of the original two (e – a – e), and there is inaddition an alternation between short and long forms. Here, brevity correlateswith the more frequently-used categories (present).

2.4. Lexical fusion

As shown for the cases of morphological change, some verbs become inter-paradigmatically “active” based on the example of other verbs. This is the casewith lexical mixing, i.e. the combination of two different verbs into one para-digm (inflectional split). Here, English offers a famous example with go –went –gone by having combined go and ME wend, the latter providing the past wentand replacing the former OE form eode. The already irregular form eode wasreplaced by a shorter form.

Janzing (1999) describes two cases of mixing of East Frisian paradigms,loope ‘go’ and sjoo ‘see’:

– loope (inf.)/loopt (3rd sg.pres.) versus ron (3rd sg. pret.) – ronen (past part.)


The ron-forms derived from a verb meaning ‘run’ (cf. NHG rennen). In thecase of sjoo, the border separates the past participle from the rest:

– sjoo (inf.)/sjucht (3rd pres.sg.) – saach (3rd sg.pret.) versus bloouked (pastpart.).

For the second verb there is a corresponding verb in German, blicken ‘look’.The most frequent verb, be, is even based on three different IE verbal roots:

IE *es- ‘be’, *bhû- ‘grow’, and *wes- ‘stay’. *Bhû- is only preserved in theWest Germanic languages, e.g. in the onset of Fris./NHG bin, bist, Dutch ben,bent, Lux. bas, and in Engl. be, been. Thus, forms such as bin, bist even consistof the syntagmatic combination of two verbs, *bhû- and *es. This is another(and only rarely described) path to irregularization: Different verb do not onlyfuse on the paradigmatic but even on the syntagmatic level.

The most extreme case of interparadigmatic fusion is the mixing of loanedand inherited verbs, such as Swed. bli < bliva, a loan from Middle Low Ger-man blîven which, in spoken Swedish (and Norwegian), is combined with itsnative predecessor varda to bli/blir – vart – blivit (for the emergence and forsome reasons of suppletion, cf. Werner 1977; Ronneberger-Sibold 1987, 1990;Nübling 1999; Mel’cuk 2000).

3. Position of irregularity in the word and in the paradigm

The position of irregularity can be understood both syntagmatically and paradig-matically, i.e. in the word form as well as in the paradigm.

The comparison of six irregular verbs in ten Germanic languages led to theclear result that the place of the deviation in most cases is the stem final con-sonant which is either modified (as in cases of Verners Law: Engl. was vs.were) or deleted (underlined): Engl. I have vs. she ha_s, ha_d; NHG habe vs.ha_st/ha_t, ha_tte, gehabt. Due to ablaut, the stem vowel often is modified,too, but this only counts as irregularity if there is a unique, isolated alternance,such as NHG kommen – kam – gekommen and werden – wurde – geworden;there is no further [O]-[a:]-[O]- and [e:]-[u:]-[o:]-alternation in German. Anotherexample is worden – werd – geworden in Dutch. Some languages use quanti-tative instead of qualitative vocalic alternations, e.g. Engl. say [sei] vs. says,said [sez]: Here, we can oberserve an alternation between [ei] and [e]. In NHGh[a:]be vs. h[a]st, h[a]t, there can be found an opposition between long ver-sus short vowels. Sometimes both positions are used. Thus the onset remainsthe only stable part of the whole paradigm. This leads to what is called weakor partial suppletion. In the extreme case of strong suppletion, even the onset


is modified. It is characteristic that the effects of phonological change in thebeginning of the word often are extended to the whole paradigm as could beseen in the case of Engl. come, NHG kommen < GMC *kwem-, whereas the re-sults of changes in the stem-final position often are maintained if the word formoccurs frequently. Even in the above mentioned cases of differentiation anal-ogy (Figure 1), it is always the first segment which remains stable. This factcorresponds to word recognition tests which indicate that the word-initial posi-tion is most salient (Cutler et al. 1985; Hawkins and Cutler 1988; Fenk-Oczlon1989).

There also were clear results with regard to the paradigm: The irregularforms nearly always reflect high token-frequency, i.e. in the singular signifi-cantly more exceptions could be expected than in the plural (there are manyuniform plurals, but uniform singulars are not as frequent). The same holds forthe present vs. the past and for the indicative vs. the subjunctive distinction,and even for the 3rd and 1st person vs. the 2nd person. This is supported by thefact that the less frequent forms often show the longer forms, as already seenin Dutch with kom-t – kwam (‘come’: 3rd sg.pres. vs. 3rd sg.past) or NHG steh-t [Ste:t] – stand [Stant] (‘stand’: 3rd sg.pres. vs. 3rd sg.past) and geh-t [ge:t] –ging [gIN] (‘go’: 3rd sg.pres. vs. 3rd sg.past). In NHG haben, the short formsof the present only are found in the singular, not in the plural. The same holdsfor Dutch, where komen [ko:m@n] (as the one and only verb with quantitativedistinctions in the present) shows short [O] in the singular, but long [o:] in theplural: ik kom, jij/zij/hij komt [O] vs. wij/jullie/zij komen [o:]. Thus, a type of di-agrammatic iconism, which corresponds primarily not to markedness but ratherto token frequency is observed, whereby both often cannot be separated neatly.However, there is evidence to support a higher ranking of the frequency fac-tor (Fenk-Oczlon 1991). Table 5 presents the token frequency of grammaticalcategories in spoken German.

As can be seen the 3rdsg.pres.ind. constitutes the most frequently-used verbform. Indeed, it is precisely this form which carries most of the irregularities,which are usually combined with brevity of expression, cf. Engl. say vs. says[sez], do [du:] vs. does [d2z], am, are vs. is. German even preserved the so-called wechselflexion, a systematic vowel change still found in the present sin-gular of roughly 55 strong verbs, which separates the first from the third (andthe second) person: werfe – wirfst/wirft ‘throw’ (Nübling 2001b). This wech-selflexion pattern, although unproductive for quite some time, is reflected inirregularizations: habe vs. ha_st, ha_t, Swiss German ha vs. hesch/het, gibvs. gi_sch/gi_t ‘give’, sag vs. saisch/sait ‘say’, Lux. soën vs. sees/seet ‘say’,huelen vs. hëls /hëlt ‘take, fetch’ (an originally weak verb which became com-pletely strong). Dutch did not develop wechselflexion, but the present singular


Table 5. Token frequencies of the grammatical categories of the verb in spoken German(based on Tomczyk-Popiñska 1987).

person/number person number tense mood

cat. % cat. % cat. % cat. % cat. %

3rd sg. 47,8 3rd 66,3 sg. 76 pres. 76,9 ind. 90

1st sg. 23,4 1st 28,7 pl. 24 past 10,3 subj.II 7,3

3rd pl.3 18,5 2nd 5,0 pres.perf 9,35 imp. 2,7

1st pl. 5,3 pluperf. 1,25 subj.I 0,06

2nd sg. 4,8 future 1,13

2nd pl. 0,2

of the second most frequently used verb hebben ‘have’ contains the specialform heeft (c.f. Section 2.1 and Table 1). In many cases these processes leadto over-differentiated paradigms, i.e. syncretisms can be broken (cf. Section 5).This even holds for English and Swiss German which usually don’t have wech-selflexion distinctions.

With regard to lexical token frequency, there is also a strong, well-knownrelationship with irregularity. Lexical frequency can change because certainverbs are substituted by new expressions (this happened, e.g., with OHG quedan‘speak’) or they simply become more seldom because the corresponding activ-ity isn’t executed as often as in earlier times (many manual activities such as‘bake’, ‘milk’, ‘clip’, etc.). Such verbs with decreasing frequency often developfrom strong or even irregular verbs to uniform weak verbs. On the other hand,grammaticalizing verbs increase quickly in frequency. These verbs take the op-posite path by becoming strong or even irregular. This is clearly confirmed byhave, say and make in many languages.

4. The categorial split caused by irregularization

Not every exception is accepted by the speakers. A strong correlation betweengrammatical category and irregular expression can be observed. In most of thecases, tense (and/or aspect) are the categories which often are expressed byexclusive means. All the above-mentioned instances of suppletion involve thiscategory (Engl. go – went, Swed. bli – vart). In the case of the most frequentlyused verb, Engl. am – are – is, NHG bin – ist – sind, it is, however, person andnumber – in addition to tense (Engl. is – was, NHG ist – war). In those cases,

3. The polite form Sie is included.


Figure 2. Relevance degree of some inflectional categories in Germanic languages(based on Bybee 1985).

very high

high

medium

TENSE MOOD NUMBER PERSON

+ relevantfused

– relevantconcatenative

ow

token frequency]

Figure 3. Inflectional categories and their (suppletive, irregular, regular) expression de-pending on token frequency and relevance.

it could be found out that a suppletive expression of less relevant categories(in Bybee’s sense) always implies suppletion in more relevant categories (cf.Figure 2 and 3). The same also holds for lower degrees of irregularity: Irregularexpression of a less relevant category usually implies irregular expression ofmore relevant categories; usually the degree of irregularity (and thus fusion)increases from right to left on the scale in Figure 2 and 3.

In the case of mood, languages with synthetic mood expression (such asGerman) should be looked at. Here, the most frequent verb sein ‘be’ providesevidence for the implicative scale above (here for the 3rd person sing.): ist –sei – wäre (ind. – subj.I – subj.II). Suppletion is nearly omnipresent in thisparadigm. The Faroese past of hava, hevði [hEijI] (past sg.) and høvdu [hœd:U]


(past pl.) serves as an example for the strengthening of the number distinction.Other languages also have a tendency to separate the singular from the plural.Looking at both the relevance and the frequency aspect, it can be concluded thatthe degree of irregularity depends first and foremost on the relevance degree ofthe respective category (Figure 3). Within this category, there are at least two(if not more) subsets, e.g. singular vs. plural within number and different timestages within tense. The borders between these different options are sharpened ifthe verb increases in its usage frequency. Here, the more frequently used subsetof the category is expressed by more irregular and shorter means (3rd personvs. the others, singular vs. plural, present vs. other tenses). Figure 3 shows thedependency of irregularity and suppletion on token frequency and relevance.

5. Irregularization as (over)differentiation

When the forms of a paradigm become shorter, syncretism should be expected.However, in the whole sample of the six verbs investigated here, there wasnot a single instance of the emergence of homonymy. On the contrary, thesefrequently-used verbs are – although often strongly reduced – sometimes moredifferentiated than their regular counterparts or predecessors. This can be easilyillustrated by some of the examples mentioned already, such as Engl. be whichis more distinctive than all other verbs not only in the present singular (sg.: am,are, is; pl.: are) but also in the past (sg.: was, were, was; pl.: were). In Dutch,the 2nd and 3rd sg. pres. always share the same ending -t while the 1st sg. has noending (ik werk-Ø – jij werk-t/hij werk-t). Only in the case of hebben ‘have’ andzijn ‘be’ this syncretism is broken: ik heb – jij hebt – hij heeft (stem alternation)and ik ben – jij bent – hij is (different stems, no ending). In the latter case this isdue to lexical suppletion, in the former to the preservation of an earlier assimila-tion. In Frisian the infinitive is always identical with the uniform present plural(meitsje – meitsje ‘make’), except for wêze ‘be’ vs. binne (pl.), jaan ‘give’ vs.jouwe (pl.), dwaan ‘do’ vs. dogge (pl.), and some more examples. In German,the short form hat (3rd sg.) breaks the regular syncretism with the 2nd pl. habt.The same is true for Lux. huelen – hëls, hëlt ‘take’ (1.-3.sg.pres.), an originallyweak verb (cf. NHG holen, Dutch halen) which adopted wechselflexion; here,hël-t (instead of regular *huelt) breaks the homophony with the 2nd pl. huel-t.

The past houl- shows the typical (analogical) uniform ablaut vowel -ou-(which can be traced back to an overgeneralization of the second ablaut class –cf. Werner 1990); the past participle geholl contains another vowel. Here, anadditional form of over-differentiation can be described: In Luxembourgish,a strong tendency to jettison the past and to replace it by the present perfect


(“preterite loss”) can be observed. Today only few of these verbs (between 10and 20) show the old synthetic forms, above all auxiliaries, including the modalverbs and some highly-frequent strong verbs such as goën ‘go’ and huelen.Thus, huelen has a quantitatively more differentiated paradigm than the averageverb. This is the type of overdifferentiation described by Corbett: “Lexemeswhich are overdifferentiated stand out from the rest of the group in that theyhave an additional form” (this volume).

English offers a further means of over-differentiation by the morpholo-gization of a category which is typically expressed syntactically: negation on(or even in) auxiliaries. In cases such as don’t < do not, shan’t < shall not,won’t < will not, ain’t < am not, the expression of negation even affects thelexical part of the verb. In other cases, the negation particle only cliticizesto the verb (hasn’t, isn’t, wasn’t). Here we can observe a categorial over-differentiation: highly-frequent verbs express more grammatical informationthan average verbs.

Another aspect of over-differentiation concerns the extent to which irreg-ularity affects the word. As already shown, the degree is often much higherthan the usual differences between the categories with the extreme of supple-tion. Therefore, the term irregularity, which only reflects the perspective of thelanguage describer, should be replaced by a term which denotes the real func-tion of this phenomenon. Irregularity means distinctiveness and allows for twocontradicting but highly-functional advantages from the speaker’s perspective:brevity without loss of information. This is the main topic of Economy The-ory as proposed by Ronneberger-Sibold (1980, 1987, 1990), Werner (1987a,1987b, 1989, 1991), Harnisch (1988), and Fenk-Oczlon (1990, 1991). Irreg-ular forms deserve our special interest and should not be seen as an accidentof language history which has been forgotten by analogy. Irregularity does nothappen by chance. It creates specific intraparadigmatic distinctions which fol-low strict principles such as the relevance of the respective category (e.g. tensebefore number) and their token frequency (present before past, singular beforeplural). Above all, lexical token frequency determines whether irregularity orrather distinctiveness occurs at all.

Finally, another interesting case of over-differentiation which cannot be ex-plained by the above-mentioned advantages should be presented: In our sam-ple, several instances of interparadigmatic irregularization could be found, i.e.some verbs leave their original class and continue as disintegrated and isolated“loners” – without the advantages of increased brevity and distinctiveness.For example, NHG werden – wurde – geworden (as well as Dutch worden –werd – geworden and Frisian wurde – waard – wurden) left its ablaut class(3b) without any apparent reason; all the other members, such aswerfen – warf –


geworfen (‘throw’), continue to use the vowel -a- of the old past singular stem(the second ablaut stage), whereas werden chose the third ablaut stage with -u-. This confirms the results of Bybee (1995), who shows that members of aclass with high token frequency are not associated with this class because theyare stored separately in the mental lexicon of the speaker. The so-called Net-work Model does not separate morphology into regular and irregular modules,but rather regards inflected forms as lexical units. Depending on their type andtoken frequency, they have different lexical connections to other forms (hightoken frequency leads to lexical strength, high type frequency to schemas). TheDual-ProcessingModel, proposed by Pinker (1991), Prasada and Pinker (1993),Pinker and Prince (1988) considers regular morphology as rule-governed, ir-regular morphology as memory driven and thus lexically represented. Betweenboth domains, there is a large scale. The left pole is connected with high produc-tivity and high type frequency, the right pole with restricted productivity andhigh token frequency. German werden was polygrammaticalized in ENHG (asauxiliary for the passive, for the future, as copula etc.) and thus became veryfrequent. Although originally formally well integrated in the 3rd ablaut class,it left this class when it became autonomous. When the reduction from fourto three ablaut stages took place in ENHG, this fact became obvious: Initiallywerden did not take part of this far-reaching development, i.e. it resisted ana-logical leveling4. 4. Eventually, it chose a different model, the third instead ofthe second ablaut stage.

The fact that even completely regular but highly-frequent verb forms canbe stored unanalyzed is important. This must have happened to the originallyweak verb have in all Germanic languages, resulting in extremely differentparadigms. Every language used its special irregularization strategies but noteven one “waited” for accumulated sound change.

6. Summary

The most important results from the comparative study of the irregularizationstrategies of six verbs in ten Germanic languages can be summarized as follows:

– Exceptions are to be expected under high token frequency; these exceptionsare highly functional.

– Exceptions emerge not only through accumulated sound changes which were“forgotten” by analogical leveling. They are also sometimes created by ex-

4. Even today speakers know the old four ablaut stages of werden: werden – ward –wurden – geworden at least passively. This is not the case with any other verb.


traordinary (accelerated) sound change, by morphological change, and evenby lexical fusion (suppletion).

– The position of irregularity is highly predictable and depends on two factors:semantic relevance and token frequency. It could be demonstrated that ex-ceptionality follows the relevance degree of the respective category. Withinthe same category the more frequently used subset is expressed exception-ally.

– The term irregularity should be replaced by brevity and distinctiveness.Suppletion as the most extreme form of irregularity allows for brevity ofexpression without the risk of information loss (syncretism). Thus, irregu-larity must be understood as a protection against syncretism. In contrast toCorbett (this volume), there was no interaction of suppletion and syncretism,but many examples for interaction of suppletion and overdifferentiation.

– In many cases, irregularity even creates more distinctions than are usuallymade in a paradigm, i.e. many instances of over-differentiation were ob-served, even including the morphological expression of categories whichare usually realized syntactically (negation on English auxiliaries). The for-mal distinctions often are deeper and sharper than in common paradigms.The expression of the category at the word level tends to modify the lexicalstem, starting with the stem-final consonant and the vowel and ending withthe word onset (suppletion). On the whole, allomorphy and the dissolutionof morphological structures (amorphous morphology) emerge.

Abbreviations

Dan. DanishENHG Early New High GermanFar. FaroeseFris. (West) FrisianGMC GermanicIE Indo-EuropeanLux. LuxembourgishMHG Middle High GermanModFris. Modern (West) FrisianNHG New High GermanOFris. Old FrisianOHG Old High GermanSwed. Swedish


References

Bybee, Joan1985 Morphology. A Study of the Relation between Meaning and Form.

Amsterdam: Benjamins.

Bybee, Joan1988 Morphology as lexical organization. In Theoretical Morphology. Ap-

proaches in Modern Linguistics, Michael Hammond and MickeyNoonan (eds.), 119–141. San Diego: Academic Press.

Bybee, Joan1994 Morphological universals and change. In The Encyclopedia of Lan-

guage and Linguistics, Vol. 5. R.E. Asher (ed.), 2557–2562. Oxford:Pergamon.

Bybee, Joan1995 Regular morphology and the lexicon. Language and Cognitive Pro-

cesses 10 (5): 425–455.

Bybee, Joan1996 Productivity, regularity and fusion: How language use affects the lex-

icon. In Trubetzkoy’s Orphan, R. Singh (ed.), 247–269. Berlin: Mou-ton de Gruyter.

Bybee, Joan, and Dan I. Slobin1982 Rules and schemas in the development and use of English past tense.

Language 58: 265–289.

Bybee, Joan, and Jean E.Newman1995 Are stem changes as natural as affixes? Linguistics 33: 633–654.

Clahsen, Harald, and Monika Rothweiler1992 Inflectional rules in children’s grammars: Evidence from the develop-

ment of participles in German. Yearbook of Morphology 1992: 1–34.

Corbett, Greville G.this vol. Higher order exceptionality in inflectional morphology.

Cutler, Anne, John Hawkins, and Gary Gilligan1985 The suffixing preference: A processing explanation. Linguistics 23:

723–758.

Fenk-Oczlon, Gertraud1989 Geläufigkeit als Determinante von phonologischen Backgrounding-

Prozessen. Papiere zur Linguistik 40 (1): 91–103.

Fenk-Oczlon, Gertraud1990 Ökonomieprinzipien in Kognition und Kommunikation. In Spielarten

der Natürlichkeit – Spielarten der Ökonomie. Beiträge zum 5. EssenerKolloquium über “Grammatikalisierung: Natürlichkeit und System-


ökonomie”, Norbert Boretzky, Werner Enninger and Thomas Stolz(eds.), 37–51. Bochum: Brockmeyer.

Fenk-Oczlon, Gertraud1991 Frequenz und Kognition – Frequenz und Markiertheit. Folia Linguis-

tica 25: 361-394.

Fertig, David1998 Suppletion, natural morphology, and diagrammaticity.Linguistics 36:

1065–1091.

Harnisch, Rüdiger1988 Natürliche Morphologie und morphologische Ökonomie. Zeitschrift

für Phonetik, Sprachwissenschaft und Kommunikationsforschung 41:426–437.

Hawkins, John, and Anne Cutler1988 Psycholinguistic factors in morphological asymmetry. In Explaining

Language Universals, John Hawkins (ed.), 280–317. Oxford: Black-well.

Janzing, Gereon1999 Das Friesische unter den germanischen Sprachen. Freiburg: Gag-

gstatter.

Lindow, Wolfgang, Dieter Möhn, Hermann Niebaum, Dieter Stellmacher, Hans Taubkenand Jan Wirrer

1998 Niederdeutsche Grammatik. Leer: Schuster.

Maiden, Martin1992 Irregularity as a determinant of morphological change. In: Journal of

Linguistics 28: 285–312.

Mayerthaler, Willi1981 Morphologische Natürlichkeit. Wiesbaden: Athenaion.

Mel’cuk , Igor2000 Suppletion. In Morphology. Vol. 1, Geert E. Booij, Christian Leh-

mann and Joachim Mugdan (eds.), 510–521. Berlin/New York: deGruyter.

Mettke, Heinz2000 Mittelhochdeutsche Grammatik. Tübingen: Niemeyer.

Michels, Victor1979 Mittelhochdeutsche Grammatik. 5th ed. Heidelberg: Winter.

Nübling, Damaris1999 Zur Funktionalität von Suppletion. In Variation und Stabilität in

der Wortstruktur, Matthias Butt, and Nanna Fuhrhop (eds.), 77–101.Hildesheim/Zürich/New York: Olms


Nübling, Damaris2000 Prinzipien der Irregularisierung. Eine kontrastive Untersuchung von

zehn Verben in zehn germanischen Sprachen. Tübingen: Niemeyer.

Nübling, Damaris2001a The development of “junk”. Irregularization strategies of have and

say in the Germanic languages. In Yearbook of Morphology 1999,Geert Booij, and Jaap van Marle (eds.), 53–74. Dordrecht: Springer.

Nübling, Damaris2001b Wechselflexion Luxemburgisch – Deutsch kontrastiv: ech soen – du

sees/si seet vs. ich sage, du sagst, sie sagt. Zum sekundären Aus-bau eines präsentischen Wurzelvokalwechsels im Luxemburgischen.Sprachwissenschaft 26 (4): 433–472.

Östman, Carin1992 Den korta svenskan. Om reducerade ordformers inbrytning i skrift-

språket under nysvensk tid. Diss. Uppsala.

Paul, Hermann1989 Mittelhochdeutsche Grammatik. 23rd ed. Tübingen: Niemeyer.

Pinker, Steven1991 Rules of language. Science 253: 530–535.

Pinker, Steven, and Alan Prince1988 Regular and irregular morphology and the psychological status of

rules of grammar. In Proceedings of the 17th Annual Meeting of theBerkeley Linguistics Society, L.A. Sutton, C. Johnson, and R. Shields(eds.), 321–251. Berkeley: BLS.

Prasada, Sandeep, and Steven Pinker1993 Generalisation of regular and irregular morphological patterns. Lan-

guage and Cognitive Processes 8: 1–56.

Ronneberger-Sibold, Elke1980 Sprachverwendung – Sprachsystem. Ökonomie und Wandel. Tübin-

gen: Niemeyer.

Ronneberger-Sibold, Elke1987 Verschiedene Wege zur Entstehung von suppletiven Flexionsparadig-

men. Deutsch gern – lieber – am liebsten. In Beiträge zum 3. EssenerKolloquium über Sprachwandel und seine bestimmenden Faktoren,Norbert Boretzky, Werner Enninger and Thomas Stolz (eds.), 243–264. Bochum: Brockmeyer.

Ronneberger-Sibold, Elke1990 The genesis of suppletion through morphological change. In Proceed-

ings of the 14th International Congress of Linguists. Berlin/GDR, Au-gust 10–15, 1987. Vol. 1, Werner Bahner, Joachim Schildt and DieterViehweger (eds.), 628–631. Berlin: Akademie-Verlag


Ruoff, Arno1990 Häufigkeitswörterbuch gesprochener Sprache. 2nd ed. Tübingen:

Niemeyer.

Teleman, Ulf, Staffan Hellberg, and Erik Andersson1999 Svenska Akademiens grammatik. Vol. 2: Ord. Stockholm: Svenska

Akademien.

Tomczyk-Popiñska, Ewa1987 Linguistische Merkmale der deutschen gesprochenen Standard-

sprache. Deutsche Sprache 15: 336–357.

van der Veen, K.F.1984 Frekwinsjeûndersyk foar it Frysk. In Miscellania Frisica, in nije bon-

del Fryske stúdzjes, N.R. Århammer et al. (eds.), 205–218. Assen:Van Gorcum.

Werner, Otmar1977 Suppletivwesen durch Lautwandel. In Akten der 2. Salzburger Früh-

lingstagung für Linguistik, Gaberell Drachmann (ed.), 269–283. Tü-bingen: Narr.

Werner, Otmar1987a The aim of morphological change is a good mixture – not a uniform

language type. In Papers from the 17th International Conference onHistorical Linguistics, Anna Giacalone Ramat, Onofrio Carruba andGiuliano Bernini (eds.), 591–616. Amsterdam: Benjamins.

Werner, Otmar1987b Natürlichkeit und Nutzen morphologischer Irregularität. In Beiträge

zum 3. Essener Kolloquium über Sprachwandel und seine bestim-menden Faktoren, Norbert Boretzky, Werner Enninger and ThomasStolz (eds.), 289–316. Bochum: Brockmeyer.

Werner, Otmar1989 Sprachökonomie und Natürlichkeit im Bereich der Morphologie. Zeit-

schrift für Phonetik, Sprachwissenschaft und Kommunikations for-schung 42: 34–47.

Werner, Otmar1990 Die starken Präterita im Luxemburgischen: Ideale Analogie oder ver-

geblicher Rettungsversuch? German Life and Letters 43: 182–190.

Werner, Otmar1991 Sprachliches Weltbild und/oder Sprachökonomie. In Akten des VIII.

Internationalen Germanisten-Kongresses, Tokyo 1990. Vol. 4, EijiroIwasaki and Yoshinori Shichiji (eds.), 305–315. München: iudicium.


Weyerts, Helga, and Harald Clahsen1994 Netzwerke und symbolische Regeln im Spracherwerb: Experimen-

telle Ergebnisse zur Entwicklung der Flexionsmorphologie. Linguis-tische Berichte 154: 430–460.

Wurzel, Wolfgang Ullrich1984/2001 Flexionsmorphologie und Natürlichkeit. Berlin: Akademie-Verlag.

Wurzel, Wolfgang Ullrich1990 Gedanken zu Suppletion und Natürlichkeit. Zeitschrift für Phonetik,

Sprachwissenschaft und Kommunikationsforschung 43: 86–91.

Wurzel, Wolfgang Ullrich1994a Skizze der natürlichen Morphologie. Papiere zur Linguistik 50: 23–

50.

Wurzel, Wolfgang Ullrich1994b Grammatisch initiierter Wandel. (Sprachdynamik. Auf dem Wege zu

einer Typologie sprachlichen Wandels. Vol. 1), Benedikt Jeßing (ed.).Bochum: Brockmeyer

On the role of subregularities in the rise ofexceptions

Wolfgang U. Dressler

It is Damaris Nübling’s great merit to have focused in a series of articles, in herbook (Nübling 2000) and now in this paper on certain diachronic developmentsand their synchronic results which have been marginalised or even neglected inmuch of morphological literature. It is even a greater merit of hers to have elab-orated well-thought and well-documented solutions. Several of these solutionsfind parallels in other languages.

For example, her case of the “irregularizing” expansion of German “Wech-selflexion”, which sets the second and third person singular present apart fromthe rest of the present paradigm, as in German helfen ‘to help’, 1.sg. helf-e, 2.sg.hilf-st, 3.sg. hilf-t finds many parallels in Romance languages (cf. Maiden 2004,first thematized by Matthews 1981). Thus in the paradigm of Italian sap-ere ‘toknow’ only the singular and the third plural present indicative have short forms:so, sai, sa, sanno as a result of pattern extension.

Or her instances (§5) of formal over-differentiation in irregular short verbshave parallels in many languages, most spectacularly in the present indicativeof the French verb ‘to be’: je suis, tu es = il/elle est, nous sommes, vous êtes,ils sont, where two forms are differentiated in the singular, which happens onlyin two other short verbs, in contrast to the syncretism of the whole singularparadigm in other verbs, plus the third plural as the default. Moreover all non-homophonous forms (with the exception of the second plural) are in relationsof strong or weak suppletion.

Lexical fusion of different roots in the verb ‘to be’, i.e. of the Proto-Indo-European roots *es-, *bhu-, *wes- (§2.4) in the West Germanic languages findsa parallel fusion of the first two roots in Latin as well as in the Italic, Celtic,Slavic and Baltic languages.

Thus I can largely agree with many of her solutions. In comparison withthese agreements my disagreements with her paper (in line with Gaeta 2006:18–20) are less numerous but still worth putting forward:

First of all, in regard to most of her examples and even types of examples,her term irregularity is a misnomer. It should rather be called subregularity,

164 Wolfgang U. Dressler

because she mostly describes patterns (and their diachronic origins) which holdfor several items, at least for a small group of them. Examples which Nübling(1995, 2000) has studied in a pioneering way are short verbs in several Ger-manic languages, such as OHG han ‘have’ and lan ‘let’, a notion that we haveapplied to Italian verbs: Inf. ess-ere, 3.sg.pres. è, avere – ha ‘have’, fare – fa‘make’, and-are – va ‘go’, pot-ere – può ‘can’, sap-ere – sa ‘know’, vol-ere –2.sg. vuoi ‘want’ (Pöchtrager et al. 1998). Nübling’s (§§1 and 5) proposal toreplace the notion of irregularity and irregularization with those of differen-tiation and distinctiveness adds relevant properties or functions but does notchange the non-distinction between irregularity as the impossibility of account-ing for a pattern by rules only and subregularity as accountability by minorrules.

Second, when she deals with the traditionally best studied problem of irreg-ularity, her “passive type” of irregularity (Nübling §2.1), i.e. the maintenanceof irregularity due to high token frequency of the irregular forms, then I cannotsee why this should involve a “path to irregularizations/exceptionality”. This isan instance of preservation but not of origin of irregularity. What is more inter-esting is her subtype of a gradual loss of members of a small subregular group,until only one member remains, which is then truly irregular. This is exempli-fied (§2.1.) with Dutch hebben ‘to have’, but what is missing is a demonstrationthat such a longest persisting member of a small subregular group of items isalways, or at least usually, the member with the highest token frequency ofall the items of this group. Next, it is unclear whether what counts should bethe high token frequency of the lexeme, i.e. of its whole inflectional paradigm,or whether the high token frequency of the subregular/irregular parts of theparadigm is decisive, or whether the ratio between the irregular/subregular andthe more regular/general parts of the paradigm.

Third, there is the recently more and more discussed problem whether theexplanatory factor is high token frequency or the degree of markedness. Nübling(§§2.1, 2.2, 3) seems to favour frequency, but does not go as far as Haspelmath(2006) in his attack on markedness. What the advocates of “frequency first”tend to overlook is, first, that ease of access in processing is more influencedby the markedness effect of early age of acquisition than by token frequency(cf. Bonin et al. 2004). Second, nearly always they have to rely on frequencymeasures that are not at all representative of the token frequencies of the itemsin the real input of native listeners of the respective language. And third, theimpact of token frequencies on our mental lexicon is not only derived from whatwe hear or read, but also on what we produce and think about. And this dependson the pragmatic importance of concepts, which is the base of markedness (cf.Mayerthaler 1981).

On the role of subregularities in the rise of exceptions 165

Fourth, her reliance on Bybee’s (1985) explanatory principle of relevancefor explaining the distribution of irregularity between regularity and suppletioncannot be transferred to all similar instances in other languages. For example, asin the case of overdifferentiation cited for the French verb ‘to be’, the most strik-ing cases of Italian suppletion and irregularity occur in the categories of personand number and not in aspect and tense, as predicted by Bybee and Nübling.This holds particularly for the opposition between root- and stem-based inflec-tion, regular in, e.g., Italian 2.sg.pres. am-i, pl. am-a-te ‘you love’, but supple-tive in 2.sg.pres. esc-i ‘you go out’, pl. usc-i-te (cf. Dressler and Thornton 1991,Maiden 2004).

Fifth, Nübling (§1) criticizes adherents of Natural Morphology for not hav-ing tried to account for irregularity. This is, however, neither true for the secondpoint above, the preservation of irregularity, nor for many subtypes of subregu-larity (the first point above), which covers the greatest part of Nübling’s areas ofirregularity. Particularly in the subtheory of static morphology (Dressler 2003,Kilani-Schoch and Dressler 2002, 2005, Aguirre and Dressler 2006), i.e. of lex-ically stored, prototypically unproductive morphology, we have focused on pa-rameters of phonological similarities (cf. Nübling §2.3), for example on rimewords such as the two Latin verb pairs f/caveo, f/cavi, f/cautum and m/voveo,m/vovi, m/votum. Or Nübling’s (§3) remarks on the word-initial consonantalonset as preferred position of lexical identification of a lexeme has been ante-dated by the same authors (cf. also Dressler 1987: 116f.). Or Nübling’s (§5)notion of “isolated loners” among verbs is paralleled by the notion of isolatedparadigms proposed by Dressler (since 1997).

Also categorical subgroups of verbs are a feature of static morphology. Thusthe secondary umlaut in the German second conjunctive hätte (preterite indica-tive hatte ‘had’) should be due to analogy of the fellow auxiliary wäre ‘were’rather than of strong verbs, such as gäbe ‘gave’, as Nübling (§2.3) proposes.

An important difference in our view regards storage: whereas Nübling (§5)appears to equate stored with unanalyzed, the model of static morphology al-lows partial analysis of fully stored forms, e.g. by establishing relations basedon rimes or other similarities.

When Nübling (§2.2) argues for accelerated sound shift as one way of ir-regularization, then this finds a parallel in the notion of lexicalisation out offast/sloppy speech of very frequent words, especially of function words, such asauxiliaries (cf. Nübling’s table 4), as discussed in Dressler (1973) and Dresslerand Moosmüller (1992).

These remarks may show that Nübling’s object of study has not been ne-glected within Natural Morphology and Phonology, but clearly not focusedupon in such a systematic way as in her own investigations.

166 Wolfgang U. Dressler

References

Aguirre, Carmen, and Wolfgang U. Dressler2006 On Spanish verb inflection. Folia Linguistica 40: 75–96.

Bonin, Patrick, Christopher Barry, Alain Méot and Marylène Chalard2004 The influence of age of acquisition in word reading and other tasks:

A never ending story? Journal of Memory and Language 50: 456–476.

Bybee, Joan1985 Morphology. Amsterdam: Benjamins.

Dressler, Wolfgang U.1973 Pour une stylistique phonologique du latin. Bulletin de la société de

linguistique de Paris 68 : 129–145.

Dressler, Wolfgang U.1987 Word formation (WF) as part of natural morphology. In Leitmotifs in

Natural Morphology, Wolfgang U. Dressler, Willi Mayerthaler, Os-wald Panagl and Wolfgang Ullrich Wurzel (eds.), 99–126. Amster-dam: Benjamins.

Dressler, Wolfgang U.2003 Latin static morphology and paradigm families. In Language in Time

and Space, Brigitte L.M. Bauer and Georges-Jean Pinault (eds.), 87–99. Berlin/New York: Mouton de Gruyter.

Dressler, Wolfgang U., and Sylvia Moosmüller1992 Sociolinguistic parameters of spoken Austrian German. In Studies in

Spoken Languages: English, German, Finno-Ugric, Miklós Kontraand Tamás Váradi (eds.), 61–81. Budapest: TA Nyelvtudományi In-tézet.

Dressler, Wolfgang U., and Anna M. Thornton1991 Doppie basi e binarismo nella morfologia italiana. Rivista di Linguis-

tica 3: 3–22.

Dressler, Wolfgang U., and Anna M. Thornton1997 On productivity and potentiality in inflectional morphology. CLAS-

NET Working Paper 7. Montréal.

Gaeta, Livio2006 How to live naturally and not be bothered by economy. Folia Linguis-

tica 40: 7–28.

Haspelmath, Martin2006 Against markedness (and what to replace it with). Journal of Linguis-

tics 42: 25–70.

On the role of subregularities in the rise of exceptions 167

Kilani-Schoch, Marianne, and Wolfgang U. Dressler2002 Affinités phonologiques dans l’organisation de la morphologie sta-

tique: l’exemple de la flexion verbale française. Folia Linguistica 36,297–312.

Kilani-Schoch, Marianne, and Wolfgang U. Dressler2005 Morphologie naturelle et flexion du verbe français. Tübingen: Narr

Maiden, Martin2004 When lexemes become allomorphs. On the genesis of suppletion. In:

Folia Linguistica 38: 227–256.

Matthews, Peter1981 Present stem alternations in Italian. In Logos Semantikos. Vol. IV,

Horst Geckeler, Brigitte Schlieben-Lange, Jürgen Trabant and Har-ald Weydt (eds.), 57–65. Berlin and Madrid: de Gruyter and Gredos.

Mayerthaler, Willi1981 Morphologische Natürlichkeit. Wiesbaden.

Nübling, Damaris1995 Die Kurzverben im Schweizerdeutschen. In Alemannische Dialekt-

forschung, Heinrich Löffler (ed.), 165–179. Tübingen: Niemeyer.

Nübling, Damaris2000 Prinzipien der Irregularisierung. Eine kontrastive Untersuchung von

zehn Verben in zehn germanischen Sprachen. Tübingen: Niemeyer.

Pöchtrager, Markus A., Csanád Bodó, Wolfgang U. Dressler and Teresa Schweiger1998 On some inflectional properties of the agglutinating type illustrated

from Finnish, Hungarian and Turkish inflection. Wiener linguistischeGazette 62–63: 57–92.

Statement on the commentary byWolfgang U. Dressler

Damaris Nübling

I thank Wolfgang U. Dressler for his useful commentary on my paper and fullyagree with him in claiming that many more languages still have to be inves-tigated regarding irregularities and irregularization strategies. The articles ofMartin Maiden concerning the Romance languages are highly instructive inshowing the different paradigmatic patterns and their emergence. Dressler’scomment that the Romance languages show special classes of highly irregu-lar short verbs also confirms my impression that the described irregularizationpaths are not only restricted to the family of the Germanic languages. Never-theless, one should be careful to draw further, possibly universal, conclusions.Therefore, a comparison of these ten verbs in as many (genetically and typo-logically) different languages as possible is a desideratum.

In the following, I relate to his disagreements with my paper:1. “The term ‘irregularity’ as a misnomer”: There are many concepts of ir-

regularity ranging from unproductive paradigms which still are integrated inbig classes to strong suppletion, i.e. completely isolated paradigms with id-iosyncratic behaviour. Dressler favours the last notion: Only isolated paradigmswith unique inflectional behaviour are labelled “irregular”. I think that the no-tion of irregularity should be kept flexible. If it were synonymous to suppletionwe would not need the term irregularity. In my opinion, irregularity includesdifferent stages within so-called static (non-productive) morphology, and es-pecially the term irregularization should relate to different stages of this pro-cess. This can be compared to the notion of grammaticalization which alsorefers to a high amount of rather disparate phenomena (which, at first sight,sometimes even seem not to be related) which does not only describe the finalstages of grammaticalization. Irregularization comprises phenomena, such as:first of all, the absence of morphological rules, furthermore morphological un-productivity, decreasing type frequency; decreasing intraparadigmatic as wellas interparadigmatic similarities, increasing allomorphy including stem vari-ants, increasing affection of the whole word form by the fusion of grammaticaland lexical material, the morphological segmentability is reduced – in sum: the


item behaves more and more like a lexical unit and is processed as such. Thedevelopment of – in Dressler’s terms – subregularity is the first, obligatory stepto irregularity. Thus, subregularity only constitutes an early stage on the longirregularization scale.

2. “The maintenence of irregularity due to high token frequency is not a pathto exceptionality”: Analogy is the most powerful means to produce regularityin the sense of intra- and/or interparadigmatic similarity. Sometimes, analogybecomes so frequent that it nearly looks like a rule, c.f. the levelling of the num-ber ablaut grades in Early New High German. Another example is superstablemarkers spreading to nearly all paradigms except for some few items – basi-cally the most frequently occurring ones – which resist this levelling process.In Middle High German, there still were four so-called athematic verbs with thesuffix -n (< OHG -m) in the 1.sg. of the present (ga-n, sta-n, tuo-n, bi-n ‘I go,stand, do, am’). Afterwards, three of these four verbs changed to the “common”strong verb suffix leading to the result, that NHG bin ‘I am’ is the only remnantof the old “regular” form. It is not by chance that bin, which belongs to the mostfrequent verb sein ‘to be’, was “forgotten” by analogy. There are many irregu-lar and even suppletive verb forms which developed completely regularly butnever underwent analogy. This “passive” way to irregularity should be takenseriously because it often leads to complete isolation of the respective items.Dutch hebben is another example; hebben is the second most frequently usedverb (after zijn ‘to be’). High token frequency is the best protection againstanalogy. The question of what counts as high token frequency, whether thewhole paradigm or only some parts of it, can be clearly answered. It is both:the so-called lexical and the categorial or grammatical token frequency. Thus,the 3.sg.present form of a frequently used verb is most prone to be an irregu-lar one and, at the same time, to exhibit short expression. This connection isexplained in Chapter 2.2 and demonstrated by extraordinary assimilations inthe 3.sg.present of Come and Take in Swiss German, Luxembourgish, NorthFrisian, Icelandic, and Low German (cf. Table 3 and the part below).

3. Concerning the explanatory factor of high token frequency, I must admitthat I fully agree with Haspelmath’s and, by the way, Fenk-Oczlon’s (1991)concept of “frequency first” instead of different markedness degrees and con-ceptions (compare, e.g., the discussion of whether the 1st or the 3rd or the 2ndperson is the least marked one). Fenk-Oczlon provides many examples of thefrequency first principle making cases of so-called “local markedness” (Tiersma1982) and “markedness reversals” (Mayerthaler 1981, Wurzel 1984/2001) un-necessary. I am convinced that token frequency counts are indeed highly reli-able and much more exact and measurable than pragmatic considerations andmarkedness determinations which rarely are defined precisely (and hence vary

Statement on the commentary by Wolfgang U. Dressler 171

depending on the linguist’s aim). Moreover, token frequency even allows forpredictions. In order to explain, e.g., accelerated sound change such as the abovementioned extraordinary assimilations in the most frequently used paradigmaticslots of frequent verbs such as Come and Take, the respective forms have to bereally applied, i.e. materialized (pronounced) to undergo these actual (articula-tory) changes, which often consist of ease of articulation. Token frequencies ofheard and read items may be important for the mental storing but they do notdirectly lead to what is described in my paper. Surely, it may be interesting toknow the many sources of high token frequency which partly may be found inpragmatics. However, this does not restrict the fact that frequency of usage isone of the most important driving forces for language change.

4. “Bybee’s principle of relevance cannot be transferred into other lan-guages”: My claim that irregularity in less relevant categories includes irreg-ularity in more relevant categories was only deduced from the Germanic lan-guages. The existence of some exceptions in other languages does not disprovethis principle. It would be very worthwhile to test these correlations systemati-cally on the basis of the Romance (and other inflecting) languages. In additionto these functional “regularities”, there often are paradigmatic patterns whichvary from language to language. In most cases, their emergence can only be ex-plained by language history. Maiden describes such patterns for the Romancelanguages, e.g. the special behaviour of the 1st and 2nd pl. in French and Span-ish verbs due to different accent positions in Latin. In German and Luxembour-gish, it is the so-called wechselflexion pattern which often exposes and differ-entiates the 3rd and 2nd sg.present by umlauted vowels from the remainingparadigm (the original reason for this are OHG endings containing -i-). This isthe reason for the fact that the 2nd sg. always follows the 3rd sg. in its inflec-tional behaviour although it is only the 3rd sg. which is extremely frequent (forfurther details, see Nübling 2000, 2001).

5. “It is not true that Natural Morphology is not interested in irregularity”:There are different camps in Natural Morphology. I referred mostly to Wurzel’s(1984/2001) concept because it is most concerned with German and other Ger-manic languages. Firstly, Wurzel claimed that irregularity including suppletionis unnatural, i.e. highly marked. In a later phase, he established a so-called sup-pletion domain which was excluded from natural principles. This domain neverwas defined exactly, and gradual transitions to less irregular conditions were notconsidered. These clearcut distinctions instead of flexible scales and schemescan be found again in the inflectional macro- and microclass systems of Kilani-Schoch & Dressler (2005). I think it would be very promising to compare theaverage token frequency rates of the members of the different classes rang-ing from huge productive classes (such as French verbs on -er) until so-called


“ssssscl[asses] (“s” = ‘sub’) (such as French courir, venir, tenir). My interestis to show how and possibly why so many small subsub- and so forth classesdevelop, how they are interrelating, how they cluster with other miniclasses andwhich categories are most affected by irregular expression. Thus, the “dynamicsof static morphology” is the main topic of my paper.

References (in addition to the references in my paper)

Kilani-Schoch, Marianne and Wolfgang U. Dressler2005 Morphologie naturelle et flexion du verbe français. Tübingen: Narr.

Tiersma, Pieter Meijes1982 Local and general markedness. Language 58: 832–849.

Taking into account interactionsof grammatical sub-systems

Lexical variation in relativizer frequency

Thomas Wasow, T. Florian Jaeger, and David M. Orr

Abstract. An exception to a non-categorical generalization consists of a lexical itemthat exhibits the general pattern at a rate radically different – either far higher or farlower – from the norm. Lexical differences in noun phrases containing non-subject rel-ative clauses (NSRCs) correlate with large differences in the likelihood that the NSRCwill begin with that. In particular, the choices of determiner, head noun, and prenominaladjective in an NP containing an NSRC may dramatically raise or lower rates of that inthe NSRC. These lexical variations can be partially explained in terms of predictability:more predictable NSRCs are less likely to begin with that. This generalization can beplausibly explained in terms of processing, assuming that facilitates processing and/orsignals difficulty. The correlations between lexical choices in the NP and the predictabil-ity of an NSRC can, in turn, be explained in terms of the semantics of the lexical itemsand the pragmatics of reference.*

1. Introduction

The notion of exception presupposes that of rule; as Webster (http://www.m-w.com/dictionary) puts it, an exception is “a case to which a rule does not

* This paper is dedicated to Professor Günter Rohdenburg of Paderborn University,whose sixty-fifth birthday coincided with the completion of the first draft of the pa-per. Professor Rohdenburg’s seminal studies on English usage and structure havebeen an inspiration to many data-oriented students of language, ourselves included.

We received help and advice on this work from many people. Paul Fontes didessential work on the maximum entropy predictability model described at the endof section 3. Sandy Thompson was generous in sharing an early version of Fox andThompson (2007) with us, and in giving us very useful feedback on earlier ver-sions of this work. Additional help and advice was provided by at least the followingpeople: David Beaver, Joan Bresnan, Brady Clark, Liz Coppock, Vic Ferreira, Ed-ward Flemming, Ted Gibson, Jack Hawkins, Irene Heim, Dan Jurafsky, Rafe Kin-sey, Roger Levy, Chris Manning, Tanya Nikitina, Doug Rohde, Doug Roland, NealSnider, Laura Staum, Michael Wagner, and Annie Zaenen. Special thanks also toHeike Wiese and Horst Simon, first for organizing the workshop at which this ma-terial was originally presented, and for comments on the written version.

176 Thomas Wasow, T. Florian Jaeger, and David M. Orr

apply”. Linguistic rules (and, more recently, constraints, principles, parame-ters, etc.) are usually taken to be categorical, at least in the generative tradition.Quantitative data like frequency of usage are widely considered irrelevant togrammar, and gradient theoretical notions like degrees of exceptionality haveremained outside of the theoretical mainstream.

This antipathy towards things quantitative probably has its origins in Chom-sky’s early writings, which dismissed the significance of frequency data andstatistical models (see, e.g., Chomsky 1955/75: 145–146; 1957: 16–17; 1962:128; 1966, 35–36). But recently, the availability of large on-line corpora andcomputational tools for working with them has led some linguists to questionthe exclusion of frequency data and non-categorical formal mechanisms fromtheoretical discussions (for example, Wasow 2002 and Bresnan et al. 2007).Moreover, corpus work has revealed that natural-sounding counterexamples tomany purportedly categorical generalizations can be found in usage data (Bres-nan and Nikitina 2003).

If categorical rules are replaced by gradient models, what becomes of thenotion of exceptionality? The paradigmatic instance of an exception is a lex-ical item that satisfies the applicability conditions of a (categorical) rule, butcannot undergo it. (When rules are categorical, so are exceptions). The obviousanalogue for a non-categorical generalization would be a lexical item whosefrequency of occurrence in a given environment is dramatically different fromthat of other lexical items that are similar in relevant respects.

For example, whereas about 8 % (11,405/146,531) of the occurrences oftransitive verbs in the Penn Treebank III corpora (Marcus et al. 1999) are inthe passive voice, certain verbs occur in the passive far more frequently, andothers far less frequently. Among the former is convict, which occurs in thepassive in 33 % (25/76) of its occurrences as a verb; the latter is representedby read, fewer than 1 % (6/788) of whose occurrences as a transitive verb arepassive.1

Such skewed distributions, which we will call “soft exceptions”, are by nomeans uncommon. For grammarians who make use of non-categorical data andmechanisms, soft exceptions constitute a challenge. Simply recording statisticalbiases in individual lexical entries may be feasible and useful in applicationsto language technologies. But it is theoretically unsatisfying: we would like

1. These numbers are based on searches of the parsed portions of the Wall Street Jour-nal, Brown, and Switchboard corpora, looking at the ratio of passive verb phrases tothe total number of VPs directly dominating the verb in question and an NP (possiblya trace).

Lexical variation in relativizer frequency 177

to explain why words show radically different proclivities towards particularconstructions.

The remainder of this paper examines one set of soft exceptions and offersan explanation for them in terms of a combination of semantic/pragmatic andpsycholinguistic considerations.

2. Background

The particular phenomenon we examine is the optionality of relativizers (thator wh-words) in the initial position of certain relative clauses (RCs). This isillustrated in the following examples:

(1) a. That is certainly one reason (why/that) crime has increased.b. I think that the last movie (which/that) I saw was Misery.c. They have all the water (that) they want.

We have been exploring what factors correlate with relativizer occurrence inRCs, using syntactically annotated corpora from the Penn Treebank III. Theresults presented below have been carried out using the Switchboard corpus,which consists of about 650 transcribed telephone conversations between pairsof strangers (on a list of selected topics), totalling approximately 800,000 words.

Certain factors make relativizers obligatory, or so strongly preferred as tomask the effects of other factors. As is well-known (see Huddleston and Pullum2002: 1055), if the RC’s gap is the subject of the RC, then the relativizer cannotbe omitted:2

(2) I saw a movie *(that) offended me.3

We have excluded these from our investigations, concentrating instead on whatwe will call non-subject extracted relative clauses, or NSRCs. We have alsoexcluded examples involving what Ross (1967) dubbed “pied piping”, as in (3):

(3) a. a movie to *(which) we wentb. a movie *(whose) title I forget

2. There are dialects that permit relativizer omission in some RCs with subject gaps, asin the children’s song, There was a farmer had a dog …

3. An asterisk outside parentheses is used to indicate that the material inside the paren-theses is obligatory.


Non-restrictive relative clauses are conventionally claimed (Huddleston andPullum 2002: 1056) to require a wh-relativizer, and this seems to be correctin clear cases:

(4) a. Goodbye Lenin, which I enjoyed, is set in Berlinb. *Goodbye Lenin, (that) I enjoyed, is set in Berlin

The converse – that wh-relativizers may not appear in restrictive RCs – is awell-known prescription (e.g., Fowler 1944: 635), though it does not appear tobe descriptively accurate. Evaluating these claims is complicated by the factthat the boundary between restrictive and non-restrictive modifiers seems to bequite fuzzy. Instead of trying to identify all and only non-restrictive RCs, weexcluded all examples with wh-relativizers. This decision was also motivatedin part by our observation that disproportionately many of the examples withwh-relativizers were questionable for other reasons (e.g. some embedded ques-tions were misanalyzed as RCs). Thus, our results are based on the comparisonbetween NSRCs with that relativizers and those with no overt relativizer.4

In addition, we excluded reduced subject-extracted and infinitival RCs, sincethey never allow relativizers (except for infinitival RCs with pied-piping –where the relativizer is obligatory):

(5) a. a movie (*that) seen by millionsb. a movie (*that) to seec. a movie in *(which) to fall asleep

After these exclusions, our corpus contained 3,701 NSRCs, of which 1,601(43 %) begin with that and the remaining 2,100 (57 %) have no relativizer.A variety of factors seem to influence the choice between that and no rela-tivizer in these cases. These include the length of the NSRC, properties of theNSRC subject (such as pronominality, person, and number), and the presenceof disfluencies nearby. We discuss these elsewhere (Jaeger and Wasow 2006;Jaeger, Orr, and Wasow 2005; Jaeger 2005), exploring interactions among thefactors and seeking to explain the patterns on the basis of processing considera-tions.

The focus of the present paper is on how lexical choices in an NP containingan NSRC can influence whether a relativizer is used. We show that particularchoices of determiner, noun, or prenominal adjective may correlate with excep-tionally high or exceptionally low rates of relativizers. We then propose that this

4. The studies were replicated including the NSRCs with wh-relativizers. The resultsare qualitatively the same, though the numbers are of course different.


correlation can be explained in terms of the predictability of the NSRC, whichin turn has a semantic/pragmatic explanation.

3. Lexical choices and relativizer frequency

Early in our investigations of relativizer distribution in NSRCs we noticed thatrelativizers are far more frequent in NPs introduced by a or an than in thoseintroduced by the. Specifically, that occurs in 74.8 % (226/302) of the NSRCsin a(n)-initial NPs and in only 34.2 % (620/1813) of those in the-initial NPs.Puzzled, we checked the relativizer frequency for NSRCs in NPs introduced byother determiners. The results are summarized in Table 1, where the numbersin parentheses indicate the total number of examples.

Table 1. NSRC that Rate by NP Determiner.

DETERMINER (FREQUENCY) NSRC WITH THAT

a or an (302) 74.8 %Possessive pronoun (37) 64.9 %some (67) 64.2 %No determiner (428) 63.1 %this, that, these, those (106) 61.3 %Numeral (177) 53.1 %any (55) 49.1 %no (34) 38.2 %the (1813) 34.2 %all (206) 24.3 %every (68) 14.7 %

The variation in these numbers is striking, but it is by no means obvious whythey are distributed as they are. Curious whether other lexical choices withinNPs containing NSRCs might be correlated with relativizer frequency, we com-pared rates of relativizer occurrence for the nouns most commonly modified byNSRCs. Again, we found a great deal of variation, with no obvious pattern.

If individual determiners and head nouns are correlated with such highlyvariable rates of relativizer presence, we reasoned that the words that come be-tween determiners and head nouns – namely, prenominal adjectives – mightshow similar variation. And indeed they do: Figure 3 shows the relativizer fre-quencies for the prenominal adjectives that occur most frequently in NPs withNSRCs.


Table 2. NSRC that Rate by NP Head Noun.

HEAD NOUN (FREQUENCY) NSRC WITH THAT

stuff (46) 62.8 %people (64) 57.1 %one (106) 51.5 %problem (44) 50.0 %something (171) 44.7 %thing (523) 43.7 %kind (49) 43.2 %anything (48) 38.0 %place (99) 34.4 %everything (60) 24.6 %reason (91) 24.0 %time (247) 14.0 %way (325) 13.0 %

The differences in relativizer frequency based on properties of the modifiedNP are immense. For example, NSRCs modifying NPs with the adjective littleare on average over eight times more likely to have a relativizer than NSRCsmodifying NPs with the adjective last. These differences are not due to chance;chi-square tests on all three of these distributions are highly significant.

Why should lexical choices in the portion of an NP preceding an NSRCmake such a dramatic difference in whether the NSRC begins with that or has

Table 3. NSRC that Rate by Prenominal Adjective.

ADJECTIVE (FREQUENCY) NSRC WITH THAT

little (41) 73.2 %certain (19) 68.4 %few (20) 65.0 %different (19) 63.2 %big (15) 60.0 %other (87) 49.4 %same (47) 46.8 %best (24) 25.0 %only (158) 24.7 %first (99) 18.2 %last (79) 8.9 %


no relativizer? How can we explain soft exceptions to the optionality of thatin NSRCs . That is, why do the presence of words like a(n), every, stuff, way,little, and last correlate with exceptionally high or low rates of that in NSRCsthat follow them within an NP?

4. Predictability

An example from Fox and Thompson (2007) provided a crucial clue. They ob-served that the following sentence sounds quite awkward with a relativizer.5

(6) That was the ugliest set of shoes (that) I ever saw in my life.

Moreover, the sentence seems incomplete without the relative clause:

(7) That was the ugliest set of shoes.

(7) would be appropriate only in a context in which some comparison collectionof sets of shoes is clear to the addressee.

These observations led us to conjecture that the strong preferences in (6) fora relative clause in the NP and for no relativizer in the relative clause might beconnected. Looking at the vs. a(n) in our corpus (the contrast that first got usstarted on this line of inquiry), we found that, of the 30,587 NPs beginning withthe, 1813 (5.93 %) contain NSRCs, whereas only 302 (1.18 %) of the 45,698NPs beginning with a(n) contain NSRCs. This difference (χ2 = 812, p < 0.001)lent plausibility to our conjecture.

Hence, we propose the following hypothesis:

(8) The Predictability Hypothesis: In environments where an NSRC is morepredictable, relativizers are less frequent.

This formulation is somewhat vague, since neither the notion of “environment”nor of “predictability” is made precise. Our initial tests of the hypothesis usesimple operationalizations of these notions: the environments are the NPs con-taining the determiners, nouns, and adjectives described in the previous section,and an NSRC’s predictability in the environment of one of these words is mea-sured by the percentage of the NPs containing that word that also are modifiedby an NSRC.

5. Fox and Thompson’s account of the preference for no relativizer in (6) is based onthe claim that (6) falls at the monoclausal end of a “continuum of monoclausality tobi-clausality”. We discuss this idea in section 5 below.


Figures 1–3 plot cooccurrence with NSRCs against frequency of relativizerabsence in NSRCs. The points in Figure 1 represent the eleven determiner typesgiven in Table 1; the points in Figure 2 represent the thirteen head nouns givenin Table 2; and the points in Figure 3 represent the eleven adjectives given inTable 3.6. The lines represent linear regressions – that is, the lines represent thebest (linear) generalization over the data points in that the total squared distancebetween the points and the lines is minimized (other tests showed that the trendis indeed linear and not of a higher order). The correlation between NSRC cooc-currence and relativizer absence is significant for all three categories. Correlat-ing the predictability of NSRCs for all 35 words (the determiners, adjectives,and head nouns in our sample) against frequency of relativizer absence is alsosignificant (adjusted r2 = .36, F(1,33) = 19.9, p < .001).7

Figure 1. Relativizer Frequency and NSRC Cooccurence by Determiner; adjustedr2 = .918; F(1,9) = 105.1, p < .001.

6. The mean plots in the three figures represent rather different sample sizes. Deter-miners are a closed class, so Figure 1 includes almost all NSRCs, whereas Figures 2and 3 are based on just the head nouns and adjectives that cooccur most frequentlywith NSRCs. And since almost all NPs include a head noun but most do not haveprenominal adjectives, the sample size in Figure 3 is far lower than in Figure 2.

7. After removing two extreme outliers, the adjusted r2 = .56, F(1,31) = 36.1, p <0.001.

8. Adjusted r2s provide a more reliable measure of the goodness of fit of the modelcompared to normal, unadjusted r2s, which usually are too optimistic. Generally, r2


Figure 2. Relativizer Frequency and NSRC Cooccurrence by Head Noun; adjustedr2 = .35; F(1,11) = 7.4, p = .02.

Figure 3. Relativizer Frequency and NSRC Cooccurrence by Adjectives; adjustedr2 = .32; F(1,9) = 5.8, p = .04.

These results support the Predictability Hypothesis: on average, if a determiner,prenominal adjective, or head noun within an NP increases the likelihood thatthe NP will contain an NSRC, then it also increases the likelihood that an NSRCin the NP will lack a relativizer.

estimates the amount of variation in the data accounted for by the model, e.g. an r2

of .91 means that the model accounts for 91 % of the variation.


The evidence presented above supports the Predictability Hypothesis, butthe predictability measures employed are rather simple. We used one word at atime in the modified NP to estimate the predictability of an NSRC, and, we onlyused the most frequent types of determiners, adjectives, and head nouns.9 Thereare several ways to develop more sophisticated models of an NSRC’s pre-dictability that (i) take into account more than one word in the NP at a time,and (ii) are not limited to the most frequent types. We present one such ap-proach, using a machine learning technique. This approach would also enableus to include information relevant to NSRC predictability that is not due to lex-ical properties of NPs (such as their grammatical function), but the study wereport on here is limited to lexical factors.10

We created a maximum entropy classifier (see Ratnaparkhi 1997), whichused features of an NP to predict how likely a relative clause was in that NP.11

Features included the type of head noun, any prenominal adjectives, and the de-terminer, as well as some additional properties, such as whether the head nounwas a proper name, and whether the modified NP contained a possessive pro-noun. Based on these features, the classifier assigned to each NP in the corpus aprobability of having an RC, which we will refer to as its “predictability index”.We then grouped NPs according to these predictability indices, and examinedhow the relativizer likelihood in an NSRC varied across the groups.12

9. Furthermore, we used means to predict means (i.e. we used the mean predictablityof an NSRC given a certain word in the modified NP and correlated that against themean relativizer likelihood for NSRCs modifying those NPs). This method arguablyinflates our r2s (i.e. the measure of how much of the variation in relativizer omissionis captured by predictability). Elsewhere (Jaeger, Levy, Wasow, and Orr 2005), weaddress this issue by using binary logistic regressions that predict the presence of arelativizer based on the predictability of the NSRC on a case-by-case basis.

10. Studies involving non-lexical factors are in progress.11. This study differs from the earlier ones in that it considered the predictability of any

relative clause, not just of non-subject relative clauses. This broader criterion pro-vided the classifier with more data on which to base its classifications; the narrowercriterion would have required a larger corpus in order to get reliable classifications.So this study is testing for a slightly different correlation than the one stated in thePredictability Hypothesis. However, since the probability that an NP will contain anNSRC and the probability that an NP will contain an RC are highly correlated, a cor-relation between RC predictability and relativizer absence still supports our claims(cf. also footnote 14). Future research may determine which of the two measures isthe better predictor of relativizer frequency.

12. Here we present the result of a classifier trained on the Switchboard corpus, similarresults were found for the parsed Wall Street Journal (Penn Treebank III release).


Before checking on relativizer presence, however, we needed to test the ac-curacy of the predictability indices our classifier assigned. We did this by com-paring the predictability index range of each of the groups with the actual ratesof RCs in the NPs in the groups. That is, we compared the fraction of the NPsin each group that contained an RC with the range of predictability indices thegroup represented. As can be seen in Figure 4, the occurrences of RCs in theNPs in each group were consistently within or close to the range assigned bythe classifier. This indicates that the predictability indices that the classifier wasassigning to the NPs were generally reasonable estimates.

Figure 4. Accuracy of Classifier.

For the NPs containing NSRCs, we then used the classifier’s predictability in-dices to test whether relativizers are less frequent where RCs are more pre-dictable. We did this by examining the rates of relativizer absence for each ofour groupings of NPs. As Figure 5 shows, the results are similar to what wefound looking at the most frequent determiners, adjectives, and nouns sepa-rately: NSRCs in NPs whose features make them more likely to contain RCsare less likely to have relativizers.

This result provides more support for the Predictability Hypothesis. Further-more, the fact that a simple maximum entropy classifier provides reasonablemeasurements of the predictability of relative clauses suggests that predictabil-ity in this sense can be computed by means of a standard machine-learningmethod. Hence, it is reasonable to assume that speakers have access to esti-mates of how likely an RC is in a given context.


Figure 5. Predictability Index and Relativizer Absence; adjusted r2=.86; F(1,5)=36.9,p = .002.

5. Explaining the Correlation

The Predictability Hypothesis seems to be correct: NSRCs evidently begin withthat less frequently in environments where an NSRC (or any RC) is more likelyto occur. But we have still not answered our original question: Why do differentlexical choices correlate with such large differences in relativizer rates? Ouranswer involves two steps. First, we suggest a processing explanation for thecorrelation between NSRC predictability and relativizer absence. Second, weargue that there are semantic/pragmatic reasons why certain determiners, headnouns, and adjectives tend to cooccur with NSRCs relatively frequently. Puttogether, these will constitute an account of why those lexical choices lead tolow relativizer rates.

Explaining the presence vs. absence of relativizers in NSRCs in terms ofprocessing can involve considerations of comprehension, production, or a com-bination of the two. Relativizers could facilitate comprehension by marking thebeginning of a relative clause and thereby helping the parser recognize depen-dencies between the head NP and elements in the NSRC (see Hawkins 2004,for an account along these lines). Relativizers could facilitate production, e.g.by providing the speaker with extra time to plan the upcoming NSRC (see Raceand MacDonald 2003, for an account along these lines). Both types of expla-nation predict that relativizers should occur more frequently in more complexNSRCs (though the factors contributing to comprehension complexity and pro-


duction complexity might not be identical). Teasing apart the predictions ofthese different kinds of processing explanations is by no means straightforward(see Jaeger 2005, for much more detailed discussion of this issue).

Whatever kind of processing explanation one adopts, it can be employed toexplain why predictability of the NSRC influences relativizer frequency. In acontext in which an NSRC has a relatively high probability, the listener gets lessuseful information from having the beginning of the NSRC explicitly marked.Hence, relativizers do less to facilitate comprehension where NSRCs are pre-dictable. And in environments where NSRCs are likely, speakers would beginplanning the NSRC earlier (on average) than in environments where they areless likely. Consequently, they would be less likely to need to buy time by pro-ducing a relativizer at the beginning of the NSRC. In short, the correlation be-tween predictability and relativizer absence follows from the hypothesis thatrelativizers aid processing.

But why do certain lexical choices early in an NP have such a strong effecton the likelihood of there being an NSRC later in the NP? To answer this, itis useful to consider the semantic function of restrictive relative clauses. Asthe term “restrictive” implies, such clauses characteristically serve to limit thepossible referents of the NPs in which they occur. For example, in (8), the NSRCthat I listen to restricts the denotation of the NP to a proper subset of music,namely, the music the speaker listens to; without the NSRC, the NP could referto any or all music.

(8) music that I listen to

Certain determiners, nouns, and adjectives have semantic properties that makethis sort of further restriction very natural or even preferred.

Consider, for example, the determiners all and every, which express uni-versal quantification. Universal assertions are generally true of only restrictedsets.13 Thus, (9a) is true for many more VPs than (9b).

(9) a. Every linguist we know VPb. Every linguist VP

13. Students in elementary logic classes are taught that sentences beginning with a uni-versal quantifier almost always have a conditional as their main connective. Theantecedent of this conditional is needed to restrict the set of entities of which theconsequent is claimed to hold. That is, for a sentence of the form ∀xP(x) to be true,P should include some contingencies. In natural language, NSRCs are one way ofexpressing such contingencies.


More generally, universal assertions are more likely to be true if the quantifi-cation is restricted, and NSRCs are one natural way to impose a restriction.14

Hence, in order to avoid making excessively general claims, people frequentlyuse NSRCs with universal quantifiers.

Notice that the opposite is true for existentials: (10a) is true for many moreVPs than (10b), since (10a) is true if VP holds of any linguist, whereas (10b) istrue only if it holds of a linguist we know.

(10) a. A linguist VPb. A linguist we know VP

So while restricting a universally quantified assertion increases its chances ofbeing true, restricting an existentially quantified assertion reduces its chancesof being true. Correspondingly, every and all cooccur with NSRCs relativelyfrequently (10.40 % and 6.92 %, respectively), whereas a(n) and some rarelycooccur with NSRCs (1.18 % and 2.10 %, respectively).

The definite determiner generally signals that the referent of the NP it is in-troducing is contextually unique – that is, the listener has sufficient informationfrom the linguistic and non-linguistic context to pick out the intended referentuniquely. But picking out a unique referent often requires specifying more in-formation about it than is expressed by a common noun. NSRCs can remedythis: for example, there are many situations in which (11a) but not (11b) can beused to successfully refer to a particular individual.

(11) a. the linguist I told you aboutb. the linguist

Even when the is used with plural nouns (e.g. the linguists) a contextuallyunique set of individuals is the intended referent. Hence the denotation of thehead noun often needs to be restricted, and NSRCs are consequently relativelycommon.

The pragmatic uniqueness associated with the definite article is very oftena result of the fact that the referent of the NP introduced by the has recentlybeen mentioned or is otherwise contextually very salient. In these cases, norestriction of the noun phrase is needed, so NSRCs would not be expected. Andwhile the cooccurs with NSRCs at about three times the baseline rate for all

14. Other kinds of restrictive modifiers such as subject-extracted relative clauses,prenominal restrictive adjectives, and postnominal PPs are also options. Wheneverthere is a need to restrict the reference of an NP, each of these options becomes morelikely. For the current purpose, it only matters that NSRCs constitute one of theseoptions.


(nonpronominal) NPs, the vast majority – about 94 % – of NPs beginning withthe have no NSRC.

Certain adjectives, however, involve a uniqueness claim for the referent ofNPs in which they appear, and these cooccur with NSRCs at far higher rates.15

The most frequent of these is only; others are superlatives like first, last, andugliest. Our arguments for the relatively high rate of cooccurrence of the withNSRCs applies equally to these adjectives. And since superlatives make senseonly with respect to some scale of comparison, the reference set that the scaleorders often needs to be explicitly mentioned. Consequently, it is not surprisingthat these words cooccur with NSRCs at a very high rate. Indeed, we notedin connection with example (6) (following Fox and Thompson 2007) that NPscontaining these adjectives sometimes sound incomplete without a modifyingrelative clause.

The dark bars in Figure 6 show that NPs with the “uniqueness adjectives”only and superlatives have far higher rates of cooccurrence with NSRCs thanNPs with other adjectives. And, as the Predictability Hypothesis leads us toexpect, the same applies to relativizer absence in those NSRCs (see the lighterbars in Figure 9).

Figure 6. Cooccurrence with NSRC and Relativizer Frequency by Adjective Type.

15. This was pointed out by Fox and Thompson (2007). As noted above, it was theirdiscussion of this observation that led us to the Predictability Hypothesis.


Turning now to the head nouns, one striking fact about the ones that cooc-cur with NSRCs most frequently is their semantic lightness – that is, nouns likething, way, time, etc. intuitively seem exceptionally non-specific in their refer-ence.16 Again, there is a semantic/pragmatic explanation for why semanticallylight nouns would cooccur with NSRCs more than nouns with more specificreference. In order to use these nouns successfully to refer to particular entities,some additional semantic content often needs to be added, and an NSRC is oneway of doing this. For example, saying (12a) is less likely to result in successfulcommunication than saying (12b):

(12) a. The thing is broken.b. The thing you hung by the door is broken.

Testing this intuition requires some basis for designating a noun as semanti-cally light. As a rough first stab, we singled out the non-wh counterparts of thequestion words, who, what, where, when, how, and why. That is, we looked athow often NSRCs occur in NPs headed by person/people, thing, place, time,way, and reason, and compared the results to the occurrence of NSRCs in NPsheaded by anything else. And, of course, we also compared the frequency of rel-ativizers in those NSRCs. The results, shown in Figure 7, are as we expected,

Figure 7. Cooccurrence with NSRC and Relativizer Frequency by Head Noun Type.

16. This was noticed independently (and first) by Fox and Thompson (2007).


with a far higher percentage of NSRCs in the NPs headed by the light nounsand a far lower percentage of NSRCs introduced by that.

6. Concluding Remarks

Summing up, the variation in relativizer frequency associated with particularlexical choices of determiners, prenominal adjectives, and head nouns in NPswith NSRCs can be explained in terms of two observations. First, whether aword is likely to cooccur with an NSRC depends in part on the semantics of theword and on what people tend to need to refer to. Second, the more predictablean NSRC is, the less useful a relativizer is in utterance processing. Thus, deter-miners, adjectives, and nouns that increase the likelihood of a following NSRCdecrease the likelihood that the NSRCs following them will begin with rela-tivizers.

Our focus has been on how lexical choices influence relativizer frequency.But many non-lexical factors are also known to be relevant. Ideally, a theoryof this phenomenon would bring all of these together and explain variation inrelativizer use in terms of a single generalization.

One attempt at a unified account of several diverse factors influencing rel-ativizer frequency is Fox and Thompson (2007). They conducted a detailedanalysis of a corpus of 195 NSRCs from informal speech, identifying a varietyof factors that correlate with relativizer presence or absence. Adapting a sug-gestion from Jespersen (1933), they argue that their examples fall at differentpoints along “a continuum of monoclausality”, with more monoclausal utter-ances being less likely to have relativizers. Among the factors contributing tomonoclausality, in their sense, are semantic emptiness of the clause containingthe NP that the NSRC modifies (which subsumes semantic lightness of the headnoun), simplicity of the head NP, and shortness of the NSRC.

The idea of a one-dimensional scale combining various factors relevant torelativizer omission has obvious appeal, particularly if it can be characterizedprecisely. However, we have two reservations about Fox and Thompson’s no-tion of “monoclausality”. First, their characterization is rather vague, and theygive no independent way of assessing degree of “monoclausality”. Second, theterminology is confusing, since even the most “monoclausal” of their examplescontain (at least) two clauses, in the sense that they have two verbs and two sub-jects. Nevertheless, we share the intuition that the contents of the two clausesin the more “monoclausal” examples are more closely connected.

We believe that the notion of predictability might provide a precisely de-finable scale that can do the work of Fox and Thompson’s “monoclausality”.


Predictability has the further advantages that its influence on relativizer absencecan be explained in processing terms and that it is often possible to explain whysome NSRCs are more predictable than others, as we did above.

Some of the utterances Fox and Thompson consider the most monoclausalare stock phrases or frequently used patterns (e.g. the way it is), which theysuggest may be stored as units. Stock phrases are by definition highly pre-dictable, so they fit well with our account. Some higher-level grammatical pat-terns17 might not be covered by a simple, lexically-based characterization ofpredictability like the ones we employed. If so, it would suggest that moresophisticated metrics of predictability should be explored. In short, the Pre-dictability Hypothesis of relativizer variation provides testable questions forfuture research. Next we briefly mention some of them.

First, we believe it is important to investigate what information speakers useto determine the predictability of an NSRC. For examples, does the grammaticalfunction of the modified NP matter? Or do speakers only use ‘local’ informa-tion to predict NSRCs (i.e. lexical properties of the NP).18 More specifically itwill be relevant for our understanding of predictability to see whether the fac-tors investigated in this paper interact. In other words, do speakers use simpleheuristic like the association of a particular lexical item with the likelihood ofan NSRC, or do speakers compute the overall predictability of an NSRC giventhe combination of lexical items in the modified NP? A further question thatdeserves attention is whether speakers use some sources of information morethan others to compute the predictability of a construction (here: NSRCs). As wehave seen in Section 3 predictability information related to determiners seemsto correlate much more strongly with the relativizer absence than informationrelated to adjectives and the head noun of the modified NP. This may simply bedue to the larger sample size available for the estimation of the mean for each ofthe words. But it is also possible that probability distributions for closed classitems (like determiners) are easier to acquire or are more efficient to use, sincethere are fewer items in those classes. We hope future research will discovergeneralizations that go beyond the particular phenomenon discussed here. On-going research that addresses some of the above issues and investigates a related

17. We know of no clear cases of such patterns that don’t have any identifying lexicalitems associated with them. One possible one is the X-er S1, the Y-er S2, as in Thebigger they are, the harder they fall. But it is not clear that the two Ss (they are andthey fall) should be analyzed as relative clauses here.

18. In this context, it is interesting that research on the effect of predictability on phoneticreduction (e.g., Bell et al. 2003) finds that the best measures of predictability are alsothe most local (i.e. bigrams).


phenomenon, complementizer omission, is presented in Jaeger et al. (2005) andJaeger (2006).

Finally, let us return to the theme of this volume: exceptions. We have shownthat the notion of exception can be generalized from hard (categorical) to soft(probabilistic) rules. We explored some soft exceptions to the optionality of rel-ativizers in NSRC, ultimately concluding that they could be explained in termsof the interaction of the semantics of the “exceptional” words, the pragmaticsof referring, and processing considerations.

Those who question the use of gradient models in syntax might suggest thatthis illustrates an important difference between hard and soft generalizations,namely, that the latter reflect facts about linguistic performance, not compe-tence, and will hence always be explainable in terms of extra-grammatical fac-tors, like efficiency of communication. In contrast, they might argue, many cat-egorical generalizations are reflections of linguistic competence, and hard ex-ceptions to them may be as well.

We would respond that it is always preferable to find external explanationsthat tie properties of language structure to the functions of language and tocharacteristics of language users. There is no basis for bifurcating linguisticphenomena a priori into those that are and those that are not amenable to exter-nal explanation. In particular, such explanations should be sought for both hardand soft exceptions. We know of no reason to believe that they will always bepossible for the soft cases, but not the hard cases.

References

Alan Bell, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory, andDaniel Gildea

2003 Effects of disfluencies, predictability, and utterance position on wordform variation in English conversation. Journal of the Acoustical So-ciety of America 113 (2): 1001–1024.

Bresnan, Joan, Anna Cueni, Tatiana Nikitina, and Harald Baayen2007 Predicting the dative alternation. In Cognitive Foundations of Inter-

pretation, G. Boume, I. Kraemer and J. Zwarts (eds.), 69–97. Amster-dam: Royal Netherlands Academy of Science Workshop on CognitiveFoundations of Interpretation.

Bresnan, Joan, and Tatiana Nikitina2003 On the Gradience of the Dative Alternation. Ms.

Chomsky, Noam1955/75 The Logical Structure of Linguistic Theory. Chicago: University of

Chicago Press.


Chomsky, Noam1957 Syntactic Structures. The Hague: Mouton.

Chomsky, Noam1962 A Transformational Approach to Syntax. Third Texas Conference on

Problems of Linguistic Analysis in English, A. Hill (ed.), 124–169.Austin: The University of Texas.

Chomsky, Noam1966 Topics in the Theory of Generative Grammar. The Hague: Mouton.

Fowler, H. W.1944 A Dictionary of Modern English Usage. Oxford: Oxford University

Press.

Fox, Barbara A., and Sandra A. Thompson2007 Relative clauses in English conversation: Relativizers, frequency and

the notion of construction. Studies in Language 31, 293–326.

Hawkins, John A.2004 Efficiency and Complexity in Grammars. Oxford: Oxford University

Press.

Huddleston, Rodney, and Geoffrey K. Pullum2002 The Cambridge Grammar of the English Language. Cambridge:


Jaeger, T. Florian2005 Optional that indicates production difficulty: Evidence from disfluen-

cies. Paper presented at Workshop on ‘Disfluencies in SpontaneousSpeech’. Aix-en-Provence.

Jaeger, T. Florian2006 Probabilistic syntactic production: Expectedness and syntactic reduc-

tion in spontaneous speech. Ph. D. diss., Stanford University.

Jaeger, T. Florian, Roger Levy, Thomas Wasow, and David Orr2005 The absence of ‘that’ is predictable if a relative clause is predictable.

Paper presented at conference ‘Architectures and Mechanisms of Lan-guage Processing’. Ghent.

Jaeger, T. Florian, David Orr, and Thomas Wasow2005 Comparing and combining frequency-based and locality-based ac-

counts of complexity. Poster presented at the 18th CUNY SentenceProcessing Conference. Tucson, Arizona.

Jaeger, T. Florian, and Thomas Wasow2006 Processing as a source of accessibility effects on variation. Proceed-

ings of the 31st meeting of the Berkeley Linguistic Society, R.T. Coverand Y. Kim (eds), 169–180. Ann Arbor: Sheridan.


Jespersen, Otto1933 Essentials of English Grammar. London: Allen & Unwin.

Marcus, Mitchell P., Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor1999 Treebank III. Linguistic Data Consortium, University of Pennsylva-

nia.

Race, David, and Maryellen MacDonald2003 The use of ‘that’ in the production and comprehension of object rela-

tive clauses. Paper presented at 26th Annual Meeting of the CognitiveScience Society.

Ratnaparkhi, Adwait1997 A simple introduction to Maximum Entropy Models for Natural Lan-

guage Processing. Technical Report 97-08, Institute for Research inCognitive Science, University of Pennsylvania.

Ross, John R.1967 Constraints on variables in syntax. Ph. D. diss., MIT.

Wasow, Thomas2002 Postverbal Behavior. Stanford: CSLI Publications.

Corpus evidence and the role of probabilityestimates in processing decisions

Ruth Kempson

Wasow, Jaeger and Orr (WJO) address the phenomenon of exceptions from abackground of increasing interest in models of language where generalizationsabout natural languages are made on the basis of probabilistic generalizations,rather than on categorical distinctions.∗ What they provide is a case for a con-cept of gradient exceptionality, expressed in terms of what is unlikely to occur –the other side of the coin from what does occur with high predictability. Theexample is the correlation between the predictability of a given determiner, oradjective, or noun occurring with a relative clause and the likelihood of thatrelative occurring without a relativizer: an expression which is likely to occurwith a relative is unlikely to occur with a relativizer. In this demonstration andthe consequences they draw from it, the window of focus is deliberately narrow,with subject relatives, relatives with a wh-relativizer, non-finite relatives, andpied-piping constructions all left on one side from the corpus cull they make,as displaying different idiosyncracies which detract from the primary issue ofwhat makes the relativizer preferred or dispreferred when it is essentially op-tional.

It is a little disappointing that the variety of relative-clause types consid-ered is so narrow, since the distinction between restrictive and nonrestrictiverelatives, one of the primary features supposedly distinguishing that- and wh-marked relatives in English is, as they note, not clear-cut. Relatives with thatexceptionally allow non-restrictive construals, particularly if they occur secondin a sequence of relatives:

(1) There was that man at the party that you had introduced me to, thatannoyed me enormously by his pompous posturing.

(2) Last week I bought this game pie for the party, that went bad on mebefore the end of the week.

* I am grateful to Jieun Kiaer, Nancy Kula and Lutz Marten for comments on this note,and the issues which the WJO paper raises.

198 Ruth Kempson

(3) I am thinking of buying a piece of land, that I hope you like.

So, of the finite relative clauses, it is only relatizer-less relatives which requirerestrictive construal.

Despite the restrictions on their corpus cull, WJO provide what are nonethe-less fascinating tables displaying how individual determiners vary in their like-lihood of co-occurrence with a that relativizer, and adjectives, and also nouns,with in the determiner class the indefinite a being the determiner to occur mostfrequently with an accompanying that relativizer, in the noun class it is the in-definite stuff which is the most common, way that is least common; and in theadjective class it is the adjective little that strikingly comes highest in the list,being an average of over eight times more likely to have a relativiser than non-subject relatives modifying NPs with the adjective last. Some of these distri-butions seem more puzzling than others; however in all cases, as WJO demon-strate, there is a regular correlation between predictability in the corpus of theparticular word being associated with a relative, overall phrasal predictability ofrelative clause modification, and predictability of the relativiser. On these gra-dience lists, the quantifiers present perhaps the least obvious distribution, witha at the top of the list with the highest proportion of relative clauses with that,with some coming lower in the list, but nevertheless twenty-five percent morethan the numerals to occur with the relativiser. On the other hand, every and allcome out bottom, with any displaying three times the proportion of that-specificrelative clauses than every, and double that of all. This makes any simple-minded account of quantification based exclusively on quantificational proper-ties seem unlikely. Based on these differential probabilities of occurrence withrelative clause, WJO provide a measure of the cumulative predictability of rel-ative clause construal of the determiner-adjective-noun sequences so collected;and from this basis, they pose the claim central to their paper, the so-calledPredictability Hypothesis: the more predictable a non-subject relative, the lessfrequent is its co-occurrence with the relativiser that. Whatever the surprisesthere may be in the probability estimates associated with individual words, thisis an intuitive result; and it is extremely good to see this properly quantitativelyconfirmed, buttressing what is otherwise no more than an intuition.

The question, then, is why there should be such a strong correlation be-tween predictability and lack of relativiser? And this is where the interest ofprobability-based results arises: should such correlations be explicable solelyin terms of the interaction of other pragmatic, semantic, or processing-orientedconsiderations – with probability assessements themselves playing no part inthe explanation; or does the predictability itself have a role to play? The startingpoint for the analysis which WJO provide is their conjecture that because the

Corpus evidence and the role of probability estimates in processing decisions 199

‘maximal entropy classification’ which they used to provide the accumulatedmeasurements of the predictability of relative clauses can be computed by stan-dard machine-learning methods, it is plausible to assume that “speakers haveaccess to estimates of how likely a relative clause is in a given context.” Theygo on from there to explore the composite effect of parsing and or productionconsiderations in conjunction with the intrinsic content of the various deter-miners/adjectives/ nouns, and from there consider how these might in part ex-plain the probability distributions. One such factor is that both determiners andnouns which allow anaphoric, context-dependent interpretations will not needa relative clause modifier whenever they can be so identified. The other is thatthose nouns which are semantically “light” but do not allow anaphoric formsof construal almost must occur with a relative clause modifier. The processingexplanation they offer is that in such cases, the presence of the relativizer hasrelatively low functional load. In this connection, one factor which might in-fluence processing considerations over and above occasion-specific functional-load considerations is the effect of routinization. That is, where there is com-mon co-occurrence of determiner/adjunctive/noun and the presence or absenceof the relativizer, e.g. in the predictability of way and lack of relativizer andpredictability of the indefinite article a and presence of the relativizer, such co-occurrence might become stored as a routinized strategy associated with thatparticular item, thereby accentuating the frequency distribution results for casesat either end of the continuum (see Cann and Kempson 2008 for arguments thatroutinization is a force in syntactic change).

WJO note with approval the Fox and Thompson observation of ‘mono-clausality’ of relativizer-less relatives, but without exploring any semantic ana-logue to this, they suggest that their account in terms of predictability mighttake the place of this “rather vague” mono-clausal notion; and they proceed toset out explanations that might confirm such a stance. However, this move istoo swift. Rather than simply seeking to replace this observation altogether, theauthors might have considered the semantic analogue to the Fox and Thomp-son observation. This is that relatives can be used either to build up a com-plex restrictor for a quantifying expression within a single clause, i.e. a restric-tive relative clause construal, or, conversely, they can be used to provide anadjunct, independent structure, a nonrestrictive relative clause construal. In-deed, whatever the difficulties of formally characterising nonrestrictive relativeclause construal (see Potts 2004 for a re-analysis in terms of ‘supplements’ forwhich he gives a conventional-implicature analysis), it is not in question that,unlike in restrictive relative clause construals, the two clauses give rise to twoindependent propositions, and in some analyses, the distinctiveness of the twois made explicit (Potts 2004, Kempson, Meyer-Viol and Gabbay 2001, Cann,

200 Ruth Kempson

Kempson and Marten 2005).1 This distinction is often reported to be only dis-ambiguated by intonation. In writing, where relativizers may play the role ofdefining a clausal edge but cannot disambiguate between restrictive and nonre-strictive construals, it is only the lack of any such indicator that can unambigu-ously indicate a restrictive construal. The “monoclausal” observation of Fox andThompson thus has a natural counterpart in a semantic characterisation of rela-tive clause construal: relativizer-less relatives uniquely identify a single overallassertion, a distinctive attribute of restrictive relative-clause construal whichhas independent syntactic and semantic motivation. If, then, a speaker is plan-ning a relative clause sequence indicating a restrictive construal, they may noteven consider the possibility of using a form which would allow the alternativeform of construal: certainly the most secure way of ensuring the appropriateconstrual is to select a form which precludes it, and of the finite forms, onlythe relativizer-less form definitively does so. Hence the more likely the form isto be associated with a restrictive form of construal, the less likely it is to beintroduced with a form which allows for any other form of construal. By com-parison, the move from the demonstration of the statistical correlation betweenprobability of a relative-clause and inverse probability of the relativizer, to theassumption that calculations of probability might drive production decisions,is a leap which needs substantial independent argumentation. At the very least,there is a well-motivated alternative to be aired.

There is in any case linguistic evidence from other languages which tendsto favour the explanation of relativizer-less relatives in terms of definitivelyindicating the singleton status of the over-all propositional structure. In somelanguages boundary marking of structure can be made by tone. One such is Be-mba. Bemba is a tone language which marks relative clauses in one of two ways,by tone or by pronominal marking. Relative clauses marked by tone alone areexclusively associated with restrictive construal and have to coincide with whatis called the conjoint form of the verb, the low tone of the conjoint verb-formdetermining that the noun head and verb initiating the relative clause will beprocessed as a single prosodic unit. In consequence the construal of the relativeas an integral part of the containing structure is unambiguously indicated. Rel-ative clauses involving pronominal marking, being morphologically marked,can be construed restrictively or nonrestrictively, these construals being distin-

1. Some authors have argued that nonrestrictive relatives are presuppositional, but thereare many examples to the contrary:(i) John ignored Mary, who burst into tears.


guished by use of the conjoint form (low tone) and the disjoint verb (high tone).2

The striking aspect of the two strategies, tonal vs pronominal, is that they donot distribute in a complementary fashion. Rather, just one of those strategiesprovides unambiguous indication that the producer is continuing immediatelywith construction of a complex restrictor, i.e. a restrictive relative clause. Thusit is the use of the conjoint verb form with its low tone that forces restrictiveconstrual in Bemba, analogous to the relativizer-less relatives of English. Suchparallels from analysis of one language to another have to be treated with somecaution, of course. In principle, the correlation between Bemba tone and mor-phologically explicit relative-clause marking might well be characterisable interms of probability of co-occurrence. Nevertheless, the explanation of suchconjoint low tone in terms of phonological indication of the mode of composi-tionality seems much more consistent with orthodox assumptions about how toexplain encoded properties of natural language (see Cheng and Kula 2006 forindependent arguments of the feeding relation between phonological markingand Bemba relative-clause structure). And this, by analogy, favours the expla-nation of the distribution of relativerless relatives in English in terms of theirunambiguous correspondence with restrictive relative clause construal.

WJO are careful to keep the Predictability Hypothesis as a claim restrictedonly to relative clauses of a particular type, and applied only to English. How-ever, they end by asking questions of a much more general nature that pre-sume the relevance of predictability weightings in the making of speakers’ de-cisions. They ask how do English speakers determine the predictability of anon-subject relative clause; and do the speakers compute over-all predictabili-ties or do they rather manipulate locally available heuristics of particular items?Further questions might be whether there are speed-up or conversely lengthen-ing phenomena associated with presence or absence of complementizer choice.Are there also any correlations between how many average words follow af-ter each determiner and whether this affects the occurrence of the relativizer?There are also more general questions. Predictability correlations are string-based observations and not category-specific, and one might expect that if theycan be manipulated constructively by speakers, they should provide a basis for

2. There are differences between object and subject marking, with morphological mark-ing of object relatives taking several forms but with restrictions on the availabilityof the tonal strategy. However, all that is relevant here is that the low-tone strategy,which is the conjoint form of the verb, is invariably associated with a restrictive con-strual (see Cheng and Kula 2006 for details). These observations are due to NancyKula; and I am grateful to both her and Lutz Marten for discussing these data withme and reminding me of their relevance to this issue.

202 Ruth Kempson

explaining distributions in other cases where two options are apparently equallyavailable. This raises fascinating new research questions. Is it the case, by anal-ogy with these cases, that in cases where two alternative forms are possible,but one much more probable than the other, that the morphologically moremarked form is less likely to be chosen, being unnecessary? One such case isstructural vs prosodic indication of question-hood. Questions, incidentally likenonrestrictive relatives, are invariably marked by intonation, a para-linguisticmarking characteristically recognisable early on in a parse sequence. By anal-ogy, if such prosodic form is so reliably associated with question construal, onemight expect that speakers of a language might deem it inessential to providemorphologically explicit forms of interrogative; and indeed in many languagesthey commonly do indeed use declarative rather than interrogative forms, rely-ing solely on the prosody. However, as the authors are well aware, the relevanceof probability results has to be treated with caution: probability of occurrencecannot in general be a guide as to whether or not a simpler form will be used.Take the case of approaching an information desk in an airport. The speakerhas two ways of asking a yes-no question, either the declarative form (withoutauxiliary) or an inverted form with an auxiliary. Does the very fact that youare highly likely to be construed by the person at the desk as asking a questioninfluence your decision to present it in one form rather than another, with a ten-dency to choose the simpler declarative form? One might seek empirical test ofthis prediction, but intuition would surely suggest the answer is “No”.

The moral to be drawn from this fascinating setting out of data and prob-ability assignments thus seems to be two-fold. It is clear on the one hand thatprobability distributions over corpus evidence, if reliably replicable, providefascinating new data which anyone facing up to the challenge of articulatinggrammar interfaces will be interested in mulling over. On the other hand, theconclusion that speakers manipulate probability estimates as input to the deci-sions as to how to say what they do would seem to be as yet premature. Whilethere are clear probabilistic distributions to be culled from language data togreat effect, providing new impetus for theoretical explanations of a subtletymost frameworks do not make provision for, it remains far from obvious thatprobabilistic distributions constitute part of the explanation. The test of suchputative explanations will be their generalisability to explain optional distribu-tions on a broad cross-linguistic basis.


References

Cann, Ronnie, and Ruth Kempson2008 Production pressures, syntactic change and the emergence of clitic

pronouns. In Language in Flux, Robin Cooper and Ruth Kempson(eds.), 179–220. London: College Publications.

Cann, Ronnie, Ruth Kempson, and Lutz Marten2005 The Dynamics of Language. Oxford: Elsevier.

Cheng, Lisa, and Nancy C. Kula2006 Syntactic and phonological phrasing in Bemba relatives. ZAS Papers

in Linguistics 43: 31-54.

Fox, Barbara A., and Sandra A. Thompson2007 Relative clauses in English conversation: Relativizers, frequency, and

the notion of construction. Studies in Language 31: 293-326.

Kempson, Ruth, Wilfried Meyer-Viol, and Dov Gabbay2001 Dynamic Syntax. The Flow of Language Understanding. Oxford:

Blackwell.

Potts, Christopher2002 The Logic of Conventional Implicatures.Oxford: Oxford University

Press.

Response to Kempson’s comments

Thomas Wasow, T. Florian Jaeger and David Orr

Kempson’s interesting commentary raises two important points.∗ First, whileextolling the value of probabilistic corpus data, she is not ready to accept “thatspeakers manipulate probability estimates as input to the decisions as to how tosay what they do”. Second, she suggests an alternative to our attempt to explainthe correlation between predictability of non-subject relative clauses and theabsence of that in such clauses. We discuss these points in reverse order andraise some additional questions for future research.

Our proposed explanation of the correlation, which is admittedly somewhatprogrammatic, is that more predictable NSRCs are easier to produce and/orcomprehend than less predictable ones, and hence do not need the extra func-tion word. Kempson’s alternative explanation is based on the fact that relativeclauses without a relativizer must be interpreted as restrictive, whereas non-restrictive construals are often possible when a relativizer is present. She sug-gests that relativizer omission is used as a way of disambiguating the intendedconstrual of the relative clause.

She points out that another method of disambiguation can be intonation.Since intonation is not marked in writing, her reasoning predicts that relativizeromission should be more common in writing than in speech. As we first notedin Jaeger and Wasow (2005), this does indeed seem to be the case. NSRCsin the parsed portions of the Wall Street Journal (WSJ) and the Brown cor-pus (BC) are significantly less likely to have a that relativizer (24% and 11%,respectively) than NSRCs in the parsed Switchboard corpus (SWBD: 43%;χ2 = 453.0, p < 0.0001). This difference decreases but prevails even when allrelativizer types are counted (WSJ: 47%, BC: 36%, SWBD: 52%; χ2 = 79.8,p < 0.0001) and after other factors influencing that are controlled for (see alsoFox and Thompson 2007, who report 60% relativizer rate for NSRCs in infor-mal conversations). Note in particular that NSRCs in the two written corporaare on average 21–36% longer than NSRCs in the Switchboard. A priori, this

* This reply benefited immensely from the feedback by Harry Tily, whose challengingcomments led us to entertain additional alternatives to our hypothesis that NSRCpredictability drives that-omission.

206 Thomas Wasow, T. Florian Jaeger and David Orr

would suggest the opposite of the observed pattern since longer NSRCs aremore likely to contain a relativizer (Race and MacDonald 2003; Jaeger 2006).The observed distributional differences between speech and written texts henceare in line with Kempson’s hypothesis (see Jaeger and Wasow 2005, for analternative explanation based on the hypothesis that relativizer mentioning isdriven by production pressures).

As intriguing as it is, Kempson’s ambiguity avoidance hypothesis leads toa prediction that is inconsistent with the data discussed in our paper. The prob-lem with the ambiguity account is related to the link between predictability andrestrictiveness. Kempson does not discuss this link. The discussion in section4 of our paper, on the other hand, provides a natural link between restrictive-ness and predictability: when the content of an NP minus its relative clauseis insufficient to pick out the intended referent, some kind of additional modi-fier is likely to be included; an NSRC is one of the options, so the probabilityof an NSRC is relatively high. To be more precise, it is the probability of arestrictive NSRC that is relatively high in such contexts. After all it is restric-tive NSRCs rather than non-restrictive NSRCs that serve to provide additionalinformation necessary to identify a referent. In other words, the need for suffi-cient identifiability influences the distribution of restrictive NSRCs and henceis a cause for increased predictability of restrictive NSRCs in such contexts.Note that there may be other reasons why RCs are more predictable in somecontext than in others. Here and in our paper, we focus on increases in NSRCpredictability due to the pragmatically motivated need for certain referents tobe identifiable. Crucially, it is not restrictiveness that causes greater NSRC pre-dictability.

If the need for identifiability is one of the major factors determining NSRCpredictability, this means that more predictable NSRCs are likely to be restric-tive. The predictable NSRCs discussed here occur in contexts where they willnaturally be interpreted as restrictive, irrespective of whether a relativizer ispresent. If disambiguation between restrictive and non-restrictive construals isone of the functions of relativizer omission, then we should expect omissionto occur most when the possibility of a non-restrictive interpretation is great-est. By much the same reasoning that led to the prediction of more relativizeromission in writing than in speech, Kempson’s disambiguation account wouldpredict that less predictable restrictive NSRCs would have higher rates of rela-tivizer omission. Since restrictive NSRCs in contexts that don’t require furtheridentifying information are more likely to be misconstrued as non-restrictive,speakers should be more likely to omit the relativizer to guarantee the intended(restrictive) reading. And this is of course the exact opposite of our central em-pirical finding.

Response to Kempson’s comments 207

The point here is that the correlation between predictability of an NSRCand absence of a relativizer seems natural from a processing perspective, butnot if relativizer omission is thought of as a disambiguation strategy along thelines Kempson suggests. There is at least preliminary evidence that relativizersfacilitate processing. There is some debate as to whether relativizers help pro-duction or comprehension (or both). On the one hand, relativizer presence hasbeen shown to facilitate comprehension (e.g. Race and MacDonald 2003). Onthe other hand, there is evidence that relativizer omission is correlated with pro-duction complexity (Jaeger and Wasow 2005; see also Ferreira and Dell 2000for complementizers), but also that relativizers do not seem to alleviate produc-tion difficulty (Jaeger 2005). While future studies are necessary to test whetherspeakers insert relativizer to facilitate production or comprehension, there is anestablished link between relativizer presence and processing (for further dis-cussion, see Jaeger,2006; Levy and Jaeger 2007). Similarly, high-predictabilityof a parse can alleviate or avoid comprehension difficulties (see Jurafsky 2003for references). Thus providing relativizers for less predictable NSRCs seemslike a reasonable hypothesis, although, admittedly, future work is necessary totest it.

The discussion of Kempson’s proposal brings up another interesting point.Our work so far does not show that there is a direct causal link between NSRCpredictability and relativizer omission. Could it be that it is the need for iden-tifiability that directly causes relativizer omission? While we are not aware ofany theory that would predict that this, it is a testable question that should beaddressed in future research. As Harry Tily also points out to us, it would beworth investigating to what extent variance in the predictability of NSRCs isexplained by the need for identifiability, and to what extent other factors deter-mine NSRC predictability. If other factors influence NSRC predictability andif increases in NSRC predictability due to these other factors correlate with rel-ativizer omission, this would provide strong evidence for a direct causal linkbetween NSRC predictability and relativizer omission.

Turning to the question of whether “speakers manipulate probability esti-mates”, we are puzzled why Kempson seems so reluctant to think that they do.In many other areas of cognitive science, including motor control (Trommer-häuser et al. 2005), visual inference (Kersten 1999), concept learning (Tenen-baum 1999), and reasoning (Anderson 1990), there is little controversy over thefact that human information processing involves access to probabilistic distribu-tions. Why should language be so different? Indeed, research over the past fewyears has revealed many cases of probabilistically-conditioned language pro-duction. For example, predictable syllables (Aylett and Turk 2004) and morepredictable words (Bell et al. 2003) are pronounced shorter and with less ar-


ticulatory detail. Similarly, vowels that are more predictable given the preced-ing segments in a word are produced shorter and with less distinct formants(van Son and Pols 2003). And cases of probabilistically-condition reductionare not limited to the phonetic level. Jaeger (2006) provides evidence that com-plementizer omission is correlated with predictability of a complement clause.Even phrasal omission has been linked to probabilistic distributions (see Resnik1996 on the distribution of implicit objects as in John ate (dinner) before Maryarrived.). For a more detailed discussion of these phenomena as well as aninformation-theoretic account that links probabilistically-conditioned reductionto efficiency and successful information transfer, see Jaeger (2006: Chapter 6).

As far as we can tell, the widespread assumption that knowledge of languagemust consist of categorical mechanisms is a legacy of half a century dominatedby grammatical theories built with tools borrowed from logic. That assumptionwas generally accepted for many years in part because the computations neededto develop serious quantitative models of language were infeasible with thetechnologies of the time. Over the past twenty years or so that has changed, andthere is now a wealth of interesting results on language built with the tools ofstatistics and probability.

This has led to a vigorous debate within linguistics over the role of proba-bilistic findings in the theory of language; see, for example, Newmeyer (2003,2006), Gahl and Garnsey (2004, 2006), and Jaeger (2007), among others. Kemp-son does not actually commit herself to one side or the other in this debate, butshe makes it clear which side she thinks bears the burden of proof.

There are, however, other passages in her comments in which her prose sug-gests just the opposite. One example is her suggestion that a correlation betweena lexical item and either presence or absence of relativizers “might becomestored as a routinized strategy associated with that particular item”. The exam-ples she gives (way is associated with relativizer absence and a with relativizerpresence) are not categorical constraints, as our corpus studies demonstrate; sothe stored strategies she posits would have to be probabilistic.1 Similarly, inarguing for the role of restrictiveness in the correlation between predictabilityand relativizer omission, she writes the following:

“… the more likely the form is to be associated with a restrictive form of con-strual, the less likely it is to be introduced with a form which allows for any otherform of construal.”

1. Incidentally, the discussion in Jaeger (2006: Chapter 6.2.3) contains control studiessuggesting that the effect of predictability on relativizer omission holds beyond afew conventionalized tokens.


This is a manifestly probabilistic claim.2 Since relativizer absence categoricallyentails restrictiveness, she needs a probabilistic formulation in order to avoid thefalse prediction that all restrictive NSRCs lack relativizers. But if speakers donot “manipulate probability estimates”, Kempson needs to explain where thenon-categorical nature of this correlation comes from.

We hasten to add that we are in broad agreement with much of what Kemp-son says. She is quite right that the focus of our paper is narrow, and that much isto be learned from broader investigations looking at the effects of predictabilityin a wider range of constructions and languages (for examples of such work, seeJaeger 2006 on English complementizer omission; Jaeger 2007 on reduced sub-ject relatives – so-called whiz-deletion; and ongoing work on relativizer omis-sion in Danish). We also agree that restrictiveness may be an important factorin relative clause structure. In both connections, her discussion of Bemba isfascinating and illuminating.

References

Anderson, John Robert1990 The Adaptive Character of Thought. Hillsdale, NJ: Lawrence Erl-

baum.

Aylett, Matthew, and Alice Turk2004 The Smooth Signal Redundancy Hypothesis: A functional explana-

tion for relationships between redundancy, prosodic prominence, andduration in spontaneous speech”. Language and Speech 47: 31–56.

2. A clarification may be in order. We use the term probabilistic to refer to events thatare conditioned by a probability of another event. This use ‘probabilistic’ is differentfrom its use in, for example, the Stochastic OT literature, where an event is calledprobabilistic when it occurs with a certain probability. In the latter sense, the claimthat relativizer omission is probabilistic is almost trivially true. Even if all variation inrelativizer omission were determined by absolutely categorical contrasts – which isextremely unlikely given that the same speaker will sometimes say the same sentencewith and sometimes without a relativizer – the resulting distribution would still bebinomial (with p being either 0 or 1, depending on the context). Our question here isdifferent. Our specific hypothesis is that it is the probability of an RC that influencesrelativizer omission. But we can ask more generally whether probabilities are partof the predictors of relativizer omission (cf. Jaeger 2006, 2007).


Bell, Alan, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand, Michelle Gregory andDaniel Gildea

2003 Effects of disfluencies, predictability, and utterance position on wordform variation in English conversation. Journal of the Acoustical So-ciety of America 113: 1001–1024.

Ferreira, Victor S., and Gary S. Dell2000 The effect of ambiguity and lexical availability on syntactic and lex-

ical production. Cognitive Psychology 40, 296—340.

Fox, Barbara A., and Sandra A. Thompson2007 Relative clauses in English conversation. Relativizers, frequency, and

the notion of construction. Studies in Language 31: 293–326.

Gahl, Susanne, and Susan Garnsey2004 Knowledge of grammar, knowledge of usage: Syntactic probabilities

affect pronunciation variation”. Language 80: 748–775.

Gahl, Susanne, and Susan Garnsey2006 Knowledge of grammar includes knowledge of syntactic probabili-

ties. Language 82: 405–410.

Jaeger, T. Florian2006 Redundancy and Syntactic Reduction in Spontaneous Speech. Stan-

ford University dissertation.

Jaeger, T. Florian2007 Usage or grammar? Comprehension and production share access to

same probabilities”. – Paper presented at the 81st Annual Meeting ofLinguistic Society of America (LSA), Anaheim.

Jaeger, T. Florian, Roger Levy, Thomas Wasow, and David Orr2005 The absence of ‘that’ is predictable if a relative clause is predictable. –

Paper presented at Architectures and Mechanisms of Language Pro-cessing conference. Ghent.

Jaeger, T. Florian, and Thomas Wasow2005 Production-complexity driven variation: Relativizer omission in non-

subject-extracted relative clauses”. – Paper presented at the 18thCUNY Sentence Processing Conference, Tucson, AZ.

Kersten, Daniel1999 High-level vision as statistical inference. In The New Cognitive Neu-

rosciences, 2nd ed., Michael S. Gazzaniga (ed.), 353–364. Cam-bridge, MA: MIT Press.

Levy, Roger, and T. Florian Jaeger2007 Speakers optimize information density through syntactic reduction.

In Advances in Neural Information Processing Systems (NIPS) 19,B. Schlökopf, J. Platt, & T. Hoffman, 849–856. Cambridge, MA: MITPress.


Newmeyer, Frederick J.2003 Grammar is grammar and usage is usage. Language 79: 682–707.

Newmeyer, Frederick J.2006 On Gahl and Garnsey on usage and grammar. Language 82: 399–404.

Race, David, and Maryellen MacDonald2003 The use of ‘that’ in the production and comprehension of object rel-

ative clauses. – Paper presented at the 26th Annual Meeting of theCognitive Science Society.

Resnik, Philip1996 Selectional constraints: An information-theoretic model and its com-

putational realization. Cognition 61, 127–159.

Tenenbaum, Joshua B.1999 Bayesian modeling of human concept learning. In Advances in Neural

Information Processing Systems (NIPS) 11, M.S. Kearns, S.A. Sollaand D.A. Cohn (eds.). Cambridge, MA: MIT Press.

Trommershäuser, Julia, Sergei Gepshtein, Laurence T. Maloney, Michael S. Landy,andMartin S. Banks

2005 Optimal compensation for changes in task-relevant movement Vari-ability. The Journal of Neuroscience 25(31): 7169–7178.

van Son, Rob J.J.H., and Louis C.W. Pols2003 How efficient is speech? Proceedings of the Institute of Phonetic Sci-

ences 25, 171–184.

Structured exceptions and case selectionin Insular Scandinavian

Jóhannes Gísli Jónsson and Thórhallur Eythórsson

Abstract. The diachronic development of case selection in Insular Scandinavian (Ice-landic and Faroese) provides strong support for a dichotomy of structured exceptions,which display partial productivity, and arbitrary exceptions, which are totally unproduc-tive. Focusing on two kinds of exceptional case, we argue that verbs taking accusativeexperiencer subjects form a similarity cluster on the basis of shared lexical semanticproperties, thus enabling new lexical items to be attracted to the cluster. By contrast,verbs taking genitive objects have no common semantic properties that could be thesource of partial productivity.∗

1. Introduction

The syntax of natural languages is characterized by general mechanisms thatoperate independently of particular lexical items and enable the speaker to pro-duce and understand an infinite number of sentences. Thus, it is fair to say thatsyntax, more than any other component of grammar, illustrates the regular andcreative aspect of language. Still, syntax is not entirely free of irregularities,especially in the domain of argument realization. To take one example, the factthat envy can have two objects in English (e.g. I envy you your good looks)is an exception to the generalization that only verbs denoting transfer of somekind can be ditransitive in English (see Goldberg 1995: 131–132 for relevantdiscussion).

* The work reported on here was funded in part by a three-year research grant fromRannís – The Icelandic Centre for Research during 2004–06, which is gratefullyacknowledged. We wish to thank Heimir Freyr Viðarsson, our research assistant atthe University of Iceland, for his invaluable assistance in preparing this paper. Wewould also like to thank an anonymous reviewer and the editors for useful comments,and the latter in particular for their patience. The authors bear a joint responsibilityfor the paper, but divided their labor in such a way that Jóhannes Gísli largely tookcare of the Icelandic part and Thórhallur of the Faroese part.

214 Jóhannes Gısli Jónsson and Thórhallur Eythórsson

In this paper we argue that exceptions to general patterns of argument real-ization are of two kinds. First, there are exceptions that are stored in the lexiconwithout any associative links between them, i.e. links which make it easier forspeakers to memorize the exceptions. These can be referred to as arbitrary ex-ceptions as they are based on an arbitrary list of lexical items. Second, thereare exceptions which involve clustering of lexical items on the basis of sharedsemantic properties. These can be called structured exceptions and they displaypartial productivity in contrast to arbitrary exceptions. Thus, arbitrary excep-tions are totally unproductive whereas structured exceptions can be extendedto new lexical items, provided that these exceptions have sufficient token fre-quency.1

As we illustrate below, the diachronic development of case selection in In-sular Scandinavian (Icelandic and Faroese) provides strong support for the pro-posed dichotomy between structured and arbitrary exceptions. The discussionwill focus on two kinds of exceptional case selection, accusative subjects andgenitive objects. It will be shown that accusative subjects, especially experi-encer subjects, have been semi-productive in the history of Insular Scandina-vian whereas genitive objects have been completely unproductive. To accountfor this difference, we argue that verbs with accusative experiencer subjectsform a similarity cluster on the basis of shared lexical semantic properties. Thisenables new lexical items to be attracted to the cluster. By contrast, verbs withgenitive objects are a disparate group with no common semantic properties thatcould be the source of partial productivity.

Exceptional but semi-productive classes are probably best known in inflec-tional morphology. For instance, the class of strong verbs in English exem-plified by cling/clung has been shown to be productive in experiments wherespeakers are asked to produce past tense forms of nonce verbs (Bybee andSlobin 1982, and Bybee and Moder 1983). This class has also attracted somenew members, e.g. the originally weak verbs dig, fling and string (Jespersen1942: 49–53), despite the sharp reduction in the overall number of strong verbsin the history of English. Bybee and Moder (1983) argue that the productivityof the cling/clung class is based on the phonetic similarity between the verbsin this class. They claim that cling/clung verbs are organized by family resem-blance around a prototypical member with a velar nasal word-finally; thus, thereis no rule at work here since there is no single phonetic feature that all theseverbs have in common. Moreover, many verbs that are phonetically similar to

1. Our use of the term “new lexical item” in this context lumps together verbs that aretruly new in the language as well as verbs that are attested in Old Icelandic but witha different case frame.

Structured exceptions and case selection in Insular Scandinavian 215

the cling/clung verbs have a different inflection (e.g. rig, bring and ring in thesense ‘encircle, put a ring on’). Clearly, our claims about exceptional and semi-productive case in Insular Scandinavian are similar in spirit to this proposalalthough we will not make any use of prototype theory.2

The paper is organized as follows. In section 2, we provide some backgroundinformation on the Icelandic case system. Section 3 discusses the decline ofgenitive objects in the history of Icelandic. The diachronic development of ac-cusative subjects is discussed in section 4, where it is shown that accusative casehas been extended to the subjects of some new verbs. Comparative data fromFaroese are presented in section 5 and shown to follow the Icelandic patterndiscussed in sections 3 and 4. Finally, some concluding remarks are offered insection 6.

2. The case system of Icelandic

There are two basic types of case in Icelandic: structural case and lexical case.Structural case is determined by syntactic position whereas lexical case is se-lected by particular lexical items. The main evidence for this dichotomy comesfrom the fact that lexical case is preserved in passives and ECM-infinitives butstructural case is not (see e.g. Zaenen, Maling and Thráinsson 1985). Usingthese diagnostics, nominative subjects and accusative objects represent struc-tural case whereas oblique subjects and dative and genitive objects exemplifylexical case.

Nominative is by far the most common subject case in Icelandic, as illus-trated in (1). However, numerous non-agentive verbs take oblique subjects, asin (2). The verb langa ‘want’, for example, selects an accusative subject andleiðast ‘be bored’ takes a dative subject and a nominative object.

(1) a. Nemendurnirthe.students-NOM

lásuread-3.PL

bókina.the.book-ACC

‘The students read the book.’b. Við

we-NOMhjálpuðumhelped-1.PL

nágrönnunum.the.neighbours-DAT

‘We helped the neighbours.’c. Faðirinn

the.father-NOMsaknarmisses

barnanna.the.children-GEN

‘The father misses the children.’

2. We have also been influenced by Pinker’s (1999) discussion of strong verbs in Eng-lish which, in turn, draws on ideas from connectionist psychology.


(2) a. Migme-ACC

langarwants

aðto

fara.go

‘I want to go.’b. Sumum

some-DATleiðistbores

þessithis-NOM

hávaði.noise-NOM

‘Some people are tired of this noise.’

Nominative subjects trigger number and person agreement with the finite verbbut oblique subjects do not. Apart from this difference, oblique subjects behavesyntactically very much like nominative subjects in Icelandic (see Zaenen, Ma-ling and Thráinsson 1985, Sigurðsson 1989: 204–209, and Jónsson 1996: 110–119 among others).

Accusative is clearly the most common object case in Icelandic, but manyverbs take dative objects, e.g. hjálpa ‘help’, as in (1b). Only a handful of verbsselect genitive objects (see Appendix for a list), including sakna ‘miss’, asshown in (1c). Nominative objects occur almost exclusively with two-placeverbs taking dative subjects, such as leiðast ‘be bored’, as in (2b).

Lexical case in Icelandic is semantically predictable in some instances andthis is most evident with dative indirect objects (see Yip, Maling and Jackendoff1987, Jónsson 2000, and Maling 2002). It can also be argued that dative casewith experiencer subjects is largely predictable from lexical semantics (Jónsson1997–98, 2003). The focus of this paper is on lexical case that is idiosyncrati-cally associated with particular lexical items and therefore has to be learned onan item-to-item basis. It is impossible, for example, to predict the accusativesubject with langa ‘want’ or the genitive object with sakna ‘miss’ from thelexical semantics of these verbs. Hence, it must be specified in the lexical en-tries of these verbs that they select an accusative subject and a genitive object,respectively.3

Still, as we will show in this paper, the semantic similarity between verbstaking accusative experiencer subjects has enabled these verbs to display someproductivity in the history of Icelandic. This means that the productivity of aparticular case frame does not require semantic (or syntactic) predictability. Onthe other hand, the status of accusative experiencer subjects in current-day Ice-landic is quite weak as there is a strong tendency to replace them by dative

3. We will not concern ourselves here with the issue of how idiosyncratic case arisesdiachronically. For interesting discussion on “irregularization” in inflectional mor-phology, see Nübling (this volume).


subjects, a phenomenon often referred to as Dative Sickness or Dative Substi-tution.4 Contrast the example in (3) with the one in (2a):

(3) Mérme-DAT

langarwants

aðto

fara.go

‘I want to go.’

It has been shown that idiosyncratic case is acquired rather late in Icelandic(Sigurðardóttir 2002 and Björgvinsdóttir 2003) and for some speakers it maynever be acquired with certain verbs. For example, a child that fails to acquirethe standard accusative case with langa ‘want’ during the critical period of lan-guage acquisition is likely to use a dative subject instead, as in (3). Thus, DativeSubstitution is an ongoing diachronic change that results from the unsuccessfultransmission of a grammar from one generation of speakers to the next. We willassume that the loss of genitive objects in the history of Icelandic also has itsroots in language acquisition but lack of historic data makes it nearly impossibleto argue for this on empirical grounds.

3. Genitive objects

3.1. Old Icelandic

The number of verbs taking genitive objects has been significantly reduced inthe history of Icelandic, from about 100 verbs in Old Icelandic to about 30in Modern Icelandic (see Appendix).5 With many of these verbs, the genitiveobject has been replaced by a PP or an object bearing a different case. In somecases, the verb has simply become obsolete, at least in the use where a genitiveobject was possible. We have not systematically investigated all the genitiveobject verbs in Old Iceland but we suspect that frequency is the most importantfactor in explaining why some of these verbs have survived but others havenot.

To judge by the textual sources, many of the genitive object verbs werealready quite rare in Old Icelandic. Genitive objects were also losing ground

4. There is also a tendency to substitute nominative for oblique case on theme/patientsubjects in Modern Icelandic – so called Nominative Substitution (see Eythórsson2002, Jónsson 2003, and Jónsson and Eythórsson 2003, 2005 and references citedthere).

5. The loss of genitive objects is well-known from other Germanic languages (see e.g.Delsing 1991 on Swedish, Allen 1995: 217–219 on English, and Donhauser 1998 onGerman).


in that some verbs could occur with other cases or PPs instead of the genitive(see Nygaard 1906: 142–148 for examples). One example is the verb missa‘miss, lose’ which alternated between genitive and accusative object in OldIcelandic. In Modern Icelandic the object must be accusative, except for a fewidiomatic phrases which preserve genitive, e.g. missa marks ‘be to no avail’(literally ‘miss the target’), missa sjónar á ‘lose sight of’ and missa fótanna‘trip’ (literally ‘lose the feet’).

Verbs with genitive objects in Old Icelandic can be divided into five syntac-tic classes, depending on the number and case marking of other arguments ofthe verb. The three biggest classes are exemplified in (4):6

(4) a. Nominative subject + Genitive object (NG-verbs)ÁsgerðurÁsgerður

varwas

þáthen

eftirleft

ogand

gættiguarded

búsfarm-GEN

þeirra.their

‘Ásgerður then stayed behind and looked after their farm.’(Egils saga, p. 455)

b. Nominative subject + Accusative object + Genitive object (NAG-verbs)ÞorgeirÞorgeir

lattidiscouraged

hannhim-ACC

utanferðar.travel.abroad-GEN

‘Þorgeir discouraged him to go abroad.’(Finnboga saga ramma, p. 651)

c. Nominative subject + Dative object + Genitive object (NDG-verbs)Hannhe

kvaðstsaid

ekkinot

varnaprevent

munduwould

henniher-DAT

máls.speech-GEN

‘He said he would not prevent her from speaking.’(Brennu-Njáls saga, p. 160)

There was also a small class of verbs taking an accusative experiencer subject +genitive object (fylla ‘become full of’, fýsa ‘want’, girna ‘desire’,minna ‘(seemto) remember’, vara ‘expect’, vilna ‘expect, hope’, and vænta/vætta ‘expect’),and an even smaller class of verbs with a dative experiencer subject + geni-tive object (batna ‘get better’, bætast ‘recover from’, fá ‘suffer’, létta ‘recoverfrom’ and ljá ‘get’). This is exemplified in (5):

6. Most of the examples from Old Icelandic in this paper are cited from editions usingModern Icelandic spelling (see bibliography) but they have all been checked forauthenticity against critical editions or manuscripts.


(5) a. Accusative subject + Genitive object (AG-verbs)Þessit-GEN

minnirremembers

migme-ACC

aðthat

þúyou

mundirwould

þáthen

koma.come

‘I seem to remember that you would come then.’(Heiðarvíga saga, p. 1391)

b. Dative subject + genitive object (DG-verbs)ÞuríðiÞuríður-DAT

batnaðiimproved

sóttarinnar.the.illness-GEN

‘Þuríður recovered from her illness.’(Eyrbyggja saga, p. 608)

Most of the AG-verbs and DG-verbs have either become obsolete (in the rel-evant use) or have ceased to select genitive case. For instance, the DG-verbbatna ‘get better’ can only take a nominative object in Modern Icelandic.7 Theonly verb that still takes a genitive object is vænta ‘expect’, but the subject casehas shifted to nominative. However, four of the AG-verbs still select accusativesubjects (i.e. fylla ‘become full of (water)’, fýsa ‘want’, minna ‘(seem to) re-member’ and vara ‘expect’). This is consistent with the major empirical claimof this paper that accusative subjects have been more resistent to diachronicchange than genitive objects in the history of Icelandic.

The historical decline of genitive objects can also be seen with NAG-verbs.Members of this class in Old Icelandic include the following verbs:8

(6) beiða ‘request’, biðja ‘request’, dylja ‘hide’, eggja ‘incite’, firna‘blame’, fregna ‘ask’, frétta ‘ask’, fylla ‘fill with’, fyrirkunna ‘blamefor’, fýsa ‘incite’, krefja ‘demand’, kveðja ‘demand’, letja ‘discourage’,minna ‘remind’, saka ‘accuse of’, spyrja ‘ask’, æsa ‘incite’

Of all these verbs, the only ones that are still regularly used with genitive ob-jects are biðja ‘request’, krefja ‘demand’ and spyrja ‘ask’. This is exemplifiedin (7):

7. Barðdal (2001: 197–198) claims that DG-verbs disappeared in Icelandic becauseof their low type frequency. An alternative explanation is that the token frequencyof genitive objects with each DG-verb was simply too low for the genitive to besuccessfully acquired.

8. As already mentioned, fylla ‘fill with’, fýsa ‘incite’ and minna ‘remind’ also occuras AG-verbs in Old Icelandic.


(7) ÉgI

þarfneed

örugglegasurely

aðto

biðjaask

þigyou-ACC

einhverssomething-GEN

á morgun.tomorrow‘I will surely need to ask you to do something tomorrow.’

With the other NAG-verbs, the genitive has been replaced by PPs, unless theverb has become obsolete or lost the sense indicated in (6). One example ofthis is minna ‘remind’. In Old Icelandic this verb could be used either with agenitive object, as in (8a), or a PP complement, but in Modern Icelandic it onlytakes a PP, as in (8b):

(8) a. Hónshe

hefirhas

minntreminded

mikme-ACC

þeirrathose-GEN

hluta.things-GEN

(Fornmannasögur.i.3)

b. Húnshe

hefurhas

minntreminded

migme-ACC

áof

þáthose-ACC

hluti.things-ACC

‘She has reminded me of those things.’

With some verbs, the genitive has become more or less restricted to formalregisters in Modern Icelandic, e.g. bíða ‘wait for’. For example, while the PP-variant in (9b) sounds very natural in all kinds of registers, the genitive variantin (9a) is clearly rather formal. This is very different from Old Icelandic wherethe genitive was the norm and the PP-variant was extremely rare.

(9) a. Margirmany

biðuwaited

Jóns.John-GEN

b. Margirmany

biðuwaited

eftirfor

Jóni.John-DAT

‘Many waited for John’

The story of genitive objects in Icelandic is a story of continuous loss and nogain. In fact, we are aware of only one verb with a genitive object that hasbeen added to the vocabulary of Icelandic in the last centuries.9 This is theverb óska ‘wish’, a variant of æskja ‘wish’ which takes a genitive object andis already attested in Old Icelandic. Moreover, we only know of one example

9. The verbs iðra ‘repent’ and krefjast ‘demand’ are attested with a genitive object inModern Icelandic but not in Old Icelandic. Still, they can hardly be seen as newadditions since the variants iðrast ‘repent’ (with the suffix -st) and krefja ‘demand’(without the suffix -st) are attested with a genitive object in Old Icelandic.


where a genitive object has become a variant alongside an original accusative ordative object. This happened in the Eastern fjords of Iceland with the verb nenna‘bother’ where genitive replaced dative.10 We suspect that this developmenthas its roots in phonology: The most common dative object with nenna is þessu‘this’, which becomes homophonous with the genitive form þess ‘this’ if thefinal vowel is elided. Deletion of the final vowel takes place regularly before aword that begins with a vowel, e.g. before the negation ekki ‘not’ which quiteoften follows the verb nenna. Thus, the change from dative to genitive withnenna seems to be a case of reanalysis triggered by phonological neutralizationof the contrast between dative and genitive.

3.2. Analysis

The facts discussed above show that genitive objects have been extremely un-productive in the history of Icelandic. This has manifested itself in the followingways:

(10) a. The number of verbs taking genitive objects has sharply declinedfrom the Old Icelandic period.

b. There are hardly any cases where genitive has become a variantwith verbs originally taking accusative or dative objects.

c. Virtually no new verbs with genitive objects have been added tothe Icelandic lexicon since the Old Icelandic period.

We claim that this lack of productivity is because verbs selecting genitive ob-jects in Old (and Modern) Icelandic do not form any semantically coherent sub-classes.11 In other words, they are arbitrary exceptions to general case selectionrules in Icelandic. Thus, Nygaard (1906: 142–148), in his classic syntax of OldIcelandic, is at pains to classify verbs with genitive objects, presenting no fewerthan eight subclasses and most of them are neither well-defined nor coherent.

One may wonder about the NAG-verbs listed in (6) which form a reasonablycoherent class as most of these verbs denote communication which the referentof the indirect object is expected to respond to in some way. The problem withthis class may be that the genitive is not common enough to provide a real basis

10. Using flýta sín ‘hurry’ with a genitive (literally ‘hurry self-GEN’) instead of thestandard flýta sér with a dative is also a known dialectal feature of the Eastern fjordsof Iceland. However, little is known about the details of this change; it awaits furtherinvestigation.

11. Verbs with genitive objects are all non-telic, i.e. they denote events that do not havea natural endpoint. Still, that property does not distinguish them from other transitiveverbs as many non-telic verbs take accusative or dative objects.


for productivity. To take one example, the verb eggja ‘incite’ only occurs withgenitive objects that denote some kind of action or undertaking (e.g. atganga‘attack’, verk ‘deed’, ferð ‘trip’ or útganga ‘walking out’), and even these kindsof noun phrases are often expressed as objects of the preposition til ‘to’ whenthey are used with eggja.12

Being arbitrary exceptions, verbs with genitive objects are stored in the lex-icon without any associative links between them. Such links between lexicalitems make them easier to memorize and therefore more learnable and lesslikely to undergo diachronic change. However, this may not be sufficient toexplain why verbs with genitive objects have failed to attract new members.We seem to need the additional assumption that new verbs entering the lan-guage always follow some general pattern with respect to case selection. Thus,a new verb cannot easily be analogized to an established verb on the basis ofsemantic similarity between the two verbs, a phenomenon referred to as isolateattraction by Barðdal (2001). This can be seen with the verb passa ‘guard, takecare of’, an 18th century borrowing from Danish. Since this verb is more or lesssynonymous with the NG-verb gæta ‘guard, take care of’ it looks like a goodcandidate for isolate attraction. Still, despite its obvious affinity with gæta, theverb passa has occured with an accusative object from its earliest attestation.

The accusative with passa represents the default object case and this optionseems to be chosen whenever there is no semantic class of dative verbs thatthe new verb could be attracted to. However, a new verb may vary between ac-cusative and dative object if the semantic basis for dative case is unclear. A casein point is the loan verb transportera ‘transport’ which is possible both with adative and an accusative object. Arguably, the dative is chosen by those speak-ers who feel that transportera belongs semantically to the class of motion verbsselecting a dative object (e.g. hrinda ‘push’, kasta ‘throw’, lyfta ‘lift’, sveifla‘swing’ and ýta ‘push’; see Maling 2002 and Svenonius 2002 for a discussionof these verbs). By contrast, speakers who do not share this intuition will opt forthe default accusative. The latter choice may be influenced by the fact that theverb flytja ‘move, transport’, taking an accusative object, is semantically closerto transportera than any other verb. However, this would not necessarily be acase of isolate attraction as the existence of flytja might simply prevent speak-ers from placing transportera in the same class as motion verbs with dativeobjects.

12. An informal count in the electronic corpus of Old Icelandic texts at http://www.lexis.hi.is (Textasafn Orðabókar Háskólans) reveals that genitive occurs with about12% of the examples of eggja.


4. Accusative subjects

Verbs with accusative subjects divide into two main types semantically: ex-periencer verbs and verbs taking theme/patient subjects. These classes will bediscussed separately in 4.1 and 4.2 below since there are important differencesbetween them in terms of their diachronic development.

4.1. Verbs with experiencer subjects

For convenience, all the verbs with accusative experiencer subjects in Old Ice-landic are shown in (11) with some preliminary semantic subclassification. Thislist, as the list in (18) below, is quite extensive as it includes all verbs that areattested with an accusative subject in a critical edition of Old Icelandic texts.Note also that some of these verbs could occur with a nominative or a dativesubject in Old Icelandic (see Viðarsson 2006 for a detailed discussion):

(11) Verbs with an accusative experiencer subject in Old Icelandic:

a. Verbs of physical discomfort: hungra ‘be hungry’, kala ‘sufferfrostbites’, saka ‘be hurt’, skaða ‘be hurt’, stinga ‘feel pain’,sundla ‘become dizzy’, svimra ‘become dizzy’, syfja ‘becomesleepy’, velgja ‘feel nausea’, verkja/virkja ‘ache’, þyrsta ‘feelthirsty’

b. Verbs of lacking: bila ‘fail, lack’, bresta ‘run out of’, nauðsynja‘need’, skorta ‘lack’, vanta ‘lack, need’, þrota/þrjóta ‘lack, runout of’

c. Verbs denoting feelings: angra ‘grieve’, ánægja ‘become happy’,forvitna ‘be curious’, fýsa ‘want’, girna ‘desire’, harma ‘vex’,heimta ‘want’, hryggja ‘grieve’, langa ‘want’, lysta ‘desire,want’, muna ‘want’, ógleðja ‘become unhappy’, ótta ‘fear’,slægja til ‘want (to have)’, tíða ‘want’, trega ‘grieve (over)’, ugga‘fear’, undra ‘be surprised’, vilna ‘hope’, öfunda ‘envy’

d. Verbs of cognition: dreyma ‘dream’, greina á um ‘disagree on’,gruna ‘suspect’, minna ‘remember vaguely’, misminna ‘remem-ber wrongly’, vara ‘expect’, væna ‘expect’, vænta/vætta ‘expect’

e. Verbs with affected experiencers: henda ‘happen to, concern’,kosta ‘cost’, skipta ‘matter to’, tíma ‘happen to’, varða ‘concern’

Some representative examples from the first three classes are provided in (12):


(12) a. ogand

munwill

þigyou-ACC

ekkinot

saka.be.hurt

‘and you will be all right’(Víga-Glúms saga, p. 1926)

b. migme-ACC

ogand

fólkpeople-ACC

mittmy-ACC

skortirlacks

aldreinever

mat.food

‘Me and my people never run out of food’(Bandamanna saga, p. 21)

c. enbut

ekkinot

slægirwants

migme-ACC

hérhere

tilfor

langvista.long.stay

‘But I am not tempted to stay here for a long time’(Grettis saga, p. 1094)

As the overview in (11) illustrates, these verb classes are fairly coherent seman-tically. This can be seen e.g. in the number of verbs having roughly the samemeaning, e.g. the verbs in (11b) and all the verbs glossed as ‘want’ in (11c).This raises the question if these verbs really are exceptional rather than follow-ing a rule linking accusative subjects and verbs meaning ‘want’. We doubt thatthere can be such a narrow lexical rule, because it would hardly be learnable. Inany case, this hypothetical rule would not apply to the verb vilja ‘want’ whichtakes a nominative subject in Old Icelandic as well as in Modern Icelandic.

Thus, we conclude that verbs with accusative experiencer subjects are struc-tured exceptions, i.e. they are not stored as isolated items in the lexicon butlinked via shared lexical semantic properties. Therefore, it is unsurprising thatthese verbs have displayed some productivity in the history of Icelandic. First,among the many new verbs that have been added to the Icelandic lexicon sincethe Old Icelandic period there are some that take accusative experiencer sub-jects, e.g. hrylla við ‘be horrified by’, óra fyrir ‘dream of’ and ráma í ‘have avague recollection of’:

(13) a. Nemendurnathe.students-ACC

hryllirhorrifies

viðat

þessarithis

tilhugsun.thought

‘The students are horrified by the thought of this.’

b. Engannobody-ACC

hefðihad

getaðcould

óraðdreamed

fyrirfor

þessu.this

‘Nobody could have dreamed of this.’

c. Migme-ACC

rámarrecollects

íin

aðto

hafahave

hittmet

hannhim

einuone

sinni.time

‘I have a vague recollection of having met him once.’


None of these verbs is attested in Old Icelandic but the oldest examples that wehave found of hrylla við and óra fyrir are from the 17th century and the oldestexample of ráma í is from the 19th century.13

Second, we know of one loan verb taking an accusative subject, the verbske ‘happen’. According to Óskarsson (1997–1998), the oldest example of thisverb is from the end of 14th century. In that particular example, and many laterones, the verb takes an accusative experiencer subject. The following exampleis from the middle of the 17th century:

(14) eins ogas

migme-ACC

hafðihad

skeðhappened

fyrirfor

áttaeight

árum.years

‘As had happened to me eight years earlier’(Píslarsaga séra Jóns Magnússonar, p. 60)

The youngest example of an accusative subject with ske that we have foundis from the middle of the 19th century. In Modern Icelandic, the affected ex-periencer is always expressed by a PP with the preposition fyrir ‘for’ (if it isexpressed at all).

Third, accusative has become a variant with some experiencer verbs. Withthe verbs hlakka til ‘look forward to’ and kvíða fyrir ‘dread, be anxious about’,both accusative and dative as well as the original nominative are possible sub-ject cases in Modern Icelandic. This is shown for kvíða fyrir in (15):

(15) a. Húnshe-NOM

kveiðwas.anxious

fyrirfor

prófunum.the.exams

b. Hanaher-ACC

kveiðwas.anxious

fyrirfor

prófunum.the.exams

c. Henniher-DAT

kveiðwas.anxious

fyrirfor

prófunum.the.exams

‘She was anxious about the exams.’

Fourth, accusative subject used to be possible with the verbs vona ‘hope’ (OldIcelandic vána) and skynja ‘sense’, but nominative is the original subject casewith these verbs and the only possible subject case in Modern Icelandic.14

13. These examples are in the electronic corpus of Icelandic texts athttp://www.lexis.hi.is (Ritmálssafn Orðabókar Háskólans (ROH)).

14. Interestingly, there are many examples of a genitive object with vona in ROH, mostlydating from the 19th century. It seems that the verb is used in the sense ‘expect’ inthese examples, similar to the genitive object verb vænta ‘expect’.


(16) a. Þáthen

vonarhopes

migme-ACC

aðthat

þærthey

smámsamangradually

fjölgi.increase

‘Then I hope that they increase in number.’(Alþingistíðindi 1859,466; ROH)

b. migme-ACC

skiniarsenses

eckinot

sannaratruer

enthan

seigi.say

‘I do not sense more truthfully than I say.’(GAndrDeil, 45; ROH)

Our final example here is klæja ‘itch’ which only takes a dative subject in OldIcelandic (17a) but also occurs with an accusative subject in Modern Icelandic(17b):

(17) a. því aðsince

mérme-ACC

klæjaritches

þarthere

mjög.much

(Sturlunga saga, p. 560)

b. því aðsince

migme-ACC

klæjaritches

þarthere

mjög.much

‘because I am itching there so much.’

The examples in (13)–(17) illustrate the partial productivity of verbs with ac-cusative experiencer subjects in the history of Icelandic. However, it seems thatthis class has become unproductive in present-day Icelandic. This can be seenmost clearly in Dative Substitution, as in (3) above, which first became commonin the middle of the 19th century and is widespread in present-day Icelandic. Theoverall number of verbs with accusative experiencer subjects has also declinedsomewhat since the Old Icelandic period.

The only clear sign of productivity of this class in present-day Icelandic isthe occasional use of accusative for the regular nominative with the verbs finnatil ‘feel pain’ and kenna til ‘feel pain’ (literally ‘feel to’). Using accusative fornominative with these verbs is not unexpected as many verbs denoting physicaldiscomfort take accusative experiencer subjects (see (11a)).

4.2. Verbs with theme/patient subjects

Verbs with accusative theme/patient subjects in Old Icelandic can be dividedinto three semantic classes. As shown in (18), the class of verbs denoting changeof state is by far the biggest:


(18) Verbs with theme/patient subjects in Old Icelandic

a. Motion verbs: bera ‘carry’, draga ‘pull’, hefja ‘raise; begin’, ke-fja ‘sink’, keyra ‘drive’, reiða ‘move about’, reka ‘drift’, velkja‘be tossed about’, víkja ‘be moved to one side’

b. Verbs denoting change of state: belgja ‘blow out’, birta ‘becomeclear’, blása ‘swell, blow’, brjóta ‘break’, brydda ‘arise’, byrja‘begin’, daga uppi ‘dawn up (turn to stone)’, deila ‘divide’, drepa‘be knocked down’, dökkva ‘darken’, enda ‘finish’, endurnýja‘renew’, eyða ‘be destroyed’, fenna ‘be covered with snow’, festa‘fasten’, fjara ‘ebb’, fjölga ‘increase’, frjósa ‘freeze’, fylla ‘fill’,gera ‘become’, grynna ‘become shallow’, herða ‘become hard’,knýta/hnýta ‘become crooked’, kreppa ‘become crippled’, kvelda‘become evening’, kyrra ‘calm’, leggja ‘become covered withice’, leiða af ‘follow from’, lengja ‘lengthen’, leysa ‘dissolve’,líða ‘come to an end’, lýsa ‘shine’, lægja ‘lower’, minnka ‘de-crease’, nátta ‘be overtaken by the night’, opna ‘open’, ómætta‘lose strength’, ónýta ‘become unusable’, rifna ‘tear’, rjúfa ‘split’,ryðja ‘disperse’, rýma ‘become wider’, ræsa ‘come true’, setja‘become’, skemma ‘become short’, skera ‘cut’, skilja ‘divide’,slíta ‘cut’, stemma ‘be obstructed’, stækka ‘become bigger’, stæra‘swell’, sækja ‘be affected by’, taka ‘be taken’, vatna ‘disappearin water’, vekja upp ‘awaken’, verpa ‘be thrown’, vægja ‘suppu-rate’, þröngva ‘diminish’, þynna ‘become thin’, æsa ‘be stirred’

c. Stative verbs: bíða ‘exist’, fá ‘be available’, geta ‘exist’, hafa út‘blow through’, heyra ‘be heard’, sjá ‘be seen’, skara ‘protrude’,sýna ‘be seen’

Representative examples from all these three classes are provided in (19) below.Note that the singular agreement on the verb in (19b) is crucial in showing thatthe subject is accusative rather than nominative.

(19) a. Þáthem-ACC

velktitossed

lengilong

útiout

íin

hafi.ocean

‘They were in rough seas for a long time.’(Eiríks saga rauða, p. 526)

b. Frausfroze-3.SG

aðat

honumhim

klæðinthe.clothes-ACC

öll.all-ACC

‘All his clothes froze to his body.’(Finnboga saga ramma, p. 635)


c. svoso

aðthat

ógerlaunclearly

sásaw

vegunathe.roads-ACC

‘So that it was difficult to see the roads.’(Egils saga, p. 478)

Verbs with accusative theme/patient subjects have shown very limited produc-tivity in the history of Icelandic. In fact, of the 77 verbs listed in (18), only about10–15 are still regularly used with an accusative subject in Modern Icelandicand even these verbs are increasingly used with nominative subjects. This num-ber includes none of the stative verbs and only two motion verbs, bera ‘carry’and reka ‘drift’. Moreover, we know of only one verb where accusative seemsto have replaced nominative case with theme/patient subjects. This is the verbdrífa að ‘come flocking’ exemplified below where Old Icelandic (20a) is con-trasted with Modern Icelandic (20b):

(20) a. Aðfangadag jólaChristmas Eve

drífaflock

flokkarnirthe.bands-NOM

aðto

bænum.the.farm

(Svarfdæla saga, p. 1788)

b. Aðfangadag jólaChristmas Eve

drífurflocks

flokkanathe.bands-ACC

aðto

bænum.the.farm

‘On Christmas Eve the bands come flocking to the farm.’

It is also worth noting that accusative is sometimes used instead of the originalnominative case with the verb taka niðri ‘touch bottom’ in Modern Icelandic asshown in (21). The use of the accusative here is presumably influenced by allthe accusative subject verbs denoting phenomena involving natural forces (seefurther discussion below).

(21) a. Báturinnthe.boat-NOM

tóktook

niðri.down

b. Bátinnthe.boat-ACC

tóktook

niðri.down

‘The boat touched bottom.’

It could be argued that verbs with accusative theme/patient subjects in Old Ice-landic formed semantically coherent classes just like verbs with accusative ex-periencer subjects. Nevertheless, these verbs have shown very little productivityin the history of Icelandic. We hypothesize that there are two reasons for thislack of productivity. First, the token frequency of these verbs was quite low


since they had a very restrictive usage.15 For instance, many of the verbs listedin (18b) were primarily used to describe phenomena involving natural forces,e.g. the verb lægja ‘lower’. Some fairly typical examples of this verb are shownin (22):

(22) a. Enbut

þegaralready

umin

voriðthe.spring

eras

sjósea-ACC

tókbegan

aðto

lægja.lower

‘But already in the spring as the sea got calmer.’(Egils saga, p. 408)

b. þegaralready

eras

sólinathe.sun-ACC

lægði.lowered

‘already when the sun set’(Eyrbyggja saga, p. 579)

c. Þáthen

tókbegan

aðto

lægjalower

veðrið.the.weather-ACC

‘Then the storm subsided’(Brennu-Njáls saga, p. 219)

The other reason for the lack of productivity of verbs taking accusative theme/patient subjects is competition from verbs with the “middle” suffix -st. The reg-ular way of forming causative pairs in Old and Modern Icelandic is by markingthe intransitive (inchoative) variant by the suffix -st in which case the subjectmust be nominative. This is exemplified by the verb opna ‘open’ in (23):

(23) a. JónJohn-NOM

opnaðiopened

hurðina.the.door-ACC

b. Hurðinthe.door-NOM

opnaðist.opened

Many of the verbs listed in (18) could in fact take the suffix -st in Old Icelandicand in some cases this variant would encroach upon the semantic territory ofthese verbs so that the form with -st prevailed. This is the case, for example,with the verbs endurnýja ‘renew’, grynna ‘become shallow’, opna ‘open’, sjá‘be seen’ and velkja ‘be tossed about’, which have all been ousted as intransitiveverbs by endurnýjast, grynnast, opnast, sjást and velkjast.

15. Low type frequency cannot be the explanation here since the number of verbs withaccusative theme/patient subjects was quite high; in fact, it was clearly higher thanthe number of verbs in the class of verbs with accusative experiencer subjects.


5. Comparison with Faroese

The Faroese case system is quite similar to the Icelandic one. However, an im-portant difference is that lexical case has been lost to a much greater extent inFaroese than in Icelandic, both with subjects and objects. In particular, geni-tive case has more or less fallen out of use in Faroese and has been replaced byother case forms or by prepositional constructions (see Thráinsson et al. 2004:248–252).

Given the close relations of the two Insular Scandinavian languages, an in-vestigation of the changes in lexical case in Faroese is interesting for the purposeof testing the predictions of our hypothesis for Icelandic that a semantically co-herent class is more resistant to change than a non-coherent class.

However, there is a problem with a diachronic investigation of Faroese inthat this language is poorly documented in its older periods. Therefore, it isnot possible to trace the changes Faroese has undergone as thoroughly as inIcelandic, which is well documented from the 12th century onwards. Alreadyin early texts, i.e. the Faroese ballads and other texts from the late 18th centuryand the early 19th century, Faroese was in the process of losing some of the casepatterns that are still preserved in Icelandic (Thráinsson et al. 2004: 426–436).

As a result of these changes, in Modern Faroese no verbs take genitive ob-jects, whereas accusative case on subjects is still found, although only to a verylimited degree (see Barnes 1986, Petersen 2002, Eythórsson and Jónsson 2003,Thráinsson et al. 2004, and Jónsson and Eythórsson 2005). Thus, the Faroesefacts are comparable to the Icelandic ones discussed in section 4 above, althoughthe development in Faroese is in a sense more “progressed” than in Icelandic.This means that the situation in Faroese is consistent with the hypothesis thata semantically coherent class is more resistant to change than a non-coherentclass.

5.1. Genitive objects

Whereas adnominal genitives and genitive objects of prepositions still occur tosome extent in Modern Faroese, genitive objects of verbs have completely dis-appeared. Modern Faroese verbs corresponding to verbs taking genitive objectsin Old and Modern Icelandic typically take accusative objects or PP comple-ments. Examples from Faroese involving the monotransitives freista ‘tempt’and njóta ‘enjoy’, both with an accusative object, are given in (24) (Poulsen etal. 1998):


(24) Faroese

a. umif

høgraright

eygaeye

títtyour

freistartempts

teg,you-ACC,

táthen

slíttear

taðit

út.out‘if your right eye tempts you, then tear it out.’

b. Hannhe

neytenjoyed

gottgood-ACC

avof

hennaraher

strevi.hard.work

‘He benefited from her hard work.’

As shown in (25), these verbs take genitive objects in Modern Icelandic, as wasalso the case in Old Icelandic:

(25) Icelandic

a. efif

hægraright

augaeye

þittyour

freistartempts

þín,you-GEN,

þáthen

slíttear

þaðit

út.out‘if your right eye tempts you, then tear it out.’

b. Hannhe

nautenjoyed

góðsgood-GEN

afof

hennarher

striti.hard.work

‘He benefited from her hard work.’

However, there are a few examples of genitive objects preserved in older Faro-ese, as evidenced in the ballads (cf. Thráinsson et al. (2004: 431, ex. (120)).The relevant verbs are all monotransitives, e.g. goyma ‘watch (over)’,16 hevna‘avenge’, vitja ‘visit’, vænta ‘expect’ and bíða‘ wait for’.

(26) Older Faroese

a. tannhe

iðwho

durannathe.doors-GEN

goymir.watches

‘he who watches the door.’

b. hevnaavenge

mín.me-GEN

c. hennarher-GEN

reiðrode

atto

vitja.visit

‘rode to visit her.’

16. Note that in Old Icelandic, the verb geyma (corresponding to Faroese goyma) alsooccurs with the accusative in the meaning ‘keep’. In Modern Icelandic geyma onlymeans ‘keep’ and only takes an accusative object.


d. afturback

skaltshall

túyou-SG

væntaexpect

mín.me-GEN

‘you shall expect me (to come) back.’

e. kirkjumaðurchurchman

bíðarwaits

tín.you-GEN

‘the church man waits for you.’

Already in the ballads and 19th century texts there are also examples of theinnovative accusative with some of these verbs, as in (27) (cf. Thráinsson et al.2004: 431, ex. (121)).17

(27) Older Faroese

a. hevnaavenge

taploss-ACC

faðirfather

síns.his

‘avenge the loss of his father.’

b. komcame

pápinthe.father

atto

vitjavisit

hana.her-ACC

‘the father came to visit her.’

c. hannhe

væntarexpects

ringtbad-ACC

veðurweather-ACC

seinnapartin.afternoon

‘He expects bad weather in the afternoon.’

There are no known examples of genitive objects of ditransitives in the ballads.Apparently, these had already been replaced by accusative objects or PP com-plements by the time of their composition, as in the examples in (28) involvingbiðja ‘ask’ and krevja ‘demand’ (cf. Thráinsson et al. 2004: 433, ex. (125a,125c–d)).

(28) Older Faroese

a. EgI

baðasked

hannhim-ACC

einaa-ACC

bøn.favor-ACC

‘I asked asked him a favor.’

b. Teirthey

kravdudemanded

hannhim-ACC

eftirafter

lyklinumthe.key-DAT

tilto

húsið.the.house‘They demanded the key to the house from him.’

17. Interestingly, two verbs, bíða ‘wait for’ and goyma ‘watch’, which today governaccusative, could earlier also take dative (cf. Thráinsson et al. 2004: 431). This indi-cates that, with these two verbs, genitive was first replaced by dative case, and onlylater by accusative.


c. Teirthey

kravdudemanded

lykilinthe.key-ACC

tilto

húsiðhouse

fráfrom

honum.him

‘They demanded the key to the house from him.’

The question arises why genitive objects were lost earlier with monotransi-tives than ditransitives. Presumably, genitive objects were preserved longerwith monotransitives than ditransitives simply because the former had a highertoken frequency.18 As a result, there would have been less evidence for the lan-guage learner of genitive case with ditransitive verbs, which would have madeit more difficult to preserve this type of genitive from one generation to thenext. Moreover, it can also be seen in Modern Icelandic that genitive objectsare better preserved with monotransitives than with ditransitives. In particular,the replacement of genitive objects by PPs is very common, and is attested al-ready in Old Icelandic as well (see section 3).

5.2. Accusative subjects

Around fifty verbs with oblique subjects are documented in Faroese sources, allof them involving experiencers, whereas no verbs taking oblique theme/patientsubjects are attested (cf. Petersen 2002, Thráinsson et al. 2004). However, mostof the relevant verbs have fallen into disuse, occurring only in fixed expressionsthat have a literary or an archaic flavor. Therefore, the token frequency of theoblique subject verbs in current spoken Faroese is a lot lower than the abovefigure indicates.

There is a strong tendency in Faroese to substitute nominative case for ob-lique case with subjects (Nominative Substitution). Thus, for example, the orig-inal accusative case with droyma ‘dream’ (29a) has been virtually eliminatedin favor of nominative case (29b):

(29) Faroese

a. Megme-ACC

droymdidreamt-3.SG

eina

sáranbad

dreym.dream-ACC

b. EgI-NOM

droymdidreamt-1.SG

eina

sáranbad

dreym.dream-ACC

‘I had a bad dream.’

18. See Bybee (1994) for the relevance of lexical and categorial token frequency in in-flectional morphology. Thus, analogical leveling has been observed to affect the lessfrequent lexical items first while the more frequent ones persist longer.


Verbs taking dative subjects in Faroese include verbs originally taking sub-jects in the accusative that was replaced by dative (Dative Subsitution), e.g.lysta ‘want’ in (30) (cf. Barnes 1986, Petersen 2002, Eythórsson and Jónsson2003).

(30) Faroese

a. Megme-ACC

lystirwants

atto

vita.know

b. Mærme-DAT

lystirwants

atto

vita.know

‘I want to know.’

There are very few speakers of Modern Faroese who use accusative as a possiblesubject case with experiencer verbs (Eythórsson and Jónsson 2003, Jónsson andEythórsson 2005). However, there is evidence that it was productive to someextent in earlier Faroese. This evidence involves a few verbs that are likely to benew creations in Faroese: hugbíta (eftir) ‘long for’, nøtra ‘shudder’, skríða (ífeginsbrúgv, ófeginsbrúgv) ‘tickle (in the left/right eybrow), i.e. ‘expect (some-thing good/bad)’, and minnast ‘remember’. The following examples are fromThráinsson et al. (2004: 253):

(31) Faroese

a. Megme-ACC

nøtrarshudders

íin

holdið.the.flesh

‘I shudder.’

b. Megme-ACC

skríðurtickles

íin

feginsbrúgv.left.eyebrow

‘I expect something good.’

These verbs either did not exist, or did not take an oblique subject, in Old Ice-landic, and the same is true of Modern Icelandic. Particularly telling in thisrespect is the verb minnast, an -st-verb which has replaced the active minna‘remember’ (with accusative subject). Old and Modern Icelandic -st-verbs areincompatible with accusative subjects so the occurrence of this verb with an ac-cusative must be a Faroese innovation. In any case, the existence of verbs takingaccusative experiencer subjects in Faroese that do not have a counterpart in Oldand Modern Icelandic indicates the partial productivity of such verbs at an ear-lier stage of the language. This is compatible with the hypothesis (cf. 4.1 above)that verbs taking accusative subjects form a coherent semantic class whereasverbs taking genitive objects do not.


The fact that oblique experiencer subjects were preserved longer than theme/patient subjects in Faroese is likely to be due to the higher token frequency ofthe former, thus making them easier for children to acquire during the acquisi-tion period. For example, the verbs that originally took accusative experiencersubjects include some very common ones (e.g. droyma ‘dream’, minnast ‘re-member’), whereas the verbs that may be assumed to have taken accusativetheme subjects in earlier Faroese (e.g. reka ‘drift’, taka út ‘take out’) appear tobe infrequent in the spoken language (cf. Thráinsson et al. 2004: 276–277).

6. Conclusion

As we have amply illustrated in this paper, there are good reasons for distin-guishing between two kinds of exceptions to general patterns of argument real-ization: what we have termed structured exceptions, which involve clustering oflexical items on the basis of shared properties, and arbitrary exceptions, whichinvolve an arbitrary list of lexical items. Structured exceptions display partialproductivity and can be extended to new items, whereas arbitrary exceptionsare totally unproductive. We have argued that the diachronic development ofcase selection in Insular Scandinavian (Icelandic and Faroese) provides strongsupport for this dichotomy. The discussion has focused on two cases of ex-ceptional case selection: accusative subjects and genitive objects. We showedthat accusative experiencer subjects have been semi-productive in the historyof Insular Scandinavian, whereas genitive objects have been completely unpro-ductive.

Appendix

In these lists we have left out all verbs that only occur with the relevant case inidiomatic expressions such as nema staðar ‘stop’ (literally ‘hold place-GEN’).We have also omitted verbs that are listed in dictionaries of Old Icelandic butonly attested in Norwegian texts.

(A) Verbs with genitive objects in Old Icelandicafla ‘acquire’, árna ‘wish’, batna ‘recover from’, beiða ‘request’,beiðast ‘request’, biðja ‘ask’, bindast ‘refrain from’, bíða ‘wait for’,blinda ‘make blind to’, bæta ‘improve’, bætast ‘recover from’, dirfast‘dare’, dylja ‘hide, deny’, efast/ifast ‘change one’s mind about’, eggja‘incite’, endurminnast ‘remember (again)’, fá ‘suffer’, firna ‘blame for’,forvitnast ‘enquire’, foryflast ‘refrain from’, fregna ‘ask’, freista ‘tempt,


try’, frétta ‘hear, ask about’, frýja ‘challenge, question’, fylla ‘becomefull of (water), fill with’, fyllast ‘become full of’, fyrirkunna ‘blame for’,fyrirmuna ‘envy’, fýsa ‘want; incite’, fýsast ‘want’, gá ‘pay attentionto, beware of’, geta ‘guess, solve’, geta ‘mention’, geyma ‘pay atten-tion to, take care of, keep’, girna ‘desire’, girnast ‘desire’, gjalda ‘payfor’, gleyma ‘forget’, gæta ‘guard, take care of’, gætast ‘mention’, hafa(ekki) ‘miss’, hefna ‘revenge for’, hefnast ‘suffer revenge for’, heitast‘threaten’, hræðast ‘fear’, hræra ‘set in motion’, hvetja ‘encourage, in-cite’, iðrast ‘regret’, kenna ‘feel, touch’, klifa ‘repeat’, kosta ‘try, pay’,krefja ‘demand’, kunna ‘be angry for’, kveðja ‘demand, request’, leita‘search for’, letja ‘discourage’, létta ‘recover from’, ljá ‘give, get’,meta‘value’, minna ‘remember vaguely, remind of’, minnast ‘remember,visit’, missa ‘miss, lose, be without’, neita ‘refuse’, neyta ‘make useof’, njóta ‘profit from, enjoy’, orka ‘effect, cause’, orkast ‘get, obtain’,óminnast/úminnast ‘to be unmindful of, neglect’, órvilnast/örvilnast‘despair of’, órvænta/örvænta/örvætta ‘despair, lose hope’, órvæntast‘despair, lose hope’, reka ‘avenge for’, saka ‘accuse of’, sakna ‘miss’,skammast ‘be ashamed of’, spyrja ‘ask about’, sverja ‘swear’, svífast‘refrain from’, synja ‘acquit of’, sýsla ‘do, get’, unna ‘grant’, vangeyma‘neglect’, vara ‘expect’, varleita ‘search insufficiently for’, varna ‘pre-vent’, vá ‘blame for’, vána ‘expect, hope for’, villast ‘get lost’, vilna‘expect, hope’, vilnast ‘expect, hope for’, virða ‘value’, vita ‘signal;know’, vitja ‘visit, go to’, væna ‘give hope of’, vænast ‘hope for’,vænta/vætta ‘expect’,æskja ‘wish’, þarfa ‘need’, þarfnast ‘need’, þegja‘refrain from saying’, þræta ‘deny’, þurfa ‘need’, æsa ‘incite’, æskja‘wish’, æsta ‘ask for, demand’

(B) Verbs with genitive objects in Modern Icelandicafla ‘acquire’, árna ‘wish’, biðja ‘ask’, bíða ‘wait for’, dirfast ‘dare’,freista ‘tempt, try’, frýja ‘challenge, question’, geta ‘mention’, gjalda‘pay for’, gæta ‘guard, take care of’, hefna ‘revenge for’, iðra ‘regret’,iðrast ‘repent, regret’, kenna ‘feel’, krefja ‘demand’, krefjast ‘claim’,leita ‘search for’, meta ‘value’, minnast ‘remember, visit’, neyta ‘makeuse of’, njóta ‘profit from, enjoy’, óska ‘wish’, sakna ‘miss’, spyrja ‘askabout’, synja ‘acquit of’, unna ‘grant’, varna ‘prevent’, virða ‘value’,vitja ‘visit, go to’, vænta ‘expect’, æskja ‘wish’, þarfnast ‘need’, þurfa‘need’

(C) Verbs with accusative subjects in Old Icelandicangra ‘grieve’, ánægja ‘become happy’, belgja ‘blow out’, bera‘carry’, bila ‘fail’, birta ‘become clear’, bíða ‘exist’, blása ‘swell,


blow’, bresta ‘run out of’, brjóta ‘break’, brydda ‘arise’, byrja ‘be-gin; be required to’, daga uppi ‘dawn up (turn to stone)’, deila ‘di-vide’, draga ‘pull’, dreyma ‘dream’, drepa ‘be knocked down’, dökkva‘darken’, enda ‘finish’, endurnýja ‘renew’, fá ‘get’, fenna ‘be cov-ered with snow’, festa ‘fasten’, fjara ‘ebb’, fjölga ‘increase’, forvitna‘be curious’, frjósa ‘freeze’, fylla ‘fill’, fýsa ‘want’, gera ‘do, make’,geta ‘exist’, girna ‘desire’, greina á um ‘disagree on’, gruna ‘suspect’,grynna ‘become shallow’, hafa út ‘blow through’, harma ‘vex’, hefja‘raise; begin’, heimta ‘want’, henda ‘happen, concern’, herða ‘becomehard’, heyra ‘be heard’, hindra ‘be hindered’, hryggja ‘grieve’, hungra‘feel hungry’, iðra ‘regret’, kala ‘suffer frostbites’, kefja ‘sink’, keyra‘drive’, knýta/hnýta ‘become crooked’, kosta ‘cost’, kreppa ‘becomecrippled’, kvelda ‘become evening’, kyrra ‘calm’, langa ‘want’, leggja‘become covered with ice’, leiða af ‘follow from’, lengja ‘lengthen’,leysa ‘be dissolved’, líða ‘come to end’, lysta ‘desire, want’, lýsa‘shine’, lægja ‘lower’, minna ‘remember vaguely’, minnka ‘decrease’,misminna ‘remember wrongly’, muna ‘want’, nauðsynja ‘necessitate’,nátta ‘be overtaken by the night’, opna ‘open’, ógleðja ‘become un-happy’, ónýta ‘become unusable’, ómætta ‘lose strength’, ótta ‘fear’,reiða ‘move about’, reka ‘drift’, rifna ‘tear’, rjúfa ‘split’, ryðja ‘dis-perse’, rýma ‘become wider’, ræsa ‘come to pass’, saka ‘hurt’, setja‘become’, sjá ‘be seen’, skaða ‘hurt’, skara ‘protude’, skemma ‘becomeshort’, skera ‘cut’, skilja ‘divide’, skipta ‘concern’, skorta ‘lack’, slíta‘be cut’, slægja ‘be tempted’, stemma ‘be obstructed’, stinga ‘sting’,stækka ‘become bigger’, stæra ‘swell’, sundla ‘become dizzy’, svimra‘become dizzy’, syfja ‘become sleepy’, sýna ‘be seen’, sækja ‘be af-fected by’, taka ‘be taken’, tíða ‘want’, tíma ‘happen to’, trega ‘re-gret’, ugga ‘fear’, undra ‘wonder’, vanta ‘lack, need’, vara ‘expect’,varða ‘concern’, vatna ‘disappear in water’, vekja upp ‘awaken’, vel-gja ‘feel nausea’, velkja ‘toss’, verkja/virkja ‘ache’, verpa ‘be thrown’,vilna ‘hope’, víkja ‘be moved to one side’, vægja ‘suppurate’, væna ‘ex-pect’, vænta/vætta ‘expect’, þrjóta ‘run out of’, þrota ‘lack’, þröngva‘force’, þynna ‘become thin’, þyrsta ‘feel thirsty’, æsa ‘be stirred’,öfunda ‘envy’


References

Primary sources

Fornmanna sögur. Copenhagen 1825–1835.Heimskringla (Ólafs saga helga)

1991 Bergljót Kristjánsdóttir, Bragi Halldórsson, Jón Torfason and Örnól-fur Thorsson (eds.). Reykjavík: Mál og menning.

Píslarsaga séra Jóns Magnússonar2001 Matthías Viðar Sæmundsson sá um útgáfuna. Reykjavík: Mál og men-

ning.

Íslendinga sögur (Bandamanna saga, Brennu-Njáls saga, Egils saga, Eiríks saga rauða,Eyrbyggja saga, Finnboga saga ramma, Grettis saga, Svarfdæla saga,Víga-Glúms saga)

1985/86 Bragi Halldórsson, Jón Torfason, Sverrir Tómasson and ÖrnólfurThorsson (eds.). Reykjavík: Svart á hvítu.

Sturlunga saga1988 Bergljót Kristjánsdóttir, Bragi Halldórsson, Gísli Sigurðsson, Guðrún

Ása Grímsdóttir, Guðrún Ingólfsdóttir, Jón Torfason, Sverrir Tómas-son and Örnólfur Thorsson (eds.). Reykjavík: Svart á hvítu.

ROH = Ritmálssafn Orðabókar Háskóla Íslands [Corpus of Written Icelandic of theUniversity of Iceland Dictionary Project], see: http://www.lexis.hi.is (Ritmálssafn).

Secondary sources

Allen, Cynthia L.1995 Case Marking and Reanalysis. Grammatical Relations from Old to

Early Modern English. Oxford: Oxford University Press.

Barðdal, Jóhanna2001 Case in Icelandic – A Synchronic, Diachronic and Comparative Ap-

proach [Doctoral dissertation]. Lund: Department of ScandinavianLanguages.

Barnes, Michael1986 Subject, Nominative and Oblique Case in Faroese. Scripta Islandica

37: 13–46.

Björgvinsdóttir, Ragnheiður2003 Frumlagsfall í máli barna [Subject Case in Child Language]. B.A. the-

sis, University of Iceland, Reykjavík

Bybee, Joan1994 Morphological Universals and Change. In The Encyclopedia of Lan-

guage and Linguistics 5, R. E. Asher (ed.), 2557–2562. Oxford: Perg-amon Press.


Bybee, Joan, and Dan I. Slobin1982 Rules and Schemas in the Development and Use of the English Past

Tense. Language 58: 265–289.

Bybee, Joan, and Carol Lynn Moder1983 Morphological Classes as Natural Categories. Language 59: 251–

270.

Delsing, Lars-Olof1991 Om genitivens utveckling i fornsvenskan [On the Development of the

Genitive in Old Swedish]. In Studier i svensk språkhistoria 2, Sven-Göran Malmgren and Bo Ralph (eds.), 12–30. Göteborg: Acta Uni-versitatis Gothoburgensis.

Donhauser, Karin1998 Das Genitivproblem und (k)ein Ende? Anmerkungen zur aktuellen

Diskussion um die Ursachen des Genitivschwundes im Deutschen.In Historische germanische und deutsche Syntax, John Ole Askedal(ed.), 69–86. Frankfurt am Main: Lang.

Eythórsson, Thórhallur2002 Changes in Subject Case Marking in Icelandic. In Syntactic Effects

of Morphological Change, David Lightfoot (ed.), 196–212. Oxford:Oxford University Press.

Eythórsson, Thórhallur, and Jóhannes Gísli Jónsson2003 The Case of Subject in Faroese.Working Papers in Scandinavian Syn-

tax 72: 207–232.

Goldberg, Adele E.1995 Constructions: A Construction Grammar Approach to Argument

Structure. Chicago: University of Chicago Press.

Jespersen, Otto1942 A Modern English Grammar on Historical Principles, IV: Morphol-

ogy. Copenhagen: Munksgaard.

Jónsson, Jóhannes Gísli1996 Clausal Architecture and Case in Icelandic. Doctoral dissertation,

University of Massachusetts, Amherst.

Jónsson, Jóhannes Gísli1997/98 Sagnir með aukafallsfrumlagi [Verbs Taking Oblique Subject]. Íslen-

skt mál og almenn málfræði 19–20: 11–43.

Jónsson, Jóhannes Gísli2000 Case and Double Objects in Icelandic. In Leeds Working Papers in

Linguistics and Phonetics, Diane Nelson and Paul Foulkes (eds.), 71–94. (Also available at http://www.leeds.ac.uk/linguistics/index1.htm)


Jónsson, Jóhannes Gísli2003 Not so Quirky: On Subject Case in Icelandic. In New Perspectives on

Case Theory, Ellen Brandner and Heike Zinsmeister (eds.), 127–164.Stanford, California: CSLI.

Jónsson, Jóhannes Gísli, and Thórhallur Eythórsson2003 Breytingar á frumlagsfalli í íslensku [Changes in Subject Case in Ice-

landic]. Íslenskt mál og almenn málfræði 25: 7–40.

Jónsson, Jóhannes Gísli, and Thórhallur Eythórsson2005 Variation in Subject Case Marking in Insular Scandinavian. Nordic

Journal of Linguistics 28: 223–245.

Maling, Joan2002 Það rignir þágufalli á Íslandi. Verbs with Dative Objects in Icelandic.

Íslenskt mál og almenn málfræði 24: 31–105.

Nübling, Damaristhis vol. How do exceptions arise? On different paths to morphological irreg-

ularity.

Nygaard, Marius1906 Norrøn syntax. Oslo: Aschehoug.

Óskarsson, Veturliði1997/98 Ske. Íslenskt mál og almenn málfræði 19–20: 181–207.

Pinker, Steven1999 Words and Rules. The Ingredients of Language. New York: Basic

books.

Petersen, Hjalmar P.2002 Quirky Case in Faroese. Fróðskaparrit 50: 63–76.

Poulsen, Jóhan Hendrik W., Marjun Simonsen, Jógvan í Lon Jacobsen, Anfinnur Jo-hansen, and Zakaris Svabo Hansen (eds.)

1998 Føroysk orðabók [A Dictionary of Faroese]. Tórshavn: Føroya Fróð-skaparfelag & Fróðskaparsetur Føroya.

Sigurðardóttir, Herdís Þ.2002 Fallmörkun í barnamáli: Hvernig læra íslensk börn að nota föll? [Case

Marking in Child Language: How do Icelandic Children Learn to UseCases?] M.A. thesis, University of Iceland, Reykjavík.

Sigurðsson, Halldór Ármann Sigurðsson1989 Verbal Syntax and Case in Icelandic. Doctoral dissertation, Lund Uni-

versity.

Svenonius, Peter2002 Icelandic Case and the Structure of Events. Journal of Comparative

Germanic Linguistics 5: 197–225.


Thráinsson, Höskuldur, Hjalmar P. Petersen, Jógvan í Lon Jacobsen, and Zakaris S.Hansen

2004 Faroese: An Overview and Reference Grammar. Tórshavn: FøroyaFróðskaparfelag.

Viðarsson, Heimir Freyr2006 Breytilegt frumlagsfall í forníslensku: athugun á breytileika í fallmör-

kun skynjandafrumlaga [Variable Subject Case in Old Icelandic: aStudy of Variation in the Case Marking of Experiencer Subjects].B.A. thesis, University of Iceland, Reykjavík.

Yip, Moira, Joan Maling, and Ray Jackendoff1987 Case in Tiers. Language 63: 217–250.

Zaenen, Annie, Joan Maling, and Höskuldur Thráinsson1985 Case and Grammatical Functions: The Icelandic Passive. Natural

Language and Linguistic Theory 3: 441–483.

Zaenen, Annie, and Joan Maling1990 Unaccusative, Passive and Quirky Case. In Modern Icelandic Syn-

tax, Joan Maling and Annie Zaenen (eds.), 137–152. San Diego: Aca-demic Press.

Remarks on two kinds of exceptions:arbitrary vs. structured exceptions

Susann Fischer

Jóhannes Gísli Jónsson and Thórhallur Eythórsson (this volume) argue for adichotomy between structured and arbitrary exceptions with respect to case se-lection in Insular Scandinavian. Their main claim is that structured exceptionsshare semantic properties and display a partial productivity which enables themto attract other lexical items to this group by analogy. Arbitrary exceptions onthe other hand don’t share semantic properties and are unproductive. Jónssonand Eythórsson draw their arguments from the diachronic development of caseselection in Icelandic and Faroese and show that experiencer accusative subjectshave been semi-productive in the history of Insular Scandinavian whereas gen-itive objects have been totally unproductive. It has been suggested that threekinds of case with arguments have to be recognized in Icelandic (Yip, Mal-ing and Jackendoff 1987, Jónsson 2000 etc.). The theoretical consequence thatseems to follow Jónsson and Eythórsson’s argument is to divide idiosyncraticcase again into structured and totally arbitrary idiosyncratic case.

The dichotomy between structured and arbitrary exceptions seems well mo-tivated when argued on the basis of productivity. It is an interesting and new ob-servation, and the explanation given seems reasonable. Nevertheless, it wouldbe interesting to see in what way the semi-productivity of accusative subjectsin Icelandic has to do with the fact that these oblique subjects are syntacticsubjects and not only logical subjects like e.g. in Modern German. In Mod-ern German we find the same groups of accusative experiencer subjects, e.g.verbs of physical discomfort (i), verbs denoting feelings (ii), and verbs of cog-nition (iii):

(i) mich dürstet ‘be thirsty’, mich friert ‘be cold’, mich hungert ‘be hun-gry’,mich schmerzt ‘feel pain’, mich schauert /mich schaudert ‘to trem-ble’;

(ii) mich wundert ‘be surprised’, mich fürchtet ‘be afraid’, mich erstaunt‘be surprised’, mich gelüstet ‘have a craving for’, mich erheitert / michbelustigt ‘be amused’, mich verlangt ‘to long for’, mich erfreut ‘be

244 Susann Fischer

happy’,mich langweilt ‘be bored’,mich erbost ‘be furious’,mich ärgert‘be angry’,mich erzürnt ‘be infuriated’,mich deprimiert ‘be depressed’;

(iii) mich dünkt / mich deucht ‘me thinks’, mich überrascht ‘be surprised’,mich wundert / mich verwundert ‘be astonished’, mich verblüfft ‘beamazed’.

However, these verbs – even though they obviously share semantic properties –do not attract new members to this class. On the contrary they lose ground1 andwe find more and more verbs that not only allow for the original accusative, butalso for dative or nominative case within one and the same speaker:

(1) mich schaudert (2) mich verlangtmir schaudert mir verlangtich schaudere ich verlange

It seems as if next to the similar semantic properties of accusative subjectssomething else were at stake? Most of the old Germanic languages display ac-cusative experiencer subjects. Most of the modern Germanic languages have ei-ther lost experiencer subjects altogether, e.g. English, or still display accusativeexperiencer subjects, e.g. German, however they are no longer syntactic sub-jects but only logical subjects. As has been argued by Fischer (2004) and alsoHrafnbjargarson (2004) these non-nominative subjects in Old Germanic, Mod-ern Icelandic and Modern Faroese seem to make use of additional functionalmaterial in the left periphery. Under this view the loss of accusative experiencersubjects in e.g. English and the change from syntactic subjects to logical sub-jects in German is explained by the loss of this additional functional category.It seems plausible to assume that the difference between e.g. German and Ice-landic/Faroese accusative experiencer verbs where Icelandic and Faroese verbsare productive and German verbs are unproductive even though they share se-mantic properties might depend on the difference with respect to phrase struc-ture. It seems that only as long as accusative experiencers are real syntacticsubjects they attract new members to their class, see also Dative Sickness inIcelandic and Faroese (Eythórsson 2002), and when they lose the capacity toappear as syntactic subjects they also lose ground in the respective languages.

Another point I want to mention regards their argument with respect to ar-bitrary exceptions, i.e., verbs selecting genitive case.

Jónsson and Eythórsson show that genitive objects have seen a steady de-cline in the history of Icelandic, from formerly 100 verbs selecting genitive to

1. More and more speakers of Modern German avoid using oblique experiencer sub-jects and choose instead a verb with a nominative subject.

Remarks on two kinds of exceptions: arbitrary vs. structured exceptions 245

now only 35 verbs. According to their argumentation this is due to the fact thatthese verbs do not form any semantically coherent subclass, instead they selectidiosyncratic/exceptional case that has to be learnt on an item-to-item basis andis therefore highly susceptible to change. I will not enter the discussion here thatobviously generations of Icelanders were able to correctly acquire verbs select-ing genitive case before they lost this capacity. Instead I would like to pointout that the loss of genitive objects might be connected to other changes thathave been going on in the history of Icelandic and Faroese and therefore mightnot have anything to do with the fact that genitive objects represent arbitraryexceptions to case selection.

Verbs selecting genitive objects represent a crosslinguistic phenomenon. Toname only a few modern languages that allow for genitive objects, i.e. that showmorphological case distinction between accusative vs. genitive/partitive in theobject domain: e.g. Russian, Polish, Serbo-Croatian, Finnish, Estonian, Turk-ish etc. It has long been noticed that there seems to be a connection betweencase-alternation on objects and the fact that these languages do not have arti-cles (King 1995, Neidle 1988 among many others), and that additionally in mostof these languages there is some interaction between case-morphology, refer-ence and aspect (Kiparsky 1998, de Hoop 1992, Leiss 2000). See below (1)the semantic contrast in Russian with respect to reference and the interactionwith aspect.

(1) a. JaI

dobaviladded.perf

saxarsugar.acc

vin

èajtea

‘I added the sugar to the tea.’

a.’ Ja dobavil saxarasugar.gen

v èaj

‘I added some of the sugar to the tea.’

b. JaI

dobavljaladded.imp

saxarsugar.acc

//

*saxara2

-genvin

èajtea

For other languages again it was shown that the old stratum possessed genitiveobjects whereas the modern languages have lost this possibility altogether orat least in spoken language – Old English, Old Swedish, Old High German,Middle German etc. The loss of genitive objects in these languages has been ofparamount interest within historical linguistics and has resulted in an abundance

2. The English progressive is a rather new development that only started during EarlyModern English.

246 Susann Fischer

of speculations about what exactly triggered this change. We find approachesexplaining the loss of genitive morphology by the phonological reduction of endsyllables (e.g. Behaghel 1923). These approaches don’t differentiate betweenthe loss of genitive inflected objects and the general loss of case morphologyon NPs. Others connect the loss of genitive objects to a change in the conceptionof the world, i.e., to the “Verkümmerung partitiver Denkformen” [degenerationof partitive forms/ways of thinking] (Wolff 1954). Additionally, it has beenconvincingly argued – on the basis of the modern Germanic languages that stilluse the accusative genitive distinction on objects - that there is an interactionbetween the loss of case and aspect morphology and the availability of articles.In other words, case morphology is used to express reference and to a certaindegree interacts with aspect, if no article system is available (Abraham 1997,Leiss 2000, Fischer 2005).

Let us go back in time: Proto-Germanic had a highly developed verbal as-pect system and case morphology but no articles. During the development ofthe Germanic languages they lost more (English, Dutch etc.) or less (Icelandic,German etc.) their case morphology, their verbal aspect morphology,3 but de-veloped definite articles (all) and indefinite articles (all but Icelandic).

With respect to German it has been argued that the loss of genitive caseon objects and the weakening of the aspectual morphology interacted with theemergence of the articles (Donhauser 1998, Abraham 1997, Leiss 2000). Don-hauser (1992) even proposed to see genitive on objects as a structural casebecause it only alternates with accusatives and only ever appears in the di-rect object position. Abraham (1997) observes that the [+/-def] interpretationof the object NP in Old High German was the result of the interplay betweenaspectual and case morphology similar to the modern Russian system (see alsoFischer 2005). A genitive NP didn’t combine with an imperfective verb; in thescope of a perfective verb it always received a [-def] reading. An accusativemarked NP however, could receive a [+def] reading in the scope of a perfec-tive verb and a [+/-def] reading in the scope of an imperfective verb. Only ac-cusatives combined with both perfective and imperfective verbs and could re-ceive a [+def] interpretation and a [-def] interpretation. After aspectual markingdisappeared, genitive lost its status as being opposed to the accusative markedobjects, and as a result the verbally governed genitive case disappeared, i.e. thedefinite/indefinite reading of the object NP could no longer be obtained throughthe interplay between case opposition and aspectual conditions. The interplay

3. Genitive case is usually excluded with imperfective verbs. Some verbs do occur withimperfective morphology and genitive case on the object; these verbs however getan iterative interpretation (Fischer 2003).


weakened and finally disappeared completely; in its place, the determiner cate-gory was lexically filled, first with a definite and later with the indefinite article(Abraham 1997: 59).

With respect to Icelandic maybe a similar development took place. Accord-ing to Leiss (2000) Proto-Nordic encoded definiteness only in indefinite con-texts (i.e., in rheme position) by the alternations of SVO to SOV. Additionally,we know that quite a lot of verbs in Proto-Nordic allowed the alternate use ofaccusative case next to genitive case in the position of the direct object. In OldIcelandic word-order gets fixed towards V2 and from the 7th century onwardspreverbal aspect marker started to disappear. Since the verb had to appear insecond position, definiteness in rheme positions could no longer be encoded byword-order alternations – it needs to be encoded now by the use of definite ar-ticles (according to Leiss 2000). However, the alternating use of accusative vs.genitive is still available in Old Icelandic (Nygaard 1906). Modern Icelandicdoes not use preverbal markers in order to denote aspect, and it still doesn’t usean indefinite article but it still allows for some verbs to appear with a genitiveobject. So it seems possible that the loss of aspectual morphology and the emer-gence of the definite article somehow triggered the loss of genitive objects inmost verbs. This seems especially plausible since we do know that the use ofgenitive in those languages that do not have article systems is used in order todenote indefiniteness with respect to the object NP (Neidle 1988 among manyothers) and also interacts with aspectual morphology, or in some languages evendenotes verbal aspectual differences (e.g. Finnish, cf. de Hoop 1992, Kiparsky1998).

Of course it is impossible without a thorough investigation of the Old Ice-landic data to argue that the loss of genitive objects is definitely connected tothe loss of aspectual morphology and to the emergence of articles. However,the previous discussion is meant to at least cast some doubts on the claim thatgenitive objects get lost only because they represent arbitrary exceptions.

References

Abraham, Werner1997 The interdependence of case, aspect and referentiality in the history

of German: the case of the verbal genitive. In Parameters of Mor-phosyntactic Change, Ans van Kemenade and Nigel Vincent (eds.),29–61. Cambridge: Cambridge University Press.

Behaghel, Otto1923 Deutsche Syntax. Eine geschichtliche Darstellung. Bd. I. Heidelberg:

Carl Winter’s Universitätsbuchhandlung.

248 Susann Fischer

de Hoop, Helen1992 Case Configuration and NP Interpretation. Doctoral dissertation, Ri-

jksuniversiteit Groningen. – Published New York: Garland 1996.

Donhauser, Karin1992 Das Genitivproblem in der historischen Kasusforschung. Ein Beitrag

zur Diachronie des deutschen Kasussystems. Habilitationsschrift Pas-sau.

Donhauser, Karin1998 Das Genitivproblem und (k)ein Ende? Anmerkungen zur aktuellen

Diskussion um die Ursachen des Genitivschwundes im Deutschen.In Historische germanische und deutsche Syntax, John Ole Askedal(ed.), 69–86. Bern: Lang.

Eythórsson, Thórhallur2002 Changes in subject case marking in Icelandic. In Syntactic. Effects

of Morphological Change, David Lightfoot (ed.), 196–212. Oxford:Oxford University Press.

Fischer, Susann2003 Partitive vs. Genitive in Russian and Polish: an empirical study on

case alternation in the object domain. In: Experimental Studies I (Lin-guistics in Potsdam 21), Susann Fischer, Ruben van de Vijver andRalf Vogel (eds.), 73–89. Potsdam: Universität Potsdam.

Fischer, Susann2004 The diachronic relationship between quirky subjects and stylistic

fronting. In Non-Nominative Subjects, Vol. 1, Karumuri Venkata Sub-barao and Peri Bhaskarao (eds.), 193–212. Amsterdam/Philadelphia:Benjamins.

Fischer, Susann2005 The interplay of reference and aspect. In Specificity and the Evo-

lution/Emergence of Nominal Determinations Systems in Romance,(Konstanzer Arbeitspapiere zur Sprachwissenschaft 119), ElisabethStark, Klaus von Heusinger and Georg Kaiser (eds.), 1–18. Univer-sität Konstanz.

Hrafnbjargarson, Gunnar Hrafn2004 Oblique subjects and stylistic fronting in the history of Scandinavian

and English. PhD. diss. Aarhus Universitet.

King, Tracy Holloway1995 Configuring Topic and Focus in Russian (Dissertations in Linguis-

tics). Stanford, CA: Center for the Study of Language and Informa-tion.


Kiparsky, Paul1998 Partitive Case and Aspect. In The Projection of Arguments, Miriam

Butt and Wolfgang Geuder (eds.), 265–307. Stanford, CA: CSLI Pub-lications.

Leiss, Elisabeth2000 Artikel und Aspekt. Die grammatischen Muster von Definitheit.

Berlin/New York: de Gruyter.

Meyer-Lübke, Wilhelm1888 Die Lateinische Sprache in den romanischen Ländern. In Grund-

riß der Romanischen Philologie, Gustav Gröber (ed.), 351–382.Straßburg: Trübner.

Neidle, Carol1988 The Role of Case in Russian Syntax. Dordrecht: Kluwer.

Nygaard, M.1906 Norrøn syntax. Oslo: Aschehaug.

Wolff, Ludwig1954 Über den Rückgang des Genitivs und die Verkümmerung der parti-

tiven Denkformen. (Helsinki Annales Academiae Scientiarum Fenni-cae. Series B). Helsinki.

Response to Susann Fischer

Jóhannes Gísli Jónsson and Thórhallur Eythórsson

In our paper we focus on two instances of idiosyncratic case in Insular Scandi-navian (Icelandic and Faroese): (i) genitive case with objects and (ii) accusativecase with subjects. We argue that there is a difference between these two typesin terms of their productivity. Thus, while genitive objects have been totallyunproductive in the recorded history of Icelandic, accusative subjects have dis-played some productivity in the same period. The development has gone evenfurther in Faroese in that genitive case has been completely lost as an objectcase, but accusative can still be found with subjects to a very limited degree.On the basis of the observed difference between accusative subjects and gen-itive objects we argue for a dichotomy of structured and arbitrary exceptions.Structured exceptions share semantic properties and display partial productiv-ity which enables them to attract new lexical items into their group. Arbitraryexceptions, on the other hand, do not share any semantic properties and areentirely unproductive.

In her remarks on our paper, Fischer grants that the dichotomy betweenstructured and arbitrary exceptions is well motivated. Nevertheless, on the ba-sis of comparative evidence from German, she claims that some other factorsin the historical development of case should be considered. Thus, the loss ofgenitive case with objects in German may have interacted with a “weakening”of the aspectual morphology of the verb and the emergence of the articles. Fis-cher suggests that a similar development may have taken place in Icelandic. Infact, however, no such development occurred in Icelandic in the period underinvestigation in our paper (i.e. from the 13th century to the present day). Moregenerally, we are not aware of any morphosyntactic changes in the relevantperiod that could have contributed to the decline of genitive objects.

The other major point made by Fischer is that it would be interesting to seehow the semi-productivity of accusative experiencers in Icelandic is connectedto the fact that they are syntactic subjects. By contrast, accusative experiencershave a much weaker status in Modern German, where subject-like obliques areusually assumed to be non-subjects. This hypothesis is clearly undermined byFaroese, where accusative experiencers have more or less disappeared despite


the fact that their subject status is not in doubt (Barnes 1986, Eythórsson andJónsson 2003, Thráinsson et al. 2004). Thus, both Faroese and German haveundergone more changes in their case systems than Icelandic, but we will notspeculate here why this is so.

Fischer’s hypothesis is further weakend by the fact that German has beenargued to have oblique subjects (e.g. Eythórsson and Barðdal 2005), contraryto the standard view in Germanic linguistics. But even if we accept the standardview, it is unclear why the productivity of accusative experiencers in Icelandic(as opposed to German) should be enhanced by their subject status. In fact,acquisition studies show that accusative experiencer subjects are acquired fairlylate in Icelandic (Björgvinsdóttir 2003, Sigurðardóttir 2002) – much later thane.g. accusative or dative objects. The conclusion, then, is that subject status doesnot provide any defence for accusative experiencers against diachronic change.

References

Barnes, Michael1986 Subject, nominative and oblique case in Faroese. Scripta Islandica

37: 13-46.

Björgvinsdóttir, Ragnheiður2003 Frumlagsfall í máli barna [Subject case in child language]. B.A. the-

sis, University of Iceland, Reykjavík.

Eythórsson, Thórhallur, and Jóhanna Barðdal2005 Oblique subjects: A Common Germanic inheritance. Language 81:

824-881.

Eythórsson, Thórhallur, and Jóhannes Gísli Jónsson2003 The case of subject in Faroese. Working Papers in Scandinavian Syn-

tax 72: 207-232.

Sigurðardóttir, Herdís Þ.2002 Fallmörkun í barnamáli: Hvernig læra íslensk börn að nota föll? [Case

marking in child language: How do Icelandic children learn to usecases?] M.A. thesis, University of Iceland, Reykjavík.

Thráinsson, Höskuldur, Hjalmar P. Petersen, Jógvan í Lon Jacobsen and Zakaris S.Hansen

2004 Faroese: An Overview and Reference Grammar. Tórshavn: FøroyaFróðskaparfelag.

Loosening the strictness of grammar

Three approaches to exceptionality in syntactictypology

Frederick J. Newmeyer

Abstract. In this paper, I contrast three approaches to handling exceptionality in syntac-tic typology: the ‘macroparametric’ approach associated with the Government-BindingTheory; the ‘microparametric’ approach associated with the Minimalist Program; andan extrasyntactic approach, in which parsing and other performance principles accountfor typological variation and exceptions to typological generalizations. I conclude thatthe extrasyntactic approach is best motivated.∗

1. Introductory remarks

Most generative grammarians have taken what one might call a strongly ‘de-terministic’ approach to language. The methodological strategy of generativegrammar has always been to push to the side what seems non-deterministic, ir-regular, and unpredictable. That is, in the search for maximally general princi-ples, generative grammarians have often ignored the messiness typically foundin linguistic data. Consider the most famous (some would say ‘notorious’) pas-sage in all of Chomsky’s writings:

Linguistic theory is concerned primarily with an ideal speaker-listener, in a com-pletely homogeneous speech-community, who knows its language perfectly andis unaffected by … grammatically irrelevant conditions … in applying his knowl-edge of the language in actual performance. (Chomsky 1965: 3)

That passage seems to imply a mechanical approach, at least in practice, to thequestion of how one’s knowledge of language affects one’s use of language.And as far as the internal structure of the grammar is concerned, we again seea methodology that abstracts away from the messier facts. After all, with fewand entirely recent exceptions, all formal linguists have taken an algebraic view

* I would like to thank Edith Moravcsik, Thomas Wasow, and two anonymous refer-ees for their comments on an earlier draft of this paper. Portions have appeared inNewmeyer (2004, 2005) and are reprinted with permission.

256 Frederick J. Newmeyer

of grammar, rather than a stochastic one. Chomsky could not have been moreexplicit on the question of determinism when he wrote: “The principles of uni-versal grammar are exceptionless” (Chomsky 1988: 62).

In early transformational grammar, language-particular rules were assumedto admit exceptions (see, for example, Lakoff 1965/1970). However, with theintroduction of the general ‘all-purpose’ movement rule, Move-α , in the mid-1970s, exceptionality was removed entirely from the transformational compo-nent. Problematic cases such as (1) through (3) below, which seem to be prima-facie counterexamples to the idea of exception-free rules, were either consignedto the lexicon or (more often) ceased to be subject matter for theoretical discus-sion:

(1) a. He is likely to be late.b. *He is probable to be late. (likely, but not probable, allows raising)

(2) a. He allowed the rope to go slack.b. *He let the rope to go slack. (let does not take the infinitive marker)

(3) a. He isn’t sufficiently tall.b. *He isn’t enough tall. / He isn’t tall enough. (enough is the only

degree modifier that occurs post-adjectivally)

It is also no secret that there is indecisiveness in judgments about the acceptabil-ity of sentences. Chomsky wrote about the problem in his earliest work (Chom-sky 1957). He was aware that judgments are not a yes/no matter and hopedthat analyses could be arrived at on the basis of totally clear and uncontrover-sial judgments. The theory would then decide the status of the unclear cases.But in fact that has happened only rarely. Indeed, we have seen the precise re-verse numerous times, in which appeal is made to the most unclear and mostcontroversial data. To give one example, Lasnik and Saito (1984) consider it amajor selling point of their theory of proper government that it accounts for theambiguity of sentence (4):

(4) Why do you think that he left?

In their view, why can be understood as questioning the reason for thinking orquestioning the reason for leaving. Its supposed ambiguity was a crucial piece ofdata in their analysis. But for Aoun, Hornstein, Lightfoot and Weinberg (1987),a major advantage of their account of proper government is that it accounts forthe lack of ambiguity of the very same sentence. Examples of this state of affairsare all too frequent.

Three approaches to exceptionality in syntactic typology 257

The grammatical indeterminism and exceptionality that will be the focus ofthe remainder of this paper is the sort that we find in typological generaliza-tions. There is a tradition over a century old of dividing languages into broadtypes. Until recently, the types were morphologically based: inflecting, ana-lytic, agglutinative, polysynthetic, and so on (see Sapir 1921). For the past fortyyears, the types have generally been syntactically based. Since the publicationof Greenberg (1963), most linguists have taken the order of heads and comple-ments to be the most revealing for purposes of syntactic typology. That is, onesays that if a language is head-initial, then a particular constellation of proper-ties follows and that if a language is head-final, then a different constellation ofproperties follows.

When one looks at things closely, however, it is not just the supposed cor-relates of head-initiality and head-finality that are full of exceptionality, but thevery notion of ‘head-initial language’ itself and the notion of ‘head-final lan-guage’ itself. Consider the statistics in (5), all of which but (5e) are drawn fromDryer (1991):

(5) For verb-final languages:

a. 96 % are postpositionalb. 85 % have predicate-copula orderc. 73 % have sentence-question particle orderd. 71 % are wh-in-situe. 64 % have case marking (Siewierska and Bakker 1996)f. 43 % have relative-noun order

It is the theoretical status of typological generalizations – and, in particular,exceptions to them – that constitute the subject matter of the remainder of thispaper. In other words, I take on the question of how linguistic theory should besthandle, say, the 4 % of prepositional verb-final languages, the 15 % of verb-finallanguages with copula-predicate order, and so on.

In the following sections, I contrast three approaches to handling exception-ality in syntactic typology: The ‘macroparametric’ approach associated with theGovernment-Binding Theory (GB) (§ 2); the ‘microparametric’ approach asso-ciated with the Minimalist Program (MP) (§ 3); and an extrasyntactic approach,in which parsing and other performance principles account for typological vari-ation and exceptions to typological generalizations (§ 4). I conclude that theextrasyntactic approach is best motivated. Section 5 is a brief conclusion.


2. The macroparametric approach of GB

In broad outline, the GB program for typology was very simple (I should per-haps write ‘is’ rather than ‘was’, since Mark Baker’s recent book Atoms of Lan-guage [Baker 2001] reasserts it). The idea is that the principles of UG are as-sociated with a small number of broad-scope macroparameters, each of whichadmits to a small number of settings. A language as a whole is specified for eachsetting. So English might be set positively for the Overt-Wh-Movement param-eter, negatively for the Verb Raising parameter, and with the setting ‘S and NP’for the Subjacency parameter. In the GB view, the interactions of these settingscombine to generate the diversity of human languages. Furthermore, since theparameters themselves are highly abstract, the idea is that unexpected cluster-ings of typological features should follow automatically.

Now, what about typological exceptionality in this model? In fact, there areseveral methods proposed for its treatment in classical GB. One is by means ofmarkedness relations among parameter settings. For example, Chinese is con-sistently head final except in the rule expanding X′ to X0 (if the head is verbal itprecedes the complement). So, as noted in Huang (1982: 46), Chinese manifeststhe ordering V-NP, but NP-N:

(6) a. youEXISTENTIAL

sangèthree

rénman

mai-lebuy-ASP

shubook

‘Three men bought books’b. Zhangsan

ZhangsandeNOM

sanbenthree

shubook

‘Zhangsan’s three books’

Travis (1989) suggested that Chinese has a marked parameter setting for wordorder. Normally, if a language is head final, it assigns Case and Theta-Roleto the left, as in (7a). However Chinese has a special setting that violates thisdefault ordering, namely (7b):

(7) a. Unmarked setting: HEAD-RIGHT⊃THETA-ASSIGNMENT TOLEFT & CASE-ASSIGNMENT TO LEFT

b. Marked setting (Chinese): HEAD-RIGHT & THETA-ASSIGN-MENT TO RIGHT & CASE-ASSIGNMENT TO RIGHT

In other words, Chinese grammar is more complicated than the grammar of aconsistent language and is therefore required to ‘pay’ for its typological excep-tionality.


Another strategy within GB was to assign typologically exceptional pro-cesses to the marked periphery, namely a system lying outside the principlesand parameters of core grammar. Some candidates proposed for the markedperiphery in Chomsky (1981) are the following:

(8) a. Elliptical expressions (He is seeing someone, but I don’t knowwho)b. ‘Exceptional Case Marking’ (I believe her to be clever)c. Picture noun reflexives (John thinks that the picture of himself is

not very flattering).d. Preposition-stranding (van Riemsdijk 1978) (Who did you talk to?)

The GB program for handling typological generalizations and exceptions tothese generalizations has not worked out as originally envisioned. To a largeextent, such is because some of the most discussed ‘generalizations’ turned outto be spurious. It does not make much sense to talk about an ‘exception’ to a non-existent generalization. Most seriously, the hoped for clustering of typologicalproperties characterizable by a simple parameter setting seems not to exist. Iillustrate this first with the Null-Subject Parameter, which by far is the beststudied parameter in GB. The theory of Rizzi (1982) predicts the followingpossible clustering of features:1

(9) NULL TS NULL NTS SI THAT-Tyes yes yes yesno yes yes yesno no no no

But still other language types exist, or at least appear to. In particular, we findlanguages such as Brazilian Portuguese (Chao 1981) and Chinese (Huang 1982,1984) that have null subjects, but not subject inversion. Taking such languagetypes into account, Safir (1985) broke the Null Subject Parameter into threeparts, dissociating null nonthematic subjects, null thematic subjects, and subjectinversion, thereby predicting a wider set of languages than did Rizzi, namelythe following:

1. In this and in the following examples, the following abbreviations are used: NULLTS = Null thematic subjects; NULL NTS = Null nonthematic subjects; SI = subjectinversion; THAT-T = the possibility of that-trace filter violations.


(10) NULL TS NULL NTS SI THAT-Tyes yes yes yesyes yes no nono yes yes yesno no yes yesno no no no

If Safir’s predictions were correct, then an ‘exceptional’ language would beone that had, say, null thematic subjects, no null nonthematic subjects, subjectinversion, and no that-trace effects.

Rizzi’s and Safir’s predictions were put to the test by Gilligan (1987), whoworked with a 100 language sample, which he attempted to correct for areal andgenetic bias.2 Gilligan devotes many pages of discussion to the problems in-volved in determining whether a language manifests one of the four propertiesor not. His final determination was often based on the results of then-currentgenerative analyses, rather than on mere surface facts about the language inquestion. For example, he excluded Chinese, Thai, Indonesian, Burmese andother languages that lack agreement morphology from the ranks of those per-mitting null thematic subjects on the basis of the analysis of Chinese in Huang(1984), which takes the empty subject in that language to be a null topic, ratherthan a pro. Gilligan found the following correlations of properties in his sample(languages for which there was not sufficient data are excluded):

(11) yes–yes yes–no no–yes no–noNULL TS – NULL NTS 24 0 15 2NULL TS – SI 22 49 11 15NULL TS – THAT-T 5 3 2 1NULL NTS – SI 14 25 1 1NULL NTS – THAT-T 7 2 0 1SI – THAT-T 4 0 3 4

According to Gilligan, the data in (11) reveal that the only robust correlationsamong the four features are the following:

(12) a. NULL TS → NULL NTSb. SI → NULL NTSc. SI → THAT-Td. THAT-T → NULL NTS

2. The most extensive published discussion of Gilligan’s work that I am aware of isfound in Croft (2003: 80–84).


These results are not very heartening for either Rizzi’s theory nor for Safir’s,nor, indeed, for any which sees in null subject phenomena a rich clustering ofproperties. In three of the four correlations, null nonthematic subjects are en-tailed, but that is obviously a simple consequence of the virtual nonexistence oflanguages that manifest overt nonthematic subjects. Even worse, five languagetypes are attested whose existence neither theory predicts. Current work on nullsubjects pretty much ignores the clustering issue and therefore (necessarily) thequestions of exceptions to the predicted clusterings.

To take another example of a failed prediction of clustering within GB,Kayne (1984) links parametrically the following four properties of French, allof which differ from their English counterparts:

(13) a. The assigning of oblique case by prepositions (as opposed to ob-jective case) (avec lui/*le)

b. The impossibility of Preposition-stranding (*Qui as-tu parlé à?)c. The impossibility of Exceptional Case Marking (*Je crois Jean

être sage)d. The impossibility of Dative Shift (*J’ai donné Marie un livre)

For Kayne, then, it would be ‘exceptional’ to find a language that allowedPreposition-Stranding, but disallowed Dative Shift.

Unfortunately, Kayne’s parameter appears to make incorrect predictionscrosslinguistically. For example, many English-based creoles lack stranding, asthe following examples from Sranan illustrate (Muysken and Law 2001: 53):

(14) a. nangawith

sanwhat

uyou

koticut

athe

brede?bread

b. *sanwhat

uyou

koticut

athe

bredebread

nanga?with

Yet in Sranan there is no evidence for distinguishing objective from obliquecase. Saramaccan is also in conflict with the parameter, in that oblique Case andstranding are missing, yet it does have Exceptional Case Marking and doubleobject constructions (Veenstra 1996). Also, Kayne’s parametric account doesnot distinguish elegantly between Icelandic, a case-rich stranding language,from Southern German and some Slavic languages, also case-rich, but non-stranding. Chinese and Indonesian have Dative Shift, but no stranding, whilePrince Edward Island French has stranding but no Exceptional Case Marking.And finally, there is experimental work by Stromswold (1988, 1989) and Sug-isaki and Snyder (2001) that shows that acquisitional data do not bear out theidea that one parameter is implicated in these processes.


Further impeding an adequate GB-based approach to typological exception-ality is the fact that the notion ‘setting for a particular parameter’ is not neces-sarily constant within a single language. The original vision of parameters wasan extremely attractive one, in that the set of their settings was conceived of asa checklist for a language as a whole. But the Lexical Parameterization Hypoth-esis (LPH) has put an end to this vision:

(15) Lexical Parameterization Hypothesis (Borer 1984; Manzini and Wex-ler 1987): Values of a parameter are associated not with particulargrammars, but with particular lexical items.

Something like the LPH is certainly necessary. For example, different anaphoricelements in the same language can have different binding domains, as is thecase with Icelandic hann and sig. But the LPH forces us to give up the idea thatthe child simply checks off one parameter setting after another in the processof language acquisition. What is even worse is that different structures in thesame language seem to have different settings. For example, Rizzi (1978) triedto capture the differences in extraction possibilities between English and Italianby positing that S is a bounding node for Subjacency in English, but S’ in Italian.But observe in (16) that English is as permissive as Italian when the extractedwh-element is a direct object, especially if the lowest clause is non finite:

(16) This is the car that I don’t know how to fix.

There has also been very little support for a GB-style parameter-setting modelfrom language acquisition. Actually, most work in the generative tradition sim-ply assumes that acquiring a language is a matter of parameter-setting, ratherthan providing evidence directly bearing on the question. That is, it takes pa-rameters as a given and raises questions such as: ‘Do parameters have defaultvalues?’, ‘Can parameters be reset in the course of acquisition?’, and so on.Yet a number of factors suggest that a parameter-setting strategy for first lan-guage acquisition is far from the simple task that it is portrayed in much of theliterature.

Several of the problems for parameters result from the fact that what thechild hears are sentences (or, more correctly, utterances), rather than structures.But any given utterance is likely to massively underdetermine the particularstructural property that the child needs to set some particular parameter. Thegreater the number of parameters to be set, the greater the problem, particularlygiven that few of the parameter settings appear to have unambiguous triggers.Citing Clark (1994), Janet Fodor points out that there is an “exponential ex-plosion from the parameters to the number of learning steps to set them … If


so, the learner might just as well check out each grammar, one by one, againstthe input; nothing has been gained by the parameterization. … [to] set one pa-rameter could cost the learner thousands or millions of input sentences” (Fodor2001b: 736). What makes the problem even more serious is the fact that childrenare obviously not born with the ability to recognize triggers for any one particu-lar language. English-speaking, Chinese-speaking, and Japanese-speaking chil-dren all need to arrive at a negative setting for the Ergativity Parameter, givenits existence, but it is by no means obvious what feature common to the threelanguages would lead the very young child to arrive at that particular setting.

In other words, the fundamental problem is that parameter-setting presup-poses some non-negligible degree of prior structural assignment. To illustratethe problem, Hyams (1986) speculates that a child sets the Null Subject Param-eter with a negative value when it hears an expletive. But how can the childknow what an ‘expletive’ is without already having a syntax in place? ‘Exple-tive’ is not an a priori construct available to the newborn, but is interpreted onlywith respect to an already existing grammar. But if the grammar is already inplace, then why do we need parameters at all?3

Given the hypothesis that parameters are complemented by rules in themarked periphery, the learner’s task is not simplified by the positing of parame-ters. As pointed out by Foley and Van Valin (1984: 20), it is made more compli-cated. Since learners have to acquire rules anyway, they have a double burden:acquiring both rules and parameter settings and figuring out which phenomenaare handled by which. And one would assume (along with Culicover 1999: 16)that any learning mechanism sophisticated enough to acquire the ‘hard stuff’in the periphery would have no trouble acquiring the ‘easy stuff’ at the core,thereby rendering the notion ‘parameter’ superfluous.

Along the same lines, there is no evidence that ‘peripheral’ knowledge isstored and/or used any differently from that provided by the system of princi-ples and parameters per se. When head-directionality or V2-ness are at stake,do German speakers perform more slowly in reaction time experiments than dospeakers of head-consistent non-V2 languages? Do they make more mistakes ineveryday speech, say by substituting unmarked constructions for marked ones?Do the marked forms pose comprehension difficulties? In fact, is there any ev-idence whatsoever that such knowledge is dissociable in some way from more‘core’ knowledge? As far as I am aware, the answers to all of these questions are‘no’. As Janet Fodor has stressed: “The idea that there are two sharply differ-ent syntax learning mechanisms at work receives no clear support that I know

3. For similar arguments see Nishigauchi and Roeper (1987); Haider (1993); Valian(1990); and Mazuka (1996).


of from theoretical, psychological, or neurological studies of language” (Fodor2001a: 371).

Finally, there is little credence to the idea that there is a robust correlationbetween the order of acquisition of some feature and its typological status, a factwhich casts into doubt the idea that parameters are organized in an implicationalhierarchy. Some late-acquired features are indeed typologically relatively rare,as appears to be the case for the verb-raising that derives VSO order from SVOorder (see Guilfoyle 1990 for Irish; Radford 1994 for Welsh; Ouhalla 1991bfor Arabic). But other grammatical features appear to be acquired relativelylate, without being typologically rare (see Eisenbeiss 1994 on scrambling inGerman).

In general, however, children acquire the relevant structures of their lan-guage quite early, regardless of how common that structure is crosslinguisti-cally. Hence English-speaking children acquire P-stranding before pied-piping(Sugisaki and Snyder 2001). French-speaking children have verb-raising fromthe earliest multi-word utterances (Déprez and Pierce 1993; Pierce 1992; Meiseland Müller 1992; Verrips and Weissenborn 1992). English-speaking childrennever manifest verb-raising (Stromswold 1990; Harris and Wexler 1996). Fur-thermore, children figure out very early whether their language is null subject ornot (Valian 1991) and children acquiring English, German, and French evidencestrong knowledge of locality in wh-extraction domains at early ages (Roeperand De Villiers 1994).4

Before leaving GB, I want to mention one more proposal for handling typo-logical exceptionality within that framework. Baker (2001) suggests attributingtypological consistency to purely grammatical factors and typologically incon-sistent behavior to extragrammatical causes, just as physicists attribute to ex-traneous factors such as air resistance the fact that objects are not observed tofall to earth at the rate of 9.8 meters per second squared. Baker noted that theEthiopian language Amharic is SOV, yet is exceptional for an SOV languagein having prepositions. Baker writes:

Languages that are close to the ideal types are much more common than lan-guages that are far from them. According to the statistics of Matthew Dryer, only6 percent of languages that are generally verb final are like Amharic in havingprepositions rather than postpositions. …The conflict of historical and geograph-

4. On the other hand, a number of language acquisition researchers continue to provideevidence for the idea, first articulated in Borer and Wexler (1987), that principles ofgrammar ‘mature’ with age (see, for example, Babyonyshev, Ganger, Pesetsky andWexler (2001). For an interesting, albeit brief, overview of the issues involved seeSmith and Cormack (2002).


ical influences could partially explain why Amharic is a mixed case. (Baker 2001:82–83)

As an initial point to make in response to this quote, Baker’s “6 percent” issomewhat misleading, perhaps inviting the reader to conclude that 94 percentof languages are typologically consistent. But when the totality of the typo-logical generalizations are taken into account, very few, if any, languages areexception-free typologically. In fact, I agree with Baker’s point that historicaland geographical influences are at the root of much typological inconsistencyBut the analogy between Amharic’s having prepositions and leaves not fallingto earth at the predicted rate seems far-fetched. Principles that predict the rateof falling bodies and those that predict the effects of air resistance belong totwo different physical systems. Is there any evidence that an Amharic speaker’sknowledge that auxiliaries follow verbs in that language (as is typically the casefor SOV languages) and their knowledge that it is prepositional (which is rarefor SOV languages) belong to two different cognitive systems? No, there isabsolutely no evidence whatsoever for such an hypothesis.

In summary, the GB program for typology has been abandoned by all buta few scholars. The hypothesized clustering of typological properties based onthe positing of abstract parameter settings appears not to exist. Even worse, itappears that parameter settings need to be attributed to individual lexical items(and possibly even to constructions), rather than to entire languages. Since thisprogram has not succeeded in capturing typological regularity, it can be dis-missed as a contender for a theory capable of capturing exceptions to typologi-cal generalizations.

3. The microparametric approach of the MP

In the MP, parameter settings are not associated with principles of UG that holdfor an entire language, but rather with particular functional projections presentor not present in a particular language. Languages are posited to differ in termsof which functional projections are manifest and in terms of the featural con-tent of these projections. By way of illustration, McCloskey (2002) argues thatwhether or not a language makes productive use of resumptive pronouns de-pends on the inventory of Complementizer-type elements in the language, andin the analysis of Pesetsky and Torrego (2001), whether or not a language hasSubject-Aux Inversion depends on the featural properties of the COMP node.Going along with this shift in the interpretation of where parameters reside isa shift from a focus on ‘macroparameters’ to one on ‘microparameters’. Thelatter are, essentially, slight differences in the properties of functional heads


that are responsible for minute difference in structure between closely relatedlanguages and dialects.

But what about the handling of typological exceptionality in the MP? In fact,it is not at all clear. The basic ontology of the MP is very sparse, consisting es-sentially of the basic operations of Merge and Move, subject to economy con-ditions of various sorts. The inventory of basic operations is so pared down andminimal that there is no elegant way for the syntactic component per se to dis-tinguish typologically regular processes from typologically exceptional ones.Everything essentially boils down to idiosyncratic properties of the lexicon, inparticular, to functional heads in the lexicon and their projections.

Let me provide a concrete example based on Cinque (1994). Cinque presentsa minimalist analysis of certain differences between French and English, asdepicted in (17)–(19):

(17) a. un gros ballon rougeb. ‘a big red ball’

(18) a. un tissu anglais cherb. ‘an expensive English fabric’

(19) a. an old friend (= friend who is aged or friend for a long time)b. une vieille amie (= friend for a long time)c. une amie vieille (= friend who is aged)

Cinque’s analysis of (17)–(19) is summarized in (20a–c):

(20) a. French has postnominal adjectives (as in 17a) because of a para-metric difference with English that allows N-movement to a high-er functional projection in the former language, but not in thelatter.

b. Cher has scope over anglais in (18a) because French has a para-metric difference with English that triggers movement of a N-ADJconstituent.

c. In (19), the two positions for vieille in French, but only one for oldin English, result from a parametric difference between the twolanguages regarding the feature attraction possibilities of func-tional categories in the two languages.

As Bouchard (2003) points out, the problem with such an account is that theword ‘parameter’ is used as nothing more than a synonym for the word ‘rule’.There is no increase in descriptive elegance, economy, or whatever in Cinque’saccount over an account which does no more than say that English and French


have different rules of adjective placement. And most importantly for our pur-poses, there is nothing in Cinque’s minimalist account that would begin to ex-plain why N-Adj order is quite a bit more common than Adj-N order crosslin-guistically. Consider some data from Dryer (1988: 188–189), which show thatfor all three basic word orders, more languages manifest N-before-Adj thanAdj-before-N order:

(21) a. SOV & AdjN 64 languagesb. SOV & NAdj 94 languages

(22) a. SVO & AdjN 23 languagesb. SVO & NAdj 67 languages

(23) a. VSO & AdjN 15 languagesb. VSO & NAdj 24 languages

Note that an SVO language with AdjN (like English) order is exceptional – only23 % of languages manifest that correlation. That result does not follow fromCinque’s MP-based approach.

In GB, at least in principle, the number of parameters and their settings wassmall. In the MP, the number seems open-ended. How many parameters arein fact necessary in the MP? It is possible to make a rough count, given theassumption that there is one binary setting for each functional head. And howmany functional heads are there? If Cinque (1999) is right, there are at least 32functional heads in the IP domain alone. On the basis of a look at fifteen lan-guages, fourteen of them Indo-European (from only four subfamilies), Longo-bardi (2003) proposes 30 binary parameters for DP. Cinque (1994) divides Ad-jective Phrase into at least five separate maximal projections encoding Quality,Size, Shape, Color, and Nationality. Beghelli and Stowell (1997) break downQuantifier Phrase into projections headed byWh, Neg, Distributive, Referential,and Share. CP has also been split into a dozen or more projections, includingForceP, FocusP, and an indefinite number of Topic Phrases (Rizzi 1997). Factspertaining to clitic inversion and related phenomena in some northern dialectsof Italian have led to the positing of Left Dislocation Phrase, Number Phrase,Hearer Phrase, and Speaker Phrase (Poletto 2000). Damonte (2004) proposesprojections corresponding to the set of thematic roles, including Reciprocal,Benefactive, Instrumental, Causative, Comitative, and Reversive Phrases. Wehave seen Verb Phrase split into two projections, one headed by V and the otherby ‘v’ (Chomsky 1995). Zanuttini (2001) posits four distinct Negative Phraseprojections for Romance alone and McCloskey (1997) argues that at least threesubject positions are needed. The positing of a new functional projection (andhence a new parameter) to capture any structural difference between two lan-


guages has led to what Ackerman and Webelhuth (1998: 225) have aptly calledthe “diacriticization of parameters”.

Other proposals have led to a potentially exponential increase in the numberof functional projections and their interrelationships, and hence in the number ofparameters. For example, Giorgi and Pianesi (1997) have mooted the possibilityof ‘syncretic categories’, that is, those that conflate two or more otherwise inde-pendent ones, as, for example, TP/AgrP. Along similar lines, Bobaljik (1995);Thráinsson (1996); and Bobaljik and Thráinsson (1998) suggest that languagesdiffer not only in terms of the settings of their parameters, but also in terms ofthe presence, or not, of particular functional categories (see also Fukui 1995).Such a proposal leads to at least a ternary value for each parameter: positive,negative, or not applicable. Complicating things still further, Ouhalla (1991a)argues that an important dimension of parametric variation among languages isthe relative ordering of embedding of functional categories. So for example, inhis analysis, in Berber and Chamorro, the AgrP projection is below the TnsPprojection, while in English and Welsh, TnsP is below AgrP.

One might, of course, argue along with Cinque and contra Ouhalla thatthe ordering among functional categories is universal. In that view, languageswould differ parametrically in their lexicalization possibilities, some functionalcategories being lexicalized in some languages, but not in others. However,transferring the parametric choice to the lexicon neither decreases the numberof potential parameters nor gives them an edge over rules. First, the number ofparameters is not reduced, since the burden of specifying whether a functionalcategory is present in a particular language or not has merely been transferredto the lexicon. Second, the statement that some language makes the parametricchoice that lexical item L licenses functional projection P is indistinguishablefrom the statement that there is a language-particular rule involving L that spec-ifies P.

In order to account for parametric variation and the “some number substan-tially greater than 5 billion” grammars that might exist in the world (Kayne2000: 8), Kayne calculates that only 33 binary-valued parameters would beneeded. His math may be right, but from that fact it does not follow that only 33parameters would be needed to capture all of the microvariation that one findsin the world’s languages and dialects. In principle, the goal of a parametric ap-proach is to capture the set of possible human languages, not the set (howeverlarge) of actually existing ones. One can only speculate that the number of suchlanguages is in the trillions or quadrillions. In any event, Kayne’s own worksuggests that the number of parameters is vastly higher than 33. Depending onprecisely what counts as a parameter (Kayne is not always clear on that point),just to characterize the difference among the Romance dialects discussed in the


first part of Kayne (2000) with respect to clitic behavior, null subjects, verbmovement, and participle agreement would require several dozen distinct pa-rameters. It is hard to avoid the conclusion that characterizing just a few moredifferences among the dialects would lead to dozens of new parameters.

If the number of parameters needed to handle the different grammars ofthe world’s languages, dialects, and (possibly) idiolects is in the thousands (or,worse, millions), then ascribing them to an innate UG to my mind loses all sem-blance of plausibility. True, we are not yet at the point of being able to ‘prove’that the child is not innately equipped with 7846 (or 7,846,938) parameters, eachof whose settings is fixed by some relevant triggering experience. I would putmy money, however, on the fact that evolution has not endowed human beingsin such an exuberant fashion.

In other words, despite its claims to the contrary, the MP takes us back tothe old idea that languages differ from each other simply by having differentrules – a solution that does nothing to distinguish typologically common pro-cesses from the exceptional ones. Recall that the great promise of parametrictheory was its seeming ability to provide a generative approach to language ty-pology, that is, to be able to characterize the difference from one language to thenext by means of differences in parameter settings. The LPH, which is centralto the MP, dashes all hope that this promise might be fulfilled. Puzzlingly frommy point of view, relocating the site of parametric variation from grammarsof entire languages to lexical items and their associated functional categoriesis often portrayed as a major step forward. For example, Pierre Pica writes thatthis move “allows a radical simplification of the nature and design of UG” (Pica2001: vi). But the price paid for this “radical simplification” is both an explo-sion in the number of functional categories needed to be posited within UG and,more seriously, the transfer of the burden for accounting for language-particulardifferences from properties of UG per se to idiosyncratic properties of lexicalentries in particular languages. In earlier versions of principles-and-parameterssyntax (and in current versions such as Mark Baker’s) a given language L wasposited to have a particular setting for the Head Directionality Parameter, theSerial Verb Parameter, and so on. But now, in principle it is individual lexicalitems in L that need to be specified as to how they relate to head directionality,serial verbs, and so on. That brings us back in effect to the earliest versions oftransformational grammar, where each lexical item bore a set of tags indicatingeach rule that it governed or failed to govern. I certainly agree with Pica (2001)that twenty years of intensive descriptive and theoretical research has shownthat macroparameters do not exist. But we have to regard that conclusion as acause for disappointment, not rejoicing.


To summarize, as far as the ability to capture typological exceptionality isconcerned, the MP represents a step backward from GB. The latter approachhad a program for capturing exceptionality, albeit a flawed one. The MP doesnot even have a program aimed at such a result.

4. An extrasyntactic approach to typological generalizations and theirexceptions

In this section I advocate a very different approach to capturing typologicalgeneralizations and exceptions to them. The burden for handling both is shiftedfrom UG to performance principles that are sensitive to grammatical structure.

A wide variety of performance principles have been proposed in the litera-ture – some, in my view, convincing, and some less so – and it is not my purposehere to review them all. In fact, I will focus on only one, the parsing principleof Minimize Domains, proposed in Hawkins (2004):

(24) Minimize Domains (Hawkins 2004): The hearer (and therefore theparsing mechanism) prefers orderings of elements that lead to the mostrapid recognition possible of the structure of the sentence.

In short, there is performance-based pressure for language users to identify con-stituents of a phrase as rapidly as possible. To illustrate, consider the tendencyof heads consistently to precede complements or to follow complements. Aswe have seen, formal approaches have provided a head-parameter provided byUG. But the performance basis of this generalization seems quite straightfor-ward and follows directly from Minimize Domains. Consider a VO languagelike English, where heads typically precede complements:

(25) V-NP, P-NP, A-of-NP, N-of-NP

In each case a ‘lighter’ head precedes a ‘heavier’ complement; putting the heav-ier phrasal complement after the lighter lexical head allows for a quicker recog-nition of all of the constituents of the dominating phrase. In fact, the light-before-heavy tendency in the grammar involves far more than the head-comple-ment relation. For example, the canonical order of VP constituents is relent-lessly lighter-to-heavier:

(26) VP[V-NP-PP-CP] (convince my students of the fact that all grammarsleak)


Also notice that single adjectives and participles can appear in pre-head positionin English:

(27) a. a silly proposalb. the ticking clock

But if these adjectives and participles themselves have complements, the com-plements have to appear in post-head position:

(28) a. *a sillier than any I’ve ever seen proposalb. a proposal sillier than any I’ve ever seen

(29) a. *the ticking away the hours clockb. the clock ticking away the hours

The evidence for a performance, rather than for a UG, basis of the light-before-heavy tendency is based on the fact that when speakers have a choice in a VO-type language, they tend to put shorter before longer constituents. So, exceptfor cases in which there is a strong lexical relation between V and P, PP’s cantypically occur in any order after the verb:

(30) a. Mary talked to John about Sue.b. Mary talked to Sue about John.

But all other things being equal, the greater the length differential between thetwo PP’s, the more likely speakers will be to put the shorter one first (Hawkins1994).5 Interestingly, Hawkins’s approach makes precisely the opposite lengthand ordering predictions for head-final languages. And to be sure, there is aheavy-before-light effect in those languages, both in language use and in thegrammar itself.

Now then, where do exceptions fit into the picture? Minimize Domains pre-dicts straightforwardly that a VO language should be prepositional and that anOV language should be postpositional. And indeed, such is generally the case.As is shown in Dryer (1992), 94 % of OV languages are postpositional and 85 %of VO languages are prepositional.6 The exceptional nature of a prepositionalOV language (like Amharic) and a postpositional VO language (like Finnish)follows directly. To illustrate, consider the four logical possibilities, illustrated

5. The discourse status of the elements involved also plays a role in ordering (seeArnold, Wasow, Losongco and Ginstrom 2000; Hawkins 2003).

6. To be accurate, Dryer’s count involves ‘genera’ – genetic groups roughly comparablein time depth to subfamilies of European – not languages per se.


in (31a–d): VO and prepositional (31a); OV and postpositional (31b); VO andpostpositional (31c); and OV and prepositional (31d):

(31)

Let us assume with Hawkins, that grammars are organized so that users canrecognize the major constituents of a phrase as rapidly as possible. In (31a) and(31b), the two common structures, the recognition domain for the VP is just thedistance between V and P, crossing over the object NP. But in (31c) and (31d),the uncommon structures, the recognition domain is longer, in that it involvesthe object of the preposition as well. So both regularity and exceptionality fol-low naturally in this approach. The exceptional cases are simply those that failto be in accord with the principle of Minimize Domains.

One might object that exceptions pose as great a challenge for parsing prin-ciples as for UG principles – after all, in both cases, some theory-based gen-eralization has been violated. But one expects performance principles to admitexceptions. Rather than being like the either-or (or yes-no) switch settings in-herent to UG parameters, they are part-and-parcel of a theory of language use.And nobody, as far as I know, believes that an algebraic theory suffices to ex-plain facts about language use. Rather, usage-based generalizations are gen-eralizations about populations (whether of speakers or of languages). To givean analogy, the generalization that cigarette smoking causes lung cancer is not


threatened by the fact that there exist (exceptional) individuals who smoke fivepacks of cigarettes per day over their lifetimes and do not develop lung cancer.The rare OV, yet prepositional, languages are parallel, in crucial respects, tothese individuals.7

Consider another example of a robust, but not exception-free, typologicalgeneralization. Hawkins (1983) proposed the following hierarchy:

(32) Prepositional Noun Modifier Hierarchy (PrNMH): If a language is pre-positional, then if RelN then GenN, if GenN then AdjN, and if AdjNthen DemN.

The PrNMH states that if a language allows long things to intervene between apreposition and its object, then it allows short things. This hierarchy predicts thepossibility of prepositional phrases with the structures depicted in (33) (alongwith an exemplifying language):

(33) a. PP[P NP[___N…] (Arabic, Thai)b. PP[P NP[___N…]; PP[P NP[Dem N…] (Masai, Spanish)c. PP[P NP[___N…]; PP[P NP[Dem N…]; PP[P NP[Adj N…]

(Greek, Maya)d. PP[P NP[___N…]; PP[P NP[Dem N…]; PP[P NP[Adj N…];

PP[P NP[PossP N…] (Maung)e. PP[P NP[___N…]; PP[P NP[Dem N…]; PP[P NP[Adj N…];

PP[P NP[PossP N…]; PP[P NP[Rel N…] (Amharic)

The Minimize Domains-based explanation of the hierarchy is straightforward.The longer the distance between the P and the N in a structure like (34), thelonger it takes to recognize all the constituents of the PP. Given the idea thatgrammars try to reduce the recognition time, the hierarchy follows:

(34) PP

P NP

X N

Since relative clauses tend to be longer than possessive phrases, which tend tobe longer than adjectives, which tend to be longer than demonstratives, whichare always longer than ‘silence’, the hierarchy is predicted on parsing grounds.

7. See the Featherston and Wasow, et al. papers in this volume for interesting discus-sions of how stochastic generalizations bear on the handling of seeming typologicalexceptionality.


It is far from clear how this generalization might be captured by means of pa-rameters, whether macroparameters or microparameters.

There are a few exceptions to the PrNMH. Hawkins (1994) reports that inthe prepositional Sino-Tibetan language Karen, genitives are the only daughtersof NP to precede N and (citing unpublished work by Matthew Dryer), he pointsto a small number of prepositional languages (e.g. Sango) in which AdjN co-occurs with NDem. Again, a small number of exceptions to a performance-based principle are entirely to be expected.

Let us finish this section with one more example of a typological general-ization that has a performance-based explanation. As observed in (5d) above,verb-finality is accompanied by wh-elements being in situ, though there are asignificant number of exceptions (29 %). The parsing explanation of this gen-eralization is straightforward. Heads, in general, are the best identifiers of theirsubcategorized arguments. If one hears the verb give, for example, one is primedto expect two associated internal arguments, one representing a recipient and theother an object undergoing transfer. On the other hand, a human NP might ormight not be a recipient and an inanimate NP might or not be an object un-dergoing transfer. Hence, if arguments precede their heads, as they do in SOVlanguages, extra cues are useful to identify their thematic status. Such can beaccomplished by keeping them contiguous to the head (that is, by restrictingtheir movement possibilities) and / or by endowing them with case markingthat uniquely identifies their thematic role or helps to narrow down the possi-bilities.

The question naturally arises (both for parametric accounts and for perfor-mance-based accounts) of why there are so many exceptions to this generaliza-tion. I have nothing to offer in terms of an answer to this question, except to sug-gest that there must be a countervailing performance pressure for all languagesto frontwh-elements The focusing property ofwh-elements immediately comesto mind as a basis for why they so often occur fronted. However, I readily con-cede that without a precise characterization of the nature of the pressure to frontthe focus of questioning and why this pressure tends to be weaker than the pres-sure for arguments to remain in situ in OV languages, my suggestion amountsto little more than hand-waving.

It is worth pointing out by way of summary that there is a ‘built-in’ ad-vantage to parsing-based explanations of grammatical structure that one doesnot find with UG-based explanations. In a nutshell, the advantage to parsingrapidly can hardly be controversial. We know that parsing is fast and efficient.Every word has to be picked out from an ensemble of 50,000, identified in onethird of a second, and put into the right structure. It simply makes sense thatparsing-pressure would have left its mark on grammatical structure. Further-


more, performance-based solutions allow the grammar itself to be kept cleaner.As Stefan Frisch has noted:

For the traditional formalist, it is actually desirable for some linguistic patterns,especially those that are gradient, to be explained by functional principles. Theremainder, once language processing influences are factored out, might be a sim-pler, cleaner, and more accurate picture of the nature of the innate language fac-ulty and its role in delimiting the set of possible human languages. (Frisch 1999:600)

I agree completely.

5. Conclusion

This paper has focused on exceptionality in syntactic typology, that is, the meansof handling exceptions to broad typological generalizations. Three approacheswere considered: The macroparametric approach of the Government-Bindingtheory; the microparametric approach of the Minimalist Program; and an ap-proach that attempts to handle typological generalizations (and exceptions tothem) by parsing and other extra-syntactic mechanisms. The GB approach, asa priori appealing as it is, has simply not been borne out by the empirical evi-dence. On the other hand, the MP approach to typology seems to boil down tonothing more than saying that some languages have one set of functional pro-jections and other languages have another set of functional projections, withoutexplaining why more languages would do things one way than another way. Ihope to have shown that a processing-based approach shows the greatest degreeof promise in handling the exceptionality that one finds in syntactic typology.

References

Ackerman, Farrell, and Gert Webelhuth1998 A Theory of Predicates. Stanford, CA: CSLI Publications.

Aoun, Joseph, Norbert Hornstein, David Lightfoot, and Amy Weinberg1987 Two types of locality. Linguistic Inquiry 18: 537–578.

Arnold, Jennifer E., Thomas Wasow, Anthony Losongco, and Ryan Ginstrom2000 Heaviness vs. newness: The effects of structural complexity and dis-

course status on constituent ordering. Language 76: 28–55.

Babyonyshev, Maria, Jennifer Ganger, David M. Pesetsky, and Ken Wexler2001 The maturation of grammatical principles: Evidence from Russian un-

accusatives. Linguistic Inquiry 32: 1–43.


Baker, Mark C.2001 The Atoms of Language: The Mind’s Hidden Rules of Grammar. New

York: Basic Books.

Beghelli, Filippo, and Timothy A. Stowell1997 Distributivity and negation: The syntax of each and every. In Ways of

Scope Taking, Anna Szabolcsi (ed.), 71–107. Dordrecht: Kluwer.

Bobaljik, Jonathan D.1995 Morphosyntax: The syntax of verbal inflection. Ph. D. diss., MIT.

Bobaljik, Jonathan D., and Höskuldur Thráinsson1998 Two heads aren’t always better than one. Syntax 1: 37–71.

Borer, Hagit1984 Parametric Syntax: Case Studies in Semitic and Romance Languages.

Dordrecht: Foris.

Borer, Hagit, and Kenneth Wexler1987 The maturation of syntax. In Parameter Setting, Thomas Roeper, and

Edwin Williams (eds.), 123–172. Dordrecht: Reidel.

Bouchard, Denis2003 The origins of language variation. Linguistic Variation Yearbook 3:

1–41.

Chao, Wynn1981 PRO-drop languages and nonobligatory control. University of Mas-

sachusetts Occasional Papers 6: 46–74.

Chomsky, Noam1957 Syntactic Structures. The Hague: Mouton.

Chomsky, Noam1965 Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.

Chomsky, Noam1981 Lectures on Government and Binding. Dordrecht: Foris.

Chomsky, Noam1988 Language andProblems of Knowledge: TheManagua Lectures. Cam-

bridge, MA: MIT Press.

Chomsky, Noam1995 The Minimalist Program. Cambridge, MA: MIT Press.

Cinque, Guglielmo1994 On the evidence for partial N movement in the Romance DP. In Paths

towards Universal Grammar, Guglielmo Cinque, Jan Koster, Jean-Yves Pollock, Luigi Rizzi, and Raffaella Zanuttini (eds.), 85–110.Washington: Georgetown University Press.


Cinque, Guglielmo1999 Adverbs and Functional Heads: A Cross-linguistic Perspective. Ox-

ford: Oxford University Press.

Clark, Robin1994 Finitude, boundedness, and complexity. In Syntactic Theory and First

Language Acquisition: Cross-linguistic Perspectives. Vol. 2: Bind-ing, Dependencies, and Learnability, Barbara Lust, Gabriella Her-mon, and Jaklin Kornfilt (eds.), 473–489. Hillsdale, NJ: Erlbaum.

Croft, William2003 Typology and Universals. 2nd ed. Cambridge: Cambridge University

Press.

Culicover, Peter W.1999 Syntactic Nuts: Hard Cases, Syntactic Theory, and Language Acqui-

sition. Oxford: Oxford University Press.

Damonte, Federico2004 The thematic field: The syntax of valency-enriching morphology.

Ph. D. diss., University of Padua.

Déprez, Viviane, and Amy Pierce1993 Negation and functional projections in early grammar. Linguistic In-

quiry 24: 25–67.

Dryer, Matthew S.1988 Object-verb order and adjective-noun order: Dispelling a myth. Lin-

gua 74: 185–217.

Dryer, Matthew S.1991 SVO languages and the OV:VO typology. Journal of Linguistics 27:

443–482.

Dryer, Matthew S.1992 The Greenbergian word order correlations. Language 68: 81–138.

Eisenbeiss, Sonia1994 Kasus und Wortstellungsvariation im deutschen Mittelfeld. In Was

determiniertWortstellungsvariation?, Brigitta Haftka (ed.), 277–298.Opladen: Westdeutscher Verlag. [special issue of Linguistische Be-richte]

Fodor, Janet D.2001a Parameters and the periphery: Reflections on syntactic nuts. Journal

of Linguistics 37: 367–392.

Fodor, Janet D.2001b Setting syntactic parameters. In The Handbook of Contemporary Syn-

tactic Theory, Mark Baltin, and Chris Collins (eds.), 730–767. Ox-ford: Blackwell.


Foley, William A., and Robert D. Van Valin1984 Functional Syntax and Universal Grammar. Cambridge: Cambridge

University Press.

Frisch, Stefan1999 [Review of Thomas Berg, Linguistic Structure and Change: An Ex-

planation from Language Processing]. Journal of Linguistics 35:597–601.

Fukui, Naoki1995 Theory of Projection in Syntax. Stanford, CA: CSLI Publications.

Gilligan, Gary M.1987 A cross-linguistic approach to the pro-drop parameter. Ph. D. diss.,

University of Southern California.

Giorgi, Alessandra, and Fabio Pianesi1997 Tense and Aspect: From Semantics to Morphosyntax. Oxford: Oxford

University Press.

Greenberg, Joseph H.1963 Some universals of language with special reference to the order of

meaningful elements. In Universals of Language, Joseph Greenberg(ed.), 73–113. Cambridge, MA: MIT Press.

Guilfoyle, Eithne1990 Functional categories and phrase structure parameters. Ph. D diss.,

McGill University.

Haider, Hubert1993 Principled variability: Parameterization without parameter fixing. In

The Parametrization of Universal Grammar, Gisbert Fanselow (ed.),1–16. Amsterdam: John Benjamins.

Harris, Tony, and Ken Wexler1996 The optional-infinitive stage in Child English: Evidence from nega-

tion. In Generative Perspectives on Language Acquisition: EmpiricalFindings, Theoretical Considerations, and Crosslinguistic Compar-isons, Clahsen, Harald (ed.), 1–42. Amsterdam: John Benjamins.

Hawkins, John A.1983 Word Order Universals. New York: Academic Press.

Hawkins, John A.1994 A Performance Theory of Order and Constituency. Cambridge: Cam-

bridge University Press.

Hawkins, John A.2004 Efficiency and Complexity in Grammars. Oxford: Oxford University

Press.


Huang, C.-T. James1982 Logical relations in Chinese and the theory of grammar. Ph. D. diss.,

MIT.

Huang, C.-T. James1984 On the distribution and reference of empty pronouns. Linguistic In-

quiry 15: 531–574.

Hyams, Nina M.1986 Language Acquisition and the Theory of Parameters. Dordrecht: Rei-

del.

Kayne, Richard S.1984 Connectedness and Binary Branching. Dordrecht: Foris.

Kayne, Richard S.2000 Parameters and Universals. Oxford: Oxford University Press.

Lakoff, George1965/1970 Irregularity in Syntax. New York: Holt, Rinehart, and Winston.

Lasnik, Howard, and Mamoru Saito1984 On the nature of proper government. Linguistic Inquiry 15: 235–290.

Longobardi, Giuseppe2003 Methods in parametric linguistics and cognitive history. Linguistic

Variation Yearbook 3: 101–138.

Manzini, M. Rita, and Kenneth Wexler1987 Parameters, binding, and learning theory. Linguistic Inquiry 18: 413–

444.

Mazuka, Reiko1996 Can a grammatical parameter be set before the first word? Prosodic

contributions to early setting of a grammatical parameter. In Signal toSyntax: Bootstrapping from Speech to Grammar in Early Acquisition,Jerry L. Morgan, and Katherine Demuth (eds.), 313–330. Mahwah,NJ: Erlbaum.

McCloskey, James1997 Subjecthood and subject positions. InAHandbook of Theoretical Syn-

tax, Liliane Haegeman (ed.), 197–236. Dordrecht: Kluwer.

McCloskey, James2002 Resumption, successive cyclicity, and the locality of operations. In

Derivation and Explanation, Samuel David Epstein, and Daniel See-ley (eds.), 184–226. Oxford: Blackwell.

Meisel, Jürgen, and N. Müller1992 Finiteness and verb placement in early child grammars. In The Acqui-

sition of Verb Placement: Functional Categories and V2 Phenomena


in Language Acquisition, Jürgen Meisel (ed.), 109–138. Dordrecht:Kluwer.

Muysken, Pieter, and Paul Law2001 Creole studies: A theoretical linguist’s field guide. Glot International

5: 47–57.

Newmeyer, Frederick J.2004 Against a parameter-setting approach to language variation. Linguis-

tic Variation Yearbook 4: 181–234.

Newmeyer, Frederick J.2005 Possible and Probable Languages: A Generative Perspective on Lin-

guistic Typology. Oxford: Oxford University Press.

Nishigauchi, Taisuke and Thomas Roeper1987 Deductive parameters and the growth of empty categories. In Param-

eter Setting, Thomas Roeper, and Edwin Williams (eds.), 91–121.Dordrecht: Reidel.

Ouhalla, Jamal1991a Functional Categories and Parametric Variation. London: Rout-

ledge.

Ouhalla, Jamal1991b Functional categories and the head parameter. Paper presented at The

14th GLOW Colloquium.

Pesetsky, David M., and Esther Torrego2001 T-to-C movement: Causes and consequences. In Ken Hale: A Life

in Language, Michael Kenstowicz (ed.), 355–426. Cambridge, MA:MIT Press.

Pica, Pierre2001 Introduction. Linguistic Variation Yearbook 1: v–xii.

Pierce, Amy1992 Language Acquisition and Syntactic Theory: A Comparative Analysis

of French and English Child Grammars. Dordrecht: Kluwer.

Poletto, Cecilia2000 The Higher Functional Field: Evidence from Northern Italian Di-

alects. Oxford: Oxford University Press.

Radford, Andrew1994 Clausal projections in early child grammars. Essex Research Reports

in Linguistics 3: 32–72.

Rizzi, Luigi1978 Violations of the wh-island constraint in Italian and the subjacency

condition. In Montreal Working Papers in Linguistics No 11, C. Du-


buisson, David Lightfoot, and Y. C. Morin (eds.), 49–76. - Reprintedin Luigi Rizzi, Issues in Italian Syntax. Dordrecht: Foris. 1982)

Rizzi, Luigi1982 Issues in Italian Syntax. Dordrecht: Foris.

Rizzi, Luigi1997 The fine structure of the left periphery. In Elements of Grammar:

Handbook of Generative Syntax, Liliane Haegeman (ed.), 281–337.Dordrecht: Kluwer.

Roeper, Thomas, and Jill De Villiers1994 Lexical links in thewh-chain. In Syntactic Theory and First Language

Acquisition: Cross-linguistic Perspectives. Vol. 2: Binding, Depen-dencies, and Learnability, Barbara Lust, Gabriella Hermon, and Jak-lin Kornfilt (eds.), 357–390. Hillsdale, NJ: Erlbaum.

Safir, Kenneth J.1985 Syntactic Chains. Cambridge: Cambridge University Press.

Sapir, Edward1921 Language. New York: Harcourt, Brace, and World.

Siewierska, Anna, and Dik Bakker1996 The distribution of subject and object agreement and word order type.

Studies in Language 20: 115–161.

Smith, Neil, and Annabel Cormack2002 Parametric poverty. Glot International 6: 285–287.

Stromswold, Karin1988 The acquisitional implications of Kayne’s theory of prepositions. Un-

published ms., MIT.

Stromswold, Karin1989 Using naturalistic data: Methodological and theoretical issues (or how

to lie with naturalistic data). Paper presented at the 14th Annual Bos-ton University Child Language Conference, October 13–15, 1989.

Stromswold, Karin1990 Learnability and the acquisition of auxiliaries. Ph. D. diss., MIT.

Sugisaki, Koji, and William Snyder2001 Preposition stranding and double objects in the acquisition of English.

Proceedings of the Second Tokyo Conference on Psycholinguistics,209–225.

Thráinsson, Höskuldur1996 On the (non)-universality of functional categories. In Minimal Ideas:

Syntactic Studies in the Minimalist Framework, Werner Abraham,Samuel David Epstein, Höskuldur Thráinsson, and C. Jan-WouterZwart (eds.), 253–281. Amsterdam: John Benjamins.


Travis, Lisa1989 Parameters of phrase structure. In Alternative Conceptions of Phrase

Structure, Mark R. Baltin, and Anthony S. Kroch (eds.), 263–279.Chicago: University of Chicago Press.

Valian, Virginia V.1990 Logical and psychological constraints on the acquisition of syntax.

In Language Processing and Language Acquisition, Lyn Frazier, andJill De Villiers (eds.), 119–145. Dordrecht: Kluwer.

Valian, Virginia V.1991 Syntactic subjects in the early speech of Italian and American chil-

dren. Cognition 40: 21–81.

van Riemsdijk, Henk1978 A Case Study in Syntactic Markedness: The Binding Nature of Prepo-

sitional Phrases. Dordrecht: Foris.

Veenstra, Tonjes1996 Serial verbs in Saramaccan: Predication and creole genesis. Ph. D.

diss., Leiden University.

Verrips, M., and Jürgen Weissenborn1992 The acquisition of functional categories reconsidered. Ms.

Zanuttini, Raffaella2001 Sentential negation. In The Handbook of Contemporary Syntactic

Theory, Mark Baltin, and Chris Collins (eds.), 511–535. Oxford:Blackwell.

Remarks on three approaches to exceptionalityin syntactic typology

Artemis Alexiadou

1. Introduction

Newmeyer (this volume) contrasts three approaches to handling exceptional-ity in syntactic typology: The ‘macroparametric’ approach associated with theGovernment-Binding Theory (GB); the ‘microparametric’ approach associatedwith the Minimalist Program (MP); and an extrasyntactic approach, in whichparsing and other performance principles account for typological variation andexceptions to typological generalizations. He argues in detail that the extrasyn-tactic approach is best motivated.

Neymeyer’s paper proposes to change the general theoretical assumptionsaccording to which certain phenomena are exceptional. The main intuition isthat typological generalizations are not free of exceptions. This is unexpectedunder a parametric approach, while it is expected under a performance basedapproach, since the domain of performance is less constrained. Since the ex-ceptional or not status of grammatical patterns is highly dependent on the the-oretical framework one assumes, a change of assumptions leads to a differentpicture and analysis of the empirical data.

But is this a necessary step in order to deal with exceptions? Roberts andHolmberg (2005) have already raised several points of criticism against theparsing approach, among which the mere fact that it is difficult to evaluate.Since I completely agree with their points, in this brief commentary I wouldlike to concentrate on two phenomena discussed by Newmeyer, as providing ev-idence both against the macro-parameter approach and the micro-parameter ap-proach. These are the null subject parameter and adjective placement crosslin-guistically. I show that a correct examination of the data and of the claims madeby the authors working within the parameters model does not necessarily leadto the conclusions Newmeyer drew.

284 Artemis Alexiadou

2. Null subject parameter

Newmeyers’s point is that the correlations proposed by Rizzi (1982) in connec-tion with the null subject parameter have been shown not to hold. As Robertsand Holmberg (2005) point out, the version of the parameter adopted by New-meyer is that the possibility of null thematic subjects in tensed clauses, nullnon-thematic subjects, free subject inversion and apparent that-trace effect vi-olations were typologically connected. Naturally the strongest hypothesis is thatany language must have all or none of these properties. Newmeyer cites Gilli-gan’s (1987) study as showing that these tight correlations do not actually hold.He further mentions languages such as Brazilian Portuguese and Chinese whichlack subject inversion but are still considered to be null subject languages, asexamples illustrating the falsity of the proposals made within the parametersframework.

There are two remarks here. First, what needs to be stresssed as far as Gilli-gan’s study is concerned is that it did not show that no correlation is possible,it only showed that a different arrangement exists than perhaps the one initiallyassumed. The fact that the correlations go a different way does not falsify thevalidity of Rizzi’s claims, instead it supports it.

Second, while it is true that not all pro-drop languages have identical prop-erties, it is a bit puzzling to mention as counterexamples to the generalizationtwo languages for which it can be established that they are non pro-drop. ThatBrazilian Portuguese is not a pro-drop language can be seen in (2), where thepresence of an overt subject is necessary, although it has been already men-tioned in the discourse; see Britto (2000), where the data come from. Data suchas (2) are not found in languages like Spanish or Greek, which are characterizedas null subject languages.

(1) O JoãoJoão

vaiwill

trazerbring

athe

salada?salad

(2) O João,João

Othe

VINHOwine

*pro/ele vaiwill

trazerbring

If Brazilian Portuguese is not a pro-drop language, then the fact that it lackssubject inversion it is not so surprising.

As for Chinese, it is debatable whether such languages are subject drop orrather topic drop, if the latter holds they should not be analysed on a par withlanguages such as Spanish or Italian. In fact, Huang (1984) argued that nullsubject in Chinese are identified by an NP in a superordinate clause, while others

Remarks on three approaches to exceptionality in syntactic typology 285

have argued that Chinese pro-drop is actually Topic NP deletion. Hence againhere the fact that this language lacks inversion is not surprising.

3. Adjective placement crosslinguistically

One other concrete case that Newmeyer discusses in order to present argumentsagainst microparametric approaches involves adjective placement facts of thetype discussed in Cinque (1994). The data below illustrate some differencesbetween French and English:

(3) a. un gros ballon rougeb. ‘a big red ball’

(4) a. un tissu anglais cherb. ‘an expensive English fabric’

(5) a. an old friend (= friend who is aged/friend for a long time)b. une vieille amie (= friend for a long time)c. une amie vieille (= friend who is aged)

Newmeyer briefly summarizes Cinque’s original proposal. The pattern in (3) isto be understood as resulting from the parametric availability of N-movement:N-movement takes place in French, but not in English. The facts in (4) can bemade sense of if N-movement takes place in French but not in English. In (5)we see that both pre-nonminal and post-nominal adjective placement is avail-able in French. Newmeyer says that for Cinque “the two positions for vieillein French, but only one for old in English, result from a parametric differencebetween the two languages regarding the feature attraction possibilities of func-tional categories in the two languages”.

To begin with, the English facts in (5a) are discussed in Larson (1998) whomakes the point that the two readings cannot be result of the same base structure.Larson argues in detail that in examples like this it is the properties of the N thatgive rise to ambiguity. In particular, when a noun contains an event argumentthe adjective can be conceived of as modifying the event, and the individual towhich a certain property is being attributed.

Others have made the point that Cinque’s particular analysis of this set ofdata is not efficient. Apart from the fact that Cinque himself (2005) has revisedhis analysis, other authors have emphasized that the pattern in (5) cannot bemade sense by appealing to N-movement. Rather what is required is to associatepatterns such as the above with two different syntactic patterns for modification(see Alexiadou 2001 and Larson 2004). On this view, a version of which is

286 Artemis Alexiadou

pursued by Cinque in his most recent work, UG makes two structures availablefor modification, and different options are available in the different languages.The one structure involves a relative clause (building on Kayne 1994 and Jacobsand Rosenbaum 1968), giving rise to N-Adj orders in languages like Romance,and the other structure involves so called direct modification, i.e. some form ofA-N compound formation.

While it is correct that Cinque’s original account does not explain why N-Aorders are more common, two observations are in order here. First, Cinque’saccount did not aim at explaining this. Second, if N-A orders are the result ofrelative clause formation, we come to a different understanding of the typolog-ical tendencies. Since most languages of the world lack adjectives and makeuse of relative clauses for modification, it does not come as a surprise that ex-actly the pattern that is related to relative clause formation is the most commoncrosslinguistically.

Thus as soon as we have identified the relevant structures that are availablefor different readings we can get a better grasp of the phenomena involved andthe cause for variation.

To conclude, what the research within the parameters approach has taught usis that there is systematic variation and systematic similarity among unrelatedlanguages. It is precisely this mere fact that provides the strongest argumentpossible in favor of UG; assuming language specific rules or extrasyntactic ap-proaches would leave this factor unaccounted for or a mere accident of nature.Exceptions to the extent that they can be identified can be shown to follow fromsome other, perhaps yet un-detected correlation. Thus a change of frameworkdoes not seem a necessary step in order to deal with exceptions.

References

Alexiadou, Artemis2001 Adjective syntax and noun raising: word order asymmetries in the DP

as the result of adjective distribution. Studia Linguistica 55: 217–248.

Britto, Helena2000 Syntactic codification of categorical and thetic judgements in Brazil-

ian Portuguese. In Brazilian Portuguese and the Null Subject Param-eter, Mary Kato and E. Negrão (eds.), Frankfurt/Madrid: Vervuert/IberoAmericana.


towards Universal Grammar, Guglielmo Cinque, Jan Koster, Jean-

Remarks on three approaches to exceptionality in syntactic typology 287

Yves Pollock, Luigi Rizzi and Raffaella Zanuttini (eds.), 85–110.Washington: Georgetown University Press.

Cinque, Guglielmo2005 The dual source of adjectives and XP vs. N-raising in the Romance

DP. LSA 2005 class notes.

Gilligan, Gary M.1987 A cross-linguistic approach to the pro-drop parameter. Ph. D. disser-

tation. University of Southern California.

Huang, C.T. James1984 On the distribution and reference of empty pronouns. Linguistic In-

quiry 15: 531–574.

Jacobs, Roderick, and Peter Rosenbaum1968 English Transformational Grammar. Waltham, MA: Ginn and Com-

pany.

Kayne, Richard1994 The Antisymmetry of Syntax. Cambridge, MA: MIT Press.

Larson, Richard1998 Events and modification in nominals. In Proceedings from Seman-

tics and Linguistic Theory (SALT) VIII, Devon Strolovitch and AaronLawson (eds.). Ithaca, NY: Cornell University.

Larson, Richard2004 The projection of DP. – Talk given in the Guest lecture series at the

University of Stuttgart, January 2004.

Rizzi, Luigi1982 Issues in Italian Syntax. Dordrecht: Foris

Roberts, Ian, and Anders Holmberg2005 On the role of parameters in Universal Grammar: a reply to New-

meyer. In Organizing Grammar. Linguistic studies in honor of Henkvan Riemsdijk. Hans Broekhuis, Norbert Corver, Riny Huybregts, Ur-sula Kleinhenz and Jan Koster (eds.), 538–553. Berlin/New York:Mouton de Gruyter.

A reply to the commentary by Artemis Alexiadou

Frederick J. Newmeyer

My principal goal in ‘Three Approaches to Exceptionality in Syntactic Ty-pology’ (henceforth ‘TAEST’) was to motivate a parsing account of typolog-ical generalizations and exceptions to them. Surprisingly, Artemis Alexiadou(henceforth ‘AA’) has virtually nothing to offer by way of criticism of such anaccount, remarking that “Roberts and Holmberg 2005 have already raised sev-eral points of criticism against the parsing approach, among which the mere factthat it is difficult to evaluate” (AA). But AA is mistaken; Roberts and Holmbergoffer no criticism at all against such an approach. Their comments, like AA’s,are devoted entirely to a defense of the parametric approach. They have nothingto say about parsing.1

AA replies to two of the arguments that I advanced in TAEST against theparametric approach to typological generalizations. One of my points was tostress, using the Null-Subject Parameter as an example, that “the hoped forclustering of typological properties characterizable by a simple parameter set-ting seems not to exist”. However, AA, in her reply, does not rebut my point byproviding a version of this parameter (or any other) where we find a robust ex-ample of clustering of abstract properties. The reader is left wondering whethershe even knows of such examples.

My other argument addressed by AA was to point out that in much minimal-ist work “the word ‘parameter’ is used as nothing more than a synonym for theword ‘rule’”. My concrete example (following Bouchard 2003) was based onthe treatment of adjective placement in Cinque (1994). AA challenges Cinque’sanalysis, but does not address the broader point about parameters and rules. Inthe course of her presentation, AA hypothesizes that “N-A orders are the re-sult of relative clause formation”, remarking that “since most languages of theworld lack adjectives and make use of relative clauses for modification, it doesnot come as a surprise that exactly the pattern that is related to relative clauseformation is the most common crosslinguistically”. On the other hand, A-N or-ders “involve so called direct modification, i.e. some form of A-N compound

1. For a reply to Roberts and Holmberg see Newmeyer (2005).


formation”. Even if AA’s facts were correct, the conclusion would be a nonsequitur. One could argue just as easily that N-A orders should be rare, sincethere is no semantic or discourse need for such orders, given the availabilityof the relative clause option. And A-N orders should be common, since, beingquite different in meaning from A-N compounds, they fill a semantic gap. Butin any event, AA’s facts are wrong. In his introductory overview to the mostextensive work on adjectives to date, Dixon (2004) argues that a formal classof ‘Adjective’ can be identified in every language in the world.

AA concludes her piece by remarking that “extrasyntactic approaches wouldleave [systematic variation and systematic similarity among unrelated langua-ges] unaccounted for or a mere accident of nature”. But the entirety of section 4of TAEST (which AA ignores) is devoted to demonstrating that an extrasyntac-tic approach accounts beautifully for systematic variation and systematic simi-larity. Parsing principles are universal, so it follows necessarily that languagesthat are unrelated genetically and/or areally would respond to such principlesidentically, as far as typological generalizations are concerned.

References

Bouchard, Denis2003 The origins of language variation. Linguistic Variation Yearbook 3:

1–41.


towards Universal Grammar, Guglielmo Cinque, Jan Koster, Jean-Yves Pollock, Luigi Rizzi and Raffaella Zanuttini (eds.), 85–110.Washington: Georgetown University Press.

Dixon, R.M.W.2004 Adjective classes in typological perspective. In Adjective Classes:

A Cross-Linguistic Typology, R.M.W. Dixon and Alexandra Y.Aikhenvald (eds.), 1–49. Oxford: Oxford University Press.

Newmeyer, Frederick J.2005 Newmeyer’s rejoinder to Roberts and Holmberg on parameters.

[http://ling.auf.net/lingBuzz/000248]

Roberts, Ian and Anders Holmberg2005 On the role of parameters in Universal Grammar: A reply to New-

meyer. In Organizing Grammar: Linguistic Studies in Honor of Henkvan Riemsdijk, Hans Broekhuis, Norbert Corver, Riny Huybregts, Ur-sula Kleinhenz and Jan Koster (eds.). Berlin/New York: Mouton deGruyter.

Three types of exceptions – and all of themrule-based

Sam Featherston

Abstract. A basic premise of this paper is that a simpler grammar is a more adequateone, and that exceptions are thus undesirable. We present studies concerning three differ-ent grammatical structures which contain phenomena standardly regarded as exceptions,and show how, in all three cases, the attribution of the status as an exception was un-necessary. In each case, the collection of better data and the explanatory advantages offirstly, a model of gradient grammaticality and secondly, the distinction between the ef-fects of the grammar and the effects of production processing, reveal the phenomenonto be rule-governed.∗

1. Introduction

For a model of generative grammar, exceptions are anathema. The overridingaim of the generative project is to attain explanatory adequacy, specifically, toaccount for the fact that most three-year-olds exhibit more grasp of the languagesystem than linguists have been able to gain in decades of research effort. Thestandard account of this is to assume that the acquisition task must be muchsimpler than the task of description facing the linguist. The research programmethus consists of the ambition to design or discover a grammatical system sosimple that it can realistically either be acquired by a toddler or else be part ofthe human genetic inheritance. This simplicity criterion forces the generativelinguist to assume that the basis of linguistic patterning is a rule system, whichmust operate blindly and indiscriminately, the apparent complexity of languagespringing from the interactions of these wider generalizations.

* This work took place within the project Suboptimal Syntactic Structures of theSFB441, supported by the Deutsche Forschungsgemeinschaft. Thanks are due toproject leader Wolfgang Sternefeld, Tanja Kiziak and to Frank Keller for WebExp.Thanks too to Horst Simon and Heike Wiese for comments and arranging the work-shop. All remaining weaknesses are my own.

292 Sam Featherston

Exceptions are the deadly enemy of simplicity, since they are by definitionnot rule-governed, and must be memorized or processed individually, whichcomplicates both acquisition and use. For this reason, linguistic phenomenawhich offer apparent exceptions must be regarded as problem cases: the idealgrammar should be exceptionless.

Unfortunately for generative grammar, linguistic data is peppered with ex-ceptions, and most grammars can only deal with generalizations about the ob-served data, rather than with the raw data itself. In order to address this, linguis-tic theory has tended to work on idealized data, which does not show so manyexceptions. There is nevertheless significant interest in dealing with this man-ifest problem, as the papers in this volume demonstrate. There is good reasonfor this interest, since alternative, non-generative analyses are breathing downour neck. For as soon as the weight of learning exceptions simply by expo-sure reaches a certain point, the case for rules breaks down. If we can learn somuch by simple exposure and frequency, then we can do without the additionalmechanism of rules, since regularities can be seen as mere epiphenomena oflocal probabilities, the argument runs (e.g. Bybee and Hopper 2001).

In this paper we shall attempt to show that many apparent exceptions are infact rule-governed phenomena. We do this by addressing three different sortsof exceptions, and set forth how, in our example phenomena, characteristicsof data and the assumptions of linguistic theory are obscuring regularities. Ourconclusion will be that at least these parts of the primary linguistic data aremore exceptionless and rule-governed than they appear. Our findings can thusbe seen as supporting the generative approach, as the problem of exceptions isless severe than is often thought.

There are three keys to this identification of wider, unrecognized general-izations. First we must pay far more attention to the data base on which weconstruct our theories, taking both judgements of well-formedness and corpus-derived frequencies into account, and looking at both in detail. This approachmakes it clear that far more factors play a role in influencing the perceived well-formedness of a structure than is generally assumed, and multiple factors can becausing apparent differences between even a minimal pair of structures. Withthis insight, certain phenomena thought to be showing exceptional behaviourcan be seen to be rule-governed, but the basket of factors affecting the phe-nomenon is larger than had been thought.

The next step is motivated by findings about the contrast of frequencyand judgement data: structures can appear although they are relatively poorly-formed, or not occur even though they are relatively well-formed. From thisfinding, we conclude that an empirically adequate architecture of grammar re-quires us to distinguish two separate modules of the grammar: Constraint Ap-

Three types of exceptions – and all of them rule-based 293

plication and Output Selection. The first is responsible for the determination ofwell-formedness, the second selects structures for output, operating competi-tively on the basis of well-formedness weightings. When we have differenti-ated these, we see that the exceptional occurrence of some structures generallycategorized as ungrammatical becomes explicable and compatible with a blindand exceptionless rule system.

These steps additionally permit us to rethink our concept of well-formednessand consider more carefully what happens when a structure breaks a rule. Itappears that such violations do not directly exclude the structure from being partof the language, but merely reduce the well-formedness of the structure. Thisrequires us to accept that there is a variable of violation cost, which can varyover constraints. Every violation cost naturally leads to the structure being lesslikely to occur, but the link between violation and non-occurrence is not direct,but rather mediated by cumulative, gradient well-formedness. It follows alsothat the rules we find in the grammar are less like (1), rather they resemble (2).

(1) Structure XYZ does not occur in / is no part of language L.

(2) Structure XYZ in language L incurs the violation cost in well-formed-ness V.

These changes in the architecture of the grammar prove their value by allowingus to account for phenomena which on current assumptions about the architec-ture of the grammar appear to be exceptions.

The net effect of these example studies is to show that the use of inadequatedata and erroneous assumptions about the architecture of linguistic theory arethrowing up ‘ghost’ exceptions, which in reality are none. We aim to show thatthe grammar contains fewer exceptions and that grammatical rule systems havegreater coverage than is at first glance apparent.

1.1. Experimentally obtained judgements

A crucial factor in our argumentation will be the collection of more detailed in-formation about the perceived well-formedness of key examples and examplesets. To allow us to concentrate on the data at hand in the individual studiesbelow, we shall briefly outline our data collection method here. We gather thisjudgement data using a variant of the magnitude estimation methodology (Bardet al. 1996). This is a procedure for obtaining judgements from naive informantswith the greatest possible degree of differentiation and reliability. It varies fromthe simple elicitation of standard categorical judgements (“Is this grammati-cal or not?”) in several ways. First, subjects are asked to provide purely rela-

294 Sam Featherston

tive judgements: at no point is an absolute criterion of grammaticality applied.Judgements are relative to a reference example and the informant’s own previ-ous judgements. Second, all judgements are proportional; i.e. subjects are askedto state how much better or worse sentence A is than sentence B. Next, the scalealong which judgements are made is open-ended: subjects can always add anadditional higher or lower score. Last, the scale has no minimum division: par-ticipants can always place a score between two previous ratings. The task thushas the form “You gave this example 20 and that example 30, so how muchwould you give this one?” The result is that subjects are able to produce judge-ments which distinguish all the differences in well-formedness they perceivewith no interference from an imposed scale.

This approach produces judgement data of much higher definition and qual-ity than traditional techniques and permits much greater insights into the factorswhich affect perceived well-formedness (e.g. Cowart 1997, Keller 2000). Weshall argue that many apparent exceptions in the grammar are only epiphenom-ena of inadequate data, a problem compounded by the inappropriate idealiza-tion of data and the insufficiently articulated model of the grammar which hasresulted from this data poverty.

The location of the first exception type is within the grammar, and it con-sists of a phenomenon which appears to satisfy the structural description for theapplication of the Binding Conditions, but where they seem nevertheless not toapply. The second exception type is a language. What shall we make of it whenapparent cross-linguistic generalizations seem not to apply to a particular lan-guage? Can a language be an exception? Our third and last exception type is thatof the structure which appears in language output even though there are well-recognized restrictions which should forbid it. Briefly, why do ungrammaticalstructures occur? What mechanism permits exceptional occurrence?

2. Exception type one: incomplete generalization

Our first case study concerns anaphoric binding in object coreference construc-tions in German, a set of phenomena characterized as a “problem” for gener-ative grammar (Grewendorf 1985, cf. the related “dilemma” for transforma-tional grammar, Reis 1976). This phenomenon is exceptional because the sim-ple Binding Conditions (Chomsky 1981) seem not to work in these cases. Thisis therefore an exception which consists of an apparent limit to the generalityof application of a rule within the grammar.

The third-person reflexive sich in German can have either dative or ac-cusative case. It regularly occurs when the subject and a subsequent other


clause-mate NP are coreferent, in line with Binding Condition A. When twonon-subject NPs are coreferent, however, the facts become much less clear.It might be expected that the relationship between direct and indirect objectswould be either symmetrical, so that each of the two would systematically c-command, and be able to bind, the other, or else asymmetrical, so that only onewould be able to bind the other. This issue is potentially of great interest, be-cause of the insight that it offers into the relative hierarchical positions of theseconstituents and thus the structure of the clause.

Unfortunately the data offers a much less clear picture than might be hoped,and coreference structures in German which do not involve a subject all seemrather marked. Authors have suggested a number of ways in which the Bind-ing Conditions might be improved in order to account for the data. Grewendorf(1985, cf. Primus 1987) attributes the restrictions on the binding of reflexives byobjects to a hierarchy of grammatical functions, arguing that the binder of a pairmust always be higher up in the hierarchy than the bindee. Since most bindersare subjects and the most oblique grammatical functions are never binders, thisworks well for the vast majority of the time, but it also predicts that direct ob-jects and indirect objects should clearly either bind or fail to bind each other(depending on where these functions are located in the hierarchy). Grewendorf(1988) argues that this is the case, and offers the following judgements.

(3) Derthe

Arztdoctor

zeigteshowed

denthe

Patienteni

patient.accsichi /*ihmi

REFL/PRN.datimin.the

Spiegel.mirror

(4) Derthe

Arztdoctor

zeigteshowed

demthe

Patienteni

patient.dat*sichi /ihni

REFL/PRN.accimin.the

Spiegel.mirror

Example (3) indicates that a dative reflexive may be bound by an accusativebinder, but a dative pronominal cannot be. Example (4) shows that an accusativereflexive cannot be bound by a dative antecedent, but that an accusative pronom-inal can. This account is superficially attractive since it links in to the nounphrase accessibility hierarchy, which has been advanced for other purposes(Keenan and Comrie 1977). In addition, it captures the three-way split betweensubject, objects and obliques quite well.

However other authors have analysed these structures quite differently:Sternefeld for example contests some of the judgements (Sternefeld andFeatherston 2003), while Reinhart and Reuland (Reinhart and Reuland 1993;

296 Sam Featherston

Reuland and Reinhart 1995) offer a completely different account of reflexivity.They distinguish between simplex SE-type and complex SELF-type anaphors,and argue that the former are in fact pronouns, not reflexives, but that they canoccur as co-arguments where pronouns cannot because they are underspecifiedfor phi-features, which makes them non-referential and thus feasible feet ofchains. Both of these types appear as sich in German, though they have differ-ent forms cross-linguistically. The disjoint distribution of SE-and SELF-typeanaphors is achieved by a rewriting of Binding Condition B, which stipulatesthat a semantically reflexive predicate must be reflexive-marked. Predicateswhich are inherently reflexive are lexically reflexive-marked (5), other pred-icates must be reflexive-marked by having a SELF-anaphor as an argument (6).

(5) MaxMax

benimmt/schämtbehaves/shames

sichSE-REFL

‘Max behaves himself /is ashamed’

(6) MaxMax

hasst/liebthates/loves

sichSELF-REFL

One final puzzling feature of these constructions is noted by Elena Anagnos-topoulou (pc). It seems that pronominals are better than full NPs as antecedentsin these structures, so that (7) is less acceptable than (8):

(7) ?Diethe

Friseurinhairdresser

zeigteshowed

demthe

Kundeni

customersichı

himselfimin.the

Spiegel.mirror

(8) Diethe

Friseurinhairdresser

zeigteshowed

ihmi

himsichi

himselfimin.the

Spiegel.mirror

Now the NP type of the antecedent is not generally thought to play a role in bind-ing structures, which implies that we have not yet attained a full understandingof the issues. In the light of this confusing situation in which the standard Bind-ing Conditions appear not to hold, Featherston and Sternefeld (2003) carried outan experimental study eliciting judgements on the relevant structures to obtaina clearer view of what is happening.

2.1. Investigating object coreference in German

In this study we used a variant of the magnitude estimation methodology asdescribed above. We tested sixteen conditions, of which we shall report justeight here for illustrative purposes (see Featherston and Sternefeld 2003 for fulldetails). These eight structures varied on three binary parameters: antecedentNP type (pronoun, full NP), relative linear order of dative and accusative case


Table 1. Eight syntactic conditions tested in our first experiment on object coreference.The sense is always the same. Note that “>“ means “linearly precedes”.

Code Antecedent Case order Anaphor Structure formndr NP dat>acc reflexive dem NP sich selbst gezeigtndp NP dat>acc pronoun dem NP ihn selbst gezeigtnar NP acc>dat reflexive den NP sich selbst gezeigtnap NP acc>dat pronoun den NP ihm selbst gezeigtpdr pronoun dat>acc reflexive ihm sich selbst gezeigtpdp pronoun dat>acc pronoun ihm ihn selbst gezeigtpar pronoun acc>dat reflexive ihn sich selbst gezeigtpap pronoun acc>dat pronoun ihn ihm selbst gezeigt

(dat>acc, acc>dat) and anaphor type (reflexive, pronoun). We present the formsof these conditions in Table 1.1

The high quality data we collected under strictly controlled conditions al-lows us to distinguish the various factors affecting the judgements of thesestructures. We present (a subset of) the results in Figure 1, which shows themean normalized grammaticality judgement score and 95 % confidence intervalfor each experimental condition. The error bars show the confidence intervals,their midpoints show the mean values. The syntactic conditions are arrangedalong the horizontal axis. Let us briefly recall the form which this data typetakes. In our experiment we collected judgements of “naturalness”, expressedin numerical form, and anchored by reference to other judgements, but with noreference to a concept of absolute (un)grammaticality. So results graphs suchas Figure 1 show us how good or bad subjects judged example structures tobe, with higher numerical judgements indicating that a structure is more natural(up on the graph). The zero on the scale shows the mean of all judgements. Thesyntactic conditions on the horizontal axis are identified by the codes shown inTable 1.

Looking at the graph we can clearly see three effects at work. All effectsthat we mention here are statistically significant (see Featherston and Sterne-feld 2003). The first effect relates to the NP type of the antecedent (full NP vspronoun). The four conditions on the right-hand side of the chart (whose codesbegin with a p) have pronouns as antecedents. They are judged better than those

1. The codes for the conditions are made up as follows. The first letter indicates theantecedent type (NP or pronoun), the second letter shows the case of the antecedent(dative, accusative) and the third letter specifies the anaphor type (reflexive, pro-noun).

298 Sam Featherston

Figure 1. Results of experiment on object coreference, showing mean normalizedjudgement scores by syntactic variant of the object coreference structures.Higher scores indicate judgements that the structure is “more natural”, butsince these are relative judgements, the precise scores refer only to the groupmeans.

on the left-hand side which have full lexical NPs as antecedents (and thus havecodes beginning with n). The reason for this is simple and it has nothing todo with binding. Since an antecedent normally linearly precedes an anaphor(in this experiment they always do), and an anaphor is necessarily a pro-form(i.e. not a full NP), it thus follows that when the antecedent is a full NP, thena word order restriction is violated, namely, that light (short) sister-like con-stituents linearly precede heavier (longer) ones, for whatever reason (Behaghel1909; Lenerz 1977). The finding is therefore real but irrelevant to conditions onbinding, since it relates to linear ordering preferences.

The second visible effect is a case order effect. Of each minimal pair ofstructures, that with dative antecedent and accusative anaphor is judged betterthan that with accusative antecedent and dative anaphor. To see this we comparethe first and second conditions from the left (with second letter d in their codes)with the third and fourth (with second letter a), and similarly the fifth and sixthwith the seventh and eighth. This preference has a robust effect in the data, butwe should be clear that it too has probably nothing to do with binding, since apreference for datives to linearly precede accusatives can be found in structureswithout object coreference (Behaghel 1932; Lenerz 1977; Uszkoreit 1986), asour own studies have replicated in this data type. The fact that exactly thoseexamples in which the antecedent is dative and the anaphor is accusative arejudged better is thus merely an epiphenomenon; the interaction of two factors:


first, that antecedents more naturally linearly precede anaphors, and second, thatdatives more naturally linearly precede accusatives, in the default case. Here tootherefore we find no specific effect of binding or coreference.

The third effect we see in this data is that of the anaphor type (reflexive vspronoun). In each of the four adjacent minimal pairs of conditions in the chart,the first, with a reflexive as anaphoric element (and with third letter r in itscode), is judged clearly better than the second of the pair, which has a pronoun(and third letter p in its code). This is exactly what Binding Condition B wouldpredict, since there is an accessible binder. In fact the pronouns are rather bet-ter than one might have predicted, but this is perhaps because these anaphoricelements are all followed by an adverbial selbst (‘self’) (see forms in Table 1)which improves the reflexivizability of the pronoun (how this works is contro-versial, e.g. Primus 1992).

Let us sum up. In the experiment which we present a part of here, we showedthat there are a number of factors affecting the apparent well-formedness of ob-ject coreference structures, but none of them relate specifically to binding ex-cept the known Binding Conditions A and B, which are fully operative here.There is thus no reason to see this data set as being in any way an exceptionto binding theory. Our experimentally obtained judgements demonstrate thatthe standard binding constraints apply here, but other irrelevant but neverthe-less systematic constraints operating cumulatively (as Keller 2000 so clearlydemonstrates) are confusing the picture. There is in this data therefore no prob-lem of generative grammar, no dilemma and no exception. The simple picturewas being obscured by the large number of additional factors affecting thesestructures.

Improved data collection techniques, making use of the great strides for-ward which have been achieved in the gathering and analysis of judgements,can reveal the predictions of generative theory to be validated. This situation,in which irrelevant factors are fogging the wider picture, is much more com-mon than is generally realised. In particular, many syntacticians still have failedto integrate into their perception of judgement data the finding that grammati-cal (and other) constraints operate cumulatively – the difference between twostructures is often not just the effect of one constraint but of several of themadditively. Irrelevant factors can tip a structure into apparent ungrammaticalitywhich would without these noise factors be good (cf. Sternefeld 2000). Failureto recognize these facts can result in exceptions being mistakenly identified.Syntacticians’ most effective weapon against being misled into identifying ex-ceptions is improved data with much finer differentiation, and the testing of setsof materials. This approach allows us to distinguish between the effects of thephenomenon we are interested in and other irrelevant factors.

300 Sam Featherston

3. Exception type two: cross-linguistic variation

Our second study relates to a difference between languages. While languagessuch as English exhibit island constraints, in others, such as German, they arenot so apparent. This finding raises the question whether German should beseen as an exception, because it does not conform to the expectations whichthe analysis of restrictions like the Empty Category Principle (ECP) as univer-sals give rise to. The ECP is an account of such phenomena as subject-objectasymmetries, operating at an abstract level. As such it can and perhaps mustbe hypothesized to be universal. But this means that German, in which theseeffects do not seem to appear, would have to be regarded as an exceptional lan-guage (cf. Haider 1993, preface). This needs to be explained within the systemof universals, and in fact it is precisely this sort of finding which motivatedthe inclusion of parameters in the Principles and Parameters model (Chomsky1981). Parameters are the escape hatch which permit inter-language exceptionsto be accounted for: not the effects of the ECP, but the option of having theECP is universal, on this analysis. This approach to cross-linguistic variation,although tenable, must be recognized to be weaker than a position in whichrestrictions apply across language without exception.

We shall argue here that German is not an exceptional language, and moregenerally sketch out why we think that the whole treatment of inter-languageexceptional behaviour using the mechanism of parameters is unnecessary. Todo this we shall draw on data from our studies of the whole group of phenomenagathered under the heading of “constraints” in the sense of Chomsky (1973) andRoss (1967), that is, those limitations upon structure which are insufficientlygeneral to be regarded as rules. We shall note that such constraints, frequentlyisland constraints, are the archetypal structural exceptions in the grammar: suchconstraints are postulated for exactly those effects which are not otherwise pre-dicted. The use of such mechanisms presupposes some form of an ‘overgenerateand filter’ grammar architecture, which is itself uneconomical, since it requirestwo component parts to the grammar with quite different functions and charac-teristics.

3.1. Superiority in German and English

In English, while multiple wh-questions are generally possible, it has been notedthat certain wh-items cannot be moved to the initial position when certain othersare in situ (e.g. Chomsky 1973). Most generally it can be stated that in-situ wh-subjects are not possible when other wh-items are in raised position. So while(9a) is a perfect sentence of English, (9b) would not normally occur at all. The


precise motor of this effect is to this day unclear (see Ginzburg and Sag 2000for thoughtful discussion), but we can distinguish two groups of grammaticalaccounts. Chomsky (e.g. 1993) has suggested that it is an economy effect, andthat structurally more distant wh-items cannot move to satisfy feature require-ments when a closer one could do so as well. The alternative is the ECP (e.g.Lasnik and Saito 1984), which has accounted for this and other asymmetries be-tween subjects, objects, and adjuncts with restrictions on which positions canbe lexically and antecedent governed.

(9) a. Who bought what to the party?b. *What did who bring to the party?

The situation in German is different. The consensus position has been that theequivalent effect does not occur in German, for instance: “German lacks the setof simple ECP effects like superiority, … *[that-t]-effects …” Lutz (1996: 35).This is illustrated in the examples below, where (10a) is the most normal formbut (10b), unlike (9b), can be fairly readily found.

(10) a. Werwho

hathas

waswhat

zurto.the

Partyparty

gebracht?brought

b. Waswhat

hathas

werwho

zurto.the

Partyparty

gebracht?brought

To test for superiority, we applied our magnitude estimation methodology totwenty-six different multiple wh-question structures, hoping to establish whe-ther German has such an effect, and if so, which combinations of grammati-cal functions as wh-items would trigger it (for detail: Featherston 2005). Thetwenty-six multiple wh-questions consisted of every pair of wh-subject wer(“who”), wh-direct object was (“what”), d-linked wh-direct object welches X(“which X”), wh-indirect object wem (“to whom”), d-linked wh-indirect objectwelchem X (“to which X”), and wh-adjunct wann (“when”). We present therelevant part of the results in Figure 2.

This graph shows the same data type and uses the same conventions as Fig-ure 1 above; mean normalized judgements are measured on the vertical scale,syntactic conditions are distinguished on the horizontal scale. In this chart wehave grouped the conditions by the in-situ wh-item. It is clear that the group ofconditions represented by the left-most error bar, which is all those conditionswith in-situ wh-subjects, is judged worse than all the others. There is a degreeof variation between the other (groups of) conditions, but this is due to inde-pendent effects which need not concern us here (for full details see Featherston2005). Since the presence in-situ of a wh-subject when another wh-item is in

302 Sam Featherston

Figure 2. Results of experiment on superiority in German, showing mean normal-ized judgement scores by in-situ wh-item. Conditions with in-situ subjectsare judged significantly worse than all others. Abbreviations: wh-DO means‘bare direct object wh-item’, wx-IO means ‘which X-type indirect object wh-item’ etc.

initial position is the structural description of superiority, and the perceived un-acceptability of such examples is the characteristic symptom of the superiorityeffect, it would appear that we observe a superiority effect in German too.

In order to confirm that what we found in German was indeed the sameeffect to the phenomenon familiar from English, we repeated the experiment onEnglish data. Again we tested 26 conditions, amongst which were all possiblecombinations of wh-subjectwho, d-linked wh-subjectwhich (person) wh-directobject what, d-linked wh-direct object which (thing), wh-indirect object (to)who(m), and d-linked wh-indirect object (to) which (person). We present theresults by in-situ wh-item as before.

Figure 3 shows the judgements of multiple wh-questions in English, parallelto the findings on German in Figure 2. Again one group of structures is judgedclearly worse than all the others and again it is precisely those structures whichhave an in-situ bare wh-subject, as the superiority phenomenon describes. Thevery close correspondence of the results on German and English leaves no doubtthat the effect that we observed in German is of the same type as the superiorityeffect which we see in English.

This has clear implications but also raises important questions. The most im-portant implication for our purposes here is that German is not an exceptionallanguage, because German has the same effect that other languages have. Whytherefore has there been a consensus among linguists which doubted the exis-tence of this effect? The answer seems to be that the perceived strength of the


Figure 3. Results of experiment on superiority in English, showing mean normalizedjudgement scores by in-situ wh-item. As in German, conditions with in-situsubjects are judged clearly worse than all others. wh-DO means ‘bare directobject wh-item’, wx-IO means ‘which X-type (d-linked) indirect object wh-item’ etc.

relevant constraint is less in German than in English. Given the general assump-tion within syntactic theory that only those effects which are strong enough fortheir violation to cause absolute ungrammaticality are narrowly syntactic, andthat all weaker effects are mere markedness or stylistics, linguists have tendedto discount weaker effects as irrelevant. We can see this assumption in the ar-gumentation that syntacticians use. Haider reveals this assumption explicitlywhen he argues in a related question (1993: 159):

If clausal subjects occupy the spec-IP position in German, then the Condition onExtraction Domains forbids extraction, and that without exception. But only onesingle example is sufficient to refute this. [our translation]2

Arguments of this sort presuppose that well-formedness is dichotomous, butevidence such as our experimental data on superiority in English and Germanwould seem to show that this assumption of a binary model of well-formednessis erroneous. In this case of German superiority, the idealization to binary well-formedness is hiding generalizations.

We think that this is much more commonly the case. In our project SFB441A3 Suboptimal syntactic structures (project leader Wolfgang Sternefeld) wehave carried out numerous studies gathering introspective judgement data of

2. “Wenn im Deutschen Subjektsätze die Spec-I Position einnehmen, verbietet CEDExtraktion, und zwar ausnahmlos. Um dies zu widerlegen genügt aber schon eineinziges Beispiel.”

304 Sam Featherston

experimental quality, controlling for irrelevant factors, standardizing materials.The picture of perceived well-formedness that this data reveals is unambiguous.We consistently find that non-categorical constraints can be syntax-relevant,and indeed that the idea that any constraints are categorical is probably false.There are constraints which are sufficiently strong to appear categorical, in thesense that speakers would not choose to use structures violating them, but if wepresent informants with structures which are unambiguously ungrammatical,but nevertheless comprehensible, so that the error can be seen to be syntacticand not semantic, then they consistently rate such sentences relative to the per-ceived severity of the rule violation leading to ungrammaticality. They do notsimply reject them absolutely. These sanctions can be stronger or weaker, butare constraint-specific. They are also cumulative: a structure violating two rulesis judged worse than one which violates only one of them (Keller 2000). To il-lustrate this further we will briefly present another study on an island constraintwhich we have conducted.

3.2. The that-trace effect in German and English

This phenomenon is very clear in English. While both subject and object can beequally well extracted from a complementizerless complement clause (11ab),extraction from a clause introduced by a complementizer reveals a subject-object asymmetry: standardly the object extraction is judged acceptable (11c),but the subject extraction is much worse (11d).

(11) a. Who does Hillary think Bill loves?b. Who does Hillary think loves Bill?c. Who does Hillary think that Bill loves?d. *Who does Hillary think that loves Bill?

This effect has been tested extensively by Cowart (e.g. 1997), who has found itto be consistent and pervasive. As is the case with many extraction restrictions,the fundamental cause of the effect is obscure. The classic ECP account (Chom-sky, 1981; Lasnik and Saito, 1984) motivates the asymmetry in the same wayas the ECP-related accounts of other constraints which contain subject-objectasymmetry, such as the superiority effect. It is perhaps fair to say that we do notyet have a complete understanding or fully satisfactory account of the that-traceeffect.

This constraint has been generally held not to exist in German. There arecertain differences between embedded clause structures in English and Germanwhich make it credible that different constraints on movement apply. Most im-portantly, German complement clauses come in two types, which we shall refer


to as V-final and V2. The first type has a complementizer in initial position whilethe second never has one, and the verb in the V-final type is clause-final, whilethe verb in the V2 type is near the beginning of the clause, generally as secondconstituent after a phrasal topic. The contrast between a complement clausewith and without a complementizer is thus in German part of a larger syntacticcontrast, unlike in English where the complementizers sometimes appear to beoptional elements.

(12) a. Weni

whommeintthinks

Doris,D.

liebtloves

GerhardG.

ti?

‘Who does D. think G. loves?’b. Weri

whomeintthinks

Doris,D.

liebtloves

ti Gerhard?G.

‘Who does D. think loves G.?’c. ?Weni

whommeintthinks

Doris,D.

dassthat

GerhardG.

ti liebt?loves

‘Who does D. think that G. loves?’d. ?Weri

whomeintthinks

Doris,D.

dassthat

ti GerhardG.

liebt?loves

‘Who does D. think that loves G.?’

If (12a), (12b), and (12c) were all grammatical, but (12d) were not, then wecould say that German has a that-trace effect. However, the consensus viewseems to be that standard German has no that-trace effect (Haider 1983; Gre-wendorf 1988; Stechow and Sternefeld 1989; Bayer 1990; Haider 1993; Lutz1996). The grammaticality status of structures of types (12c) and (12d) maybe said to be marginal, as they are not generally felt to be part of the standardlanguage, although they occur in speech in southern varieties. In our experi-ment we aimed to test whether this effect would be identifiable in German us-ing our more sensitive experimental judgement elicitation methods. Will Ger-man here turn out to be an exceptional language? We tested the four struc-tures each in eight different lexical forms. The results are presented in Fig-ure 4 together with the results of a parallel experiment on English by Cowart(1997).

This result shows a very clear picture. First, there is indeed a that-trace effectin German, for the resemblance of the German data to the English is remark-able. The existence of the effect in German can thus not be in doubt. In bothlanguages the subject and object extractions from complementizerless clausesare judged about equally good, and are clearly better than the extractions fromclauses with complementizers. The extraction of a subject from a clause with acomplementizer is judged much worse than the extraction of an object, in both

306 Sam Featherston

Figure 4. Results of our experiment on that-trace in German (on the left), with Cowart’s(1997) results on English for comparison (on the right). The pattern of resultsfrom the two languages is very similar.

languages. It therefore seems safe to assert that the basic factors affecting thisset of structures are the same in the two languages.

3.3. Implications for theory I: gradience

Since closer inspection of the data has shown that German has both superiorityand that-trace effects, there can be no question of German being an exceptionallanguage in that the ECP (if that is indeed the causal factor) does not apply init. Precisely the same phenomena can be found cross-linguistically, and we donot need the mechanism of parameter setting to account for the apparent non-existence of presumed universals in a given language, at least in this case. Weconsider it highly likely that this finding will prove to be the rule, not the ex-ception: effects found in one language will generally be found in others too (cf.Bresnan et al. 2001). This must reinforce the hypothesis that there is such a thingas a universal grammar. But we still have to explain why the superiority effectand the that-trace effect are uncontroversial in English, but usually thought tobe absent from German. Why did German look like an exception?

A look back at Figure 4 offers some insight. In English, the best three struc-tures are regarded as well-formed, while the worst one, the extraction of a sub-ject over a complementizer, is regarded as ill-formed. That is the that-traceeffect. In German, by contrast, the top two are regarded as well-formed, andthe lower two both as marginal, hence no that-trace effect is recognized. Themost likely explanation of this mismatch between the standard assumption andthe empirical data is that this data set, in English and in German, demands noless than three degrees of well-formedness, but the standard idealized model ofgrammaticality provides only two.


In both languages, there is a clear difference between the V2 extractions andthe V-final extractions; the first pair is plainly better. But there is also a clear dif-ference within the lower pair, that is, between the two V-final extractions. Wethus have three distinct levels, but the binary model of grammaticality allowsthe linguist to capture only one of these differences at a time. In German thedifference between the extractions with and without complementizers is recog-nized (no doubt because it is slightly greater), which means that the differencebetween the two extractions with complementizers becomes effectively invisi-ble. In English, the difference between the subject and object extractions over‘that’ is recognized (again, it is the greater of the two), which means that thefact that all extractions over complementizers are worse than the others is in-visible.

This explanation of how and why linguists have misread the data rests en-tirely on the assumption that well-formedness is a gradient, not a dichotomy.But this conclusion is forced upon us by the data anyway. The more detaileddata revealed by our controlled methods of collecting judgements can onlybe faithfully represented on a continuum of well-formedness. The data sim-ply has this form: there is not only ‘well-formed’ and ‘ill-formed’, there isalso ‘better-formed’ and ‘more ill-formed’. These two example studies showthat the effects of constraint violations can be larger or smaller, and can varycross-linguistically, and can be added together. For this to happen, these effectsmust have quantifiable values, and this is only possible in a model of gradientwell-formedness. The idealization to a binary opposition of well-formed andill-formed can thus be seen to be obscuring important information. It is hidingwhat we would have predicted, namely that German too has ECP effects. Thiseffect of the binary model reveals it to be an abstraction from primary data, nota feature of the primary data.

Let us be clear what we are arguing for. Certain sorts of idealization aredesirable and necessary. The idealization described by Chomsky in his famous“ideal speaker-listener” paragraph (1965: 3) we consider useful and indeed es-sential. But it should be noted that the idealization of well-formedness to a bi-nary opposition does not occur in that paragraph. On the contrary, Chomsky ex-plicitly avows in that text that grammaticality is a matter of degree” (1965: 11).Chomsky explicitly limits his chosen low-data approach to “clear cases”, to the“masses of evidence that are hardly open to serious question” (1965: 19). Theidealization of well-formedness to a binary model, we argue, may be feasiblewith data which are not open to question, but is inappropriate to data wherefiner distinctions are to be made, as in these cases. Over-zealous idealization ofwell-formedness has brought about an assumption that well-formedness reallyis binary, contrary to fact. German too has island constraints, but their weaker

308 Sam Featherston

violation costs make them less visible. We are creating ghost exceptions whenwe impose an unempirical single possible violation cost on the data.

Let us briefly review our findings so far. First, some apparent exceptionsare merely due to inadequate data. More and better data makes for fuller cover-age of theory. Next, some idealizations of data can cause phantom exceptions.Ignoring non-categorical constraints on structure, the assumption of a singleconstraint violation cost, and the idealization of well-formedness to a binaryscale can all obscure important evidence and make exceptions appear to occurwhere in fact there are none. This can be avoided by the adoption of a model ofgradient well-formedness.

Admittedly a gradient grammar requires several additional features in a the-ory. It must allow constraint-specific violation costs, which in turn necessitatesthat judged well-formedness be represented as a continuum. Violation costsmust be quantified, so that they can be cumulative. All this adds complexity tothe linguist’s task, but provides a more explanatory grammar, since the gram-mar produces fewer exceptions, and these features are anyway robustly presentin the primary language data; their inclusion in our grammar thus also increasesits empirical adequacy. Our task is not yet finished, however, for our grammarmust also allow exceptions in the output to be produced, which we shall arguerequires the architecture of the grammar to include probabilistic competitionfor output. We turn to this last feature now.

4. Exception type three: exceptional occurrence

This section addresses a very different type of exception, namely the occurrencein naturalistic output, such as corpus data, of structures which our grammarwould exclude. Every linguist will have had the experience of finding examplesof structures that they would have predicted not to occur. We shall give just oneexample here, superiority violations.

A search in the British National Corpus (Oxford, 100 million words) re-veals two examples of structures which violate superiority. Both have a directobject wh-item in clause-initial position and an in-situ wh-subject (What.DO …who.SUBJ). Searching the internet with Google reveals more examples.3

A search in Google UK (google.co.uk, February 2005, English language only,

3. For the effectiveness of internet search engines and the validity of the results, we findKeller et al. (2002) very convincing, in which it is shown that internet search engineresults can match introspective judgements no less accurately than can even carefullycompiled corpus data. See Featherston (2005) for further details which show clearlythat the corpus search and the web search produce closely matching patterns.


UK based sites only) for “what did who” yielded 371 hits, of which 236 werenon-repeats. Detailed inspection of each of these revealed 112 apparent realexamples. A similar search for “who did who” yielded 831 hits, of which 486were non-repeats. The exclusion of linguistics sites and non-anglophone sitesand the like revealed five apparent real examples, of which three had the formwho.DO … who.SUBJ, and two others who.IO … who.SUBJ4.

Now occurrences of structures which we would predict would not appear arecommon, and nothing hangs on this particular example. But this phenomenonposes a real problem for linguists, since grammatical models generally cannotaccount for this. Our own model of a grammar incorporating gradient well-formedness, which we have noted above is anyway required to deal with otherphenomena, does however permit the occurrence of exceptions. It does thisby introducing a differentiation into the architecture of the grammar. It distin-guishes two modules: Constraint Application and Output Selection. We shalllay out roughly how these two operate and see how this arrangement predictsexceptional occurrence.

The first of these carries out the function of structure building, essentiallyin the form of a constraint satisfaction model. This stage develops the form ofthe structure, being guided by the requirements of the semantic content but atthe same time constrained at every step by the application of the constraints onlinguistic form. This process is roughly equivalent to the stage of ‘grammaticalencoding’ in the ‘formulator’ in Levelt’s (1989) ‘blueprint for the speaker’. Butthe process of applying constraints to structures involves trade-offs, and witheach violated constraint the nascent structure incurs a violation cost. Let us notethat this process no doubt takes place incrementally, on roughly phrase-sizedutterance planning chunks, but we shall abstract from this for simplicity of pre-sentation here. The result of the application of constraints is that each candidatestructure receives a well-formedness weighting which can be accessed in in-trospective judgements: structures breaking more rules/preferences are judgedworse. Of the possible output forms for a given semantic content, one will usu-ally be better than the others, but it may happen that two (or more) of the best

4. It is worth noting here that some of them were unambiguously not echo questions.This is important because it establishes that the examples were originally generatedin this form. Echo question examples could be argued to just have one element ques-tioned in an otherwise quoted string. For instance, if a child recites the two times table‘2, 4, 6, 9, 12...’, their parent can ask: ‘2, 4, 6, what, 12?’ without putting into questionthe syntactic generalization that wh-items are fronted in English. Echo questions, asquoted strings, are thus only weak examples of the generability of superiority vio-lations. It is thus important that we find examples which the context shows are notecho questions.

310 Sam Featherston

are roughly equally good. We can illustrate this with sets of examples such asin (13).

(13) a. Jack looked the word up in the dictionary.b. Jack looked up the word in the dictionary.

Both phrasal verb particles and NP complements are preferred adjacent to theirhead verb (see excellent discussion in Wasow 2002), but only one of them ata time can appear there. The violation costs of these two structural preferencesmust be about equal however, since both (13a) and (13b) are fairly natural andboth occur, more or less in free variation.

How does the human language production system choose between the pair(13a) and (13b)? There must necessarily be some module selecting outputamong options, since we never find both being output when only one woulddo. Let us next note that this Output Selection procedure must also take note ofthe well-formedness status of the competing alternatives, since better-formedstructures are apparently preferred to less well-formed ones: we do not usuallyfind examples of Jack looked the word in the dictionary up. It seems economicalto assume that this Output Selection module selects a single form for output onthe basis of the well-formedness weightings assigned by the first Constraint Ap-plication module. Since both are about equally well-formed, as our perceptionsconfirm, we therefore regularly find both (13a) and (13b).

How does this account for exceptional occurrence? Well, to err is humanand human linguistic behaviour is probabilistic. This can be readily verified bytaking a look at the frequent experimental studies in the ‘Journal of SecondLanguage Acquisition’ where the performance of second language learners iscompared with that of a native speaker control group. The native speaker controlgroup never get everything right; most commonly they attain 90–95 % of thetarget behaviour. It is thus unsurprising that output selection too operates proba-bilistically. When two forms are equally good, Output Selection chooses one ofthem more or less randomly. If one were just slightly better than the other, thenwe would find this reflected in their distribution frequencies, but the slightly lessgood candidate would still occur. The key point: every now and again, we selectfor output a candidate structure which is more significantly less good than someother. Exceptional occurrence is thus merely a slip in operation, probably in theassessment of the well-formedness of a candidate. Some noise in the apprecia-tion of well-formedness is not stipulation but a well-attested fact: introspectivejudgements of well-formedness are well-known to have a degree of randomvariation in the individual judgement event (Schütze 1996). It is therefore notat all surprising that the output selection function, which must use perceived


well-formedness as its criterion for selection, exhibits some degree of variabil-ity in its choices.

We may thus summarize our account of exceptional occurrence as follows.Examples such as those in (13) demonstrate that it is at least sometimes thecase that speakers must choose between two or more equally legal structuresin production processing. It follows that we have such a thing as an OutputSelection function: if we did not, any pair of equally good forms would crash theproduction system. It will also be fairly uncontroversial that this function makesuse of well-formedness information about competing candidates: slightly betterforms are selected more frequently than less good forms. This demonstratesclearly that our Output Selection module functions probabilistically; if it did not,even very slightly less good forms would never occur at all. These assumptionsare sufficient to account for exceptional occurrence: when a system functionsprobabilistically, improbable outcomes occasionally occur, in our specific case,substandard structures are occasionally selected for output. The good news isthat this account of exceptional occurrence in no way complicates our grammaror forces us to include a probabilistic element into the grammar, for the grammarand the selection of output are completely separate functions. In the next sectionwe look in a little more detail what this might mean for the architecture of thegrammar and how it is used.

4.1. Implications for theory II: Well-formedness is not identical to output

We have argued for two features of the grammar which allow us to account forexceptions. Many apparent exceptions in the grammar can be seen to be merephantom exceptions if we assume a gradient model of well-formedness, whileexceptional occurrence is predictable probabilistic behaviour, if we distinguishbetween Constraint Application and Output Selection. In this section we shallattempt to show that these two features of language behaviour fit well togetherinto a coherent model of the architecture of human language computation, andare directly motivated by judgement and corpus frequency data. We summarizethe features and functions of the two modules that we distinguish in (14) and(15). We refer to this as the DecathlonModel because the selection of structuresfor output takes place in two separate stages (see footnote for the reason for thisname)5

5. We called this the Decathlon Model because of the similarity of the scoring systemin the athletic discipline decathlon and the way that well-formedness and output op-erate. In the decathlon, competitors take part in ten separate events and receive apoints score from each. The points reflect not their performance relative to the othercompetitors, but their absolute performance. It therefore does not matter whether a

312 Sam Featherston

(14) Constraint Application

a. applies rulesb. takes note of rule violations, andc. applies violation costs (well-formedness weightings)

– blindly and exceptionlessly,– all constraints applying to all structures.

(15) Output Selectionselects structures for output– on the basis of well-formedness weightings– competitively, and– probabilistically.

A key point in this model is the non-identity of well-formedness and occur-rence. Syntactic realisations of a given semantic content compete for outputon the basis of their well-formedness ratings. This means that a given syntac-tic realisation of a semantic content can be fairly well-formed (as perceived injudgements), but virtually never appear, simply because better syntactic reali-sations exist. Similarly, a syntactic realisation can be judged to be fairly poor,but nevertheless appear in linguistic output (e.g. corpus data) because it is thebest of the set of structural alternatives.

competitor comes first, third or sixth in the sprint, what matters is the absolute timeachieved. These points are summed, and the athlete with the highest total gets thegold.

The grammar and production systems work in a similar way. All structural al-ternatives are subject to all linguistic constraints, and all violations cause reductionsin their well-formedness rating (these are presumably caused by ease of process-ing at some level, which would explain why well-formedness is perceptible, but notexplicable, to the speaker and can be accessed through intuitive judgements). Thiswell-formedness is the equivalent of athletes’ points scores. The structural alterna-tives compete to be selected for output on the basis of their well-formedness in thesame way as athletes compete to win the gold medal on the basis of the points theyhave gathered in the individual events. In language this competition is probabilistic,probably at the stage of the perception of well-formedness.

To continue the sports analogy, the architecture of generative grammar resem-bles the slalom, in that the candidates must pass through all the gates to win; missingeven one gate (violating even one restriction) causes categorical exclusion. OT islike the high jump: the bar is put at a certain level and all competitors try to jumpover it. All who fail are excluded, and the bar is put higher. This continues until onlyone candidate remains, who becomes the optimal high jumper.


Note that this contrasts with both the traditional maxims of generative the-ory and with the precepts of Optimality Theory (OT, Prince and Smolensky1993). In generative grammar the standard supposition has been that any suc-cessfully generated structure will be used and will thus appear in the output.More recently the idea of competition in syntax has gained favour, appearing inthe Minimalist Program and occupying a central place in OT (for review Müllerand Sternefeld 2001). But in both of these types competition takes place in thegrammar and contributes to the definition of well-formedness. In our own De-cathlon Model (Featherston 2006) the situation is different. Well-formedness isnot a result of competition but the result of cumulative violation costs. Com-petition steers only the choice of output from among candidate syntactic re-alisations. This competition is based upon the well-formedness weightings ofthe candidates, but has a probabilistic element. Although the best realizationof a content will normally win the race to be output, it is predicted that lessoptimal candidates will sometimes produced, with a frequency proportional totheir degree of ill-formedness relative to the most well-formed candidate. Ex-ceptional occurrence is thus the effect of the probabilistic element in OutputSelection.

One of the major advantages of this grammar architecture is the fact that it isdirectly related to the evidence of the primary language data, both judgementsand corpus frequencies. We illustrate this in Figure 5. This graph shows theresults of two studies on object coreference structures in German, the first us-ing experimentally obtained judgements, the second using corpus frequencies(COSMAS I, Institut für Deutsche Sprache, Mannheim). For full details of thisstudy see Featherston (2002). In this graph the judgements are represented byerror bars and refer to the left-hand scale. These judgements show clear dif-ferences between the sixteen syntactic variants. These vary on four binary pa-rameters, and their perceived well-formedness is a product of the number andseverity of the constraints that each violates. This is thus the same sort of findingas that which we found in our magnitude estimation studies reported above; infact this data type always reveals this picture of well-formedness as a gradientphenomenon, reflecting cumulative quantifiable violation costs.

The frequency information was gathered from the corpus COSMAS I(W-PUB Archiv, 530 million word forms), and relates to the right-hand scale.The syntactic variant judged best occurred fourteen times, that judged sec-ond best occurred just once, and no other form was found at all. This is thusa very different pattern to that of the judgements: the frequency data showsstrong evidence of probabilistic competition for output, unlike the perceivedwell-formedness data which shows no sign of competition (others find similarpatterns e.g. Kempen and Harbusch 2005).

314 Sam Featherston

Figure 5. Experimental judgements and corpus frequencies of a set of syntactic variants(from our work on object coreference). The judgements reveal a continuumof perceived well-formedness. The frequencies show that just the best andsecond best structure ever occur (W-PUB Archive 530 million word forms).This pattern reveals that competition for output occurs over well-formednessvalues.

On the basis of data like this, we suggest that perceived well-formedness andoccurrence are not identical. They are of course related, and it seems to be anatural assumption that the competition for output proceeds on the basis of thewell-formedness values: we produce the best syntactic variant available to us,(probably because it is easiest or the first available, which may be the samething). But notice that even in this limited data set, competition for output isprobabilistic: not only the best but also the second best structure occurs, oc-casionally. This illustrates variation. Exceptional occurrence results when oneof the weaker candidates is selected; this will be rare, but it will occasionallyoccur.6

6. Frequency data only ever contains the very best alternatives and fails to distinguishvariants with zero occurrences. It is perhaps worth noting here that it is this verylimited range of alternatives present (which we dub the Iceberg Effect) which makesimportant factors such as cumulativity difficult to spot in frequency data (e.g. vander Feen, Hendriks and Hoeks 2006). Experimental judgements, which provide ev-idence from the full range of possible and impossible structures, make the realityof cumulativity perfectly clear (Keller (2000). In the sporting analogy, frequencydata only shows you the medal winners. But to see what it takes to become a medalwinner, you gain much more information by comparing medal winners with athleteswho precisely didn’t win medals.


4.2. Architectural simplicity vs coverage: Contrasting grammatical models

To clarify the nature and implications of our analysis we shall compare our ownmodel with classic generative grammar and stochastic OT (henceforth often:StOT; Boersma and Hayes 2001) in order to bring out its features and revealhow it combines architectural simplicity with empirical coverage. In Table 2we see the relevant characteristics of the three grammar models contrasted. Thedetails are of course greatly simplified, but readers will be able to supply thedetails of their own favourite grammar architecture on the basis of the informa-tion given. We shall first make clear what the chart shows by contrasting thetwo familiar models.

First row: Traditional generative grammar applies all constraints to all struc-tures, blindly and exceptionlessly. OT is more complex. The application of con-straints to candidate structures is ranked, that is effectively ordered; in stochasticOT this application order is additionally stochastically variable (Boersma andHayes 2001).

Table 2. The internal architectures of two familiar models of grammar, showing howthey compare with our own Decathlon Model. Traditional generative gram-mar has a simple architecture, but cannot account for exceptional occurrence.Stochastic OT can account for this, but requires ordered and conditional func-tioning and a relative, unstable well-formedness model. Our own DecathlonModel requires no complex architecture to account for exceptional occurrence.

Generativegrammar

StochasticOT

DecathlonModel

Constraintapplication

blind,exceptionless

ordered,probabilistic

blind,exceptionless

Violation costapplication

blind,exceptionless

‘smart’,conditional

blind,exceptionless

Violationcosts

only one value:‘ungrammatical’

only one value:‘ungrammatical’

constraint-specific,cumulative

Well-formednessmodel

dichotomous,absolute, stable

dichotomous,relative, unstable

gradient,absolute, stable

Outputselection

trivial: all non-badstructures output

trivial: single non-bad structure out-put

competitive,probabilistic

Exceptionaloccurrence

not accounted for accounted for accounted for

316 Sam Featherston

Second row: Generative grammar also has a simple violation cost applicationfunction, again, blind and exceptionless. If a structure breaks a rule then thestructure is automatically penalized. OT is again more complex, since the ap-plication of violation costs is not automatic but conditional: conditional uponwhether it will distinguish the candidates, if not, no penalty is applied, and theviolation has no effect upon the outcome.

Third row: On the other hand, OT and generative grammar both have onlyone violation cost value. There is only one possible outcome of a violation costbeing applied: the attribution of the status ‘ungrammatical’.

Fourth row: The features so far result in both models having a dichotomousmodel of well-formedness. But generative grammar’s well-formedness is ab-solute and inherent, while OT’s well-formedness is always relative to a com-parison set (hence ‘optimality’). The status of a given structure in generativegrammar is stable, while the same structure in StOT can vary between good orbad even within the same comparison set, depending on the effect of the randomfactor in the application order.

Fifth row: Both familiar models assume that well-formedness is sufficientto license output. All and only well-formed structures should appear. In simpleOT this will always be just a single candidate, the optimal structure. In StOTthis single structure may vary between evaluation events.

Sixth row: Stochastic OT can thus capture the empirical reality that not onlyone form of a competition set in practice appears in the language. Since con-straint ordering has a random weighting applied, occasionally a candidate willwin which would normally lose: ‘exceptions’ may therefore win through thecompetition and thus occur. Traditional generative grammar permits multipleacceptable candidates to occur, but does not easily permit forms to occur whichare other than fully acceptable. It thus has no ready account of exceptional oc-currence. It is, however, considerably simpler than any version of OT as a the-ory. It is perhaps not surprising that the ability to deal with exceptional occur-rence incurs a cost in terms of complexity of the architecture.

Our own Decathlon Model contrasts with both of these, but also shares as-pects of both.7 Our model patterns with generative grammar in that it applies

7. I am often asked how my model relates to Jäger and Rosenbach’s (2006) MaxEntmodel. It is a step closer than Boersma and Hayes StOT, but no more. It is stilleffectively a method of modelling frequency distributions only, and its concept ofwell-formedness is still categorical in any single evaluation event, as any model willbe which confounds well-formedness and occurrence. The Decathlon Model is basedupon evidence from both judged well-formedness and frequency data, and accountsfor the constrasts in findings in these two data types by distinguishing constraintapplication and output selection. It is also much more empirically grounded than any


all constraints to all structures, requiring no ‘smart’ application function. Theapplication of violation costs is simple, since it too is blind and exceptionless.However, instead of having a simple binary contrast of ‘good’ and ‘bad’, ourmodel reflects the data from our judgement experiments in requiring constraint-specific violation costs which vary in severity. A violation does not make astructure bad, it merely makes it worse, by a fixed, constraint-specific amount.What is more, these violation costs are cumulative, so that if a structure vi-olates two restrictions, it is ‘worse’ than if it merely violates one. These twofactors, constraint-specific violation costs and cumulativity, demand a gradientmodel of well-formedness, which is of course more complex than the familiardichotomous model. On the positive side, this allows gradient well-formednessto be absolute and stable; structures are inherently good, bad or marginal, on acontinuum scale.

This is the basic operation of our Constraint Application module, whichroughly corresponds to what is generally thought of as the ‘grammar’. It is theweightings assigned to structures by this module that we believe can be mea-sured with the elicitation of judgements. The Output Selection function is, weargue, no part of the grammar but merely a facet of production. It works verysimply. In production processing, we have to choose between different struc-tural variants in just the same way as we must choose between non-structuralvariants. We generally choose the best structural variant, probably because itis the easiest for us to compute; (in fact the causal factor of ‘well-formedness’is probably ‘ease/speed of computation’, and the ‘competition’ is no doubt a‘race’). But since we are humans and we do not process deterministically, wesometimes perceive well-formedness inconsistently or make mistakes. It fol-lows that ill-formed structures are occasionally produced, essentially for thesame reasons that we sometimes call our mother when we intended to call oursister, or scrape the car driving out of the garage. The grammar need not gener-ate exceptional occurrence, and in our model it does not.

5. Conclusions

Our aim in this paper was to argue that the grammar has many fewer excep-tions than it sometimes appears. We looked at examples of three different sortsof exceptions, and tried to show that each of them is, on closer inspection, not

variant of OT since it makes no use of empirically unobservable parameters such asconstraint rankings. Our own violation costs are directly obtained from experimentsof well-formedness judgements. It is therefore able to predict frequency distribution,rather than being obtained from frequency distribution.

318 Sam Featherston

exceptional. In each case we were able to account for the phenomena in a sys-tematic, rule-governed way, in each case backing up our explanation with harddata from our experimental judgement studies.

Our first example was one of incomplete generalization. The conditions ofthe binding theory have considerable descriptive adequacy, but they do not seemto apply in German object coreference structures. We therefore carried out anexperiment in which we gathered informants’ judgements using the magnitudeestimation methodology. The results superficially support the exception hypoth-esis as they show a much more complex pattern than the binding theory alonewould predict. However, the far greater differentiation in the data made avail-able by the experimental technique and the testing of many different minimallydifferent structural variants allowed us to identify the factors affecting the per-ceived well-formedness of the structures. Surface factors such as constituentweight and constraints on linear precedence can be discounted from the dataset, leaving only the relevant information. This winnowing process reveals thatthe relevant binding condition is fully active in the data. Better data with finerdifferentiation thus allows us to discount the suspicion that these structures werean exception to the binding theory. We think that syntacticians will find this tobe quite commonly the case. Perceived well-formedness is sensitive to gram-matical and non-grammatical effects, acting cumulatively, but these irrelevantfactors can be disentangled with high quality data, when it is recognized thateven between structural minimal pairs, not just the one factor of interest maybe active.

Our second type of exception was that of ECP phenomena in German, alanguage in which these effects have been consensually thought not to apply.Again improved data allowed us to confirm that the ECP does indeed operate,but that it seems to cause weaker violation costs than in English. In such a case itis necessary to take a critical look at the idealizations embedded in the standardtheoretical assumptions. The main reason why these ECP effects were thoughtnot to apply in German seems to be that their violation costs do not cause cate-gorical grammaticality: structures violating superiority or the that-trace effectmay be more readily found in German corpuses than in English. Working onthe assumption that constraints on structure which can be violated cannot benarrowly grammatical, linguists had tended to deny the existence of these ECPeffects. However, in this case it seems very clear that this assumption is causingexceptions to be identified where none in fact are present. This idealization to adichotomous model of well-formedness is revealed in such cases as an abstrac-tion applied to the primary language data, not a generalization derived from it.The conclusion must be that ECP effects do indeed apply in German and thatGerman is in no way an exceptional language in this regard. In this case, the


simplifying idealization to a binary model of well-formedness was causing theghost exception, which must throw its usefulness as a ‘simplifying’ assumptioninto some doubt.

Let us hasten to add that we do not consider all idealization of the data baseof syntax to be erroneous or unnecessary. Precisely those abstractions from theraw data of language use or language intuition which relate to the differencebetween competence and performance seem to us to be fully justified, for thediscarded information about the speaker-listener’s state of mind or dialectal id-iosyncrasies are of little relevance in ascertaining the structure of linguistic ex-pressions. This is not true of the idealization to a binary well-formedness model,however, for the information discarded in this simplification can indeed be rel-evant to structure, as is clearly demonstrated in this paper. We suspect that theabandonment of this admittedly long-standing but ultimately unmotivated ide-alization will cause other ghost exceptions to be revealed for what they are.

The last exception type we addressed was that of exceptional occurrence.Every linguist has had the experience of seeing or hearing structures that the-ory would suggest should not occur. Traditional generative grammar has noreal account of this, since it assumes that all but only grammatical structuresshould ever make to through to output. Stochastic OT does offer a mechanismwhich can model occurrence variation and occasional exceptional occurrence,but does this at a very high price in complexity of grammar architecture. Thisapproach is forced to abandon both the blind and exceptionless application ofconstraints to candidate structures and the blind and exceptionless assignmentof violation costs to violating structures. Additionally it must assume a modelof well-formedness in which the well-formedness status of a structure is notinherent, but only ever relative to a comparison set, and not stable, as it maychange between different competition events. Put briefly, stochastic OT dis-cards the assumption that the grammar is maximally general and instead pro-vides it with a mechanism to allow exceptions. Our own architecture preservesthe blind and exceptionless grammar and accounts for exceptional occurrenceas a product of Output Selection, a function of human language processing in-dependently motivated by our ability to choose between equally well-formedalternative structures. We therefore consider our model to be more explanato-rily adequate, since it preserves the simple grammar and accounts for the datausing only assumptions which are independently motivated.

We noted at the beginning of this paper that exceptions are the deadly enemyof generative grammar, since they offend against the simplicity criterion whichis an essential component of an explanatory grammar. We hope to have shownthat many exceptions are merely apparent, the epiphenomena of assumptionsabout data, the nature of well-formedness, and the architecture of the grammar,

320 Sam Featherston

which are not empirically motivated but rather apocryphal. Without this theo-retical baggage generative grammar will move further and faster, in my opinion.

References

Bard, Ellen, Dan Robertson, and Antonella Sorace1996 Magnitude estimation of linguistic acceptability. Language 72: 32–

68.

Bayer, Josef1990 Notes on the ECP in English and German. Groninger Arbeiten zur

Germanischen Linguistik 30: 1–55.

Behaghel, Otto1909 Beziehungen zwischen Umfang und Reihenfolge von Satzgliedern.

Indogermanische Forschungen 25: 110–142.

Behaghel, Otto1932 Deutsche Syntax. Vol. 4: Wortstellung, Periodenbau. Heidelberg:

Winter.

Boersma, Paul, and Bruce Hayes2001 Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry

32: 45–86.

Bresnan Joan, Shipra Dingare, and Chris Manning2001 Soft constraints mirror hard constraints: Voice and person in English

and Lummi. In Proceedings of LFG01 Conference, Miriam Butt andTracy King (eds.), 13–32. Stanford: CSLI.

Bybee, Joan, and Paul Hopper2001 Frequency and the Emergence of Linguistic Structure. Amsterdam:

Benjamins.


Chomsky, Noam1973 Conditions on transformations. In A Festschrift for Morris Halle, Ste-

phen Anderson, and Paul Kiparsky (eds.), 232–86. New York: Holt,Reinhart & Winston.

Chomsky, Noam1981 Lectures on Government and Binding: The Pisa Lectures. Berlin:

Mouton de Gruyter.

Chomsky, Noam1993 A minimalist program for linguistic theory. In The View fromBuilding

20, Ken Hale, and Samuel Keyser (eds.), 1-52. Cambridge, MA: MITPress.


Cowart, Wayne1997 Experimental Syntax: ApplyingObjectiveMethods to Sentence Judge-

ments. Thousand Oaks, CA: Sage.

Featherston, Sam2002 Coreferential objects in German: Experimental evidence on reflexiv-

ity. Linguistische Berichte 192: 457–484.

Featherston, Sam2005 Universals and grammaticality: Wh-constraints in German and En-

glish. Linguistics 43: 667-711.

Featherston, Sam2006 The Decathlon Model: Design features for an empirical syntax. In Lin-

guistic Evidence – Empirical, Theoretical, and Computational Per-spectives, Stephan Kepser, and Marga Reis (eds.), 187–208. Berlin:Mouton de Gruyter.

Featherston, Sam, and Wolfgang Sternefeld2003 Experimental evidence for Binding Condition B: The case of coref-

erential arguments in German. In Arbeiten zur Reflexivierung, LutzGunkel, Gereon Müller, and Gisela Zifonun (eds.), 25–50. (Linguis-tische Arbeiten 481) Tübingen: Niemeyer.

Ginzburg, Jonathan, and Ivan Sag2000 Interrogative Investigations. Stanford, CA: CSLI Publications.

Grewendorf, Günther1985 Anaphern bei Objekt-Koreferenz im Deutschen: Ein Problem für die

Rektions-Bindungs-Theorie. In Erklärende Syntax des Deutschen,Werner Abraham (ed.), 137–171. Tübingen: Narr.

Grewendorf, Günther1988 Aspekte der Deutschen Syntax. Eine Rektions-Bindungs-Analyse. Tü-

bingen: Narr.

Haider, Hubert1993 Deutsche Syntax – Generativ. Tübingen: Narr.

Jäger, Gerhard, and Anette Rosenbach2006 The winner takes it all – almost: cumulativity in grammatical varia-

tion. Linguistics 44: 937–971.

Keenan, Edward, and Bernard Comrie1977 Noun phrase accessibility and universal grammar. Linguistic Inquiry

8: 63–99.

Keller, Frank2000 Gradience in grammar: Experimental and computational aspects of

degrees of grammaticality. Ph. D. diss., University of Edinburgh.

322 Sam Featherston

Keller, Frank, Maria Lapata, and Olga Ourioupina2002 Using the web to overcome data sparseness. In Proceedings of the

Conference on Empirical Methods in Natural Language Processing,Jan Hajic and Yuji Matsumoto (eds.), 230–237. Philadelphia.

Kempen, Gerard, and Karin Harbusch2005 The relationship between grammaticality ratings and corpus frequen-

cies: A case study into word order variability in the midfield of Ger-man clauses. In Linguistic Evidence – Empirical, Theoretical, andComputational Perspectives, Stephan Kepser, and Marga Reis (eds.),329–350. Berlin: Mouton de Gruyter.

Lasnik, Howard, and Mamoru Saito1984 On the nature of proper government. Linguistic Inquiry 15: 235–289.

Lenerz, Jürgen1977 Zur Abfolge nominaler Satzglieder im Deutschen. Tübingen: Narr.

Levelt, Willem1989 Speaking: From Intention to Articulation. Cambridge, MA: MIT

Press.

Lutz, Uli1996 Some notes on extraction theory. In On Extraction and Extraposition

in German (Linguistik Aktuell 11), Uli Lutz, and Jürgen Pafel (eds.),1–44. Amsterdam/Philadelphia: Benjamins.

Müller, Gereon, and Wolfgang Sternefeld (eds.)2001 Competition in Syntax. Berlin: Mouton de Gruyter.

Primus, Beatrice1987 Grammatische Hierarchien(Studien zur Theoretischen Linguistik 7).

München: Fink.

Primus, Beatrice1992 ‘Selbst’ – Variants of a scalar adverb in German. Linguistische Be-

richte. Sonderheft 4: 54–88.

Prince, Alan, and Paul Smolensky1993 Optimality Theory: Constraint Interaction in Generative Grammar.

Technical Report No. 2. Center for Cognitive Science, Rutgers Uni-versity.

Reinhart, Tanya, and Eric Reuland1993 Reflexivity. Linguistic Inquiry 24: 657–720.

Reuland, Eric, and Tanya Reinhart1995 Pronouns, anaphors and Case. In Studies in Comparative Germanic

Syntax, Hubert Haider, Susan Olson and Sten Vikner (eds.), 241–269.Dordrecht: Kluwer.


Reis, Marga1976 Reflexivierung in deutschen ACI-Konstruktionen: Ein transforma-

tionsgrammatisches Dilemma. Papiere zur Linguistik 9: 5–82.

Ross, John1967 Constraints on variables in syntax. Ph. D. diss., MIT.

Schütze, Carson T.1996 The Empirical Base of Linguistics: Grammaticality Judgements and

Linguistic Methodology. Chicago: University of Chicago Press.

Stechow, Arnim von, and Wolfgang Sternefeld1989 Bausteine syntaktischen Wissens. Opladen: Westdeutscher Verlag.

Sternefeld, Wolfgang2000 Grammatikalität und Sprachvermögen. Anmerkungen zum Induktion-

sproblem in der Syntax. In Von der Philologie zur Grammatiktheo-rie: Peter Suchsland zum 65. Geburtstag, Josef Bayer, and ChristineRömer (eds.), 15–44. Tübingen: Niemeyer.

Sternefeld, Wolfgang, and Sam Featherston2003 The German Reciprocal “einander” in Double Object Constructions.

In Arbeiten zur Reflexivierung (Linguistische Arbeiten 481), LutzGunkel, Gereon Müller, and Gisela Zifonun (eds.), 239–266. Tübin-gen: Niemeyer.

Uszkoreit, Hans1987 Word Order and Constituent Structure in German. CLSI Lecture

notes no. 8. Stanford, CA: CSLI Publications.

van der Feen, Marieka, Petra Hendriks, and John Hoeks2006 Constraints in language processing: Do grammars count? In Proceed-

ings of the COLING-ACL Workshop on Constraints and LanguageProcessing (CSLP-06). Sydney: Association for Computational Lin-guistics.

Wasow, Thomas2002 Postverbal Behavior. Stanford, CA: CSLI Publications.

Anomalies and exceptions

Hubert Haider

1. Anomalies and exceptions

Anomalies in the observed data patterns are usually construed as exceptions inthe grammar of the data patterns. ‘Anomaly’ is a characterization of data prop-erties in terms of ‘normalization’ expectations. ‘Exception’ is the reconstructionof an anomaly in terms of a rule with a restriction. Whether an anomaly is bestcharacterized as the local effect of an exceptional rule or as a global systemeffect (result of conflicting but otherwise exceptionless rules) is an empiricalissue. Not every anomaly is an exception, though. It may be a mere processingdifficulty (see below).

The conceptual difference between anomaly judgements sampled from per-formance data and exceptionality ascriptions to rules of grammar must notbe obscured by equivocation. The source of the performance data is a self-evaluation of the mental processing of a stimulus, the second one, namely thegrammar ascription, is an attempt to model the grammar-related properties ofthe performance data. Note that the grammar-related aspects are only a sub-system of the complex cognitive computations whose composite output is theglobal acceptability judgement for a given stimulus.

Anomalies may be the result of processing difficulties in the absence of anyexceptional trait in the grammar of the given construction. Strong garden patheffects are the best examples:

(1) a. Manone

glaubt,believes

dassthat

MaxMax

Musikermusicians

vorgestelltintroduced

bekamen.1

got3.PL

1. Scrambling of an object without overt case across the subject in the Germanget-passive variant produces strong deviance feelings for informants. Thanks toM. Schlesewsky for the datum.

326 Hubert Haider

b. Dasthis

sindare

es.2

it

The perceived anomaly of (1a) is the difficulty of identifying scrambling inthe absence of overt case markings. If you replace Max by denacc unbekanntenMax ‘the unknown Max’, and (optionally) Musiker by viele Musiker ‘manymusicians’, the anomaly disappears. In (1b), both pronouns are singular, but thefinite copula is marked for plural. This mismatch is perceived as an anomaly.Only when the informant is pushed to realize that the predicate es ‘it’ may as apredicate refer to a plural entity, the judgement changes from deviant to perfect.

An exception is a restriction on the range of a rule to a subset of its pos-sible range of application (2b). In (2b), xn is exempted from the range of theuniversally quantified rule predicate P.

(2) a. ∀ P(x)b. ∀[X−xn]P(x)

exception: the range is the set X minus the member xn

There is no denying that grammars are exceptional to a certain extent. Evenif exceptions exist on the level of individual items (3), the frequent case is arestriction on a subclass (4). Here is an example for a restriction on the levelof an individual lexeme. In every Germanic language, the cognate of genug‘enough’ is anomalous, since it follows rather than precedes the modified item:

(3) G. gross genug, E. big enough, D. groot genoeg …

An example for a subclass restriction is the restriction on the German ‘Ersatz-infinitiv’ construction3 that is itself triggered by the exceptional avoidance ofthe participle form for verbs that select a bare infinitive (modal, perception,causative verbs). In German, but not in Dutch, the construction is restricted tothe finite clause. It is ungrammatical in infinitival clauses (4b) in German, butnot in Dutch (4c).

(4) a. dassthat

erhe

esit

hathas

esseneat

wollenwant

2. Thanks to M. Bierwisch for reporting me this datum (source: E. Lang). Informantsreject it first, but they accept it immediately once you ask for instance “Is this apossible answer for: Are this really 47 envelopes?”

3. Descriptively speaking, the (finite) auxiliary that would trigger the participial formon the preceding verb is fronted and the would-be-participle appears as the bare in-finitive form.

Anomalies and exceptions 327

b. *ohnewithout

esit

zuto

habenhave

esseneat

wollenwant

c. zonderwithout

hetit

teto

hebbenhave

willenwant

eteneat

Binding by pronouns in German provides examples of an anomaly caused byinconsistent grammar requirements: on the one hand, an antecedent must c-command the bindee (5a),4 on the other hand, a dative pronoun must followan accusative pronoun in German. So, the German restriction on pronoun orderrules out binding of an accusative reflexive by a dative pronoun, since the dativewould have to precede (in order to fulfil binding) and thereby it would violatethe order restriction embodied in the Grammar of German pronouns (5b). Thisleaves (5c) as the only option and rules out (5b), in accordance with Feather-ston’s experimental findings, reported in his Figure 1.5

(5) a. *Wirwe

habenhave

sichiacc

themrefl

(selbst)(selves)

ihnenidat

to.themüberlassen.left

b. *Wir haben ihnenidat sich

iacc (selbst) überlassen.

c. Wirwe

habenhave

sieiacc

themsichi

dat(to) themrefl

(selbst)(selves)

überlassen.left

However, this account does not fully cover the Dative-Accusative anomaly withrespect to binding in German. Although a dative may be a binder for an anaphor(6a), it appears to be disqualified if the bindee is a co-argument (6b) of the dativebinder. In addition, there is another anomaly involved. A dative not only doesnot bind in these cases, it also interferes with binding between the subject andthe object (6d), if it intervenes.

(6) a. Er hat den Leutenidat über einander/sich

i erzählt / Biografien voneinander/sichi gezeigt.

b. *Er hat den Leutenidat einander/sichacc vorgestellt.

c. Er hat die Leuteiacc einander/sich

idat vorgestellt.

4. The only apparent exception is binding by the nominative subject: Hat sichirefl je-

mandi geirrt? But in this case, there is an agreeing item, i.e. the finite verb, thatc-commands (see Frey 1993).

5. Note that the subjects in the experiment seem to treat binding by objects as an unde-fined case, since they allow personal pronouns (Principle B) on a par with reflexives(Principle A) in this case (see Featherston’s figure 1, ‘ndp’ and ‘nar’). This wouldfollow if the core case of binding in German is binding between the subject and anobject and binding between objects is un-/ill-defined in the application of Germangrammar by the informants.

328 Hubert Haider

d. ??Wiri haben diesem Mann einanderi/unsi vorgestellt.e. Wiri haben einanderi/unsi diesem Mann vorgestellt.

These data have not been part of Featherston’s experiment, but they are crucialfor the dative anomaly. In my view, the anomaly is not yet fully understood.6

We do not yet see clearly enough whether it involves a genuine exception oris just a result of conflicting interactions of otherwise unexceptional rules ofbinding.

2. Discreteness or gradience

Data judgements are necessarily gradient, grammars are discrete. Judgementsare the composite, unconscious result of the interaction of various components(grammar, information structure, stylistic preferences, implicit comparison withpotential paraphrase variants, ease of parsing and interpretation, anticipativejudgements of the experimenter’s expectations, …). Even if each of these com-ponents would produce a discrete evaluation value, the aggregate would be agradient function.

Well-formedness as defined by human grammars is discrete, that is, a matterof yes or no. There is no such thing as ‘75% grammatical’. Crucially, well-formedness as a property determined by grammar must not be equated withthe introspective attribution of the quality of well-formedness by informants.Their judgements are reports on introspection experiences and these are surelynot discrete. From a theoretic point of view, grammatical well-formedness isnot adequately characterized as a matter of cumulated (relative) weights.7 Thefact that informants asked for grammaticality judgements are at a loss in certaincases does not prove that grammaticality is gradient. Discreteness shows in themajority of cases if ‘clear cases’ are tested. To make an issue testable is difficult,

6. Müller (1995, sect. 4.5) suggests that the Dative is base generated lower than theaccusative and then raised, and therefore it could not serve as a binder for the ac-cusative. Immediate counterevidence for this claim is the fact that i) Dutch does notallow object scrambling but has Dative < Accusative as the obligatory order, andii) that a raised argument ought to be a possible binder (see: The meni seem to eachi

other to be incompetent).7. Optimality theory suggests discrete rules with a relative weighting in terms of their

violability ranking. Until now, the proponents have not produced a universal theoryof weighting, however. Without a UG of ranking, a weighted system is not learnable:If the perceived input deviates from the child’s interim grammar output, the neces-sary changes in the interim grammar involve intractable computations of comparingalternative rankings.


of course. It is a matter of a meticulous test design, including pilot studies, andit requires ultimately also a lot of experience, giftedness and even luck on theexperimenters’ side.

There is no need to succumb to experimentally testing the robustness of thecontrast between (7a) and (7b). It is evident. The head of the attribute phrasemust be adjacent to the head-initial phrase it modifies (see Haider 2004a, on the‘edge effect’). Second, an adverbial modifier precedes the modified element,with the exception of genug (see (3) above). Because of the adjacency restric-tion, (7b) is deviant. Some speakers resort to (7d), but they never use (7e). Isthe difference between (7b), (7d) and (7e) a gradient one? Yes, it is in termsof acceptability, but it is not in terms of well-formedness. (7b) and (7d) areequally ungrammatical. Why is (7d) considered less deviant? It is felt to be thebest solution in a no-win situation: If you stick to the order required by the id-iosyncrasy of genug, you would have to inflect and thereby turn it into a kind offake head of the attribute, otherwise it violates the adjacency requirement (7d).What remains is the violation of inflecting an uninflectable item that is not thehead. In sum, gradience is often the result of dealing with a situation in whichtwo requirements are in conflict.

(7) a. eina

genügendsufficiently

deutlichesclear

Beispielexample

b. *eina

deutlichesclear

genugenough

Beispielexample

c. *ein deutliches genügend Beispield. ??ein deutlich genuges Beispiele. *ein deutlich genügendes Beispiel

(*, in the reading of (7a))

It is an undeniable fact that the results gained in psycholinguistic experimentsare heavily influenced by the design of the experiment. If the experimenter wasin the lucky situation to be able to control most of the potentially interveningvariables, the results will be close to the ideal of being representative for theissue under experimentation, but in most cases this is hard to achieve.

Featherston’s experiment on superiority is a good case. The experimentaldata document a preference for the patterns congruent with superiority in Ger-man as well as in English, but for German, the preference is characterized asnot ‘categorical’ in the sense that “speakers would not choose to use structuresviolating” the superiority constraint. This characterization does not pay enoughattention to intervening variables, however. The experiment tests only a subsetof superiority cases, namely direct questions. Indirect questions (8c, d) would

330 Hubert Haider

have been the better choice. The contrast between English and German wouldbecome even clearer.

(8) a. It is unclear what belongs to whom.b. *It is unclear to whom what belongs.c. Es ist unklar, was wem gehört.

it is unclear what whomdat belongsd. Es ist unklar, wem was gehört.

it is unclear whomdat what belongse. Wem gehört was?

whomdat belongs whatf. *To whom does what belong?

Direct questions presuppose a context since they presuppose that a possible an-swer exists and the answer is an appropriate choice of discourse participants forthe questioned elements. Second, for this choice, the order of the wh-elementsprovides the sorting key. For instance, in (8e), the sorting key iswem ‘to whom’.So, the elements of a subset of the set of possessors are mapped on elements ofthe set over which was ‘what’ ranges, namely the set of possessed elements. In-direct questions do not require an answer, so they do not need a discrete choiceof potential discourse participants, and so the question of the sorting key isnot relevant, and therefore the information structure effect of the order of wh-elements is less salient.

What the experimental data confirm is this. If an informant is presented asentence in isolation, (s)he implicitly embeds it in a potential discourse situationand judges the information structure. If you have to choose between (9a) and(9b), you will easily identify the order in (9a) as congruent with the base order.The preference of (9a) is a preference of the contextually unmarked informationstructure.

(9) a. Werwho

hathas

waswhat

zurto the

Partyparty

mitgebracht?brought

b. Waswhat

hathas

werwho

zurto the

Partyparty

mitgebracht?brought

c. Ich möchte gar nicht wissen, was wer zur Party mitgebracht hat.d. *I do not want to know what who has brought along to the party.

What a simple comparison between English and German fails to honour is across-linguistic generalization. Superiority does not only hold for English, itholds for any VO language (Haider 2004b), but not for OV languages. The


absence of superiority is not just a property of German, it is a property of OVlanguages in general, cf., for instance, Japanese (10a, b).

(10) a. Nani o dare ga katta no. [Japanese]what-obj who-sub bought Q-PRT‘What did who buy?’

b. Dare ga naze kita nowho-sub why came Q-PRT‘*Who came why?’

c. Wer hat wie/weshalb/wann/wo protestiert?who has how/why/when/where protested

d. *Who protested when/where/*why/*how?

The reason is this (see Haider 2004a): In VO, the VP-internal subject is pre-verbal and therefore not in the directionality domain of the verb. In OV, anyargument, and in particular the subject, too, is in the directionality domain ofthe verbal head. So, the subject in VO, but not in OV, needs a functional headas a directional licenser. This is the grammar theoretic source of the existenceof obligatory functional subject positions in VO and the peculiar behaviour ofan in-situ wh-subject (see Haider 2004b, 2005, for details).

The that-trace effect is just a facet of this phenomenon, but an ill-understoodone. First, in English, the effect is absent if an adverbial intervenes (Browning1996).8 Second, neither German nor any other OV language punishes that-trace-structures. Third, languages differ with respect to the general transparency ofC-introduced clauses. In German, speakers of northern varieties object to anywh-extraction out of dass-clauses whereas Southerners are free extractors. Thishas been documented already by Paul (1919: 321f.). He devoted a subsection ofhis German Grammar to long distance wh-dependencies and referred to themjustly as ‘Satzverschlingung’ (sentence intertwining). In his collection, he docu-ments plenty of cases of that-t-violations for interrogative and relative clauses.Andersson and Kvam (1984) tested extractions out of that-clauses in variouslocations in Germany and not only showed a contrast between extractors andnon-extractors, but also a kind of adaption effect for non-extractors that tolerateextractions by others.

So, that-t-violations need to be checked as carefully as Torris (1984) did inher dissertation. She showed that systematic extractors for wh-constructions aresystematic extractors for the other cases (comparatives, relative clauses, long

8. Here is an example:i) Who do you think that *(under these circumstances) would disagree?

332 Hubert Haider

distance topicalization), too. On the other hand, there are extraction admittersand they show an unsystematic behaviour. They tolerate extractions by othersbut long distance extraction out of dass-clauses is not part of their grammar.This is reminiscent of Andersen’s (1973) concept of ‘via-rules’ for communi-ties in which the common vernacular language is fed by microparametricallycontrasting grammars.

In sum, subject extraction out of a C-introduced finite clause is on the onehand subject to constraints on extraction out of C-introduced finite clauses ingeneral, and on the other hand, it is restricted by constraints for traces in func-tional spec position. The former restriction applies to variants of German, thelatter one does not.

Just like experimental data, corpora do not speak for themselves. They re-quire evaluation and interpretation. We have learnt that water freezes at 0◦ centi-grade. If you want to test it, you will find out that the results are gradient andthat you may be puzzled by exceptions and learn about the influence of inter-vening third factors, like impurities, pressure fluctuations, etc. Corpora are per-formance records, and performance is unavoidably prone to influences of thirdfactors, including simple mistakes and imperfections. A strict methodology ofcorpora evaluation is still wanting. Nevertheless, corpora may serve heuristicpurposes. If something is a robust phenomenon, a big enough corpus will re-flect this. More subtle relations are hard to immediately assess by mere corporainspection.

3. Methods and theories

The ideal grammar is exceptionless (Featherston, sect. 1). But, grammars of hu-man languages are not ideal. Unlike platonic objects (e.g. logical calculi), theyare biologically grounded, cognitive systems for a culturally formed dynamicbehaviour, namely human languages. What we perceive as exceptions are com-promises in the fine-tuning of a complex modular system. Some of them areexternally geared (diachronic relics), some of them internally (inconsistent ruledemands).

From the methodological point of view, it is a crucial question as to whethera perceived anomaly is the reflex of an exceptional trait of the system or justapparent. It is apparent if what we perceive as exception is the result of inad-equately modelling a phenomenon that would turn out as regular in a widercontext of an adequate account. This is exception by error or, in other words,a scientist’s deficiency. A main business of science is to remove these ‘excep-tions’ by testing and changing their theories.


How can we distinguish between ‘real’ exceptions and apparent exceptions?Featherston’s claims rest on the interpretation of the results of a refined methodof gaining introspection data from informants (‘magnitude estimation’). Thismethod in my opinion is not reliable and valid enough to call into questiongeneralizations arrived at by a systematic comparative study of grammar by thescientific community of experts.

First, naive introspection is notoriously erratic – across individuals andacross categories for a single individual – except for the most robust contrasts.Second, an informant’s judgement is an aggregate of all factors that influence a‘this is how I myself would (not) say’-judgement. So, third, informants’ judge-ments would have to accompanied by a protocol of what the informants reportas the crucial traits of the stimulus that (s)he has based the judgement on. Fourth,the patterns of reaction have to be tested for consistency (across the categoriesunder examination) and for retest stability. Fifth, the minimal battery of statis-tic analysis tools needs to be employed in order to guarantee that the sampleis representative, that the correlations are significant, that the results are solidenough to stand up against the null hypothesis, and so on. If linguists apply psy-chological or sociological methods, they are bound to comply with the requiredmethodological standards developed for these methods.

In sum, in the face of the results and interpretations of Featherston’s in-vestigations, I do not feel compelled to give up my conviction that grammarsdetermine discrete characteristic functions (‘well-formed’ vs. ‘ill-formed’) forlinguistic expressions. What appears to be gradient is not the grammar, but thereactions of the test subjects. Models that employ weighted rules (‘violationcosts’) necessarily obscure this important difference: discrete systems producegradient outputs if the output is mediated by additional interacting system. Thisis the case for human languages. Grammar theory models a cognitive capacityfor a discrete symbol management algorithm. Grammar theory does not modelthe cognitive architecture of language production and perception. This is therealm of processing theories. Performance data bear only in a highly indirectway on competence issues.

References

Andersen, Henning1973 Abductive and deductive change. Language 49: 765–793.

Andersson, Sven-Gunnar, and Sigmund Kvam1984 Satzverschränkung im heutigen Deutsch. Tübingen: Narr.

334 Hubert Haider

Browning, Margaret A.1996 CP-recursion and ‘that-t’-effects. Linguistic Inquiry 27: 237–255.

Frey, Werner1993 Syntaktische Bedingungen für die semantische Interpretation. (Studia

Grammatica 35) Berlin: Akademie Verlag.

Haider, Hubert2004a Pre-and postverbal adverbials in VO and OV. Lingua 114: 779–807.

Haider, Hubert2004b The superiority conspiracy. In The Minimal Link Condition, Arthur

Stepanov, Gisbert Fanselow and Ralf Vogel (eds.), 147–175. Berlin:Mouton de Gruyter.

Haider, Hubert2005 How to turn German into Icelandic – and derive the VO-OV contrasts.

The Journal of Comparative Germanic Linguistics 8: 1–53.

Paul, Herman1919 Deutsche Grammatik. Vol. III, Part IV: Syntax. Halle an der Saale:

Niemeyer.

Torris, Thérèse1984 Configurations syntaxiques et dépendances discontinues en allemand

contemporain. Ph.D. diss., Université de Paris VIII-Vincennes.

Distinguishing lexical and syntactic exceptions

Sam Featherston

A quality that I value in Hubert Haider’s work is the consistency he showswhere others seek refuge in fuzziness. It was Haider who correctly stated that,in a binary model of grammar, a single counter-example falsifies a rule (see the‘exceptionlessness’ quote in my paper in this volume). This fundamental factis far too often finessed round, and it is particularly important in the contextof a discussion of the role of exceptions in the grammar. If it is empiricallycorrect that grammatical rules apply exceptionlessly, then this reveals importantinformation about the nature of the grammatical system. If it does not hold,when all the irrelevant factors which Haider correctly mentions are controlledfor, then this finding too has important implications.

I agree with most of what Haider says in his commentary: the criteria heapplies, the distinctions he makes, and the type of data he considers relevant.He calls for appropriate methodological standards, for careful distinctions be-tween the various factors which can influence introspective judgements, forthe careful selection of the syntactic conditions in syntactic studies like ourwork on binding and superiority. All of these are concerns which we entirelyshare.

There are however one or two points which would divide us, and it is thesewhich I shall discuss here. Haider’s distinction between an anomaly (an irregu-larity in the data) and an exception (a restriction on the applicability of a rule) isuseful. I should like to see this extended further, however, so as to differentiatebetween lexical effects and rule-based effects. In my view, the lexicon is thelocation of all effects which are related to or restricted to specific lexical items;only patterning which is independent of lexis need be included in the rule sys-tem. The rule system, I would argue, should in the ideal case be exceptionless:lexical effects on the other hand need not be. There are naturally patterns tothe behaviour of lexically-driven behaviour, but the existence of exceptions tothese is in no way problematic for our conception of the linguistic system. Ifthe lexicon is learnt, then the patterns we find in it are mere association-basedgeneralizations, not rules. Lexicon-based exceptions are thus to be expected andrather unexciting.

336 Sam Featherston

I would therefore hesitate before attributing to the position of genug(‘enough’) (Haider’s example 3) after a modified adjective any great impor-tance, since we would assume this to be lexical. There are enough other ex-amples of similar behaviour for this to be fairly clear (e.g. Engl. ago, Germ.entlang ‘along’) . Potentially more interesting cases of linear ordering are thosewhere structures can optionally appear before or after heads, independent oflexis, such as complements to adjectives in German (stolz auf seine Kinder, aufseine Kinder stolz ‘proud of his children’), especially when this behaviour isnot even marginally possible in closely related languages (Engl. *of his child-ren proud). A thorough investigation of this phenomenon could reveal insightsinto head-complement order, I suspect.

I would also take issue with the status of Haider’s example (1b, my (1),acceptable in German.

(1) A: Sindare

esit

wirklichreally

4747

Umschläge?envelopes

‘Is that really 47 envelopes?’

B: Dasthat

sindare

es.it

‘That it is.’

Here I would argue that the effect is not just a garden path, as Haider suggests,but a lexical exception, or rather perhaps two. The use of expletive es (‘it’) inpresentational structures with apparent plural verb agreement, if not reference(Es sind derer zwei ‘There are two of them’) is certainly exceptional, but it isnot systematic, for the exception is specific to this lexical item.1 The particularexample Haider advances is yet more exceptional however, because these cop-ula structures with es normally contain a plural which can license, in whateverway, the plural verb form (Es kommen jede Woche 47 Studenten ‘There comeeach week 47 students’). Das sind es has no overt plural NP, since the pro-formdas is not marked for plural. This structure is additionally limited to sein (‘tobe’), or perhaps just to copula verbs – (2), (3).

(2) WennIf

ichI

zehnten

hinzufüge?add

Werdenbecome

esit

4747

Umschläge?envelopes

??Das werden es.that werden it

1. The pronominal das seems to allow this more restrictedly too.

Distinguishing lexical and syntactic exceptions 337

(3) EsIt

kommencome

jedeeach

Wocheweek

4747

Studentenstudents

zuto

deineryour

Sprechstunde?office hour

*Das kommen es.that come it

The main aim of my paper was to show linguists shy of gradience that gradi-ence can be good news for syntactic theory. A more empirically adequate modelof grammar is at the same time more descriptively adequate but also more ex-planatorily adequate, because it permits us to reduce the number of exceptionswithin the rule system, which means that the grammar can be more general,more learnable and/or more universal. Haider and I are largely in agreementabout most of the terms of this important debate, but I have argued here thatsyntactic and lexical restrictions may have very different qualities.

Disagreement, variation, markedness, and otherapparent exceptions

Ralf Vogel

Abstract. With the example of case conflicts in German free relative constructions, Idiscuss three problems which might count as grammatical exceptions: constraint viola-tion, systematic variation within a speech community, and contradicting evidence fromdifferent empirical domains. The systematic nature of these phenomena calls for a morecomplex conception of grammar and language in linguistic theory. What is called intoquestion are some of the idealisations that figure frequently in linguistic work.

1. Introduction

Whether a particular linguistic observation is classified as exception, dependson what we assume to be the rule. Background assumptions about the nature ofgrammars and languages guide this classification. A perhaps naive or “folk” lin-guistic characterisation of languages and grammars might include the followingstatements:

– Grammatical rules and constraints are obeyed by all expressions in a lan-guage. Otherwise they would not be rules and constraints of that language.

– If two speakers speak the same language, they speak the same language inevery respect. Otherwise they would not speak the same language.

– Linguists have an objective elicitation method at hand which allows themto figure out exactly which expressions are well-formed in a language, andwhich are not.

It is common wisdom among linguists that none of these three assumptions canseriously be upheld. But I have the impression that the consequences of thisinsight are farther reaching than linguists are usually willing to admit. If wegive up the idea that there is a clear-cut boundary between well-formed andill-formed, for instance, our notion of language changes: there are better andworse examples for expressions of a language, an expression might be seen asbelonging to that language only to a particular degree.

340 Ralf Vogel

A grammatical model that accounts for this has to embody the means to dealwith gradient well-formedness. Exceptions to empirical generalisations have adifferent status in such a grammar. If expressions are seen to be ‘good’ membersof a language to varying degrees, then in fact every expression is an exceptionto a certain degree.

Such a line of reasoning has a quite long research tradition within linguis-tic theory which mostly revolves around the notion of markedness. An impor-tant insight from markedness theory is the idea that a linguistic generalisation,stated in the form of a grammatical constraint or rule, is not falsified by coun-terexamples. Rather, constraints and rules are seen as violable tendencies, orso-called soft constraints. A very influential recent development that emergedfrom this line of grammatical thinking is Optimality Theory (its founding doc-ument is Prince and Smolensky 2004, which has been circulated as manuscriptfrom 1993 on).

The violability of constraints is a core assumption of Optimality Theory(OT). In this model, constraints frequently come into conflicts which are re-solved by prioritisation. Every expression that violates a markedness constraintcan be seen as an exception to the linguistic generalisation that motivates theconstraint. So, exceptions are expected to occur quite frequently, and in generalare easier to accommodate in OT than in a theory that assumes inviolable con-straints, where every exception would at the same time be a counterexample.An exception, i.e., the violation of a constraint, occurs in order to fulfil another,more important grammatical constraint.

Not only can our conception of the grammar be liberalised in order to dealwith exceptions, but also our views of what a language is. The study of thediachronic development of language together with sociolinguistic and dialecto-logical research have led to a conception of languages as constantly changingcontinua with unsharp boundaries. Nevertheless, particular stages in the histor-ical development or particular dialectal varieties are often described, as if theirspeakers’ linguistic behaviour was uniform. This picture might be too idealised.

That corpus frequencies and psycholinguistic experiments mirror grammaronly in a distorted way has been an important argument in early generativegrammar in discarding this kind of work as irrelevant for grammatical theory.But an alternative elicitation method has not been developed within that frame-work. Generative linguistics, especially in syntax, often seems to presupposethat we already know which expressions are well-formed in a language. Butvery often, this is not the case. The development of solid elicitation methodscontinues to be a central task in linguistic research.

The studies that I want to present in this paper touch on each of these is-sues. The morpho-syntactic phenomenon that we will explore are case con-

Disagreement, variation, markedness, and other apparent exceptions 341

flicts in German free relative constructions (FR). This construction is one of therare cases where conflicting constraints can really be observed, as I will brieflysketch in Section 2. The wh-pronoun in (1) is sensitive to the case requirementsof both the matrix verb besuchen, here: accusative, and the verb inside the FR,vertrauen, here: dative. The pronoun can only realise one case morphologically.

(1) IchI

besuchevisit

wem[who-DAT

ichI

vertrauetrust]-ACC

‘I visit who I trust.’

This conflict situation leads to ungrammaticality in far fewer cases than onemight expect. However, we also observe disagreement among both linguistsand native speakers about these structures, which nevertheless follows a sys-tematic pattern, as I will show in Section 3. The generalisation about this sys-tematic pattern can be phrased in terms of tolerance of markedness. It would getlost, if German were particularised into several unconnected sub-variants, eachwith their own exceptionless grammar. We gathered empirical evidence withpsycholinguistic experiments and corpus studies (see Sections 4 and 6) whichlargely confirms the Optimality Theoretic analysis that I will present below (seeSection 5). But we also found counterevidence in one interesting case pattern.Perfectly well-formed expressions can be quite rare, if they contain redundantmaterial. This sheds some light on how representative corpus data are for thegrammar of a language.

2. Case realisation as violable grammatical constraint

The morphological non-realisation of an assigned case on a noun phrase usuallyleads to ungrammaticality, as in the German examples in (2) where nominativeis required on the subject:1

(2) a. Derthe

Schiedsrichterreferee-NOM

hathas

gepfiffen.whistled

b. *Denthe

Schiedsrichterreferee-ACC

hathas

gepfiffen.whistled

c. *Demthe

Schiedsrichterreferee-DAT

hathas

gepfiffen.whistled

1. I will follow here the standard assumption that nominative case is assigned to thesubject by the finite verb – here, the auxiliary.

342 Ralf Vogel

Not all instances of non-realisations of case lead to ungrammaticality. It canhappen that a noun is confronted with two conflicting case requirements. Anexample is the relative pronoun of argument free relative clauses (FR), as in thefollowing examples from Modern Greek:

(3) a. Agapolove-1Sg

opjon/*opjoswhoever-ACC/*NOM

meme

agapa.loves

‘I love whoever loves me.’b. Opjon/opjos

whoever-ACC/NOMpiasocatch-1Sg

thaFUT

timorithi.be punished-3Pl

‘Whoever I catch will be punished.’(Alexiadou and Varlokosta 1995)

In (3a) we see that the FR pronoun realises the case assigned by the matrixverb (“m-case”, here: accusative), and suppresses the one assigned by the rel-ative clause internal verb (“r-case”, here: nominative). If the FR pronoun hasthe chance of retaining r-case, it can nevertheless do so, as shown in (3b). Thetwo options for (3b) result from the fact that Modern Greek is a pro-drop lan-guage: (3b) can be given two different syntactic analyses, where only one ofthem actually displays a case conflict:

(4) a. [IP [FR … ] [IP pro ] … ] = FR is left dislocated, an empty pronounis the subject, no case conflict, r-case required

b. [IP [FR … ] … ] = FR is the subject, case conflict, m-case required

When the FR is in the syntactic position of the subject, it is assigned nominativecase by the finite verb. This case surfaces on the FR pronoun, yielding opjos in(3b). When the FR is left dislocated, we have an empty pronoun that servesas subject and is assigned nominative. Now, the FR pronoun is free to realiser-case, and so it does, yielding opjon in (3b).

These examples show on the one hand that the morphological realisation ofaccusative case is a constraint of the Greek grammar. But, on the other hand, itis obviously violable. Otherwise, case conflicts would not be tolerated at all.

As already explained in the first section of this chapter, constraint violationis a kind of exception that is expected under anOptimality Theoretic perspectiveon grammar.2 It is the result of a situation where different constraints come intoconflict, and only one of them can be fulfilled by an expression. The conflict isresolved by giving one of the constraints higher priority.

2. For representative work in OT syntax, see, for instance, (Legendre et al. 2001). I de-veloped an OT analysis of FR constructions in (Vogel 2001, 2002, 2003).


That the FR pronoun in German, which obligatorily realises r-case, is sen-sitive to both case requirements has already been reported by Pittner (1991).According to her, (5c) and (5d) are ungrammatical:

(5) a. IchI

lade eininvite[acc]

wenwho-ACC

ichI

treffe.meet

b. IchI

lade eininvite[acc]

wemwho-DAT

ichI

begegne.meet

c. *IchI

lade eininvite[acc]

werwho-NOM

mirme-DAT

begegnet.meets

d. *IchI

helfehelp[dat]

wenwho-ACC

ichI

treffe.meet

The crucial difference between (5b) and (5c) lies not in the fact that the caseassigned by the matrix verb, accusative, is left unrealised. It is important, infavour of which case it remains unrealised: accusative might be suppressed infavour of dative, but not in favour of nominative.

In general, a FR with a case conflict is well-formed, according to Pittner,only if the suppressed case is not lower on the following case hierarchy:

(6) The German case hierarchy:NOM < ACC < OBLIQUE (DAT,GEN,PP).

This hierarchy goes hand in hand with further distinctions among the case forms.Nominative is assumed to be the default case, nominative and accusative arestructural cases, while dative, genitive and PPs are usually assumed to be ob-lique case forms. While the dative can be argued to have a certain semanticcontribution to the meaning of the clause it occurs in, and seems to be limitedto thematic roles of a certain kind, no such restrictions can be proposed fornominative and accusative.

3. Variation – among linguists

The judgements given by Pittner are not shared by all linguists. In the view ofGroos and van Riemsdijk (1981), structures like (5b) are ill-formed already, inaddition to (5c) and (5d). But in (Vogel 2001), I nevertheless observe that formany, though not all, speakers, only (5d) is bad, while the other three clausesin (5) are acceptable.

There is, thus, an obvious disagreement among linguists about the facts. InVogel (2001), I deal with this disagreement in terms of variation. German might

344 Ralf Vogel

have three “variants”, German A, B, and C, respectively, which differ in their“tolerance” of case conflicts. The four types of FRs in (5) can be distinguishedby how “serious” the case conflict is. They differ in their relative markednesswhich can be described as in (7).

(7) a. The two case requirements match, no case conflict.b. Case conflict, where the hierarchically lower case – here: accusa-

tive – is suppressed.c. Case conflict, where the hierarchically higher case – a structural

case, here: accusative – is suppressed.d. Case conflict, where the hierarchically higher case – an oblique

case, here: dative – is suppressed,.

No dialectal or sociolectal factor could be identified for the three “variants”German A, B, and C. The best interpretation of this observation is perhapsstatistical: it might be an unrealistic expectation that a speech community ishomogeneous in every grammatical respect. But still, the variation follows acharacteristic pattern which itself can be described in grammatical terms, assketched in (7). Furthermore, it might be the case that German speakers agreein the relative acceptability of the structures. The “variants” only differ fromeach other in the tolerance of markedness. But what counts as marked seems tobe determined in the same way in all variants.

Under a categorical view on grammar, two of the three variants of Germanmust be seen as exception to the third variant, which would have to be identifiedas the ‘norm’. But the decision which variant represents the norm can only bearbitrary. Alternatively, and this is the point of view that I prefer, the observedvariation, which follows a systematic pattern of markedness, could be seen asthe “norm”, the “reality” of the German grammar.

If we rely on absolute grammaticality, we deal with three ununifiable ‘di-alects’. But if we use relative acceptability as our empirical base, we might onlyhave one single language. The three apparent variants, empirical ‘exceptions’for one another, might result from the use of the wrong empirical base in gram-mar modeling, namely, absolute, rather than relative acceptability. Whether anobservation counts as exceptional or regular is sometimes only a matter of the-oretical background assumptions.

In the next section, I will briefly present the results of elicitation experi-ments on case conflicts in German FR constructions, that I undertook in col-laboration with Stefan Frisch and Jutta Boethke at the University of Potsdam(see Boethke 2005), in order to get a clearer picture of the empirical side of thephenomenon.


4. Empirical exploration

We carried out three speeded acceptability judgement experiments, exploringdifferent case conflicts in FR constructions, nominative versus dative (exper-iment 1), nominative versus accusative (experiment 2), and accusative versusdative (experiment) 3.

The participants of the experiments were 24 (different) students in each ofthe experiments. Stimulus sentences were presented wordwise on a computerscreen, in randomised order, mixed with test sentences from 3 other experimentswhich served as distractor items. After the sentence was finished, subjects hadto press one of two buttons for (non-)acceptance. Two of the experiments werecarried out by Jutta Boethke, as part of her diploma thesis (Boethke 2005).

Each experiment contained 8 conditions. Among them were four sentenceswith FRs with the four possible case patterns. In experiment 1, all FRs wereclause-initial, and each of the four sentences was paired with a correlative vari-ant which avoids the case conflict with an additional resumptive d-pronoun.In experiment 2 and 3, the FRs were tested in clause-initial and in clause-finalposition.

4.1. Experiment 1 – nominative versus dative

The first experiment dealt with the conflict between nominative and dative. TheFRs were clause-initial in each of the eight test conditions. The four logicallypossible case patterns appeared in two versions, one with (correlative) and onewithout (FR) a resumptive d-pronoun following the FR. Each participant saweight items of each of the eight conditions. Lexical variation between the blocksand the item sets that the participants saw ensured that there was no confoundingby the lexical material. The test conditions were constructed as in (8).

(8) a. Werwho-NOM

unsus-DAT

hilft,helps

(der)(that-one)-NOM

wirdwill

unsus-DAT

vertrauen.trust

b. Wemwho-DAT

wirwe-NOM

helfen,help,

(dem)(that-one-DAT)

werdenwill

wirwe-NOM

vertrauen.trust

346 Ralf Vogel

c. Wemwho-DAT

wirwe-NOM

helfen,help,

(der)(that-one-NOM)

wirdwill

unsus-DAT

vertrauen.trust

d. Werwho-NOM

unsus-DAT

hilft,helps

(dem)(that-one-DAT)

werdenwill

wirwe-NOM

vertrauen.trust

Subjects were presented each sentence wordwise on a computer screen andwere finally prompted to press one of two buttons for (non-)acceptability. Werecorded both the judgements and the reaction times. I will only present thejudgement data here.

Table 1 displays the results of the four conditions without resumptive pro-noun.3 In the statistical analysis, we found that matching FRs are significantlymore likely to be accepted than non-matching FRs. Furthermore, suppressionof nominative is significantly more likely to be accepted than suppression ofdative, and finally, clause-initial matching nominative FRs are more likely tobe accepted than clause-initial matching dative FRs. This latter observation ispresumably due to an additional factor which has not been controlled for inthis experiment. The clause-initial position is the default position for subjects,but not for objects. Matching dative FRs are objects of their matrix clause, andso should occur in object position. For subordinate object clauses, this is theclause-final position. We controlled for this factor in the second and third ex-periment.

Table 1. Experiment 1, FRs only, average acceptance in %.

m-case r-case % accepted

nom NOM 86.98

dat DAT 70.83

nom DAT 61.98

dat NOM 16.67

3. Here and throughout, I abbreviate case patterns in the form “case1-CASE2”, wherethe first case, in lowercase letters, is the suppressed m-case, while the case in upper-case letters is the case that surfaces on the wh-pronoun, r-case.


The results confirm the picture that has been drawn in the literature. FRswithout a case conflict have a higher probability to be accepted than FRs witha conflict, and among the conflicting FRs, suppression of nominative is moreoften accepted than suppression of dative.

4.2. Experiment 2 – nominative versus accusative

The second experiment again contained eight test conditions, but this time eachof the four case patterns varied with the FR in initial and final position. Wehad no correlative structures. (9) illustrates this with the two case patterns nom-NOM and nom-ACC. In addition to these conditions, we constructed parallelitems with the patterns acc-ACC and acc-NOM.

(9) a. Wer[who-NOM

unsus-ACC

vermisst,misses]-nom

wirdwill

unsus-ACC

suchen.search

‘Who misses us, will look for us.’

b. Unsus-ACC

wirdwill

suchen,search

wer[who-NOM

unsus-ACC

vermisst.misses]-nom

c. Wer[who-NOM

unsus-ACC

vermisst,misses]-acc

werdenwill

wirwe-NOM

suchen.search

d. Wirwe-NOM

werdenwill

suchen,search

wer[who-NOM

unsus-ACC

vermisst.misses]-acc

Apart from the different construction of the test items, the experiment had thesame design as experiment 1. The acceptability results are displayed in Table 2.

Table 2. Experiment 2, average acceptance in %.

m-case r-case initial final total

nom NOM 94.27 81.25 87.76

acc ACC 83.85 91.15 87.50

nom ACC 69.27 78.65 73.96

acc NOM 49.48 70.83 60.16

The results again show that matching FRs are more likely to be judged accept-able. Interestingly, our suspicion that the syntactic position is a relevant factorcould be confirmed. Matching nominative FRs are significantly better in initialposition, while matching accusative FRs are significantly better in final position.

The case hierarchy could also be shown to play a role. Suppression of nom-inative is significantly more likely to be judged acceptable than suppression ofaccusative. Interestingly, non-matching FRs are significantly more likely to be

348 Ralf Vogel

judged acceptable in final position, no matter which grammatical function theyserve in the matrix clause. Even a non-matching FR that serves as subject ofthe matrix clause has a higher probability of acceptance, if it occurs in final po-sition. It seems that the initial position is only advantageous, if the case of thewh-pronoun does not provide contradictory information.

In a statistical analysis across the two experiments, we found a significantcontrast between initial FRs, where accusative is suppressed (49 % in Exp. 2)and initial FRs where dative is suppressed (17 % in Exp.1). This result, togetherwith the previous findings, confirms the proposal by Pittner (1991) that the casehierarchy ‘NOM < ACC < OBLIQUE (DAT,GEN,PP)’ is a crucial factor forthe acceptability of FRs with case conflicts.

4.3. Experiment 3 – accusative versus dative

The third experiment dealt with the two object cases accusative and dative. Theexperiment design was the same as in experiment 2. We had eight conditions,namely the presence of FRs in the four logically possible case patterns in clause-initial and clause-final position. Examples for the test conditions with the dat-DAT and dat-ACC patterns are given in (10).

(10) a. IchI

helfehelp

wem[who-DAT

ichI

vertraue.trust]-dat

‘I help whom I trust.’

b. Wem[who-DAT

ichI

vertrauetrust]-dat

helfehelp

ich.I

c. IchI

besuchevisit

wem[who-DAT

ichI

vertraue.trust]-acc

d. Wem[who-DAT

ichI

vertrauetrust]-acc

besuchevisit

ich.I

(plus 4 conditions with the acc-ACC and dat-ACC patterns)

The default position for both accusative and dative object clauses is the clause-final position. We therefore leave out the results for clause-initial FRs here. Theresults for the clause-final FRs are given in Table 3.

This experiment replicates the results of our earlier pilot study (Vogel andFrisch 2003) on the same case pattern. Matching FRs are at the same high levelof acceptability, which is significantly higher than for non-matching FRs, wheresuppression of accusative has higher acceptability than suppression of dative.4

4. The non-significant contrast between matching accusative and matching dative FRsmight be an effect of the overall lower frequency of dative in German.


Table 3. Experiment 3, average acceptance in %.

m-case r-case accept.

acc ACC 91.67

dat DAT 86.46

acc DAT 72.40

dat ACC 54.17

This result also very clearly confirms the two factors that have been identifiedas crucial for the acceptability of FRs. Having no case conflict is better thanhaving one, and suppressing the less important case is better than suppressingthe more important one.

The still quite high acceptability rate for the worst structure in this exper-iment might cause some worries. If suppression of dative counts as ungram-matical in German, why, then have such structures been accepted at the rateof 54 %? Can we take the percentages that we get in such experiments at facevalue? Certainly not. An experiment creates quite special laboratory conditionswhich influence linguistic intuitions in many ways. Because such factors usuallyinfluence our experimental conditions to the same degree, this does not affectthe relation between the conditions, but we cannot assume, from the experi-ment’s results, that such structures are grammatical or ungrammatical, not eventhat they have gradient acceptability of a certain degree. All we can trust in, isthe fact that there are statistically significant contrasts given our experimentalconditions.

A second, sociolinguistic factor is the fact that most subjects came fromthe Brandenburg area and that the local dialect mixes up the two object casesdative and accusative. Hence, we expect a certain amount of uncertainty andconfusion when subjects from this area deal with a case conflict between dativeand accusative. To clarify this, the experiment will have to be replicated in adifferent area of Germany.

4.4. General results of the experiments

On the basis of these results we can identify four types of case patterns forFRs. These four types differ significantly in their relative probability of beingaccepted by native speakers. We thus get an acceptability hierarchy of case pat-terns. The three observed “variants” of German can be classified along this hi-erarchy. As illustrated in (11), each “variant” correlates with one of the marked-ness contrasts that showed up as statistically significant in our experiments.

350 Ralf Vogel

(11) Markedness hierarchy of German FR constructions:

(i) Matching FRs. (acceptable in German A,B,C)(ii) Non-matching FRs where the suppressed case is lower on the case

hierarchy. (acceptable in German A,B)(iii) Non-matching FRs where the suppressed case is higher on the

case hierarchy, but not an oblique case (i.e., the [acc-NOM] pat-tern). (acceptable in German A)

(iv) Non-matching FRs where an oblique case is suppressed. (not ac-ceptable in German, according to the common view)

Assume that our statistical findings correctly characterise the German facts.Then one prediction would be that in an arbitrarily selected group of infor-mants, more of them would accept FRs with the nom-ACC pattern than FRswith the acc-NOM pattern. If only absolute acceptability is elicited in such apopulation, and only one item per case pattern, then this elicitation would re-produce the three variants German A, B, and C. Thus, the finding that there arethree variants is in fact compatible with our statistical findings. If there werethree variants, and if they were characterised as given in the literature, then,under the assumption that each variant shows up equally in a given population,the results should come out as they did in our experiment.

There is, however, another important observation. Each test condition waselicited eight times with every participant of our experiments. It was not the casethat participants consistently judged all items of one condition alike. Rather, forthe structures with intermediate acceptability status, it has also been the case thatthey were accepted to an intermediate degree by many participants, for instance,we got three rejections and five acceptances from one and the same participant.In statistical terms, the population had normal distribution – which is, in fact,a prerequisite for the application of the statistical tests we used, analyses ofvariance (ANOVA).

If one wants to make this finding compatible with the idea that there arethree variants of German, one would have to postulate that individual speakersconstantly shift between the three variants. Consider, however, that speakershad no choice to express intermediate acceptability status of structures. Theyonly had two buttons for acceptance and rejection. The four-point scale in (11)has to be mapped onto the two-point scale in our experiment. There are threepossible options for this mapping, and these correspond exactly with our threevariants. Are there three variants or is there only one variant with differing prob-abilities of acceptance? Given the above reasoning, both characterisations areequally acceptable. They are both valid descriptions of the same underlyinggrammar.


In the next section, we will briefly sketch an optimality theoretic descriptionof this underlying grammar.

5. A brief OT reconstruction

The constraints that we need for the Optimality Theoretic reconstruction of thegrammar of German FRs must encode two things, (i) the fact that having a caseconflict is worse than not having one, and (ii) the case hierarchy. With a slightmodification of the constraint system I previously proposed (Vogel 2001, 2002,2003), these tasks are fulfilled by the following three constraints:5

Realise Case(RC): An assigned case requires a morphological instantiation.Realise Case (relativised)(RCr): An assigned case requires a morphological

instantiation of itself or a case that is higher on the case hierarchy.Realise Oblique (RO): Oblique Case must be morphologically realised.

These constraints evaluate the morphological realisation of syntactically as-signed case features. The constraints are seen as markedness constraints insofaras they evaluate properties of the expression itself, in particular the correspon-dence between syntactic case relations and their morphological expression.6

The constraint ‘Realise Oblique’ especially refers to the realisation of da-tive case. It is the most respected constraint, neither of our observed variantsaccepts the non-realisation of dative case. The three constraints tolerate a situ-ation, where one phrase, the FR pronoun, serves two case assigners at the sametime, as in a situation where both case requirements match. The constraint ‘Re-alise Case’ is fulfilled under such conditions. ‘Realise Case (relativised)’ is a

5. In Vogel (2001, 2002, 2003) I do not use the constraint ‘Realise Oblique’. Its functionis taken over by more complex mechanisms of the OT grammar which I will not gointo here. The constraint ‘Realise Oblique’ might be seen as a shortcut for thosecomplex mechanisms.

6. However, whether a constraint counts as markedness constraint or as faithfulnessconstraint is also matter of the architecture of the OT model: faithfulness con-straints evaluate the preservation of certain features of the input in output candidates.Markedness constraints do not refer to the input. In our case, the constraints on caserealisation are markedness constraints because the candidates are syntactic structureswith morphologically inflected words as terminal nodes. Thus, the constraint viola-tions can be checked without reference to the input. A different scenario is possible,however, where the syntactic structure is given in the input and the word forms aregiven in the output candidates. In that case, the constraints compare the syntacticcase configuration given in the input with the morphological case instantiations inthe candidates, and the constraints would have to be seen as faithfulness constraints.

352 Ralf Vogel

liberal version of that constraint, which accepts a case to be realised by a casewhich is higher on the case hierarchy. The ranking of these constraints in Ger-man is as in (12).

(12) RO � RCr � RC

FRs which fulfil RO are more likely to be accepted than those which violate RO.FRs which additionally also fulfil RCr fare even better, and those which alsofulfil RC fare best. Thus, this ranking models the empirical findings presentedin the previous section.

If we also want to reconstruct the three German variants, we have to specifyour OT model in the standard way, by predicting optimal structures as grammat-ical. We need an input specification and a set of competing candidate outputs.Suppose that we have two competing candidate structures, a FR structure, anda correlative structure (CORR), one where the FR is accompanied by an addi-tional resumptive d-pronoun, as in the first experiment presented in the previoussection.

Let us assume that the input is a syntactic specification, including the casepattern and the structure of a FR. In such an OT competition, the CORR struc-ture violates a constraint on input preservation, a faithfulness constraint, be-cause its syntactic structure differs from the FR structure specified in the input.

(13) Faithfulness (F): The input is preserved in the output.

Different rankings of faithfulness now yield the three variants. The higher therank of F, the more FR structures are allowed. We only have to integrate F intothe constraint hierarchy in (12), as in (14).

(14) German A: RO � F � RCr � RCGerman B: RO � RCr � F � RCGerman C: RO � RCr � RC � F

The interaction of faithfulness and markedness constraints is the standard ap-proach to optionality and ineffability within Optimality Theory (see Legendreet al. 1998; Bakovic and Keer 2001, for applications of faithfulness in OT syn-tax). Each variant, for instance, allows for CORR structures, as these performperfectly, when they do not violate F. This is the case, when the input is spec-ified for CORR, rather than FR. As there is no restriction on possible inputs inOT, such a competition also has to be considered.

In a sense, the task of faithfulness is the reconstruction of an ‘old style’set of grammatical expressions. The substance of the grammar, however, liesin the markedness constraints, here: the three constraints on case realisation.


The tableaux in (15) show how the acceptability patterns of German A arederived.

(15) German A:

Only suppression of dative is ruled out in that variant, so F is ranked imme-diately below RO. In less problematic case conflicts, like the acc-DAT and theacc-NOM pattern, the FR is given an advantage by faithfulness. Only in the caseof non-realisation of dative, like the dat-ACC conflict, is the unmarkedness ofCORR crucial.

When we make an elicitation experiment, we do not know in advance whichvariant an informant belongs to. Likewise, as the same candidate structure mightappear in many different OT competitions, and violations of faithfulness varywith different competitions (it evaluates input-out relations!), faithfulness con-straints are not very informative. What remains constant among different com-petitions, are the violations of markedness constraints which are summed up inTable 4. The result is a relative ranking of our test structures.

Table 4. Relative markedness of FRs with different case conflicts.

RO RCr RC1. FR: acc-ACC2. FR: acc-DAT *3. FR: acc-NOM * *4. FR: dat-ACC * * *

354 Ralf Vogel

Our OT grammar predicts that the relative markedness of these structures, asdetermined by their markedness profiles, should result in relative acceptabilityin judgement experiments, relative frequency in corpora, and similar contrastsin other empirical studies. The facts presented thus far confirm this prediction.But there is contradictory evidence which will be discussed in the next section.

6. Contradictory evidence

6.1. FR vs. CORR in experiment 1

In the first experiment that we reported in Section 4, we contrasted clause-initialFRs in the four case patterns with nominative and dative to the correspondingcorrelative structures, as in (16).

(16) Werwho-NOM

unsus-DAT

hilft,helps

(der)(that-one)-NOM

wirdwill

unsus-DAT

vertrauen.trust‘Whoever helps us will trust us.’

The acceptability results are displayed in Table 5. The correlative structuresavoid the case conflict by providing one pronoun for each of the two assignedcases. Consequently, they receive similar and high acceptability in all four casepatterns. The contrast between FR and CORR is significant for all case pat-terns, except for the least problematic one, nom-NOM, where the acceptabilityof CORR is also higher, but the acceptability of the FR is already too high toyield a significant contrast between the two structures.

The lacking of a significant contrast might not be problematic here. But anearlier corpus study shows that this case pattern is more problematic for themodel proposed here.

Table 5. Mean acceptability percentages for FR and CORR in different case configu-rations.

nom-NOM dat-DAT nom-DAT dat-NOM

FR CORR FR CORR FR CORR FR CORR

87 95 71 91 62 92 17 90


6.2. A corpus study

In (Vogel and Zugck 2003), we studied the relative corpus frequency of FRand CORR. We used the publicly available Cosmas II corpus of the Institutfür Deutsche Sprache in Mannheim, Germany (an extremely large corpus, con-sisting mainly of newspaper texts). Random samples, each 500 sentences long,have been selected, with the wh-pronounswer, wem, wen (‘who’ in nominative,dative, accusative). The FR and CORR usages in each of the samples have beencounted and sorted into the different case patterns.

The relevant results of this count are displayed in Table 6.7 We see thatFRs are quite rare, when m-case is dative or accusative. With nominative asm-case, CORR is more frequent than FR in case conflict configurations. Butvery surprisingly, in the nom-NOM pattern, the frequency of FR is about ninetimes as high as that of CORR. The average distance between wh-pronoun andthe first word of the matrix clause in the nom-NOM context was 6.02 (FR) vs.12.04 (CORR). There was a highly significant correlation of length and clausetype.

Table 6. Frequencies of clause-initial FR and CORR in the context of different casepatterns.

m-case r-case FR CORR

nom NOM 274 (89.8 %) 31 (10.2 %)

nom DAT 33 (34.4 %) 63 (65.6 %)

nom ACC 5 (25 %) 15 (75 %)

acc ACC 1 (20 %) 4 (80 %)

dat DAT 1 (5.6 %) 17 (94.4 %)

For this case pattern, it seems that FR is less marked than CORR. This contra-dicts the tendency we found in the elicitation experiment, and it also contradictsthe expectations generated from our OT model.

The results of two different empirical methods thus contradict each other.Should the results of one method be treated as exception and those of the otheras the rule? Which results should be assumed to reflect the underlying grammarmore appropriately, and how can we justify such a decision?

Given our grammatical description of FR and CORR, there is no doubt thatCORR should count as the less marked structure. This is also clear from thecorpus counts for conflicting case patterns. The typological perspective also

7. Pittner (2003) reports another corpus study on this phenomenon, with largely equiv-alent results.

356 Ralf Vogel

provides arguments in favour of our view, as the languages that have FRs seemto build a proper subset of those that have CORRs (see Vogel 2002).

The explanation for the exceptionally low frequency of nom-NOM cor-relatives is that they are “over-correct”: the resumptive pronoun is redundant,because both the position of the FR and the case of the FR pronoun alreadysignal the correct m-case. In judgement experiments, the resumptive pronounmakes grammatical information explicit. Judgements are more accurate andeasier to make, and so the resumptive pronoun is rewarded. But in production,there seems to be a tendency to avoid redundant material, if possible.8

This effect might be observable with long-winded expressions in general.Consider the two English questions in (17):

(17) a. Who stole my car?b. Who was it that stole my car?

These examples are semantically equivalent. Though both questions are well-formed English, the cleft construction (17b) is certainly much less frequent.In the case of long-winded expressions, low frequency is no sign of reducedacceptability, and thus should not be reflected in a grammar model.

It is necessary to have a theory about how grammatical properties enter cor-pus frequencies. No empirical method mirrors only the properties of the gram-mar. This is also true of acceptability judgements. It is a well-known fact thatacceptability judgements are strongly influenced by properties of the humanparser, the limitations of working memory, and other psychological factors (seeFanselow and Frisch 2006 for a recent discussion of these issues).

Consider the question how many grades of acceptability there are. In ourexperiments we only used two, but these yielded a four-way distinction amongtypes of case conflicts. Should we therefore use a four-point scale in the future?Perhaps, when dealing with other phenomena we might find three-way, five-way etc. distinctions. It is impossible to postulate a priori, how fine-grainedacceptability is, or even whether it is categorical or gradient. Consequently,there cannot be a ‘right’ elicitation method. Results of experiments have to beinterpeted in the light of independently developed linguistic theories.

7. Conclusion

Exceptions to grammatical rules and constraints are expected under an Opti-mality theoretic perspective that assumes that these can come into conflict with-

8. See Vogel (2006) for further discussion of this issue.


out leading to ill-formedness. Such a perspective is empirically well-motivated.One obvious example for such a situation are case conflicts in FR constructions.

Variation is not simply “exception” if it follows the pattern of relativemarkedness observed here. It only reflects tolerance of markedness defined onthe basis of the same underlying grammar (understood as system of rankedmarkedness/well-formedness constraints). Hence, the same grammar mightlead to different empirical outcomes. Individual members of a speech commu-nity might contradict each other, though it can be shown that the community asa whole follows the same grammatical system.

Discrepancies between different empirical methods do not necessarily con-stitute grammatical exceptions. As in our case, they sometimes only reflect thatstudying grammar cannot be reduced to analysing one single empirical domain,and that empirical methods have their limits.

Violation profiles assigned to structures by an OT grammar can be rathersuccessfully used for empirical predictions. This increases falsifiability. A com-parison of the relative markedness assigned by the grammar with the relativeacceptabilities, frequencies, and preferences observed with different empiricalmethods, not to forget the typology of a given phenomenon, leads to deeperinsights into the nature of both the grammar and those empirical domains.

AcknowledgementsI want to thank my collaborators Stefan Frisch, Jutta Boethke, and Marco Zugck,without whom the empirical research presented in this paper would not havebeen undertaken. I am also grateful to the audience of the DGfS workshop onexceptions in February 2005 and its organisors, Horst Simon and Heike Wiese,for a fruitful discussion and helpful suggestions. This work has been supportedby a grant from the Deutsche Forschungsgemeinschaft, grant FOR-375/2-A3,for the interdisciplinary research group “Conflicting Rules in Language andCognition” at the University of Potsdam.

References

Alexiadou, Artemis, and Spyridoula Varlokosta1995 The syntactic and semantic properties of free relatives in Modern

Greek. ZAS Working Papers in Linguistics 5: 1–30.

Bakovic, Eric, and Edward Keer2001 Optionality and ineffability. In Optimality Theoretic Syntax, Géral-

dine Legendre, Jane Grimshaw, and Sten Vikner (eds.), 97–112. Cam-bridge, MA: MIT Press.

358 Ralf Vogel

Boethke, Jutta2005 Kasus im Deutschen: Eine empirische Studie am Beispiel freier Rel-

ativsätze. Diploma thesis, Institute of Linguistics, University of Pots-dam.

Fanselow, Gisbert, and Stefan Frisch2006 Effects of processing difficulty on judgments of acceptability. In Gra-

dience inGrammar: Generative Perspectives, Gisbert Fanselow, Car-oline Féry, Matthias Schlesewsky, and Ralf Vogel (eds.), 291-316.Oxford: Oxford University Press.

Groos, Anneke, and Henk van Riemsdijk1981 Matching effects with free relatives: A parameter of core grammar.

In Theories of Markedness in Generative Grammar, Adriana Belletti,Luciana Brandi, and Luigi Rizzi (eds.), 171–216. Pisa: Scuola Nor-male Superiore di Pisa.

Legendre, Géraldine, Jane Grimshaw, and Sten Vikner (eds.)2001 Optimality Theoretic Syntax. Cambridge, MA: MIT Press.

Legendre, Géraldine, Paul Smolensky, and Colin Wilson1998 When is less more? Faithfulness and minimal links in WH-chains.

In Is the Best Good Enough? Optimality and Competition in Syn-tax, Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis,and David Pesetsky (eds.), 249–289. Cambridge, MA: MIT Press.

Pittner, Karin1991 Freie Relativsätze und die Kasushierarchie. In Neue Fragen der Lin-

guistik, Elisabeth Feldbusch (ed.), 341–347. Tübingen: Niemeyer.

Pittner, Karin2003 Kasuskonflikte bei freien Relativsätzen – eine Korpusstudie.Deutsche

Sprache 31: 193–208.

Prince, Alan and Paul Smolensky2004 Optimality Theory. Constraint Interaction in Generative Grammar.

Cambridge, MA: MIT Press. [The 1993 manuscript is available athttp://roa.rutgers.edu.]

Vogel, Ralf2001 Case conflict in German free relative constructions. An Optimality

Theoretic treatment. In Competition in Syntax, Gereon Müller, andWolfgang Sternefeld (eds.), 341–375. Berlin: Mouton de Gruyter.

Vogel, Ralf2002 Free relative constructions in OT syntax. In Resolving Conflicts in

Grammars: Optimality Theory in Syntax, Morphology, and Phonol-ogy, Gisbert Fanselow, and Caroline Féry (eds.), 119–162. [Linguis-tische Berichte, Sonderheft 11] Hamburg: Buske.


Vogel, Ralf2003 Surface matters. Case conflict in free relative constructions and Case

Theory. In New Perspectives on Case Theory, Ellen Brandner, andHeike Zinsmeister (eds.), 269–299. Stanford: CSLI Publications.

Vogel, Ralf2006 Degraded acceptability, markedness, and the stochastic interpretation

of Optimality Theory. In Gradience in Grammar: Generative Per-spectives, Gisbert Fanselow, Caroline Féry, Matthias Schlesewsky,and Ralf Vogel (eds.), 246-269. Oxford: Oxford University Press.

Vogel, Ralf, and Stefan Frisch2003 The resolution of case conflicts. A pilot study. In Experimental Stud-

ies in Linguistics I, Susann Fischer, Ruben van de Vijver, and RalfVogel (eds.), 91–103. [Linguistics in Potsdam 21] Potsdam: Instituteof Linguistics, University of Potsdam.

Vogel, Ralf, and Marco Zugck2003 Counting markedness. A corpus investigation on German free rela-

tive constructions. In Experimental Studies in Linguistics 1, SusannFischer, Ruben van de Vijver, and Ralf Vogel (eds.), 105–122. [Lin-guistics in Potsdam 21] Potsdam: Institute of Linguistics, Universityof Potsdam.

What is an exception to what? –Some comments on Ralf Vogel’s contribution

Henk van Riemsdijk

1. Do exceptions exist?

The main message in Vogel’s article, if I interpret him correctly, is that excep-tions as such do not exist. There is a lot of variation, among languages, amongdialects or sociolects, among speakers. But in each case we should refrain fromdefining a norm, and thereby it is inappropriate to say that other variants areexceptions to that norm. What I will try to do in these comments is to arguethat, perhaps, the notion of exception is not to be dismissed out of hand.

This point of disagreement does not detract from the value of Vogel’s con-tribution. He is quite right in showing that there is much more variation at alllevels than many, perhaps most, grammarians would admit, and that there areinteresting links between strictly grammatical variation and statistical data. In-deed, things are always more complex than one first suspects, but the questionremains how we should deal with this complexity.

I intend my comments to be largely independent of the issue of OptimalityTheory (OT), but their implication may well be that OT is too flexible and pow-erful an instrument to raise a number of fundamental questions about the sortsof phenomena that Vogel (henceforth V) discusses.

My main focus will be on the question of whether there are reasons to sup-pose that variation is truly symmetrical, as Vogel argues. I believe that there aregood reasons to doubt that that is the case. My doubts have to do, on the onehand, with the concept of markedness, and on the other with the fact that certaintheoretical choices force the linguist to commit himself/herself as to what is theexpected pattern and what is a deviation from that pattern.

2. Markedness

Variation is intricately linked with the notion of markedness. And indeed, Vbrings up the concept of markedness. But I confess that I do not fully understand

362 Henk van Riemsdijk

what he means. Before looking at his formulation, let us clarify the notion tosome extent. Among a variety of uses, I detect three major interpretations of‘markedness’:

1. Markedness as an evaluation of the status of (introspective) data, as in I amworking home being more marked than I am working at home,1 meaningthat the first variant is less acceptable, rare, stylistically special, etc.

2. Markedness as a tool to rank grammars, that is as a tool to help the childacquiring the language to choose the optimal grammar, cf. Kean (1980),Van Riemsdijk (1978).

3. Markedness as a tool to rank values of parameters, default vs. marked. Thisis a kind of local or micro-variant of the interpretation in 2.

Most likely, interpretation 1 is the one that V has in mind. It is indeed an inter-esting question to what extent the type of evaluation that violable constraintsimpose on syntactic structures have any bearing on or resemblance to the inter-pretations intended in 2 and 3.

Questions that typically arise in the context of markedness considerationsof type 3 do indeed come to mind quite directly when we confront the sort ofdata that constitutes the core empirical object of V’s article. Suppose matchingis a binary parameter (undoubtedly a simplified assumption), then we shouldask if the child acquiring the language in question, say German, starts out witha default hypothesis. Most plausibly, the default would be matching. Any dataindicating a deviation from the matching pattern would lead to a resetting of theparameter in that child’s grammar. In a more fine-grained system, certain viola-tions of matching in the primary data would lead to a resetting in a limited sectorof the relevant part of the parameter system while retaining the general match-ing pattern elsewhere. The opposite approach, assuming that non-matching isthe default value leads immediately to the difficult question of how matchingpatterns found in the primary data could be properly evaluated by the child and,in the absence of negative data, lead to a resetting of the parameters.

Relative rankings of constraints in an OT framework are not, as far as I amaware, linked to markedness considerations of this type. Instead, a subset ofthe constraints, the markedness constraints, is assumed to be ranked low, butsubject to raising in the constraint hierarchy on the basis of relevant primarydata. It is not clear to me, however, whether the constraints V uses to derive thematching/non-matching data and their variation are in any way linked to the OTway of dealing with markedness and acquisition.

1. The examples are taken from Collins (2008: 18), in particular example (67a) andfootnote 16, when the observation is attributed to Paul Postal (p.c.) .

What is an exception to what? – Some comments on Ralf Vogel’s contribution 363

I raise this issue, because it is a possible way of comparing the OT approachto other frameworks.

3. Free relatives as grafts

In my own recent work on free relatives (cf. Van Riemsdijk 2006a, Van Riems-dijk 2006b and references cited there) I have suggested that free relatives shouldbe treated as grafts. By this I mean that the matrix tree and the relative clausetree are built up independently from one another and that the wh-word in therelative clause is merged into a position in the matrix. The effect of this is thatthe wh-word is truly a shared element and that there is no empty head positionin the matrix clause that the relative clause is an adjunct to. Clearly, an analysislike this predicts that matching must be complete. The fact that deviations frommatching are found has been known (within the generative literature) since theseventies (cf. Groos and Van Riemsdijk 1981, Hirschbühler 1977). Any devi-ation from the matching pattern therefore constitutes a serious problem for thegraft analysis.

It is, indeed, the position of Grosu (2003) that a graft analysis should berejected and that free relatives (including transparent free relatives) should allbe analyzed essentially like regular relative clauses. From a perspective likehis, however, any matching effect comes as a real surprise, and artifacts mustbe introduced into the theory to account for them.

This is not the place to continue this debate. But what I am trying to say hereis that the choice of framework or analysis yields immediate predictions as towhether matching effects should or should not be found. In each case, a defaultpattern is predicted, and the non-default pattern is the exception.

The approach suggested by V seems to me to be diametrically opposed tosuch proposals, be it Grosu’s or mine, in that variation is taken to be the centralgiven, nothing is exceptional with respect to anything else. While it is perfectlypossible that this is indeed the way things are, it does appear to lead to a ratherunconstrained view of what is possible and what is not, a view that, withoutfurther elaboration, would seem to lead to considerable problems when we thinkabout the acquisition process, as discussed in section 2.

Pursuing my own interpretation of the matching facts, i.e. that matching isthe norm, a norm that is entirely unexpected on any analysis that makes useof essentially the same structures as headed relatives, the onus of dealing withdeviations from the matching pattern is on me. And I am still looking for agood solution, but I regard the fact that I am forced into such a situation as an


advantage rather than as a disadvantage, as it is illustrative for the inherentlyconstrained nature of my overall approach.

Note, in fact, that the issue of matching vs. non-matching in ‘shared con-stituents’ is by no means unique to the grammar of free relatives. As another no-torious example, consider Right Node Raising (RNR). Leaving open the ques-tion of whether RNR should be treated as ATB-movement, multiple dominance,backward ellipsis or something else, it is quite clear that the phenomenon ischaracterizable in terms of shared material at the right edge of two or more con-juncts. That morphological identity sometimes is and sometimes is not requiredis shown in the following examples, taken from Boškovic (2004: ex. (8)):

(1) a. ?John will (sleep in her office), and Peter definitely was, sleeping inher office

b. John will (sleep in her house), and Peter already has, slept in herhouse

c. John hasn’t (questioned our motives), but Bill may be, questioningour motives

d. John has (slept in her house), and Peter definitely will, sleep in herhouse

e. *John is (entering the championship), but Jane won’t, enter thechampionship

f. *John will (be obnoxious), and Jane actually was, being obnoxious

4. Charting the variation

V has done an outstanding job in painting a fine-grained picture of the variationfound among speakers with respect to matching phenomena. The outcome ofhis experimental research is very interesting. Unfortunately, experimental workis inherently somewhat impeded by the fact that you can never take into accountall the factors that might possibly affect the results. The one factor that he showshas significant effects on speakers’ judgments is whether or not the free relativeis fronted in its clause or in situ.

Let me suggest two more factors that in my own judgments play a role. Imention them because both suggest that the domain in which deviation frommatching is allowed has to be further limited. But before turning to these twofactors, let me make another point. It is true that I am in the ‘strictest dialect’,that is, among the four examples in V’s (5), repeated here for convenience,

(2) a. IchI

lade eininvite [acc]

wenwho-ACC

ichI

treffemeet


b. IchI


wemwho-DAT

ichI

begegnemeet

c. *Ich lade ein wer mir begegnetI invite [acc] who-NOM meets me

d. *IchI

helfehelp [dat]

wenwho-ACC

ichI

treffemeet

I also reject (2b). However, I perceive a clear difference in acceptability be-tween (2b) and (2c/d) in the sense that I find (2b) significantly less bad than(2c/d). What this suggests to me is that perhaps the differences among the var-ious individual grammars of the subjects tested may well be smaller than sus-pected in that the differences might be more about the cutoff point at whichindividuals switch from grammatical to ungrammatical in assessing examples.You might then call those with a strict cutoff point the normative group andthose with a liberal cutoff point the anti-authoritarian group.2

The first factor that I would like to suggest can influence judgments onmatching is that of definiteness. Note indeed that, due to the choice of thepresent tense in the examples in (2), the interpretation that is most prominent isthat of the universally quantified (free choice) free relative, that is (2a) wouldbe taken to mean ‘I invite whoever I meet’. When we force the other, definite,interpretation, for example by switching to the past tense, I feel that the non-matching pattern of (2b) is harder to accept. Consider (3):3

(3) a. #IchI


wem auch immerwhomever-DAT

ichI

begegnemeet

b. ##IchI

habehave

eingeladeninvited [acc]

wemwho-DAT

ichI

gesternlast

abendnight

begegnetmet

binhave

‘I have invited the person that I met last night’

2. It may well be the case that my own normative judgment is due to the fact that I grewup in the German speaking part of Switzerland, where children learn Swiss Germanat home and outside school, but learn Standard German at school and from the mediaonly.

3. I use the symbol ‘#’ to indicate that speakers may vary in their judgment, but thedistinction between one and two #s signals relative acceptability, quite likely for allspeakers.


Needless to say, these are just the introspective judgments of a single individual.Further experimental work would be needed to decide whether this factor doesindeed systematically affect the acceptability of non-matching examples.

The second factor that I submit influences relative acceptability is the dis-tinction between the direct object accusative as in (2b) and the prepositionalaccusative.4 Compare (2b) with the following examples.

(4) a. ##IchI

habehave

michmyself

anto [acc]

wemwhom-DAT

ichI

begegnetmet

binhave

gewandtturned’I turned to whom I met’

b. ##Derthe

Bürgermeistermayor

hathas

denthe

ganzenwhole

Abendevening

auffor [acc]

wemwhom-DAT

derthe

Preisprize

verliehenawarded

werdenbe

sollteshould

gewartetwaited

’The mayor waited the whole evening for who the prize was sup-posed to be conferred on’

Again, I believe that if we look beyond the simplest of examples the matchingeffect establishes itself more strongly.5 But, assuming that these factors do in-

4. It is true that in V’s Case Hierarchy (6) PPs are listed among the oblique cases inthe lowest position, but I believe that the PP as a whole is meant and that, therefore,this statement is relevant for the matching of prepositions (cases like I talk to whomyou talk vs. *I count on/to whom you talk), and not for the cases that the prepositionsgovern.

5. Observe, somewhat paradoxically, that there is one other construction in which theprepositional accusative seems to be less rigid than the direct object accusative: nom-inal appositives. Leirbukt (1978) shows that prepositional accusatives can take non-agreeing appositives in the dative, while direct object accusatives can never do that.Here are two examples of this phenomenon from Van Riemsdijk (1983: exx. (48/50))

(i) Der Verkauf des Grundstücks an den Komponisten,the sale of the land to the composer-ACC

dem späteren Ehrenbürger der Stadt……the later honorary citizen-DAT of the city

(ii) *Ich besuchte dann Herrn Müller,I visited then Mr. Müller-ACC

unserem Vertreter in Pforzheim.our representative-DAT in Pforzheim


deed play a role, the tendency that we observe is that matching is the defaultcase while non-matching is the ‘marked’ variant or, if you wish, the exception.

5. Conclusion

My comments in these few pages amount to the following points.

– It is important to map out fine-grained patterns of variation, and V’s exper-imental work constitutes a fine illustration of the techniques that one may(among others) resort to in doing so.

– It would be wrong to underestimate the number of factors that may influenceintrospective judgments.

– The more work that is done along such lines, the greater the risk of beingoverwhelmed by the impression that everything varies endlessly. But notonly is there structure in the variation, there are also patterns that suggest thatsome facts are more representative of the underlying grammatical systemthan others.

It would not be wrong to use the term ‘exception’ for the latter category offacts. There are, indeed, as I have tried to show, both theoretical and empiricalreasons to believe that variational scales are (or at least can be) asymmetricalwith the options at one end being the default and the options at the other endbeing marked exceptions.

References

Boškovic, Željko2004 Two notes on right node raising. In Cranberry Linguistics 2; UConn

Working Papers in Linguistics Vol. 12, Miguel Rodriguez-Mondo-ñedo and Emma Ticio (eds.), 13–24. Storrs: University of Connec-ticut.

Collins, Chris2008 Home sweet home. NYU Working Papers in Linguistics 1: 1–34.

Groos, Anneke, and Henk C. van Riemsdijk1981 Matching effects in free relatives: a parameter of core grammar. In

Theory of Markedness in Generative Grammar. Proceedings of the1979 GLOW Conference, Adriana Belletti, Luciana Brandi and LuigiRizzi (eds.), 171-216. Pisa: Scuola Normale Superiore.


Grosu, Alexander2003 A unified theory of ‘standard’ and ‘transparent’ free relatives. Natural

Language and Linguistic Theory 21: 247–331.

Hirschbühler, Paul1976 Headed and headless Free Relatives: a study in Modern French and

Classical Greek. In Les contraintes sur les règles, Philippe Barbaud(ed.), 176–229. Montréal: Université du Québec à Montréal.

Kean, Mary-Louise1980 The Theory of Markedness in Generative Grammar. Bloomington:

Indiana University Linguistics Club.

Leirbukt, Oddleif1978 Über dativische appositionen bei akkusativischem Bezugswort im

Deutschen. Linguistische Berichte 55: 1–17.

Riemsdijk, Henk C. van1978 A Case Study in Syntactic Markedness: the Binding Nature of Prepo-

sitional Phrases. Dordrecht: Foris

Riemsdijk, Henk C. van1983 The Case of German adjectives. In Linguistic categories: auxiliaries

and related puzzles, Frank Heny and Barry Richards (eds.), 223–252.Dordrecht: Reidel.

Riemsdijk, Henk C. van2006a Grafts follow from Merge. In Phases of Interpretation, Mara Fras-

carelli (ed.), 17-44. Berlin: Mouton de Gruyter.

Riemsdijk, Henk C. van2006b Free Relatives. In The Blackwell Companion to Syntax, Martin Ever-

aert and Henk C. van Riemsdijk (eds.), 338–382. Oxford: Blackwell.

Response to van Riemsdijk

Ralf Vogel

At first, I would like to thank Henk van Riemsdijk (henceforth HvR) for hisinsightful comments on my contribution. He raises more issues than space al-lows me to address. So I will concentrate on those I find most important in thecontext of this book.

1. Are there no exceptions?

According to HvR, I make the claim that “exceptions as such do not exist”.This is more than I actually wanted to say. My major concern is that we have todistinguish exceptions from predictable variation within a speech community.The example that I chose is instructive here in several respects.

Two different understandings of the term ‘exception’ have to be kept apart.There is an empirical notion of exception that is instantiated by deviating ob-servations. Typical examples of such cases come from morphology, like thecase of regular vs. irregular inflection. We have learned to deal with such casesin our grammar models, for instance in the form of rule ordering, blocking orconstraint interaction. For an Optimality Theoretic approach to grammar, suchobservational exceptions are welcome, as they are empirical evidence for theviolability of grammatical constraints, a core feature of OT grammars.

Secondly, there is a notion of exception which might better be termed (poten-tial) ‘counterevidence’, i.e. observations that plainly contradict the predictionsof a particular grammar. The linguist defending that grammar has the choice toeither ‘explain away’ these cases and treat them as exceptions arising from adifferent cause, or accept them as counterexamples and modify her theory.

The possibility of non-matching free relative clauses in German is, first ofall, a fact. Non-matching FRs are observed to be exceptions: usually, case ismorphologically realised in German, and it cannot be ‘substitutionally realised’by some other case. A clausal subject usually bears nominative case, and itwould be unacceptable if it was realised with dative case instead. But a non-matching FR with a wh-pronoun in the dative case can nevertheless serve asgrammatical subject.

370 Ralf Vogel

Pittner (2003), Vogel and Zugck (2003), Vogel (2006), Vogel, Frisch, andZugck (2006), as well as the studies reported in my contribution repeatedlymake this observation. Hence, Groos and van Riemsdijk (1981) as well as someother linguists working on German were simply wrong in their claim that non-matching free relative clauses are impossible in German.

If a grammar crucially relies on this factual error, the observation of non-matching free relative clauses is true counterevidence that either needs to beexplained away, or requires a revision of the theory in question. HvR admitsthat this is true and that he has no answer yet. When he additionally reportsfeeling comfortable with this situation, as it shows how restrictive his theory is,then it might be important to remind him that descriptive adequacy is the veryleast level a grammar should reach, even on a Chomskyan approach. Withoutempirical adequacy, higher explanatory features of a theory are inapplicable.

A more interesting question, to my mind, is whether we are really forcedto separate German speakers into three groups of German A, B, and C. Thispartition can be seen as an artefact of the elicitation method. Suppose we elicitedjudgements from German speakers for the following three FR clauses:

(1) a. IchI

besuche,visit

wenwho-acc [ACC]

ichI

nettnice

finde.find

‘I visit who I find nice.’matching

b. IchI

besuche,visit

wemwho-acc [DAT]

ichI

vertraue.trust

‘I visit whom I trust.’non-matching, obeying case hierarchy

c. IchI

besuche,visit

werwho-nom [NOM]

mirme-dat [DAT]

vertraut.trusts

‘I visit who trusts in me.’non-matching, disobeying case hierarchy

German A speakers will accept all three clauses, German B speakers will reject(1c), and German C speakers will only accept (1a). But, as HvR himself alsoconfirms, German C speakers (like him) nevertheless have the intuition that (1c)is worse than (1b). Likewise, German A and B speakers both will find (1a) tobe the best and (1c) to be the relatively worst example. Hence, if we asked forrelative rather than absolute acceptability judgements, the difference betweenthe three groups would very likely disappear.

Would it then even be legitimate to speak of different variants? This is thequestion that I am addressing. Empirical claims about the German language are

Response to van Riemsdijk 371

claims about the speech community as a whole, and they are usually based onobservations of representative samples of this speech community. We shouldexpect that grammatical generalisations show up as statistical tendencies only,and not in an all-or-nothing fashion.

A generative grammar, as standardly conceived, assigns a grammaticalityvalue to an expression. This value is Boolean: an expression is either grammat-ical or ungrammatical. Optimality theory deviates from the generative traditionin two respects: one aspect is that the grammaticality of an expression E is notdetermined by inspecting E in isolation, but in a holistic fashion: E is grammat-ical, if it performs better than all possible alternatives in an evaluation based ona hierarchy of violable constraints.

The constraint hierarchy contains markedness and faithfulness constraints.Markedness constraints formulate various aspects of well-formedness, whilefaithfulness constraints mainly determine which aspects of markedness are tol-erated within a language. Thus, grammaticality is derived from markedness.This is the second aspect where OT leaves the generative tradition. Contrary togrammaticality, markedness is a relative concept, and therefore it is much bet-ter suited to construct a descriptively adequate grammar, if description includesempirical studies like those under discussion here. So how shall we understandmarkedness?

2. Markedness and Optimality Theory

HvR is quite right in assuming that, under the definitions of “markedness” thathe offers, his option (1) is the one that comes closest to the way I see it. ButI am interested in exploring to what extent the OT conception of markednesscorrelates with this empirical conception of markedness.

The OT understanding of markedness is very much inspired by the tradi-tional use of this concept in phonology and morphology, as well as in languagetypology. It is not based on gradient acceptability data, but on distributionalfacts, along a typological and a language-internal dimension:

Typological dimension: The number of languages that admit the unmarked formis larger than the number of languages that allow for the marked form. In manycases, the languages that admit the marked form are a proper subset of the lan-guages that admit the unmarked form. That is, if a language admits the markedform then it usually also admits the unmarked form, but not vice versa.

Language-internal dimension: the number of contexts that allow for the markedform is larger than the number of contexts for the unmarked form. Very often,

372 Ralf Vogel

contexts that allow for the marked form also allow for the unmarked form. Butsome contexts only allow for the unmarked form.

Such typological and distributional entailments are among the core empiricalphenomena that an OT model aims at reconstructing. While it is true that suchtypological observations mostly go hand in hand with acceptability status withina single language, this is not a necessity.

The issue that I raise in my contribution is: to what extent can the typology-based conception of markedness of standard OT be related to graded accept-ability as we are able to elicit it using psycholinguistic methods? Here, it is alsoimportant to note which method we chose in our experiments: graded accept-ability has only been measured indirectly. The subjects had only two choicesfor their acceptability judgement: ‘yes’ and ‘no’. Gradient acceptability showsup as statistical tendency both within individual subjects (because of inconsis-tent judgements for the same clause types in repeated measuring) and withinthe whole group of participants. In other words, two effects have been attested:

a. Marked patterns are more likely to be rejected by the same subject.b. Marked patterns are more likely to be rejected by the whole group of partic-

ipants.

So, we still are making distributional observations, though these are observa-tions of introspective judgements, not of the expressions themselves.

3. Learning constraint rankings and “marked grammars”

HvR asks for the relation of the proposed model to learning theory. He wondershow a definition of markedness as a measure to rank grammars as a whole, as inearlier generative models of language acquisition, is related to either my or theOT notion of markedness. In particular, if this measure is understood as in hisdefinition (3), how can we establish the difference between default and markedstate in OT?

HvR takes the example of matching vs. non-matching in FRs as a parameter,stating that a grammar that restricts FRs to those with case matching should bethe default. Does the OT grammar that I presented capture this intuition? Yes,it does. We have three constraints, RC, RCr and RO. What does this constraintsystem predict for cases where there is no case conflict? Consider the Germanclause (2):

(2) Derthe

Hunddog-nom [NOM]

bellte.barked


The subject has nominative case. The OT competition for this structure willhave candidates that differ in the case of the subject. Why do such candidateslose? Because of their constraint violations:

Table 1. OT competition for the case morphology of a German subject noun phrase

[NOM] RC RCr RO

+nom

acc *

dat *

The constraint “Realise Case” directly implements matching as the default.Without a case conflict, there cannot be a situation where an assigned or re-quired case does not surface, independent of how the constraints are ranked.

In case of conflict, so HvR’s argument goes, the default should be match-ing, i.e. the non-acceptability of an FR. Does this follow from what I proposed?Here, I would like to refer to Table 4 of my contribution where the constraintviolation profiles of a matching and three non-matching FRs are compared. Asone can figure out easily, the matching FR violates none of the above three con-straints, while the non-matching ones have more violations. One can thereforestate that a matching FR will always have a better violation profile than non-matching ones, irrespective of the particular constraint rankings. Even moreso, as the violations rise cumulatively between the candidates, their relativemarkedness can be determined independently of the constraint ranking, andtherefore will be the same in every language. Non-matching FRs will alwayscome out as more marked than matching FRs. A grammar that allows for non-matching FRs can be called “more marked” in the sense that it allows for moremarked structures. If we were to assume that the initial state of the languagefaculty of the language learner were the unmarked grammar, this would implythat, for the OT model, faithfulness constraints are initially ranked lower thanmarkedness constraints.

Learning the grammar of German with respect to FRs involves, for instance,learning a constraint hierarchy that excludes case attraction as in the exam-ple (3a) from Modern Greek in my contribution. The constraint rankings thatrefer to German A, B and C only rank faithfulness differently. Standard OTlearning theory, as outlined by Tesar and Smolensky (2000), uses the constraintdemotion algorithm to derive such rankings: when you observe an expressionE, rerank the constraints violated by E such that each of E’s competitors has aworse violation profile than E. In our example, the crucial rerankings target the

374 Ralf Vogel

position of faithfulness. From this perspective, the constraint system proposedhere is as learnable as any other standard OT constraint system.

One advantage of the constraint system that I propose can be seen in the factthat there is no particular case matching constraint. The constraints are very gen-eral and simple constraints on case realisation. A particular constraint rankingcan be seen as the OT correlate of “parameter setting” in traditional generativemodels of language acquisition. In (Vogel 2001, 2002), I use the constraint setintroduced here together with a few additional ones to derive the whole attestedtypology of case conflict resolutions in FR constructions. I thus use a typologi-cally motivated grammar here to predict the outcomes of empirical studies.

4. Syntactic Analysis

HvR offers his own analysis of FRs as grafts, the basic idea being that thewh-pronoun of the FR is at the same time contained in the FR and in the ma-trix clause, and this is the structural basis for the case conflict. I would like tobriefly introduce results from a further experimental study that we undertook,and which is reported in detail in Vogel, Frisch and Zugck (2006). In this studywe tested a case conflict that occurs on a noun phrase that is the complement oftwo coordinated verbs at the same time, as in (3):

(3) a. Maria mochteMaria liked

und unterstützteand supported

den Arzt.the doctor-ACC

b. *Maria mochteMaria liked

und halfand helped

dem Arzt.the doctor-ACC [DAT]

c. *Maria halfMaria helped

und mochteand liked

dem Arzt.the doctor-DAT [ACC]

The two verbs in (3a) both assign accusative to their object, and the clause iswell-formed. This is an instance of case matching. In (3b) and (3c), the coor-dinated verbs differ in their case requirements, as ‘helfen’ assigns dative to itsobject. No matter, which of the two cases is realised on the object noun phrase,the clause is ungrammatical. This could be confirmed in our experiment, againa speeded grammaticality judgement study, that took into account all six logicalpossbilities for the pattern in (3). We found no effect of case hierarchy here. Itmade no difference, whether dative was realised on the object NP or accusative.With the case conflicts in FRs in mind, this is unexpected. How can this be ex-plained? The route that we took, in accord with my earlier analysis of FRs (e.g.,Vogel 2002), is the following: while in (3bc), the conflict is syntactic, in FRs itis only morphological.


The object NP in (3bc) is assigned two different cases syntactically. Thisunavoidably leads to ill-formedness, as an NP can only be assigned one case,no matter which case this is. The wh-item of the FR is only assigned r-case.The m-case is assigned to the FR as a whole. The case conflict comes about,because the wh-item is the element that has to realise morphologically the caseassigned to the FR. Thus, case conflict in an FR is a morphological case conflictonly, while the configuration in (3) leads to a truly syntactic case conflict.

The analysis that HvR offers for FRs creates a situation for the wh-item thatis like the one of the object NP in (3). Consequently, HvR argues that matchingshould be the norm, just as it is for (3). But our experiments confirmed thisfor (3), while it didn’t do so for FRs! Hence, our results do not support HvR’ssyntactic analysis of FRs.

References

Pittner, Karin2003 Kasuskonflikte bei freien Relativsätzen. Eine Korpusstudie. Deutsche

Sprache 31: 193–208.

Tesar, Bruce, and Paul Smolensky2000 Learnability in Optinality Theory. Cambridge, MA: MIT Press.

Vogel, Ralf2001 Towards an optimal typology of the Free Relative Construction. In

IATL8. Papers from the 16th Annual Conference and from the Re-search Workshop of the Israel Science Foundation “The Syntax andSemantics of Relative Clause Constructions”, Alex Grosu (ed.), 107–119. Tel Aviv: Tel Aviv University.

Vogel, Ralf2002 Free Relative Constructions in OT syntax. In Resolving Conflicts in

Grammars: Optimality Theory in Syntax, Morphology, and Phonol-ogy, Gisbert Fanselow, and Caroline Féry (eds.), 119–162. (Linguis-tische Berichte, Sonderheft 11) Hamburg: Helmut Buske Verlag.

Vogel, Ralf2006 Degraded acceptability and markedness in syntax, and the stochas-

tic interpretation of Optimality Theory. In Gradience in Grammar.Generative Perspectives, Gisbert Fanselow, Caroline Féry, MatthiasSchlesewsky, and Ralf Vogel (eds.), 246–269. Oxford: Oxford Uni-versity Press.

Vogel, Ralf, Stefan Frisch, and Marco Zugck2006 Case matching. An empirical study on the distinction between abstract

case and case morphology. Linguistische Berichte 208: 357–384.

376 Ralf Vogel

Vogel, Ralf, and Marco Zugck2003 Counting markedness. A corpus investigation on German Free Rela-

tive Constructions. In Experimental Studies in Linguistics 1, SusannFischer, Ruben van de Vijver, and Ralf Vogel (eds.), 105–122. (Lin-guistics in Potsdam 21) Potsdam: University of Potsdam.

Describing exceptions in a formal grammarframework

Frederik Fouvry

Abstract. Phenomena that a grammar does not describe cannot be analysed by a sys-tem or theory that use the grammar. It is impossible to offer clues as to what may begoing on: is it an error in the input, an omission in the grammar (intended or not), anextra-grammaticality, or an exception? One family of formal frameworks that has beendeveloped and used to write natural language grammars is Typed Feature Logic (TFL).In this paper, we propose an extension to such a formalism to ensure that there alwaysis a minimal analysis of the input. It relaxes the constraints on the information that isassociated with the input just enough to make rule applications succeed, and ranks theresults based on how much had to be relaxed. Analyses for the input that the unrelaxedgrammar does not describe contain the precise location of the error (in the analysis treeand in the tree nodes), as well as the set of values that are involved.

1. Introduction

The goal of linguistic theory is to discover regularities and develop generali-sations in the description of facts about natural language. Form and nature ofthe generalisations help us to understand how language works. Unfortunately,these generalisations are not perfect. Exceptions pose a challenge because theydisrupt the structure of theoretical descriptions. The descriptions have to be re-vised to incorporate the exceptions – if that is at all possible. When we considerthe problem from the viewpoint of the theory, and not as natural language ex-perts, we can reformulate it in a simplified form: any utterance that is acceptableto language users but not completely described by a linguistic theory is an ex-ception (in the wider sense) to the rules of that theory. In what follows, wewill present a technique that enables a grammar writer to obtain a grammat-ical description for exceptions of this general nature, while at the same timepostponing the need for their detailed and precise treatment. In this paper there-fore, exceptions, also called extra-grammaticalities, are phenomena that are notcovered by the rules of the theory.

378 Frederik Fouvry

The ideas in this paper and the need for this solution originated from workin computational linguistics. While working with linguistic grammars, compu-tational linguists typically face the problem that when the input cannot be de-scribed by the grammar, the quality of the output drops dramatically. We firstpresent a computational linguist’s view on exceptions. Then, we describe ourmethod, and finally, some implications of this technique are discussed.

2. Exceptions in computational linguistics

Whereas in development of linguistic theory on paper, dealing with exceptionscan be conveniently postponed, there is a strong need for some way of treat-ing them in applications that have to deal with real-life texts, such as systemsthat use grammar implementations. Although the single frequency of exceptiontypes may be relatively low, the numbers add up, and on the token level theycan be quite frequent.1 When we take into account that implemented grammarstend to break down when descriptions are incomplete, the need for a treatmentis clear.

Often the distinction is made between “real errors” and “things that the sys-tem does not get right”. We would like to point out that it is impossible forthe kind of system we assume to distinguish between those cases. In order tobe able to do so, it would need to contain at least two grammars: one for thenatural language (as humans do in their capacity as speakers of a natural lan-guage) and one for the (implemented) theory. We assume that a system onlyhas one such grammar. That is the case in all systems we know of. We fur-thermore assume that the utterances the system will be dealing with are gram-matical or at least acceptable. Dropping the latter assumption makes any kindof tolerant processing practically impossible, because too many parameters canbe varied.

Exceptions have to get a place in the linguistic description in a system. Whatare the options for dealing with them? Much of the concrete answer to this ques-tion depends on the level in the linguistic description the phenomena belong to.Mostly the formalisms (formal frameworks that are used for theory develop-ment and implementation) determine what can be done and how it should bedone. The presence of a default mechanism for instance makes a great differ-ence in the treatment of exceptions: defaults and exceptions are each other’scomplement, and therefore go together very well. Without defaults, other tech-

1. Even though pure lack of lexical coverage is the most serious problem, the incom-plete grammar coverage is considerable as well (Baldwin et al. 2004).

Describing exceptions in a formal grammar framework 379

niques have to be used. An overview of how exceptions have been integratedin linguistic theory is given in Moravcsik (this volume), but without touchingon formal issues.

Morphological analysis components form an example of a concrete formal-ism. They are often realised with pure table lookup (chosen for reasons of com-puting efficiency). In that case, anything that is not in the table is an exception;there are no other rules. This approach must not be confused with the generationof tables for this lookup. Here rules will be used, and exceptions are likely tooccur. An example of this generation is described in Corbett (this volume). Wewill return to it later on.

For more complex levels in natural language such as semantics, pragmaticsor stylistics, the rules are not sufficiently clear and explicit as to be able to speakof exceptions. In this paper, we limit ourselves to mainly syntax, as there arecertain exceptions (although not to the extent of e.g. morphology) and becausethe framework we work in has been developed for that purpose.

2.1. Grammars

A grammar implementation typically consists of a grammar and a parser. Thegrammar consists of a lexicon and a set of syntactic rules to combine the el-ements of the lexicon with. The parser reads the input, looks up the words inthe lexicon and then tries to apply all rules to the constituents (the words andthe results of earlier rule applications). In the case of a context-free grammar(cfg) and bottom-up processing, a rule application consists of combining con-stituents with each of the rule daughters. If the combination is successful, a newconstituent has been found. The combination is the point where a constituent isfound to be good or bad for this rule application.

We assume – as in linguistic theory – that a sentence that is not describedby the grammar, does not belong to the language. Many parsers also operateon this principle, and return the failure to find a description as the result: theymake a binary distinction. The grammar in the linguistic theory is assumed tobe a complete model of the grammar of the speaker or of the language, and theinput is assumed to be conforming to that grammar. The same holds for gram-mar implementations, with the additional complication that it is not possibleto distinguish between phenomena that should have been in the grammar andphenomena that do not belong in the grammar.

The assumption that the grammar is a complete and correct model of thegrammar of the speaker is almost certainly false. It makes therefore sense toextend the processing formalism to catch cases where the assumptions may beviolated. That is what we will be doing in the next section. For the remainder

380 Frederik Fouvry

LN LILN LI

LI′

Figure 1. The relation between the natural language and the language of an imple-mented grammar. Left: the traditional relation between the languages. Right:the relation between the natural language and the language of the relaxedimplemented grammar.

of this section, we discuss the existing situation and approaches to deal withexceptions in it.

In Figure 1 (left), a pictorial view of the current situation is shown (i.e. withbinary distinctions): the natural language LN is a not a sharply defined concept,the implementation on the other hand makes a very sharp distinction (LI ). Theintersection of LN and LI is what the system can successfully and correctly dealwith. Ungrammatical is what lies outside of LN . The difference of LN and theimplemented grammar LI contains what is extra-grammatical (for the imple-mentation).2

Exceptions are defined as deviations from the rules, and therefore they be-long to the set of extra-grammaticalities (insofar as they have not been addedas rules to the grammar, see section 2.2).

The situation that we want to attain is shown in Figure 1 (right). The systemgrammar for LI is more permissive: the grammar is defined for LI , but the for-malism can relax it so that it can describe all of LI ′ (the coverage of the relaxedgrammar). Descriptions for structures outside the language LI are also possible(in LI ′), only will they not be as informative as the structures for grammaticalsentences (see section 3.2.1). They are however available, which is an improve-ment over the current situation. It is possible to cover a much larger part of thenatural language. Some constructions may of course still be missing or not ap-plicable, but in principle the grammar can be relaxed to the point that everythingis covered.

The undesired coverage may (will) also increase. It will be the task of thegrammar writer on the one hand, and of refinement techniques on the other totry to keep this as small as possible.

2. The term “extra-grammatical” is sometimes also used for phenomena that lie beyondwhat one wants to describe in a grammar, such as layout. This is not so here.


In computational linguistics, the implemented grammar always makes bi-nary distinctions. There are however ways to make this distinction less sharp orto deal with them in some other way. Exceptions may be removed by modify-ing the grammar, by modifying the grammar formalism, or by adding statisticalinformation. We discuss these in the next paragraphs.

2.2. Treating exceptions in the grammar

For grammarians, treating exceptions in the grammar is the ideal solution, es-pecially when incorporation of the exception into the grammar leads to the for-mulation of a better generalisation. In practice, exceptions are most often in-tegrated into the grammar, sometimes without a linguistic improvement of therules, i.e. without a better generalisation. This is very likely due to the fact thatthere are very few formal devices available to linguists. With the integration intothe grammar, an exception is treated as a separate phenomenon which shouldbe modelled in the grammar. It is not guaranteed that the relationship to therule of which it is an exception can be retained in the description, although it iscertainly desirable to make it so.

A disadvantage of this way of working is that the exceptions take a largepart of the grammar, relative to the other phenomena. (The severity of this isinversely proportionate to the frequency of the exception.) Together with thesize of the grammar, the odds on complicated interactions as well as the main-tenance effort increase.

Here the notion of grammar is the traditional one: the grammar describes thelanguage competence of the user. Exceptions that are worthwhile treating in thegrammar are the very frequent ones, because the returns for the descriptive workare high.

2.3. Treating exceptions in the formalism

Instead of modifying the grammar every time an exception is discovered, it ispreferable to have a system that can deal with all cases as well as possible. Itis however only feasible to develop such devices when the mechanisms withwhich the grammar is processed, are very strictly defined. The solutions areextensions to the standard, non-tolerant formalism, and are often formulated ina way such that the non-tolerant functioning of the grammar is a special case ofthe extension. That is only possible when it is well-defined.

This solution requires fewer direct efforts from the linguist to deal withextra-grammatical phenomena. The soft-failure does not hide other (perhapsmore important) problems from the view, which is a great advantage.

382 Frederik Fouvry

Some concrete possible solutions are: default rules, mal-rules, relaxationrules, and deeper modification of the formalism. We discuss these now.

2.3.1.Mal-rules

A solution that conceptually occupies the middle ground between grammar andformalism changes is the introduction of “mal-rules”. These are rules that de-scribe a specific error, for instance lack of subject-verb agreement, verb-secondinstead of verb final positioning, and so on.

Mal-rules are used often in computer-aided language learning (call), whereit is important to detect the errors correctly. In such a setting, each error has itsown rule. This approach is not free of problems however. Sometimes a mis-take can be explained in several ways. To appreciate this point, consider thefollowing example (James 1998: 200–201):

(1) a. “having *explain my motives”b. (i) “having explained my motives”

(ii) “having to explain my motives”

There are two explanations for the error in example (1a): the language learnerwho produced the sentence wanted to use either the past participle (1b-i) oran infinitive construction (1b-ii). This is a serious problem that we face whendealing with extra-grammaticalities. In the best case, the error can be describedwith one rule for all alternative descriptions, but that is unlikely to be useful inmany cases.

An advantage of this solution is that the explanation of the mistake is easy:mal-rules are associated with a description of the extra-grammaticalities theyare intended for. That every extra-grammaticality has to be described first ishowever a disadvantage.

Exceptions can also be treated in this way. In that case there is not so muchdifference with the solution in section 2.2. The main difference is that mal-ruleshave a special status among the formal devices. Therefore, they can be appliedin a very controlled way, typically when no analysis was found with the normalrules. They function as fallback rules. They take as much space in the grammaras other rules, but remain unused as long as they are not needed.


2.4. Relaxation rules

Douglas presents an extension to a unification formalism where the rules areannotated with relaxation principles (Douglas and Dale 1992; Douglas 1995).They are similar to mal-rules, except that there is a step-wise and monotonicretraction of the constraints in a rule, until a solution can be found.

Suppose there is the rule in (2a) and the input in (2b).

(2) a. Rules --> np vp:<np per> = <vp per><np num> = <vp num>

b. “*They walks”

The rule rewrites S (sentence) to a sequence of a noun phrase NP and a verbphrase VP. The constraints (the part after the colon in the rule) restrict a certainfeature path (e.g. <np per>) to have a certain value (e.g. 3), or to be re-entrantwith another feature path, e.g. <vp per>. The input in Example (2b) cannot beanalysed with the rule in (2a) because <np num> = plural and <vp num>= singular, and these values are not the same. Only when the requirementis relaxed that the number values should be unifiable can the rule applicationsucceed. That is what Douglas does: when no analysis can be found, the con-straints are relaxed, but only stepwise. For each rule, the order in which theyshould be relaxed is indicated by a cost. First only the constraints with a lowercost are relaxed, and this is recorded as the penalty.

This still requires a good deal of work from the grammar writer. (For in-stance, in the described approach, the linguist has to understand the interactionsbetween the rules due to relaxations, in order to assign the right costs.)

2.5. Only changing the underlying formalism

The last technique in this section is one where there is no need to change thegrammar to be able to deal with extra-grammaticalities, because the formalismtakes care of it.

Krieger (1995) describes an approach in a typed formalism where all typecombinations can be allowed for user-defined sets of types. His direct aim wasto aid the grammar writer in debugging. The relaxation simply lies in the freecombination of types from certain sets. (In principle, it can be done for all types,but then all rules would apply in all cases.)

384 Frederik Fouvry

Others (notable is the collection of papers in Schöter and Vogel 1995) havemade similar attempts, but did not exploit the use of types. They therefore onlycan deal with a very limited set of cases.

In the following sections, we present a further solution in this category, butone with more general applicability. First however, we briefly discuss statisticalprocessing.

2.6. Statistical processing

The approach that has to be mentioned, as it is very successful in deliveringanalyses for unrestricted text, is statistical processing. They take as their basisan existing formalism, and extend it with automatically acquired probabilities.The strict comparison of the equality of categories is replaced by a softer one,one of probabilistic preference. Statistical parsers can easily deal with extra-grammaticalities and undescribed events like exceptions, and a grammar doesnot need to be adapted for parsing other texts and domains.

There is however no notion of an exception in the statistical component of aparser. A rule application or a constituent is more or less probable or prefer-able, but they cannot make a distinction between grammaticality and extra-grammaticality. Statistical processing also relaxes the boundaries of Figure 1by turning them into probabilities. The picture in Figure 1 (right) applies in thesame way, except that there is no distinction between LI and LI ′ .3

3. Generalised unification

Having reviewed the existing solutions, we now present generalised unification.It modifies the definition of unification such that the rules of the grammar canalso describe input which does not quite fit with the grammar. This is achievedby relaxing the rules, and assigning a penalty to relaxations. Both actions arebased on the type hierarchy.

3.1. Setting

As formal setting, we are using Carpenter’s Typed Feature Logic (1992). Alinguistic theory that is based on Typed Feature Logic or a similar formalism,such as Head-driven Phrase Structure Grammar (hpsg) (Pollard and Sag 1994),consists of two parts: a type hierarchy (also called “signature”) and a grammar.The grammar contains the various rules and the lexical entries. The rules and

3. A distinction could be introduced by defining a probability threshold.


the lexical entries are all feature structures. The shape of the feature structuresis determined by the type hierarchy.

3.1.1. Types

A type hierarchy is a finite partial order (more specifically a meet semi-lattice),i.e. a set that consists of a finite number of elements (types), which are orderedwith respect to each other. The ordering relation is subsumption (|=). If a type asubsumes another type b, then a is equal to or more general than b. An exampleof a hierarchy is shown in Figure 2.

category

verbal nominal gerund determiner adjective

verb gerund-as-verb noun gerund-as-noun

Figure 2. A sample hierarchy.

The most general type is category: everything in this hierarchy is a category.The other types are more specific. If there is no relation between two types, e.g.between determiner and adjective, then they have nothing in common (exceptfor the supertype). All types (except for the most general one) have to have atleast one supertype. Gerund-as-noun for instance has more than one supertype.That means that if we find something of which we know that it is nominal andgerund, we know it is gerund-as-noun. Each type inherits the properties of itssupertypes and adds it own. Without features (see section 3.1.2), properties areonly conceptual and not visible.

Two types are compatible if one subsumes the other, or if they have a com-mon subtype, for instance nominal and gerund-as-noun, or verbal and gerund.The unification ( �) of two types is the most general subtype that they havein common. For the pairs of types that we just mentioned, the unifications aregerund-as-noun and gerund-as-verb respectively. If there is no such commonsubtype, then the unification fails.4

From a practical viewpoint, typedness helps to ensure the correctness of agrammar.

4. The type hierarchy is constructed such that, if there are common subtypes, there isalways a most general one. This can be done automatically, and need not be theconcern of the grammar writer.

386 Frederik Fouvry

3.1.2.Features

It is also possible to define features on types. A type noun could for instancehave a feature case. All subtypes of noun will then inherit this feature. When afeature is defined, the possible values of the feature also have to be determined.It makes sense to limit it to cases that exist in the language (so that, say, a nounwith case singular is impossible). To that effect, a case hierarchy is defined,and case is required to have a value that is subsumed by case. With this infor-mation, we can build a feature structure, for instance for a nominative noun (seeFigure 3):[nouncase nominative

]

Figure 3.

The feature structure can alternatively be represented as a graph as in Figure 4:

noun nominativeCASE

Figure 4.

There are no limitations on the number of features. Only every type has to carryall features that it was defined with or that it inherited. Feature structures can benested as well. In Typed Feature Logic, all grammatical categories are featurestructures, such as in Figure 5:⎡⎢⎢⎣

signorthography “him”

head[nouncase [ accusative ]

]⎤⎥⎥⎦

Figure 5.

3.1.3.Unification

In a unification-based grammar, the rules are similar to the rules in a cfg.5

With cfgs the applicability of rules is determined by an equality test betweenthe category of the constituent and the rule daughter. With feature structures,

5. This is only true of most implementations. In theoretical hpsg for instance, the wordorder is assumed to be specified independently from the rules (Pollard and Sag 1987).


the comparison is done by compatibility. Two feature structures are compatiblewhen they do not contain any conflicting properties. The absence of a certainproperty is not considered to be significant. With typed feature structures, thetype values have to be compatible as well.

⎡⎢⎢⎣

fF [ a ]G [ b ]H [ c ]

⎤⎥⎥⎦�

⎡⎣ f

F 1G 1

⎤⎦ =

⎡⎢⎢⎣

fF 1 [ b ]G 1H [ c ]

⎤⎥⎥⎦

Figure 6.

In Figure 6, h is not present in the second unificand, but it is in the result. Unifi-cation is monotonic: it keeps all information from its input. In the second featurestructure, we see a reentrancy. This means that the value of f and g is one and thesame object (token-identity). The two features are only different ways to reachthat feature structure (here of unspecified value). The graph representation forthis feature structure is shown in Figure 7:

f

F

G

Figure 7.

A consequence is that any property that is imposed on the value of f, also holdsfor the value of g and vice versa. The result of the unification as it is shown inFigure 6 is correct if b is a subtype of a. If that were not the case, and a and bare not compatible, the unification would fail.

Rules in unification-based grammars are normally specified such that infor-mation is shared (through a reentrancy) between the mother node (the left-handside of the rule) and the daughters. This way, properties of the daughters can bepassed up in the tree, and that can be exploited for linguistic description or forbookkeeping information. The fact that both the mother and one daughter in therule in Figure 8 (A) are verbal (V stands for a verb), can easily be expressed bya reentrancy in between the category features, as in (6).

(A) VP → V NP

(B)[

CATEGORY 1LEVEL phrase

]→

[CATEGORY 1verbLEVEL non-phrase

][CATEGORY nounLEVEL phrase

]

Figure 8.

388 Frederik Fouvry

Before continuing, we summarise what has been presented. The rules in ourformalism are like context-free rules, except that the categories are not atomicvalues, but typed feature structures. To test whether a rule can be applied using acertain constituent, it is unified with one of the rule daughters. If the unificationsucceeds, the application can proceed with other constituents for the remainingrule daughters, until a new constituent is found. Otherwise, other constituentsand other rules need to be tried out.

3.2. The approach

Our goal is to be able to describe extra-grammaticalities without the necessityto extend our grammar. Therefore, we need the following:

– Every sentence or utterance can be described.– An analysis has to contain a measure of how much the grammar had to be

relaxed.– Extra-grammaticalities can be distinguished from each other.

The intuition behind our solution is that we keep track of all information that thegrammar provides in the form of lexical entries and rules for a given input. Theinformation in the grammar consists of the feature structures and the types thatwere used to create them. In grammatical sentences, the information remainsintact, while in extra-grammatical sentences, some of it has to be discarded inorder to comply with the grammar. One might say that the grammar rules actas a filter.

3.2.1.An analysis for every sentence

In order to be able to describe every sentence, we need to make any unificationpossible. Then there will always be at least one tree that spans the input. Theunification for every set of types need to be defined (and is thereby allowed).That is achieved by an automatic (order-preserving) extension of the type hier-archy to a lattice. Unifications like determiner � adjective are now possible.The notation for the result is: determiner ∨ adjective.

There is a second kind of unification necessary. With the unification of theprevious paragraph, rule applications will only collect all values that are en-countered during the analysis. This cannot tell us how much this value deviatesfrom the grammar, since we do not have any way of distinguishing betweenthe rule values and the input values. Therefore, the values in rules are set up asfilters for the values coming from constituents (see Figure 9).

If the value in the rule is more general than the value in the rule daughter,then the latter stays as it was. For instance, a rule that specifies that a category


� =

Figure 9. A grammar rule acts as a filter. The grey square is the rule. The unificandsrepresent fragments of a type hierarchy. The unification of the two only leavesthe compatible information visible.

should be of type category, does not impose any restrictions. Therefore all val-ues are possible, as in the following coordination rule (Figure 10):

[ category 1 ] → [ category 1 ] “and” [ category 1 ]

Figure 10.

Determiners and adjectives can be coordinated: the category will be determiner∨ adjective. When however the rule value is not compatible with the constituentthat is used, then the incompatible information is removed through a generali-sation:

[ category 1 ] → [ category 1 verb ] [ category noun ]

Figure 11.

In Figure 11, determiner fits neither daughters, and therefore the type of the re-sult would be the generalisation (

|= ) of determiner and for instance verb, whichis category. Because the grammar rules need to be maintained, the ultimate cat-egory value of the new constituent will be verb.

For every sentence there is a description, but either there is type informationlost through a generalisation, or because type values are put together.

3.2.2.A measure for extra-grammaticality

The disappearance of type information can be quantified by counting how muchof the information that was supplied by the input, has to be discarded. The mea-sure we shall use here is the number of steps that need to be taken in the typehierarchy to keep the feature structures compatible with the grammar. (Othermeasures are equally possible. Only should the mapping between the type hi-erarchy and the ordering on the information quantity be order-preserving. Thatmeans that a supertype should always have a smaller weight than a subtype.)

390 Frederik Fouvry

With a generalisation, the “amount of extra-grammaticality” is the number oftypes between the result of the generalisation and the original type.

For a set of incompatible types, such as determiner ∨ adjective, several waysof counting are possible. The most conservative one is the sum of the numberof steps that need to be taken to make the set compatible with the desired re-sult. Since we do not know that value, we can take the smallest of all possiblevalues. For determiner ∨ adjective, that is 1: regardless of whether determineror adjective is taken out, the removed information is 1, which is the distance tothe most specific common supertype, category.

Additionally, we count the number of times we have seen every type value,and multiply this with the loss for every type. This expresses that values thathave been provided more frequently in the input sentence should count as more,also after unification. In the sentence in (3), there is twice evidence for feminine(“la” and “voiture”), and only once for masculine (“abandonné”). Discardingmore frequent information should be penalised more heavily.

(3) LaThe

voiturecar

étaitwas

*abandonné.abandoned

‘The car was abandoned.’

For each type that is encountered, not only the type itself has to be counted butalso all its supertypes, as shown in the following Figure 12:

plural1

number1

�singular1

number1

=plural1 singular1

number2

Figure 12.

The value of number in the result is 2 because that is the sum for the values onnumber in the unificands.Plural and singular only occurred once, and thereforetheir occurrences do not change in the result. This is done to make sure that noinformation is lost unnecessarily: supertypes that are compatible with a rulevalue are always kept.

The occurrence values are in principle independent from each other, withthis restriction that the occurrence of a supertype always has to be greater thanthat of one of its subtypes. This reflects the fact that when a type is used, alsothe properties that were inherited from the supertypes are used. A generalisationshould not throw this away.


When there is no information loss in an analysis, it means that the inputcould be described by the grammar, and the result looks precisely as when itwould have been constructed by a non-tolerant grammar.

3.2.3.Distinction between extra-grammaticalities

The requirement that extra-grammaticalities are distinguishable is needed toavoid the trivial solution that a single or a few failure types are defined to whichall failures are reduced. In some cases, this may be a valid approach, but for ourneeds it is necessary to make the distinction between masculine |= feminine |=

neuter and feminine |= neuter: the first unification is clearly worse.The distinction is also useful when it comes to diagnosis or corrections. Val-

ues that were not used at all are unlikely candidates for a correction.

3.2.4.Reentrancies

Reentrancies have a special status. They are information carriers, but at thesame time, they reduce the information weight in a feature structure. This isdemonstrated by the following. According to the definition of subsumption inCarpenter (1992), the feature structure in Figure 13 (A) is subsumed by thefeature structure in Figure 13 (B). When we add up the total weight for thetwo feature structures, thereby assuming that that a has an occurrence of 1,and a weight of 3, and the top node simply 1 and 1, we get for Figure 13 (A):1×3 + 1×1 = 4; for Figure 13 (B): 1×3 + 1×3 + 1×1 = 7.

(A)

[F 1 aG 1

]

(B)

[F aG a

]

Figure 13.

This is justified because the feature structure nodes are the basis for countingthe weight. When there are fewer nodes, the weight should be smaller. It doesnot make a difference over how many paths the nodes are accessible.

As separate information carriers, reentrancies could be weighted separately,but not in all cases: only information that can be lost should be taken into ac-count. Not all reentrancies may be lost, since that would deeply affect process-ing with the grammar rules. Some reentrancies, viz. the ones between motherand daughters, are fundamentally responsible for keeping the rule applicationprocedure in good order. The other reentrancies might be counted explicitly.

392 Frederik Fouvry

It is unfortunately not clear how the weights should be assigned, counted andprocessed, especially in combination with the type weights.

3.3. A few examples

Some examples will show the technique can be used in grammar writing.In Ancient Greek, the form of the verb for third person plural shows some

idiosyncrasy: for the genders masculine and feminine, it has the normal pluralform (e.g. for the copular eisi(n)), but for neuter, the verb form is the same asthe one for third person singular (esti(n)).6 Specifying a re-entrancy betweenthe agreement features of the verb and the subject does not work correctly (ina non-tolerant formalism): it fails for third person plural verbs with neuter sub-jects. There are a few alternative analyses. One works with a special form of thethird person plural for neuter. Another states that neuter plural is in some sense(e.g. semantically) really felt as a singular, such as a collective noun (this is the“traditional” explanation). Implementations would probably tend to favour thefirst solution. There is a third analysis, which is only possible in a frameworksuch as the one we are presenting: there, the phenomenon is left out of the gram-mar stricto sensu. The phenomenon is only treated by a tolerant formalism (itbelongs to the set union of LN and the set difference of LI ′ with LI ). A valueclash will be detected when the agreement features are unified: singular ∨ plu-ral, and in the interpretation of the results, this specific incompatibility may beallowed, by explaining it as an exception, not an error. The extra-grammaticalitycan be described, whereas it was not with previous formalisms. The explanationof the analysis however is not part of the presented formalism extension, sinceit requires more knowledge than is present in a grammar.

The situation is somewhat more complicated when it is necessary to ruleout the use of neuter plural and the third person plural of the copular. In thatcase, all agreement checks must be treated in the same way, i.e. they are treatedinside or outside of the grammar. That restriction is due to the way in whichthis phenomenon is analysed here: reentrancies cannot be considered as excep-tions, and therefore the fact that there is an exceptional non-reentrancy cannotbe expressed. In that case, other means have to be chosen to describe the phe-nomenon, e.g. where the reentrancies are not the direct reflection of the linguis-tic description. We do not pursue this specific problem further here, as it wouldtake us too far.

6. An agreement example may be misleadingly simple, many exceptions are more com-plex. However, in a unification-based grammar framework, everything is describedas an agreement phenomenon: the values have to unify.


Another example is the use of transitive verbs as intransitives in English,such as to eat. Normally, the grammar requires that transitive verbs should havea direct object. When there is none, as in Is he eating again?, the verb frame isincomplete. When that has not been treated in the grammar, this sentence wouldbe impossible to describe. The reason for this defect cannot be given by thisgrammar, since there is no means of doing so, except by having an (external)expert manually perusing the analysis. With tolerant parsing, an analysis canbe given, and it states that a direct object was expected by the verb, but notgiven in the sentence. Concretely, it looks like the following: in the rule thatcombines subject and verb, the verb is required to have an empty complementslist (all objects have been found). The verb from the input on the other hand hasstill an object slot open. These two values, an empty list and a singleton list,clash. This example also shows that this technique does not make a distinctionbetween errors and exceptions.

There also exist verbs that cannot be used intransitively (e.g. to take, to lift).When such a verb is used incorrectly, the same description will be found: “adirect object was expected by the verb, but not given in the sentence.” It isimportant to use a description of the problem, because in the system there isnot enough information to distinguish between errors or mistakes on the onehand and grammatical but unusual behaviour on the other. That the formalismcan return a description at all that relates to the grammar, is novel. Further steps,such a correcting the sentence or the grammar, and making a diagnosis are notwithin the scope of this paper.

We can also take an example from morphology, e.g. from Corbett (this vol-ume) the inflection of the Slovene clovek. The exception in this paradigm con-sists of two factors: suppletion and syncretism. In this case, the (somewhat un-usual) goal is to generate the table describing the inflection, and not to analyseinput containing the word. The exception lies in the relation between the differ-ent word forms in the paradigm (knowledge of which is usually not available ina grammar). The linguistic analysis where the phenomenon is integrated into thegrammar describes the combination possibilities of the stem clovek with the dif-ferent case endings: it combines with singular and dual nominative, accusative,dative or instrumental endings. The other stem, ljud, only combines with plu-ral and dual genitive and locative endings. When the stem is combined with anending, then their compatibility is checked and if their requirements unify, theword can be formed. This is the situation where the exception has been incorpo-rated into the grammar. When on the other hand one of the forms has not beenentered in the grammar, the default case will occur, i.e. the same stem is usedfor all forms. (We do not take into account any phonological or orthographicchanges.) If for instance only clovek is known for the entire paradigm, then there

394 Frederik Fouvry

are two different scenarios. In the first, clovek creates too many forms (e.g. also*cloveki for genitive dual and plural). That is overgeneration, and it cannot beautomatically detected. Undergeneration however can. In the second case, ljudiis found in the table and cannot be assigned a proper analysis, because it is notknown. This can be detected, and a possible analysis proposed based on theendings. It is not possible to automatically relate the two stems as belonging toone paradigm since that is an inductive step. The rule that suppletion violateswas expressed above by requiring that all forms should use the same stem. Asfor the syncretism, we can describe it as a rule itself (genitive dual and pluralhave to be the same form), or as an exception, in which case the rule states thatthe forms have to be different.

We have described the exceptions by creating rules for the unexceptionalcase, here relations of stems within the paradigm, and where exceptions occur,we find them as value clashes. Higher-order exceptions are treated by provid-ing one or more rules for each dimension. The interaction of the dimensionsis visible in the different explanations that can be given for the exception (thegenitive dual and plural peculiarity is either a violation of the rule that the stemshave to be identical, or to the rule that the forms have to be different).

4. Discussion

In the present section, we discuss some interesting properties of the proposedformalism.

4.1. Relation to the original formalism

The proposed changes only extend Typed Feature Logic. The original resultsof the grammar remain precisely the same, with the difference that many fail-ures are now turned into a linguistic analysis. The most important property ofunification is monotonicity. This means that information that has been added tothe system does not disappear. Do the proposed changes violate monotonicity,since information is removed and sometimes even changed (see Figure 11)? Inthe extended formalism, the original coverage of the grammar is not touched,hence monotonicity is maintained. Input that cannot be analysed by the gram-mar was rejected anyway. (Strictly speaking, that also breaks monotonicity: thesentence is thrown away.) In these cases, and only in these cases, the valueaccumulation is not monotonic (in the discussion of the case depicted in Fig-ure 11), determiner becomes verb). The information weight however alwaysmonotonically decreases in the case of both grammatical and extra-grammaticalinput.


We have not investigated the relation to formalisms that use defaults (Bouma1992; Briscoe, Copestake and de Paiva 1993; Lascarides et al. 1996). Untypedformalisms such as the one used by Lexical-Functional Grammar (lfg) (Bres-nan 1982; Dalrymple et al. 1995) can be treated as well, as an untyped systemis essentially a typed one with a very flat hierarchy.

A nice consequence of the fact that the formalism is only an extension, isthat there is no need to change existing grammars that have been written for theoriginal formalism.

4.2. Philosophical status of the grammar

With the proposed formalism, extra-grammatical input is within the reach oflinguistic description, without being in the grammar. We believe this is novel.Without a (correct) target structure, description of ill-formed input was previ-ously impossible. Now alternatives can be computed on the basis of the input(and the grammar). This has interesting consequences for the status of a gram-mar. Traditionally a grammar should describe everything that is inside of thelanguage. Now, the definition of “grammar” can become more flexible. Lin-guists may decide for themselves what status they assign to the tolerant modulein the grammar formalism. The two extremes of the spectrum are:

– the grammar is a minimal well-defined core, and all deviations should betaken care of by tolerant processing

– the tolerance module is just a fall-back for cases that were missed in thegrammar

Not a great deal needs to be done to realise this choice: the decision on whatshould be put into the grammar is the only difference. The first extreme positionis realised when no more grammar work takes place, the second one when ev-ery newly discovered extra-grammaticality is integrated into it (the traditionalgrammar implementation work). The best practical choice probably lies some-where in the middle.

This approach explicitly takes into account the expectation of an incom-plete grammar. It does not deal with extra-grammaticalities to the end (see sec-tion 4.4), but it makes clear what the grammar would do with it in a consistentand systematic way.

The proposed model makes no principled distinction between exceptionsand errors. It is up to the grammar writer to decide on their status. An errorlies outside the grammar, an exception lies inside it. After deciding where aphenomenon lies, its place needs to be explicitly defined in the grammar. Whenan error is more regular than the exception, then it will not be found if it is

396 Frederik Fouvry

in the grammar (detection of overgeneration within the system is impossible),but the exception will be detected as a problematic case, which may point thegrammar writer to the overgeneration. If the exception is in the grammar, theovergeneration may still exist, in which case it is harder to detect.

4.3. Implications of the use of the module

Because the extension has taken place in the formalism, not the entire gram-mar search space needs to be checked. There is no hard need for a preliminarydescription of extra-grammatical phenomena, since descriptions are generatedanyway. The results of the analyses remain to be interpreted, and if necessary,converted back to a normal description. How this is precisely done dependson the context in which the system is used (see section 4.4.) The interpretationof descriptions containing a problem is very flexible, but complex at the sametime.

The strength of the approach is at the same time also its weakness: it exploitsthe properties of the formalism to provide a treatment for exceptions. Phenom-ena that cannot be described with this formalism fall outside of the scope of thissolution. We believe however that the technique is generally applicable to otherformalisms that are based on a partial order.

4.4. Diagnosis and correction

Obtaining an analysis that has dealt with extra-grammatical phenomena, is notthe final goal. At that point, the system has only made the distinction betweengrammatical and extra-grammatical input. The analysis still needs to be inter-preted, to see what the extra-grammaticality precisely consists of, in order todecide whether the input contained an error or an exception. If it is the former,there is no guarantee that it will be the correct or wanted solution. If it is a realexception, nothing more needs to be done. The construction of the right de-scription is the task of an interpretation module. On the basis of the obtainedsolutions, the interpretation module has to find what the problem really is.

It is possible to tune the grammar and the type weights in order to obtain adifferently ranked set of results. What the best strategy is to do this, remainsto be investigated. Weights could for instance be automatically learnt on thebasis of a manually re-ranked test corpus (similar to the work in the Redwoodsproject (Oepen et al. 2002)).

4.5. Moravcsik

Where does this paper fit in the classification of Moravcsik (this volume)? Shediscusses how exceptions are dealt with in linguistic descriptions. We have de-


veloped a technique to detect rule deviations, and to describe precisely wherethe problem lies (from the viewpoint of the implemented grammar). Whether itis an exception and how it should be treated, is left to the grammar writer.

5. Conclusion

We have presented an extension of Typed Feature Logic to deal with extra-grammatical phenomena, such as exceptions. Through an enlarged type hier-archy, it is made sure that there is at least one analysis. The set of analyses isranked according to ungrammaticality, which is defined as deviance from thegrammar (in terms of type distance and lost information). Some examples haveshown that it can deal with more than only agreement extra-grammaticalities.Contrary to standard formalisms, our proposal makes a qualified distinction be-tween alternatives. This gives the linguist, both computational and theoretical(or also the system that processes the output from the grammar) the option ofhaving a closer look at extra-grammaticalities. The relaxation has been obtainednot by changing the grammar, but by modifying the formalism, and exploitingthat fact that types provide a natural carrier for graceful degradation.

AcknowledgementsThis research was funded by a University of Essex Studentship and by the Ger-man Research Fund DFG (through the projects SFB 340 B 4/B 8 and SFB 378B 4/MI 1 perform). Thanks for helpful comments go to the two anonymous re-viewers, the editors, Doug Arnold, Péter Dienes, and the audiences of the talksI gave on this topic.

References

Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim, and Stephan Oepen2004 Road-testing the English Resource Grammar over the British National

Corpus. In Proceedings of the Fourth International Conference onLanguage Resources and Evaluation (LREC 2004), 2047–50.

Bouma, Gosse1992 Feature structures and nonmonotonicity. Computational Linguistics

18 (2): 183–204.

Bresnan, Joan (ed.)1982 The Mental Representation of Grammatical Relations. (Series on

Cognitive Theory and Mental Representation) Cambridge, Massachu-setts/London, England: MIT Press

398 Frederik Fouvry

Briscoe, Ted, Ann Copestake, and Valeria de Paiva (eds.)1993 Inheritance, Defaults and the Lexicon. (Studies in Natural Language

Processing) Cambridge: Cambridge University Press.

Carpenter, Bob1992 The Logic of Typed Feature Structures: With Applications to Unifica-

tion Grammars, Logic Programs and Constraint Resolution. (Cam-bridge Tracts in Theoretical Computer Science 32) Cambridge: Cam-bridge University Press.

Dalrymple, Mary, Ronald M. Kaplan, John T. Maxwell III, and Annie Zaenen (eds.)1995 Formal Issues in Lexical-Functional Grammar. (CSLI Lecture Notes

47) Stanford: CSLI Publications.

Douglas, Shona1995 Robust PATR for error detection and correction. In Nonclassical fea-

ture systems, Andreas Schöter, and Carl Vogel (eds.), 139–156. (Ed-inburgh Working Papers in Cognitive Science 10) Edinburgh: Centrefor Cognitive Science, University of Edinburgh.

Douglas, Shona, and Robert Dale1992 Towards robust PATR. In Proceedings of the Fifteenth International

Conference on Computational Linguistics. Nantes, 23–28 August1992. Vol. 2, Hans Karlgren (ed.), 468–474. International Committeeon Computational Linguistics.

James, Carl1998 Errors in Language Learning andUse: Exploring Error Analysis.Ap-

plied Linguistics and Language Study. London/New York: Longman.

Krieger, Hans-Ulrich1995 TDL: A type description language for constraint-based grammars.

Foundations, implementation, and applications. Ph. D. diss., Depart-ment of Computational Linguistics and Phonetics, Saarland Univer-sity, Saarbrücken, Germany. [Published in 1998 as Vol. 2 of the Saar-brücken Dissertations in Computational Linguistics and LanguageTechnology.]

Lascarides, Alex, Ted Briscoe, Nicholas Asher and Ann Copestake1996 Order independent and persistent typed default unification. Linguis-

tics and Philosophy 19 (1): 1–90. [Revised version of acquilex IIWP 34 (August 1994/March 1995). Also chapter 3 in Schöter andVogel (1995: 61–136).]

Oepen, Stephan, Kristina Toutanova, Stuart Shieber, Christopher Manning, Dan Flick-inger and Thorsten Brants

2002 The LinGO Redwoods Treebank: Motivation and preliminary appli-cations. In Proceedings of the 19th International Conference on Com-


putational Linguistics. Taipei, Taiwan: International Committee onComputational Linguistics.

Pollard, Carl, and Ivan A. Sag1987 Information-Based Syntax and Semantics. Vol. 1: Fundamentals.

CSLI Lecture Notes. Stanford: CSLI Publications.

Pollard, Carl, and Ivan A. Sag1994 Head-Driven Phrase Structure Grammar. (Studies in Contemporary

Linguistics) Chicago/London/Stanford: University of Chicago Press.

Schöter, Andreas, and Carl Vogel (eds.)1995 Nonclassical Feature Systems. (Edinburgh Working Papers in Cog-

nitive Science 10) Centre for Cognitive Science, University of Edin-burgh.

Explanation and constraint relaxation

Pius ten Hacken

1. The special position of computational linguistics

Fouvry discusses the treatment of exceptions in the context of language process-ing. This raises interesting issues because this perspective combines two ratherdifferent conceptions of science. On the one hand, language processing can bestudied as an empirical phenomenon taking place in humans. In this approach,linguistics is an empirical science aiming to describe part of the observableworld and explain it in terms of its underlying system. On the other hand, lan-guage processing can be seen as a task to be formalized in the sense that a com-puter can perform it. As I argued elsewhere, ten Hacken (2001, 2007a), com-putational linguistics is an applied science. This means that it is not concernedwith describing and explaining the world, but with solving practical problemsand explaining why the solution works.

In the case of language processing, the empirical science approach asks forthe formulation of a hypothesis about the human parser, whereas the appliedscience approach requires the specification of a working computer program.Both approaches are valid from a scientific point of view, but it is importantto distinguish them, because an optimal solution for the one is typically notadequate for the other. An explanation of human language processing does nothave to be the basis for a computer program. Conversely, a running computerprogram does not have to explain anything about how humans process language.In fact, a computer program mimicking human language processing is likely tocombine the worst of both worlds. There is no reason to expect that computersas we know them today have the same limitations as human brains.

It is important to keep this in mind when we consider the recognition prob-lem. In the shape it is studied by Fouvry, this problem concerns the matching ofwritten input with the structures generated by a formal grammar as programmedon a computer.

The computational perspective also opens up an ambiguity in the use ofthe term exception. Apart from the familiar linguistic sense of ‘irregularity’,

402 Pius ten Hacken

this term is also used in computer science in the sense of ‘undescribed event’.The latter means that the machine enters a state that is not foreseen in the pro-gramme. If nothing special has been done about it, an exception in this senseleads to a crash.

2. Computational linguistics, competence, and exceptions

Fouvry’s Figure 1 gives a good intuitive representation of the problem of gram-mar writing. In the light of the preceding discussion of the status of computa-tional linguistics, however, it is useful to distinguish two interpretations of thisfigure.

In the context of linguistics as an empirical science, the grammar is a theoryof the speaker’s competence. The competence is not directly observable andthe techniques for collecting relevant data contribute to the vagueness at theborders of the object. The grammar need not be a fully formalized system. Itcan leave open more than one option where not enough data are available tochoose one.1

In the context of computational linguistics (CL) as an applied science, the na-ture of the objects represented in Figure 1 is entirely different. The solid objectwith vague boundaries represents the potential input to the processing system.The clear object with precise boundaries is the coverage of the formal gram-mar as implemented in the processing system. Therefore, in the CL perspectivewe have a partial match between two sets of sentences. The vagueness of theboundaries of the input set should be interpreted as a result of the uncertaintyon which sentences may actually appear in the input.

It might be tempting to collapse the two interpretations into one, taking com-petence as the underlying source of the input. In trying to do so, however, anumber of problems emerge. One category of problems concerns the relationbetween the sentences found in the input and the competence. Another categoryconcerns the type of entity the competence belongs to.

The first category relates to the distinction between competence and per-formance. As explained by Chomsky (1965), the input sentences of a process-ing system belong to performance, which does not reflect competence directly.

1. Of course, the grammar should not depend on undefined intuitive notions as in sometraditional approaches to grammar. The approach to grammar intended here, anddescribed in more detail in ten Hacken (2006, 2007b), corresponds to the one adoptedin a wide variety of approaches of generative grammar, although it excludes strictlyformalized approaches such as Generalized Phrase Structure Grammar, cf. Gazdaret al. (1985).

Explanation and constraint relaxation 403

There are at least three factors that lead to discrepancies. First, other types ofknowledge are involved in constraining and selecting what sentences are pro-duced, e.g. pragmatic knowledge. Secondly, competence is used creatively andaccording to free will. This can also lead to less than fully grammatical sen-tences used for a particular effect. Finally, on the track from the mental produc-tion of a sentence to its realization, distracting factors may lead to errors.

The second category of problems is more fundamental, because it relatesto the nature of competence itself. In CL, competence is mapped to a set ofgrammatical sentences. This step is necessary for a match with the languageproduced by the formal grammar, which also constitutes a set of sentences.However, as elaborated by Chomsky (1980), it raises a number of problems.First, grammaticality is not a binary property. Although there are clear cases ofgrammatical and ungrammatical sentences, there are also many shades in be-tween. Arguably, this can be represented by the shades in Figure 1. However,there is another problem. Competence is strictly individual. As Chomsky arguesrepeatedly, competence does not correspond to the notion of a named language,e.g. English, but to the state of an individual’s mind/brain.2 No two individu-als can share a competence, because no two individuals share a mind/brain.They can only be similar. The problems this causes are compounded by the in-teraction of competence with other components of knowledge and distractingfactors.

This analysis of competence leads to three types of exception, which seemto be collapsed in Fouvry’s discussion:

– idiosyncrasies in the central concept of competence, e.g. the fact that theplural of ox is oxen rather than *oxes;

– cases of more or less marginal grammaticality resulting from the inherentgrammaticality cline in competence, the use of pragmatic and creative re-sources in interaction with competence, and the degree of individual varia-tion involved;

– errors occurring in the realization.

The first category, idiosyncrasies, is the more traditional linguistic notion of ex-ception. The other two categories correspond to the undescribed events in com-puter science. The difference between them is that in the former case, marginal

2. In HPSG, Pollard and Sag (1994) try to avoid committing themselves to either amentalist or a non-mentalist position, without however formulating a convincing al-ternative view of the nature of language. They use the notion of ‘shared knowledge’,but it remains unclear how this notion should be interpreted in the context of diver-gences between individual and/or groups of them.

404 Pius ten Hacken

cases, a human reader (for text) will be able to explain the event in terms ofvarious factors intended to interact with competence, whereas in the latter case,errors, the human reader is unable to do so.

3. Strategies for exceptions in computational linguistics

Fouvry proposes to treat exceptions by the relaxation of constraints imposed bythe grammar. He also considers encoding exceptions in the grammar directly.In his section 4.2 he presents these two strategies as two extremes and claimsthat “The best practical choice probably lies somewhere in the middle.” Thisconstitutes a rather one-dimensional approach to the treatment of the types ofexceptions analysed above as idiosyncrasies, marginal cases, and errors. In fact,a more balanced view takes into account the specific properties of these types.

In the case of idiosyncrasies such as oxen, the exceptions are actually partof the ‘well-defined core’ to be described by the grammar. There is no goodreason to treat such exceptions in any other way. If nothing is stated about theplural of ox, the correct form will be rejected and the form oxes will wrongly beaccepted. Constraint relaxation is a very inefficient way to counter this. First,oxes will still be accepted and indeed preferred as the plural form. Second, oxenwill achieve the same status as a number of other forms that are neither regularnor correct. Which other forms have this status depends on the details of thestatements of the pluralization rules. In the simplest case, the final -n will betreated as a typo, so that oxen has the same status as oxec, oxej and oxel. Surely,this is far from ideal.

The most widespread approach to exceptions in the grammar is the adop-tion of a default mechanism. An example is DATR, as applied, for instance,by Cahill & Gazdar (1999). Although listed in the introduction to section 2.3,defaults are not discussed as an alternative to constraint relaxation. The exam-ple of the Ancient Greek number agreement would be much better served in anapproach in which the clearly delimited exception is stated as a subregularitythan by treating it on a par with other cases in which subject and verb do notagree in number.

Marginal cases may be a better area for the application of the constraintrelaxation technique. A more principled approach would of course try to disen-tangle the different influences that determine the judgement that an expressionis less than fully grammatical. However, this would involve an analysis of thesefactors which may be beyond what we can expect in fully formalized compu-tational system. The question remains how efficient constraint relaxation is formimicking the types of divergence from full grammaticality that occur in actual


text. While constraint relaxation guarantees robustness, much of the intendedcontent is likely to be missed in the analysis.

Fouvry’s primary examples used to illustrate constraint relaxation clearlyfall into the category of errors. While examples such as his (2b) and (8) illus-trate the technique quite clearly, no evidence is given that they represent a sig-nificant proportion of actually found errors. The alternative treatment discussedby Fouvry is so-called malrules. In a sense, malrules and constraint relaxationapproach the problem from opposite sides. Malrules look for particular, well-specified errors. They are based on systematic error analysis and geared towardsfeedback about the errors. Constraint relaxation is very unspecific in what itlooks for. Its main purpose is not to let errors in the input produce a crash. Itmight have been fairer to compare constraint relaxation with chunk parsing asproposed by Abney (1996). In chunk parsing, analyses for parts of a sentenceare returned when no full analysis can be produced.

4. The purpose of constraint relaxation

If Fouvry had presented constraint relaxation as nothing more than a generaltechnique for increasing robustness, I would probably not have objected to it.What I find problematic in his article, however, is his suggestions in severalplaces that constraint relaxation is more than an ad hoc mechanism to avoidcrashes.

In the first few sentences of his article, Fouvry presents the problem tobe dealt with as “to offer clues as to what may be going on” if the input isnot recognized by the grammar. As far as I can see, if there is anything con-straint relaxation does not do, it is offering any such clues. What it can do is tostate which constraints were relaxed and in what ways. This is not sufficient toknow whether the input contains an idiosyncrasy not covered by the grammar,a marginal case of creative use of language, or an error.

A more specific description of Fouvry’s goal is given at the start of sec-tion 3.2: “to describe extra-grammaticalities without the necessity to extend thegrammar”. This can be a legitimate goal in a context in which a grammar isgiven and robustness has to be increased for practical reasons. It is not part oflinguistics as an empirical science, because it imposes an arbitrary restrictionand is not geared towards general explanations. Moreover, as Fouvry notes cor-rectly in section 4.2, his model does not make a principled distinction betweenexceptions and errors. In any competence-based approach to linguistics, thisdistinction is essential, because exceptions belong to competence and errors donot.

406 Pius ten Hacken

It may be possible to increase the interest of Fouvry’s proposal by limitingits scope to computational linguistics as an applied science. However, as I arguein ten Hacken (2001, 2007a), in order to be an applied science as opposed tomere technology, computational linguistics has to provide explanations of whyand to what extent the solution to a particular practical problem works. Thefirst step is then to specify the problem. A practical problem is more like howto give useful feedback to language learners in CALL, as discussed in Fouvry’ssection 3.2 than like how to increase robustness without changing the grammar.The latter is far too dependent on theory-internal constraints for an explanatoryaccount to be of general interest.

References

Abney, Steven1996 Partial parsing via finite-state cascades’, Natural Language Engineer-

ing 2: 337–344.

Cahill, Lynne, and Gerald Gazdar1999 German noun inflection, Journal of Linguistics 35: 1–42.


Chomsky, Noam1980 Rules and Representations. New York: Columbia University Press.

Gazdar, Gerald, Ewan Klein, Geoffrey Pullum, and Ivan Sag1985 Generalized Phrase Structure Grammar. Oxford: Blackwell.

ten Hacken, Pius2001 Revolution in computational linguistics: Towards a genuinely applied

science. In Computational Linguistics in the Netherlands 2000: Se-lected Papers from the Eleventh CLIN Meeting, Walter Daelemans,Khalil Sima’an, Jorn Veenstra, and Jakub Zavrel (eds.), 60–72. Am-sterdam: Rodopi.

ten Hacken, Pius2006 Formalism/formalist linguistics. In Encyclopedia of Language and

Linguistics,2nd ed., Keith Brown (ed.), Vol. 4, 558–564. Oxford: El-sevier.

ten Hacken, Pius2007a Computational linguistics as an applied science. In Computation, In-

formation, Cognition: The nexus and the liminal, Gordana DodigCrnkovic and Susan Stuart (eds.), 260-269. Cambridge: CambridgeScholars Press.


ten Hacken, Pius2007b Chomskyan Linguistics and its Competitors. London: Equinox.

Pollard, Carl, and Ivan A. Sag1994 Head-Driven Phrase Structure Grammar. Chicago and Stanford, CA:

University of Chicago Press and Center for the Study of Language andInformation.

Unexpected loci for exceptions:languages and language families

Quantitative explorations of the worldwidedistribution of rare characteristics, or:the exceptionality of northwestern Europeanlanguages

Michael Cysouw

Abstract. In this article, the distribution of rare features among the world’s languages isinvestigated based on the data from theWorld Atlas of Language Structures (Haspelmathet al. 2005). A Rarity Index for a language is defined, resulting in a listing of the world’slanguages by mean rarity. Further, a Group Rarity Index is defined to be able to measureaverage rarity of genealogical or areal groups. One of the most exceptional geographicalareas turns out to be northwestern Europe. A closer investigation of the characteristicsthat make this area exceptional concludes this article.∗

1. Introduction

From a cross-linguistic perspective, the notion of exceptionality is intricatelyintertwined with assumptions about (ab)normality. A language showing an ‘ex-ceptional’ characteristic is much too often just a language that differs from thefew ‘normal’ European national standard languages widely investigated in cur-rent linguistics. Unfortunately, from a worldwide perspective it is these Euro-pean national standard languages that often turn out to be atypical – as will beshown later on in this article. Instead of assuming knowledge about what isnormal or exceptional for a human language, I will investigate exceptionalityempirically by taking account of the worldwide linguistic diversity.

One way to empirically approach the notion of exceptionality is to replace itwith the notion of rarity. Strictly speaking, exceptionality is a more encompass-ing concept than rarity. However, rarity is much easier to operationalise whendealing with large amounts of data. In this article, a trait will be considered ex-

* I thank Bernard Comrie, the editors of the present volume, and one anonymousreviewer for their comments and input on the basis of an earlier version of thispaper.

412 Michael Cysouw

ceptional when it is rare with regard to the known worldwide diversity. Suchan approach can only be taken given a large amount of data about the world’slinguistic diversity. Such a database has recently become available in the formof the World Atlas of Language Structures (WALS, Haspelmath et al. 2005),and I will gratefully draw on this enormous dataset for the present investigationof rarity among the world’s languages.

This paper is organised as follows. First, in Section 2, I will introduce theWorld Atlas of Language Structures from which the typological data are drawnthat form the basis for my calculations of rarity. In the following Section 3,the quantitative approach to compute rarity from typological data is explained.Section 4 then looks at the overall rarity for individual languages, claiming theSouth American language Wari’ to be one of the languages with the highestindex level of rare characteristics. In Section 5, the calculation of rarity is ex-tended to encompass groups of languages, and this calculation is applied to ge-nealogical families. The Kartvelian and Northwest Caucasian language familiesturn out to be the families with the highest index level of rare characteristics. InSection 6, the calculation of group rarity is used to investigate areal centres ofhigh rarity. Various geographical areas with a high level of rarity are identified.Most fascinatingly, northwestern Europe ends up on top as the linguisticallyrarest geographical area in the world. Section 7 investigates the exceptional-ity of northwestern Europe more closely, identifying twelve features that makethis area so unusual from a worldwide perspective. These characteristics areall linguistically independent from each other, indicating that the exceptionallyhigh level of rarity is probably a historical coincidence, possible enlarged by astructural bias of European scholarly tradition in linguistics.

2. Using the World Atlas of Language Structures

The World Atlas of Language Structures (WALS, Haspelmath et al. 2005) is alarge database of structural (phonological, grammatical, and lexical) propertiesof languages gathered from descriptive materials (such as reference grammars)by a team of more than 40 authors, many of them the leading authorities onthe subject.1 It is published as a printed book in traditional atlas format, but

1. The WALS is an exceptionally large collaborative project, involving many differ-ent authors. Because I have been using the complete data as supplied by WALS forthe calculations of the rarity indices, I take this opportunity to thank the editors andall the authors for making this kind of research possible (in alphabetical order): An-dreas Ammann, Matthew Baerman, Dik Bakker, Balthasar Bickel, Cecil H. Brown,Dunstan Brown, Bernard Comrie, Greville G. Corbett, Sonia Cristofaro, Michael

On the distribution of rare characteristics 413

also accompanied by a fully searchable electronic version of the database. Theatlas consists of 142 maps with accompanying texts on diverse features of hu-man language (such as vowel inventory size, noun-genitive order, passive con-structions, and ‘hand’/‘arm’ polysemy), each of which is the responsibility ofa single author (or team of authors). Each map shows between 120 and 1,370languages. Altogether more than 2,600 languages are shown on the maps, andmore than 55,000 dots give information on structural characteristics of theselanguages.2

In informal discussion, some doubts have been uttered as to the reliabilityof the data in the WALS. The reason for these doubts is that most data pointshave been coded by typologists on the basis of extant descriptive material, andnot by specialist of the languages in question. As a test case, Wälchli (2005)checked the 119 coding points for Latvian and found these WALS-data to bereasonably well representative of the language. Latvian is a ‘hard’ case for relia-bility, because the editors urged all authors to include this language in their map(Latvian is one of the so-called ‘basic 100-language sample’). Further, Latvianis a well-known and well-described language, but the problem for typologistsis that there is no central reference work to check for any information on thislanguage. This led to a few errors in WALS, because authors sometimes basedtheir judgements on sources that were not the best for their particular question.Wälchli (2005) notes five errors (= 4.2%) in which it is understandable from

Cysouw, Östen Dahl, Michael Daniel, Ferdinand de Haan, Holger Diessel, Nina Do-brushina, Matthew S. Dryer, Orin D. Gensler, David Gil, Rob Goedemans, ValentinGoussev, Martin Haspelmath, Johannes Helmbrecht, Oliver A. Iggesen, Paul Kay,Ekkehard König, Maria Koptjevskaja-Tamm, Tania Kuteva, Ludo Lejeune, Ian Mad-dieson, Luisa Maffi, Elena Maslova, Matti Miestamo, Edith Moravcsik, Vladimir P.Nedjalkov, Johanna Nichols, Umarani Pappuswamy, David Peterson, Maria Polin-sky, Carl Rubino, Peter Siemund, Anna Siewierska, Jae Jung Song, Leon Stassen,Thomas Stolz, Cornelia Stroh, Stephan Töpper, Aina Urdze, Johan van der Auwera,Harry van der Hulst, Viveka Velupillai, Ljuba N. Veselinova and Ulrike Zeshan.Further, I would like to thank Hans-Jörg Bibiko for supplying the WALS InteractiveReference Tool, with which the maps in this paper are made.

2. Note that with about 142 features and 2,600 languages, there should be as many as369,000 datapoints. With the actually available 55,000 datapoints ‘only’ about 15 %of the data matrix is filled. For many statistical approaches this low coverage is aproblem, and only carefully selected parts of the data can normally be used. In theapproach presented in this paper, I will attempt to use the complete data, notwith-standing the many missing values. However, special statistical corrections, as de-scribed in Section 3, are needed to work around the problem of missing values.

414 Michael Cysouw

the sources used that a linguist might be led to the wrong conclusions. Further,Wälchli found two errors in the WALS that appear to be practical mistakes (=1.7%). From all information supplied by the authors (e.g. from the examplesincluded), it is clear that the author knew the right coding. However, by someunidentifiable problem in the long chain of preparations, starting with the collec-tion of the data up to the final publication of the atlas, somewhere an error arose.In a large-scale enterprise like WALS, it is impossible to avoid such practicalerrors completely. The low number of practical errors for Latvian even arguesfor the high reliability standard of WALS.3

3. Computing a rarity index

The principal idea of the present investigation is to use this enormous WALS-database for ‘holistic’ typology. In WALS, there are features coded from allareas of linguistic structure, so it is possible to look for correlations betweenwidely different aspects of linguistic structure. For the present analysis, I willnot look at the content of the features, but only consider their relative ubiq-uity. Are there languages, families or areas that have more rare characteristicsthan others? To investigate this question, I devised a rarity index – a calcula-tion to estimate the relative ubiquity of characteristics of a language, as mea-sured by the data in WALS. The basic idea behind the rarity index is to com-pute the chance of occurrence for all characteristics of a particular language,and then take the mean over all these chances of occurrence. In essence, thisresults in an average rarity for a language. However, there are various con-founding factors mediating between chance and rarity, which make it neces-sary to introduce a few extra steps in the evaluation of the chances of occur-rence.

Before I explain these confounding factors and the resolution used, let mefirst introduce some WALS-terminology. The data in WALS is organised intofeatures and values. A feature is a parameter of linguistic variation, shownas a double-paged map in the printed atlas (e.g. the first map depicts the sizeof the consonant inventory, Maddieson 2005a). Within each feature, each lan-guage has a value. A value is the characterisation of the language for the

3. The data as brought together in WALS is beyond doubt the largest and best organ-ised survey of structural linguistic characteristics of the world’s languages. However,there are various problems with the coding structure of the data that make it difficultto use the data for large-scale quantitative investigations without recoding them (cf.Cysouw et al. 2005). In this paper, I disregarded these problems and took the data assupplied in the atlas without doing any recoding.


feature in question (e.g. in the first map on consonant inventories, English –with 24 consonants – has the value ‘average’, defined as the range between19 and 25 consonants). As a first approach to a rarity index, the rarity of avalue might be formalised by simply taking the chance occurrence of that value.For example, the value ‘average’ of the feature ‘consonant inventories’ oc-curs in 181 languages out of a total of 561 languages coded for this feature.There is thus a chance occurrence of 181/561 = 0.322 for this value. However,this chance cannot simply be interpreted as an indication of the rarity of thevalue.

The first problem is that different maps distinguish different numbers of val-ues, and the chance occurrences thereby have different impact on the evaluationof rarity. For example, in the map on consonant inventories there are five differ-ent values distinguished (small, moderately small, average, moderately large,large), but in the next map on vowel quality inventories (Maddieson 2005b)there are only three different values distinguished (small, average, large). Now,consider the value ‘large’ of the feature ‘vowel quality inventory’. This valuehas a chance occurrence of 183/563 = 0.325, almost exactly the same as for‘average’ consonant inventory discussed previously. However, with only threevalues distinguished for vowel quality inventories, such a chance of aroundone-third should count as just average rarity. In contrast, with the five values asdistinguished for consonant inventories, a chance of one-third is actually higherthan expected from an equal distribution (in which the chance would be one-fifth), and should thus be counted as relatively low rarity (or ‘common’). Con-versely, in a hypothetical feature with only two values distinguished, a chanceexpectation of around one-third would count as relatively high rarity (or ‘un-usual’).

The simplest solution to this problem is to multiply the chance occurrence ofeach value with the number of values distinguished, as shown in the definitionof the Rarity Index in (1). The feature ‘consonant inventories’ distinguished fivedifferent values, so the rarity index for the value ‘average’ is 5 · 0.322 = 1.61,which is higher (and thus less rare) than the index for the value ‘large’ of thefeature ‘vowel quality inventory 3 · 0.325 = 0.975. Note that a rarity index ofaround 1.0 means that the chance occurrence of a particular value approachesthe chances for equally distributed features. For a feature with x values, an equaldistribution would mean a chance of occurrence for each value of 1/x. If theempirically established chance occurrence of a particular value approaches 1/x,the rarity index for this value approaches x · (1/x) = 1. For practical reasons, Iused the inverse of this index, as shown in (2). The higher this index, the higherthe rarity of the value in the WALS data. Using this inverse has the nice effectthat the mean of all indices over all languages coded for a particular feature is

416 Michael Cysouw

also exactly one, as shown in (3). The equation in (3) can easily be verified bywriting out the terms in the summation.

Rfi = n · f1ftot

(1)

n = number of values of a particular featurefi = frequency of value iftot = total number of languages coded for this feature

Rfi =ftot

n · fi(2)

n

∑i=1

(Rfi · fi)

ftot= 1(3)

The formula in (2) thus defines the rarity-index of a value. The next step is nowto compute a rarity index for a language on this basis. The basic idea for com-puting a rarity index of a language is to take the mean of all rarity indices for allthe characteristics of this language, throughout all the maps in WALS. How-ever, a second confounding factor is the number of maps in which a particularlanguage occurs. The data of WALS is not complete, meaning that not everylanguage is coded in every map. Many languages are only coded in very fewmaps. For this reason, simply taking the mean rarity over all values is not a goodmeasure to evaluate which language has the most unusual characteristics. If aparticular language is only coded for few features in WALS, there will be strongrandom effects. Languages with few code-points in WALS will show more ex-treme values of mean rarity, both to the high and the low side. This effect canbe observed in Figure 1, in which the mean rarity for all 2,600 languages inWALS is plotted against the number of features coded (each point in the fig-ure represents one language).4 The fewer features are coded for a language, themore extreme mean rarities occur.

To normalize this effect, I evaluated the distribution of mean rarity by arandomization technique. The randomization proceeded as follows. For each

4. For clarity of depiction, the logarithm of mean rarity is shown in this figure. Using thelogarithm has the visual effect of separating the out the values some more, therebyshowing more clearly the distribution of the points in the figure. Another effect isthat the mean rarity now centers around zero, because log (1) = 0.


Figure 1. Plot of mean rarity indices against the number of features coded, with linesindicating 1 % (outer lines) and 5 % (inner lines) extremes as measured by arandomization procedure.

number of features coded (ranging between 1 and 139),5 a thousand fictitiouslanguages were created. For each invented language, a set of features was se-lected completely at random. Within each feature, a value was selected semi-randomly. The value selection was guided by the actual chance occurrences ofeach value in WALS. In this way, each set of a thousand fictitious languages has

5. WALS has 142 maps, but for the present investigation the two maps on sign lan-guages and the map on writing systems have been disregarded, leading to a maximumof 139 features available.

418 Michael Cysouw

the same distribution of values as the real WALS. For example, the number oflanguages with average consonant inventory will be around 32.2% in each set ofthousand languages. One such set of a thousand languages was made with eachlanguage being coded for one feature only. Then one set was made with eachlanguage being coded for two features, etcetera, finishing with a set of thousandlanguages in which each language was coded for 139 features. The mean rarityfor all these invented languages was computed, thus giving a thousand meanrarity values for each number of features.

Using all these fictitious languages, the mean rarity of a real language canbe evaluated. For example, Dutch is coded for 67 features and has a mean rarityof 1.66. The question now is how extreme this value is. The mean rarity ishigher than 1.00, so there appears to be a relatively high level of rarity in thislanguage. But is this really much higher than 1.00, or is a value of 1.66 stillwithin the expected variation? To evaluate this, the set of thousand fictitiouslanguages coded for 67 features were used. Among this set of thousand made-up languages, there turned out to be 96 (= 9.6%) with a mean rarity higherthan 1.66. Thus 904 (= 90.4 %) fictitious languages had a smaller mean rarity.From this it can be concluded that the mean rarity of Dutch is really rather high(even among the highest 10 %). Note that this value is not a real significancevalue as given by statistical analyses, although it is a somewhat similar concept.This value indicates the relative unusualness of a particular language withinthe WALS dataset. Using such evaluations, lines representing the 1 % and 5 %extremes can be drawn in Figure 1. These lines show the boundary betweenthe extremes in the fictitious languages, indicating which of the real languages(represented by the dots) belong to these extremes.

4. Rarity indices for individual languages

Using this evaluation of mean rarity by randomization, the languages with themost extreme mean rarity are shown in Table 1. In this table, a mean rarity ‘in-dex level’ is indicated by a percentage in the last column. For example, 100 %means that this particular mean rarity is higher than all thousand fictitious lan-guages for the number of features coded. The first six languages all fall in thelevel of this most extreme mean rarity. As can be seen in the penultimate col-umn, the actual values of mean rarity differ widely. Winnebago has a very highmean rarity (11.37), which is even high considering that this language is onlycoded for 7 features (judging from the index level of 100 %). In contrast, Wari’is also included among the most extreme index levels with a mean rarity of‘only’ 2.36 (remember that the mean over all the data in WALS is 1.00). How-


Table 1. Top 15 of languages according to mean rarity index level. Within each level,the languages are ordered to the number of features coded, though this is forpresentational purposes only.

Language Genus Features Mean IndexCoded Rarity Level

Wari’ Chapacura-Wanhan 115 2.36 100Dinka Nilotic 45 3.45 100Jamul Tiipay Yuman 44 3.76 100Nuer Nilotic 28 3.42 100Karó (Arára) Tupi-Guarani 24 6.16 100Winnebago Siouan 7 11.37 100Chalcatongo Mixtec Mixtecan 113 2.05 99.9Kutenai Kutenai 113 2.02 99.9Kombai Awju-Dumut 38 3.27 99.9Dahalo Southern Cushitic 17 5.86 99.9Maxakali Maxakali 15 6.95 99.9Warrwa Nyulnyulan 20 3.74 99.8Bunuba Bunuban 16 4.21 99.8Eyak Eyak 16 4.05 99.8Yawuru Nyulnyulan 15 4.51 99.8

ever, this value is achieved with as much as 115 features being coded, and forsuch many features, a mean rarity of 2.36 is apparently still highly significant.

Although such a listing of the world’s languages as to the level of rarity sat-isfies a currently widespread felt need for rankings, its merits are doubtful. Itwould be interesting if particular genealogical or areal groups showed up high inthis listing. However on first inspection this is not the case. There are two Niloticand two Nyulnyulan languages among the top 15, which is indicative, thoughnot convincing. Areally, among the top 15 as presented in Table 1, only the lan-guages from Eurasia are absent. The majority of the top 15 is from the Americas(eight languages), three are from Africa and four from Australia/New Guinea.However, this is partly an effect of the random cut-off point of the top 15, cho-sen here for reasons of space. In Figure 2, a world map is presented, showingthe geographical distribution of the top 5 % languages (i.e. all languages with anindex level of 95 % and higher). There appears to be a relatively high density oflanguages in Africa (around the equator) and northern Australia/New Guinea,but these are also regions with a high number of languages represented in theWALS data (and with many languages in general). I would argue that from this

420 Michael Cysouw

Figure 2. World map showing the top 5 % on the rarity index level of the languages inthe WALS.

distribution alone, there does not appear to be a reason to declare any group oflanguages to stand out as showing a particular high level of unusualness.

5. Rarity indices for groups of languages

To further investigate the distribution of rarity among the world’s languages, Icomputed rarity for groups of languages, based the index levels for each lan-guage (as discussed in the previous section). Such values for Group Rarity (GR)are useful to evaluate the relative rarity of a genealogical or an areal groupof languages. As a measure of Group Rarity, I have used a weighted mean ofthe rarity index levels of the individual languages. Basically, to compute thisweighted mean, I took the mean of all index levels of the individual languages(not the mean rarity itself), and weighted the languages according to the loga-rithm of the number of features coded, as shown in the formula in (4). Becauseof this logarithm, the languages with more features coded have slightly lessinfluence on the resulting value. Also, languages that are only coded for onefeature do not have any influence, because log (1) = 0.

GR =

n

∑i=1

log(Li) · (%R)i

n

∑i=1

log(Li)(4)


n = number of languages in a groupLi = number of features coded for language i%Ri = rarity index level for language i

Using the measure of group rarity on genealogical groups results in an interest-ing set of linguistic families showing a high level of rarity. The top 10 linguisticfamilies as to group rarity are shown in Table 2. Only families with more thanthree languages included in WALS are shown, because I want to show effecton the level of the family. In families with only few members coded in WALS(or few members existing in the world), high rarity of individual languages willraise the level of the whole family unproportionally.

Table 2. Top 10 of weighted rarity for linguistic families (only families shown withmore than 3 languages included in the WALS data).

Family No. of Languages Group RarityNorthwest Caucasian 7 87.8Kartvelian 4 83.7Caddoan 5 82.2Wakashan 7 80.2Iroquoian 8 76.3Khoisan 11 74.5Arauan 6 71.8Salishan 24 71.2Na Dene 23 70.2Algic 31 69.9

Two families from the Caucasus (Northwest Caucasian and Kartvelian) take thefirst two positions on the ranking of families (the third indigenous family fromthe Caucasus, Nakh-Dagestanian, has only slightly higher than average rarity).Further, families from Northern America are strongly represented: Caddoan,Wakashan, Iroquoian, Salishan, Na Dene and Algic all made it into the top 10.Hokan, Eskimo-Aleut, Kiowa-Tanoan and Penutian just did not make it all theway up, though they still show an extremely high level of group rarity. From agenealogical perspective, the Caucasus and Northern America clearly stand outas having families showing a high level of group rarity.

6. Areal distribution of rarity

To evaluate whether there are geographical areas with a high preponderance ofrare features, I investigated groups of languages that are geographically con-

422 Michael Cysouw

tiguous. For each language in the database, I took the thirty nearest languages(using a simple Euclidean distance, not taking account of natural barriers) andcomputed the rarity for all such areal groups. The rarity index for each group isplotted on a map on the location of the centre of the group. Such an approachnecessarily will show some areal consistency, because two neighbouring lan-guages will share many of their neighbours. However, it is interesting to seewhere the centres of areally consistent groups are. These centres are indicativeof the location of geographical areas with a high level of rarity. The higher therarity index for a group around a particular language, the darker the dot on themap as shown in Figure 3.

In this map, there are fifteen centres of high rarity, as summarised in Ta-ble 3. For all these areas, a centre is indicated. These centre languages are thefirst languages that show up in the ranking of group rarity for the areal groups.This central language is not necessarily of any importance itself. For example,Frisian only turns out to be the centre of the Northwest European cluster be-cause it is roughly in the middle of the area including English, French, Germanand Norwegian. The fact that there are fifteen centres (and not more or less)depends on the decision to compute group rarity for areal groups of thirty lan-guages around each centre. More centres of rarity appear when, for example,groups of only ten languages are taken. However, these centres mostly split up

Table 3. Areas of high rarity, grouped by Macroareas.

Macroarea Location of area with high rarity Centre languageEurasia North-western Europe Frisian

Caucasus AdygheOceania Philippines Bikol

Sumatra MinangkabauPacific East FutunaNorthern Australia WalmatjarriSoutheast Australia Ngiyambaa

America Northwest America LummiNortheast America West GreenlandicWestern North America HavasupaiCentral America ZapotecAmazonia Pirahä

Africa West Africa GuroCentral Africa MendeSouthern Africa Zulu


Figure3.

Wor

ldm

apsh

owin

gar

ealc

entr

esof

rari

ty.

424 Michael Cysouw

groups found in the map shown here. When groups larger than thirty languagesare used in the computations, then the clear distinctions between the variouscentres start to diminish. For the current purpose of investigating worldwideareal patterns in the WALS data, a group size of about thirty appears to be mostsuitable.

It is interesting to speculate why these centres appear in this worldwide sur-vey of rarity. Several of these areal groups are considered to be typological areas(or ‘Sprachbünde’). However, some areas with high rarity have no accompany-ing claim for areality, and many traditionally claimed linguistic areas do notshow up as areas with high rarity. Although it is tempting to hypothesize thatstrong influence between languages might lead to the spreading of otherwiserare phenomena, the overlap between rare areas and known areal groupings isat present only approximate. However, the quantitative notion of rarity as usedin this paper might be particularly useful to investigate linguistic areas as thestrongest evidence for areality stems from traits that are common in a particulararea, but rare elsewhere.

7. Rare characteristics of northwestern Europe

Probably the most surprising area to appear in the list of geographical areas witha high level of rarity is northwestern Europe. This area is centred on Frisian.Many of the thirty languages around Frisian are variants that are often consid-ered West Germanic dialects. These are only coded for a few features in WALS,and do not have much impact on the rarity measure. When these are removed,the remaining languages in this area, all with a relatively high coverage in theWALS data, are English, German, Dutch, Frisian, and French.

The pressing question now of course is what makes these languages so ex-ceptional? To investigate which features caused the high rarity index for thisgroup, I considered each feature individually. Depending on the values for eachfeature, I took the original rarity index, as shown in (2), for each value of eachlanguage in the area. Then the mean of these rarity indices was computed, andthe features were ordered according to this mean. This resulted in a list ofmost exceptional characteristics of this area. The top ten of this list is shownin Table 4 (the mean rarity of each feature for this area is shown in the firstcolumn).

This list of exceptional characteristics of northwestern European languageswill be quickly reviewed here. For more details on the coding and the decisionsto distinguish between various values, please refer to the relevant texts accom-panying the maps in WALS. A summary of the presence of these exceptional


Table 4. Top 10 of the rarest characteristics as found in northwestern Europe.

Rarity Feature Exceptional value present in Europe8.39 Polar Questions Interrogative word order7.96 Uvular Consonants Uvular continuants only7.93 The Perfect Perfect of the ‘have’-type7.56 Coding of Evidentiality Modal morpheme4.58 Demonstratives No distance contrast4.32 Negative Indefinite Pronouns No predicate negation present4.15 Front Rounded Vowels High and mid3.46 Relativization on Subjects Relative pronoun3.14 Weight-Sensitive Stress Right-oriented, antepenultimate involved2.86 Order of Object and Verb Both orders, neither order dominant

traits in northwestern European languages is given in Table 5, alongside the ba-sic percentages of these exceptional features among all the world’s languages.

The exceptional features of northwestern Europe are the following. First,the marking of polar questions is unusual. In most of the world’s languages,polar questions are constructed by using a question particle. Two other majormarking patterns are polar questions marked solely by use of intonation or byspecial verb morphology. The typical northwest European change in word or-der to mark polar questions is extremely uncommon worldwide, with only fewattestations outside of Europe (Dryer 2005e).

Uvular consonants are not very widespread among the world’s languages.Maddieson (2005d) finds them only in 17 % of the world’s languages. Most ofthese languages have at least some kind of uvular stop – possibly alongside otherkinds of uvular consonants. The situation found in northwestern Europe, namelythe existence of uvular continuants (in the form of a voiceless fricative), withoutthe existence of uvular stops as well, is highly uncommon. Outside Europe thisis mainly attested in a few incidental languages scattered throughout centralAsia.

A perfect (like in English I have read the book), defined as a construc-tion combining resultative and experiential meanings, is reasonably widespreadthroughout the world’s languages. Dahl and Velupillai (2005) find a construc-tion with similar semantics in almost half of the world’s languages. However,the typical European perfect construction of the ‘have’-type (derived from apossessive construction) is a European quirk, unparalleled elsewhere in theworld.

Evidentiality is the marking of the evidence a speaker has for his/her state-ment. Grammatical devices to code this are reasonably widespread among theworld’s languages. De Haan (2005) finds some kind of evidentiality in slightly

426 Michael Cysouw

Table 5. Occurrence of rare characteristics in northwestern Europe compared to theirworldwide frequency.

Unusual characteristic French English German Dutch Frisian World

Word order in polarquestions

– + + + + 1.4 %

Uvular continuantsonly

+ – + 2.1 %

Perfect of the ‘have’-type

+ + + 3.2 %

Modal morpheme forevidentiality

+ – + + 1.7 %

No distance contrastin demonstratives

+ – + 3.0 %

No negation withnegative indefinites

– – + + 5.3 %

High and mid frontrounded vowels

+ – + 4.1 %

Relativepronoun

+ + + 7.2 %

Right-oriented stress,antepenultimate

– + + + 5.4 %

Both orders of objectand verb

– – + + + 6.6 %

No productivereduplication

+ + + 15.3 %

Comparativeparticle

+ + + 13.2 %

[Note: Blank cells in this table are not coded in the data from WALS. Informal inspectionand personal knowledge of the present author indicates that they are almost all to bemarked as present (‘plus’).]

more than half of the world’s languages. However, the usage of a modal verbfor this means, as found in northwestern Europe (e.g. Dutch het moet een goedefilm zijn, French il aurait choisi la mort), is extremely uncommon worldwide.

Demonstratives are normally expected to have some distinctions as to dis-tance, like English this vs. that. In a survey of such distance contrasts in adnom-inal usage, e.g. this book vs. that book, Diessel (2005) finds distance contrasts


in almost all of the world’s languages. However, there are a few languages thatdo not have such distance contrasts in adnominal usage. Some examples arefound in western Africa and, somewhat surprisingly, in French (ce) and Ger-man (dies- or das; note that jen- does not mark a distance contrast in modernGerman, although it did in older stages of the language).

Negative indefinite pronouns, like nobody, nothing or nowhere, are in mostof the world’s languages accompanied by a regular predicate negation. Haspel-math (2005) finds predicate negations to be obligatorily present in 83 % of theworld’s languages. There are only very few languages in which a negative in-definite pronoun can occur (or even has to occur) without the predicate negation.This unusual phenomenon is mainly found in a few languages in Mesoamericaand in northwestern Europe.

Front rounded vowels, like high [y] or mid [ø], are highly unusual as phon-emes in a language. Maddieson (2005e) finds them only in 7 % of the world’slanguages. Both the high and the mid front rounded vowels are mostly foundin some languages of northern Eurasia, among them French and German. Re-lated to this unusual characteristic are the exceptionally high number of vowelquality distinctions (Maddieson 2005b) and the low consonants to vowel ra-tio (Maddieson 2005c) of northwestern European languages. These two relatedcharacteristics just did not make it into the top ten of rare features of northwest-ern European languages.

Relative clauses are a much debated and widely investigated aspect of hu-man language. It might come as a surprise to many linguists that the typicalEuropean usage of a relative pronoun is only highly sporadically found outsideof Europe (Comrie and Kuteva 2005).

There is a large variety of stress-systems attested among the world’s lan-guages. The typical northwestern European system is a weight-sensitive stresssystem in which also the antepenultimate syllable is involved (Goedemans andVan der Hulst 2005). Such a system is unusual, though it is also found in thenear east and sporadically throughout the world’s languages.

The last rare characteristic in the top ten of rarest traits in northwestern Eu-rope is the variable order of verb and object (Dryer 2005c). This variabilityis paralleled in the likewise rare trait of having variable order of genitive andnoun (Dryer 2005d), which, however, did not make it into the top ten of rarecharacteristics of northwestern Europe.

Finally, two interesting characteristics of northwestern European languagesthat also did not make it into the top ten of rarity deserve quick mention here.First, the languages of northwestern Europe are exceptional because they do notallow for productive reduplication (Rubino 2005) and, second, because they usea special particle for comparative constructions (Stassen 2005).

428 Michael Cysouw

Going through this list of rare characteristics of northwestern European lan-guages, it is important to realize that there are no worldwide correlations be-tween any pair of these features. From a typological perspective, all these fea-tures appear to be independent parameters of linguistic variation. At least, Ihave not been able to find any clearly significant correlations between any twofeatures in this list in the WALS data. Not even the presence of a ‘have’-perfectand a ‘have’-possessive correlate. This would mean that there are no internallinguistic reasons for these features to co-occur in northwestern Europe. It isprobably an accidental effect of historical contingency that exactly these rarefeatures are found in this area, and not others.

As can be seen from the summary in Table 5, the exceptional characteristicsare basically found in Continental West Germanic, with English and Frenchsharing these unusual traits in about half of the cases. This areal centre roughlycoincides with the Charlemagne Sprachbund, or Standard Average European(SAE) as summarised in Haspelmath (2001). Some of the typical characteristicsof SAE languages, as described by Haspelmath (2001), are also found in thepresent investigation. In particular, the word order in polar question, the perfectof the ‘have’-type, no negation with negative indefinites, the special structureof the relative clause, and the usage of comparative particles are noted both inHaspelmath’s and in the current investigation. However, there are also cleardifferences between my claim for northwestern Europe to have many unusualcharacteristics and Haspelmath’s claim that the European languages share manytraits. For example, the existence of definite and indefinite articles is a clearcase of a pan-European characteristic (Haspelmath 2001: 1494). This arealityis also found in the WALS maps on articles (Dryer 2005a, 2005b). However,articles are not nearly as rare on a worldwide basis to show up in the presentinvestigation. In contrast, the presence of the rare uvular continuants cannot beclaimed to be a typical European characteristic. In fact, almost no Europeanlanguages have such consonants (except for Continental West Germanic andFrench), but their presence is exceptional enough from a worldwide perspectiveto end up as a rare trait of northwestern Europe. Summarising, the claims forSAE as a linguistic area and the presence of many exceptional characteristics inthis area are supplementary claims, probably both to be explained by long-termmutual influence between the languages in question.

There are a few words of caution to be added to these results. Matthew Dryer,one of the WALS editors, warns (in personal communication) that in some casesthe exceptionality of northwestern Europe in the WALS data might have beenenlarged by more or less deliberate decisions. He suggests that the WALS edi-tors and authors might have included typical European oddities as separate val-ues, thereby enhancing the exceptional profile of this area. This might indeed


be the case for polar questions, modal evidentials, the ‘have’-perfect, relativepronouns and particle comparatives. These characteristics are really Europeanquirks. They are common in Europe, and any linguist with a training based onEuropean languages (which means almost all linguists) will at first considerthem to be the norm. While investigating the worldwide typological diversity,it will probably come as a surprise that European languages are exceptionalin these respects. This might have raised the interest to investigate these char-acteristics of human language, eventually leading to their inclusion in WALS.Though this process might have had some effect, there are still numerous rarefeatures in Europe that do not seem to have been influenced by this bias.6

8. Conclusion

The usage and interpretation of large linguistic typological databases is still inits infancy. In this paper, I have laid out a preliminary attempt to approach anew large-scale typological database, the World Atlas of Language Structures,using quantitative methods. As a showcase, I have taken the notion of rarityand investigated the distribution of rare characteristics among the world’s lan-guages.

Individual languages and linguistic families were ranked according to theirlevel of rarity. Rarity appears to be found rather evenly distributed throughoutthe world’s languages, though there are, of course, some languages and groupsof languages that have more of it than others. The remaining question, that has tobe answered by future research, is whether these languages or language groupswith relatively many rare features are really ‘rare languages’. This would onlybe the case when in a completely different dataset the same languages wouldhave a high level of rarity as well. Personally, I do not believe that this willbe the case. Circumstantial evidence for this can be discerned in Figure 1, aswith a rising number of characteristics considered, the mean rarity seems to ap-proach normality. This might indicate that throughout all structures of a wholelanguages, rare and common characteristics are kept in balance.

6. In this same vein, it might also be speculated that the strong influence from Russianand North American linguists on the research in typology in recent decades has leadto the introduction of such features as to enlarge the exceptionality of the languagesin the Caucasus and North America. However, even if true, the presence of theseexceptional features is still highly interesting. And there are still other areas with highrarity that show up in the present investigation. Any scientific-historical influence isprobably only a minor factor influencing the results as presented in this paper.

430 Michael Cysouw

Still, it is interesting to interpret the distribution of rare traits in the currentdata. The most fascinating result being that the northwestern European area,centred on Continental West Germanic, turned out to be one of the most lin-guistically unusual geographical areas world-wide. Many of the rare charac-teristics as attested in this area might have been considered the norm from aEuropean perspective, but the typological data shows that these characteristicsare to be considered special structures of European languages, and not of humanlanguage in general.

References

Comrie, Bernard, and Tania Kuteva2005 Relativization strategies. In The World Atlas of Language Structures,

Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Com-rie (eds.), 494–501. Oxford: Oxford University Press.

Cysouw, Michael, Jeff Good, Mihai Albu, and Hans-Jörg Bibiko2005 Can GOLD “cope” with WALS? Retrofitting an ontology onto the

World Atlas of Language Structures. Proceedings of E-MELD work-shop ‘Linguistic Ontologies and Data Categories for Language Re-sources’.

Dahl, Östen, and Viveka Velupillai2005 Tense and Aspect. In The World Atlas of Language Structures, Mar-

tin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie(eds.), 266–281. Oxford: Oxford University Press.

de Haan, Ferdinand2005 Coding of Evidentiality. In The World Atlas of Language Structures,


Diessel, Holger2005 Distance constrasts in demonstratives. In The World Atlas of Lan-

guage Structures, Martin Haspelmath, Matthew S. Dryer, David Gil,and Bernard Comrie (eds.), 170–173. Oxford: Oxford UniversityPress.

Dryer, Matthew S.2005a Definite articles. In The World Atlas of Language Structures, Mar-



Dryer, Matthew S.2005b Indefinite articles. In The World Atlas of Language Structures, Mar-


Dryer, Matthew S.2005c Order of object and verb. In The World Atlas of Language Structures,


Dryer, Matthew S.2005d Order of genitive and noun. In The World Atlas of Language Struc-

tures, Martin Haspelmath, Matthew S. Dryer, David Gil, and BernardComrie (eds.), 350–353. Oxford: Oxford University Press.

Dryer, Matthew S.2005e Polar questions. In The World Atlas of Language Structures, Mar-


Goedemans, Rob, and Harry van der Hulst2005 Weight-sensitive stress. In The World Atlas of Language Structures,


Haspelmath, Martin2001 The European linguistic area: Standard Average European. In The

World Atlas of Language Structures, Martin Haspelmath, MatthewS. Dryer, David Gil, and Bernard Comrie (eds.), 1492–1510. Oxford:Oxford University Press.

Haspelmath, Martin2005 Negative indefinite pronouns and predicate negation. In TheWorld At-

las of Language Structures, Martin Haspelmath, Matthew S. Dryer,David Gil, and Bernard Comrie (eds.), 466–469. Oxford: Oxford Uni-versity Press.

Haspelmath, Martin, Ekkehard König, Wulf Oesterreicher, and Wolfgang Raible2001 Language Typology and Language Universals. Vol. 2. (Handbooks

of Linguistics and Communication Science 20.2) Berlin: Walter deGruyter.

Haspelmath, Martin, Matthew S. Dryer, David Gil, and Bernard Comrie2005 The World Atlas of Language Structures. Oxford: Oxford University

Press.

Maddieson, Ian2005a Consonant inventories. In The World Atlas of Language Structures,


432 Michael Cysouw

Maddieson, Ian2005b Vowel quality inventories. In The World Atlas of Language Struc-


Maddieson, Ian2005c Consonant-vowel ratio. In The World Atlas of Language Structures,


Maddieson, Ian2005d Uvular consonants. In The World Atlas of Language Structures, Mar-


Maddieson, Ian2005e Front rounded vowels. In The World Atlas of Language Structures,


Rubino, Carl2005 Reduplication. In The World Atlas of Language Structures, Mar-


Stassen, Leon2005 Comparative constructions. In The World Atlas of Language Struc-


Wälchli, Bernhard2005 Par tipologijas atlantu un latviešu valodas materialu taja. [About the

typological atlas and the Latvian material in it]. Paper presented atLetonistu seminars [Letonists’ seminary], August 6–13, 2005, Mazsa-laca, Latvia.

Remarks on rarity

Östen Dahl

In his paper, Cysouw proposes to approach the notion of exceptionality by op-erationalising it with the help of the notion of rarity. He considers a trait to beexceptional when “it is rare with regard to the known worldwide diversity”.Although he does not define the term ‘rare’, one can deduce from his way ofusing it that it means ‘less common than expected, given the number of al-ternatives’. ‘Rare’ is a word belonging to everyday language and its meaningis accordingly somewhat fluid. For instance, it is not always clear if what iscrucial is frequency or absolute numbers. In some domains, however, we findexplicit definitions of ‘rare’. Thus, according to standards applied in informa-tion about drugs in the EU, a rare side-effect of a drug is one which appearsin less than one case of 1,000. If we applied these standards to languages, nofeature could be regarded as rare that is manifested in more than six or sevenlanguages in the world. In a language sample of the size that is average in theWALS maps – about 400 languages – such a feature would normally show upin one language or none at all, and it would be unlikely to be represented withby a separate value in WALS. In other words, if we are to study rarity in theWALS data, we must have more modest demands on what counts as rare. In-deed, the ‘rare’ or ‘exceptional’ features discussed by Cysouw in the paper tendto have a rather much higher incidence. In the most extreme case, Cysouw saysthat northwestern European languages are ‘exceptional’ because they lack pro-ductive reduplication and employ comparative particles – properties which arefound in 15 and 13 per cent of the respective world-wide samples in WALS.(Note that any side-effect of a drug which has an incidence of more than oneper cent is labeled ‘common’.) Admittedly, these properties are not among the‘Top 10 of the rarest characteristic as found in northwestern Europe’ – amongthose, the highest global incidence is 7.2 % (relative pronouns). The questionwhere to draw the borderline for ‘rarity’ and ‘exceptionality’ is perhaps of nogreater theoretical significance, but there are some other problems here, whichI shall now turn to.

For something to be called ‘exceptional’, it would seem necessary for it tobe an exception to some generalization. In the case of reduplication, the lack

434 Östen Dahl

of which was said to be exceptional, the generalization would be “Languageshave productive reduplication”. Even if this is true only of 85 per cent of alllanguages, as the WALS data suggest, it still looks like a legitimate general-ization. The case of the use of comparative particles in European languages ismore problematic. The generalization would have to be negatively formulated:“Languages do not use comparative particles”. In my mind, it is a bit counterin-tuitive to speak of ‘exceptions’ to such negative generalizations, in particular,when we are dealing with one among several ways of realizing a certain kind ofconstruction. In this case, the WALS map shows four types of comparative con-structions, and admittedly, ‘Particle Comparative’ is the least frequent amongthose – it is found in 22 languages, but it is not drastically less frequent than thetwo middle ones (‘Exceed Comparative’ and ‘Conjoined Comparative’), whichoccur in 33 and 34 languages. Thus, it may well happen that several of the alter-natives have relatively low individual frequencies and will all be seen as ‘excep-tional’. An example of this would be the feature ‘Weight-sensitive stress’. In thesample of 500 languages, only 219, that is roughly 44 per cent, have weight-sensitive stress, and these are distributed over seven different types, none ofwhich has more than 13 per cent of the total.

Another problem is exemplified by the second on the top-ten list of rarecharacteristics in NW Europe, “Uvular continuants only”. Cysouw says that“the existence of uvular continuants … without the existence of uvular stopsas well, is highly uncommon”. Well, it depends on how you count. It is truethat out of 566 languages, only twelve show this value of the feature ‘Uvularconsonants’, but on the other hand, that makes up 20 per cent of the 60 languageswith uvular continuants. And actually, only two of the twelve languages with‘Uvular continuants only’ are found in NW Europe. (We do know that there aremore of them outside the sample but it would be ‘cheating’ to include them inthe count.) The problem is really a general one: if a set S1 of languages is theintersection of two other sets S2 and S3, the assessment of the rarity of S1 shouldnot be based on its absolute frequency but rather on the relative frequency ofS2 given S3, or vice versa.

Two notions that have interested me are that of linguistic complexity (Dahl2004) and that of typological diversity (Dahl 2008). Both of these do in facthave non-trivial relations to rarity. Consider, to begin with, complexity, andas a concrete example, vowel systems. It is reasonable to assume that a vowelsystem with a larger number of distinctions is more complex than one with asmaller number. In the paper, Cysouw says that the presence of rounded frontvowels in the languages of northwestern Europe is related to “the exception-ally high number of vowel quality distinctions” in those languages. It seemsthat there are certain generalizations we can make about systems that are made

Remarks on rarity 435

up of elements of varying frequencies. Thus, given two distinctions a and bin a vowel system, if a is rarer (less frequent) than b, then systems that con-tain a will on average contain a larger number of distinctions and thus be morecomplex than systems that contain b. This is a claim that depends solely onprobability-theoretical considerations. However, the connection between rarityand complexity is enhanced by the universality or near-universality of somevowel quality distinctions (“All languages have some variations in vowel qual-ity that indicate contrasts in the vowel height dimension” (Ladefoged & Mad-dieson 1996: 286)) and by the existence of implicational universals to the effectthat the presence of a less frequent element entails the present of a more frequentones. But in Cysouw’s paper, rarity is a property that pertains to values of fea-tures in WALS rather than to distinctions or elements in a system, and someof the values concern the absence rather than the presence of elements, such asthe feature value ‘No adpositions’, which is found in 28 out of 1074 languagesand would thus be fairly rare. If we assume that languages without adpositionsare ceteris paribus less complex than the ones that have them, this will be anexample of a rare trait that is not connected with higher complexity.

Turning now to typological diversity, I demonstrate in Dahl (2008) that atleast to a certain extent, linguistically diverse parts of the world are also theplaces where rare features show up most. Thus, I argue in the paper that theindigenous languages of the Americas contribute about 40 per cent of the to-tal typological or structural diversity of the languages of the world, althoughthey make up only about 15 per cent of those languages. Comparing this toCysouw’s result, eight languages on his list of the 15 languages with the high-est ‘mean rarity level’ are from the Americas. Looking at families with at leastthree languages included in the WALS data, out of the top ten having the high-est ‘weighted rarity’, six are from North America and one from South America.Finally, of the 15 ‘areas of high rarity’, four are in North America and one inSouth America.

References

Dahl, Östen2004 The Growth and Maintenance of Linguistic Complexity. (Studies in

Language Companion Series 71). Amsterdam/Philadelphia: Benja-mins.

Dahl, Östen2008 An exercise in a posteriori language sampling. Sprachtypologie und

Universalienforschung 31: 208-220.

436 Östen Dahl

Ladefoged, Peter, and Ian Maddieson1996 The Sounds of the World’s Languages. Oxford: Blackwell.

Some more details about the definition of rarity

Michael Cysouw

Replying to the many stimulating comments raised by Dahl, I am first ratherastound by his assertion that I did not define the term ‘rare’. In fact, the whole ofSection 3 defines the precise mathematical operalization of my notion of rarity.And indeed, my notion of rarity is a relative one (and I would even go as far asto argue that a notion of ‘absolute rarity’ is meaningless, cf. Cysouw 2003). Stillstronger, also the evaluation of the (relatively defined) Rarity Indices is relative.I explicitly do not presuppose any absolute norm separating ‘low’ from ‘high’Rarity Indices, because I would not know of any data that could help us setsuch a norm. Thus, the only observations I make in the paper are about themost extreme (relative) rarities as compared to all other (relative) rarities. Thelist of rare traits of Northwestern European languages in Section 7 is thus a listof ‘relative relative rarity’. Whether these traits are really all noteworthy is ofcourse open to interpretation. Looking at the values of the Mean Group RarityIndex for the traits themselves (as reported on in the first column of Table 4),I would suggest that the first four are really much more significant rarities innorthwestern Europe than the others in the list. Still, I find it hight stimulatingto know what other European characteristics should be considered rare whenthe notion of rarity is interpreted a bit more lenient. Just to take up the leastextreme case of relative pronouns (as referred to by Dahl), this is indeed foundin 7.2 % of the world’s languages, which one might (or might not) find rare.However, looking at the worldwide distribution of relative pronouns, shownhere in Figure 1 (Comrie & Kuteva 2005 = WALS 122), it is clear that it actuallyis a clear example of a regionally bound rarity.

Next, Dahl discusses two possible problems with my notion of rarity. First,from the context of the theme of the present collection of papers he warns thatthe intuitive notions of rarity and exceptionality do not necessarily coincide. Inprinciple, I completely agree with this comment, as I write in the introduction tothe paper “exceptionality is a more encompassing term than rarity.” However,I think that the difference proposed by Dahl does not differentiate the two. Forsomething to be called an exception, Dahl argues, there has to be some presup-posed generalization relative to which it can be an exception. Now, when a trait

438 Michael Cysouw

Figure 1. Usage of relative pronouns (dots) compared with other relativization strate-gies (squares) for the relativization of subject (adapted from Comrie &Kuteva 2005).

‘X’ is rare, but the opposite trait ‘not-X’ is not be definable (or only negativelydefinable by saying it is not X), then it is difficult to argue relative to what X isan exception. Here I disagree. The only generalization that is necessary is thepresence of one trait (or a group of traits) that is common, and then everythingelse can be declared both exceptional and rare relative to the common case(s).One example discussed by Dahl concerns the typology of comparative construc-tions (Stassen 2005 = WALS 121). There are four types distinguished, one ofwhich is more common than the others: Locational (47%), Exceed (20%), Con-joined (20%), and Particle (13%). Now, relative to the Locational strategy, allother are (more or less) rare and (more or less) exceptional. The radical situationwould be an extremely fine-grained typology of the world’s languages in whichall types are rare (implying of course that there are very many different types).In this situation (which would probably be an anomaly in itself, cf. Cysouw2010), I do not think anybody would want to claim that all types are excep-tions, because indeed there is nothing to be an exception against. However, inmy operalization of rarity this situations would also not result in the presence ofany rare types. In the Rarity Index, as proposed in (3) in the paper, the propor-tion of occurrence is taken relative to the number of types that are distinguished.The result is that in the hypothetical situation with very many roughly equallyfrequent small types, the Rarity Index will consider all types to be not rare. So,as far as there are problems with the definability of the ‘non-rare’ counterpart,I think the interpretations of rarity and exceptionality coincide.

Some more details about the definition of rarity 439

Secondly, Dahl argues that a trait might be a composition of various inde-pendent characteristics, only the combination of which is rare. In such situationsrarity should be assessed relative to the expected intersection of the traits in iso-lation. I completely agree with this, but the problem is caused by the unstruc-tured coding of the values of WALS. Unfortunately, WALS does not includeexplicit information on the finer-grained structure of the traits distinguished.For the present paper, I decided not to perform any recoding of the WALS data,as this would be a project in it’s own right (see Footnote 4 of the paper and thereference therein). But suppose one would perform such recoding, as suggestedby Dahl, then the computation of the rarity index for composed traits wouldindeed change. As an example, let’s consider the WALS map on uvular conso-nants (Maddiesson 2005 = WALS 6) that was brought up by Dahl. There arefour different types distinguished in this map that can easily be decomposedas an intersection of two binary parameters, as shown in Table 1. There is astrong correlation between these parameters (Fisher’s Exact p < 10−7). Thisimplies that the twelve cases of ‘uvular continuants without uvular stops’ areactually much less than would be expected by chance alone (expected frequencyis 480×60/566 = 50.9).

Table 1. Typological distribution of uvular consonants

Uvular

Stops

No Yes Total

Uvular Continuants No 468 38 506

Yes 12 48 60

Total 480 86 566

The Rarity Index, as shown in (3) in the original paper, is actually of the form“expected proportion divided by observed proportion” (E/O). The observed pro-portion (O) is the frequency of a trait fi divided through the total number oflanguages ftot (i.e. 12/566 in the current example). The expected proportion (E)that I used in the paper was simply the expectation under assumption of inde-pendence, viz. 1/n, where n is the number of values distinguished (i.e. 1/4 inthe current example). The Rarity Index for this trait is thus E/O = ftot/(n× fi) =566/(4× 12) = 11.8. However, when the feature is decomposed as shown inTable 1, then the expected proportion changes: the expected proportion is the

440 Michael Cysouw

product of the independent proportions of the decomposed traits. In the examplethe expected proportion is the proportion of ‘no uvular stops’ times the propor-tion of ‘yes uvular consonants’ (i.e. 480/566× 60/566 = 0.09, which is note-ably smaller than 1/4 as assumed in the paper). In this way, composed traits thathave a lower expectation than 1/n get a lower ‘Composed’ Rarity Index. Forthe present example this index would be 480/566×60/566×566/12 = 4.24,which is clearly smaller than the 11.8 from the index as used in the paper. Ingeneral, when a feature f is decomposed into a set of co-occurring features f1,f2, f3, … ft then the expected proportion for fi is the product of all independentproportions, see (1), and the Rarity Index (RI) changes accordingly, as shownin (2). However, this all of course highly depends on any proposed decompo-sition of WALS features. In the current example the decomposition is ratherunproblematic, but for many other features in WALS this is not as easy.

E(fi) =t

∏s=1

fsi

ftot(1)

RI(fi)IE

O= E × 1

O=

( t

∏s=1

fsi

ftot

)× ftot

fi(2)

Finally, building on the discussion in Dahl’s reply, I would like to suggest thatthe relation between complexity and rarity is of implicational nature, in thesense that complexity probably implies rarity, but clearly not vice versa. Asfor the relation between areal diversity and rarity, I am not convinced that thereshould be any relation. Of course, in highly diverse areas more rarities will befound, but so would common traits. The real question should be whether theproportion of rare traits to common traits correlates with diversity. As far as Iam concerned, the verdict on this matter is still open.

References

Comrie, Bernard, and Tania Kuteva2005 Relativization on subjects. In The World Atlas of Language Struc-

tures, Martin Haspelmath, Matthew Dryer, David Gil and BernardComrie (eds.), 494–497. Oxford: Oxford University Press.

Cysouw, Michael2003 Against implicational universals. Linguistic Typology 7: 89–10.

Some more details about the definition of rarity 441

Cysouw, Michael2010 On the probability distribution of typological frequencies. In The

Mathematics of Language, Christian Ebert, Gerhard Jäger and JensMichaelis (eds.), 29–35. Berlin: Springer.

Maddiesson, Ian2005 Uvular consonants. In The World Atlas of Language Structures, Mar-

tin Haspelmath, Matthew Dryer, David Gil and Bernard Comrie(eds.), 30–33. Oxford: Oxford University Press.

Stassen, Leon2005 Comparative constructions. In The World Atlas of Language Struc-

tures, Martin Haspelmath, Matthew Dryer, David Gil and BernardComrie (eds.), 490–493. Oxford: Oxford University Press.

Subject index

ablaut 15, 146, 150, 154–156ablaut classes 15ablaut formation 17

acceptabilityacceptability hierarchy 295, 349relative acceptability 344see also speakers’ judgements; gram-

maticality judgementsaccusative languages 32acquisition 262–264, 291, 374agreement 13, 37–38, 40, 44–45, 119–

121, 392, 404long-distance agreement 36, 37, 40,

41, 44, 45analogy 7, 141, 143, 147, 154, 170anaphor 296, 298areal relationships 35, 47, 421

Sprachbünde 424, 428argument realisation 213aspect 245–247, 251Associative Network Model 148Autolexical Grammar 41autosegmental phonology 84

binding 294–295, 299, 318, 327borrowings 60, 72brevity 157

canonicity 107–126, 136canonical inflection 108, 139, 141non-canonicity (external, internal)

111–112see also case: non-canonical case

markingcase 13, 40, 41, 215, 243, 341–343

accusative case for subjects 32, 48,214, 223, 230, 233, 244

accusative case for experiencers244, 251

case alternation on objects 245case conflict 344, 351case hierarchy 386case matching 372Dative Sickness 217, 244Dative Substitution 234Exceptional Case Marking 40experiencer verbs 223genitive case 32, 214, 217, 230,

244–245, 251Nominative Substitution 217non-canonical case marking 48second genitive 118see also ergativity

categorial split 152clusivity 10clusterings 48comparative constructions 427, 434,

438competition 7, 12, 13, 26

see also rules: competing rulescomplexity 434

see also morphology: morphologicalcomplexity; production: productioncomplexity

comprehension see processingconsonants

consonant phoneme inventories 34–35

nasal consonants 34–35, 50uvular consonants 425, 434, 439

constraints 300, 304, 315Alignment Constraints 63conflicting constraints 341Constraint Application 309–310,

312, 317constraint relaxation 405

444 Subject index

Faithfulness Constraint 63, 351–352,373

soft constraints 340context-free grammar 379core grammar 11, 259, 404

core phonology 89counterevidence 369

dative see caseDecathlon Model 311, 313, 315default 8, 378, 395deponency 113–114, 120derivation 38–39, 41, 47, 49diachronic change 5, 15, 17, 127, 133,

213-241see also sound change; sporadic

changesee also grammaticalisation

diacritic feature 63, 86, 97, 99, 101,105

dialect 177, 349dialectal or idiolectal variation 75dialect geography 11

differentiation see overdifferentiationdiminutives 21–22distinctiveness 155, 157, 164diversity 434–435, 440

E-language 127Economy Theory 141, 155Elsewhere rule see rulesEmpty Category Principle (ECP) 300,

304, 318ergativity 32evidentiality 425exceptions

accommodating exceptions 35arbitrary exceptions 214higher-order exceptionality 32, 108,

394hyper-exception 25lexical exceptions 31regularising exceptions 35, 43, 50soft exceptions 176, 181

structured exceptions 214see also case: Exceptional Case

Markingsee also rules: dialectical nature of

the relationship between rules andexceptions

exhaustivity 109explanations 35, 48, 50extra-grammaticality 15, 23, 26, 380,

382, 389–392, 394–396, 405

Faithfulness Constraint see constraintsfamily resemblance 214folk linguistics 339frequency 17, 143, 148, 152–153, 155–

157, 164, 169–171 176, 179–180,191, 219, 229, 233, 235, 292, 311,340, 354–356, 434see also processing: statistical pro-

cessing; universals: statistical uni-versals

functional categories 267–268

gender 117genealogical groups 35, 47, 421generalisation 132–133, 257

sociolinguistic overgeneralisation 16usage-based generalisations 272see also typology: typological gener-

alisationsGenerative Grammar 36, 39, 255–256,

291–323, 340, 371, 402Government and Binding Theory 40,

275, 283gradience 5, 12, 23, 25, 51, 176, 293,

294, 306–308, 328, 333, 337, 339–340, 371

grafts 363, 374grammatical indeterminism 257

see also underdeterminationgrammatical levels 8, 15grammaticality 306

grammaticality judgement 291–323,374

Subject index 445

see also acceptability; ideal speaker-listener; extra-grammaticality

grammaticalisation 169see also diachronic change

Grimm’s Law 6–7

homophony 131–132, 136HPSG, 403

I-language 127, 133Iceberg Effect 314ideal speaker-listener 9, 255, 307inflection

inflectional split 149verbal morphology 24Wechselflexion 151, 163see also aspect; case; gender; mor-

phology; paradigms; person-number systems; tense; verbclasses: strong verbs

IPP-effect 18irregularisation 140, 156, 169

see also regularityirregularity 149, 155, 157, 163

language production see productionlexical fusion 149Lexical Integrity Hypothesis 22Lexical Parametrisation Hypothesis see

parameters: Lexical ParametrisationHypothesis

Lexical Phonology 59, 62–63lexicon 11–12, 24, 122, 256, 268, 335

Lexicon Optimisation 64, 77–78lexical (pre-)specification 71, 95,

103redundancy-free lexicon 80see also exceptions: lexical excep-

tionsloan elements 61, 85locality 3738, 40

markedness 143, 164, 170, 340, 361–362, 371

markedness constraints 351–352,362, 373

markedness hierarchy 350markedness profiles 354

maximal entropy classification 184,199

Maximum Underspecification (MU)model 96, 103

Minimalist Program 265–267, 269,270, 275, 283, 289, 313

Minimise Domains 270, 272–273morphology 20

morphological complexity 74morphophonological alternations 60morphological naturalness 141morphosyntactic specification 110Natural Morphology 165, 171preterite present 19static morphology 165word formation 20, 22see also inflectionnaturalness 140–

141, 297

Neogrammarian Controversy 5Network Model 109, 156non-canonicity see canonicitynon-coherent class 230norm 344number, grammatical 7–8, 12, 16–17,

31, 33–34, 49, 113

objects see syntactic relationsOccam’s Razor 4Optimality Theory (OT) 14, 63, 67,

69, 71, 77, 96, 131, 293, 313, 340,342, 351, 357, 361–362, 369, 371–374

Stochastic OT 209, 315, 316ordering paradoxes 62–63outliers 9Output Selection 309–310, 312, 317overdifferentiation 113, 116–117, 120,

128, 155, 157, 163differentiation 164

446 Subject index

overgeneralisation see generalisationovergeneration 396

paradigms 109, 115, 139, 150, 393parameters 258, 263, 265, 268–269,

272, 283–284, 286, 289, 306, 362,372, 374Lexical Parametrisation Hypothesis

(LPH) 262, 269macroparameters 258, 269microparameters 265, 285, 332Null-Subject Parameter 259, 284,

289see also Principles and Parameters

modelparsing 270, 274–275, 328, 356, 379,

283, 289–290, 356particles 51

comparative particles 434particle constructions 41particle order 41particle structures 43response particle 12see also verb-particle constructions

passive 1, 4, 32–33, 38, 39performance 51, 270–271, 275, 283,

325, 333, 402periphery 11, 259person-number systems 10politeness 11, 16polysemy 34pragmatics 11, 23predictability 3, 179, 181, 183–184,

186, 189, 192, 197, 201, 207–208Principles and Parameters model 300probability 197, 202, 205, 207–209,

313, 384, 435processing 24–25, 49, 51, 186, 207,

401Dual-Processing Model 148, 156statistical processing 384; see also

probabilitytolerant processing 378

see also parsingproduction 186, 206

production complexity 207production errors 25

productivity 89, 156, 169, 214, 221,229, 243, 251partial productivity 226semi-productivity 214, 235, 243

pronouns 10, 17negative indefinite pronouns 427see also relative clause: relative pro-

nounsproper names 74, 85proto-patterns 20prototypes 12, 214–215

raising 39–40, 44, 49Raritätenkabinett, Grammatisches 11,

32rarity 411–412, 433, 435, 437

centres of rarity 422–423Group Rarity 420, 437mean rarity 418rare languages 429Rarity Index 414–416, 437–438,

440reanalysis 9, 221reduplication 427, 433reflexivity 294, 296

long-distance reflexives 46regularity

subregularities 59, 66, 163, 165,170;

see also subsystemtypological regularities 133see also exceptions: regularising ex-

ceptionssee also irregularity

Relational Grammar 38, 41, 49relative clause 33, 41, 49, 177, 200

free relatives 339–359, 363, 370,374

non-restrictive relative clauses 178

Subject index 447

non-subject relative clauses 177,205

relative pronouns 51, 342, 427, 437relative-clause types 197relativiser 177–178, 197–198, 205,

207–209restrictive relative clauses 178

relevance 153, 157, 165, 171repair 17routinisation 199rules 3, 175, 266

competing rules 7; see also compe-tition

dialectical nature of the relationshipbetween rules and exceptions 4

Elsewhere Rule 8, 12, 59mal-rules 382, 405movement rule 39P-Rules 62relaxation rules 383; see also con-

straints: constraint relaxationrules of referral 132

Sezer Stress Rule 73, 85transformational rules 31, 39, 47salience 188schemas 156sound change 142, 144, 156

Sound Laws 6–7sporadic change 7

speakers’ judgements 293, 311, 328,333, 366, 370, 372, 404see also acceptability

specification 67Radical Underspecification 82

see also lexicon: lexical (pre-)specification

see also Maximum Underspecifica-tion (MU) model

see also morphology: morphosyntac-tic specification

Sprachbünde see areal relationshipsStandard Average European 428

storage 165Stress Assignment 32, 59–87Sturtevant’s Paradox 7subclass 25, 31–34, 37, 43–46, 50,

115, 221, 326see also superclass

subjects see syntactic relationssubsystem 4–5, 12–14, 17, 20, 22, 24,

89see also regularity: subregularitiessee also subclass

superclass 31–34, 43–46, 50–51see also subclass

superiority 300, 306, 318, 329–330suppletion 115, 120, 128–129, 136,

139, 141–142, 150, 153, 171syncretism 112, 115, 128, 130–132,

136, 154syntactic relations

object coreference 296, 313, 318oblique subjects 216, 252theme/patient subjects 223see also case

tensenominal tense 47past tense 33perfect 425preterite loss 155tense forms 21

typologysyntactic typology 257typological consistency 264typological generalisations 132–133,

257see also regularity: typological regu-

laritiesthat-trace effect 304–306, 318Typed Feature Logic 384, 386, 394

underdetermination 3, 20see also grammatical indeterminism

underlying representation 62, 84underspecification see specification

448 Subject index

unification 386–387, 392, 394Universal Grammar (UG) 10, 132–133,

137, 258, 265, 269–270, 272, 274,286, 328

universalsGreenberg-type universals 9statistical universals 9see also probabilitysee also Universal Grammar (UG)

V2-position 22–23variation 331, 343–344, 357, 361–364,

367, 369

verb classesathematic verbs 139–140, 146, 170modal verbs 18psych-verbs 13rückumlaut verbs 143strong verbs 15, 17

verb-particle constructions 39, 44–48Verner’s Law 6, 142, 150vowel harmony 32, 59–87

weight-sensitive stress system 427well-formedness 311, 313–314, 328,

339

Language index

Abkhaz 10Afrikaans 17Algonquian languages 44–45Ambrym 10Amharic 34, 264–265, 271Arabic 5, 264

Bemba 200–201, 209Berber 268Blackfoot 45Burmese 260

Catalan 99Caucasian, Northwest 412Chamorro 268Chinese, Mandarin 46, 258–261, 284

Danish 144, 222Dravidian 62Dutch 5, 142–144, 150–151, 154–155,

164, 246, 326, 418, 424, 426Middle 142–143

Duwamish, 35

English 5, 12, 15, 17–18, 20–21, 31,33–34, 39, 41, 44–45, 47–49, 60–63,97, 130–131, 142, 144, 149, 150–152, 154–155, 175–195, 201, 213–214, 217, 244, 246, 256, 264, 266,268, 271, 285, 300, 302–304, 326,329–330, 336, 356, 364, 366, 393,403, 422–426, 428Early Modern 245Middle (ME) 16, 149Old (OE) 7, 149, 245

Esperanto 14Estonian 245

Faroese 25, 144, 213–241, 243–244,251–252

Finnish 79, 128, 245, 247, 271French 5, 16, 163, 165, 171–172, 261,

264, 266, 285, 390, 422, 424, 426,427–428Prince Edward Island 261

Frisian 147–148, 150, 155, 422, 424West 145, 148North 144–145, 170Old 147–148

German 5–8, 12–15, 17–18, 20–21, 25,33, 51, 100–101, 103–104, 106, 142,149–150, 153, 156, 163, 165, 171,217, 243–244, 246, 251, 252, 264,294, 325, 327, 329, 330–331, 336,364, 366, 369–370, 372–373, 422,424, 427Bavarian 11Early New High (ENHG) 149, 156,

170Low 145, 170Middle 245Middle High (MHG) 16, 146, 149,

170Middle Low 150New High (NHG) 142, 146, 150–

152, 154, 155Old High (OHG) 16, 20, 142, 146,

149, 152, 164, 170–171, 245–246Southern 261Spoken 19, 144–145, 151Swiss German 144–145, 151, 170,

365Germanic (GMC) 15, 17, 24, 142, 169,

326Continental West 428

Germanic languages 139–162, 217Gothic 6

450 Language index

Greek 60, 284, 342, 373, 392, 404Ancient 6, 392, 404Modern 342, 373

Hixkaryana 47Hungarian 36, 40–41, 97–99

Icelandic 25, 32, 48–49, 89, 130, 144–145, 170, 213–241, 243–244, 246–247, 251–252, 261

Indo-European (IE) 267Proto- 7, 15, 142, 150, 163

Indonesian 260–261Irish 34, 264Italian 60, 163–165, 267, 284

Palermo dialect 98

Japanese 331

Karen 274Kartvelian 412Kirghiz 32Latin 73, 113–114, 163, 165, 171Latvian 413–414Luxembourgish 144–145, 150, 154,

170–171

Malay, Singapore 46Malayalam 62Maltese 113, 128Mura 35

Niger-Congo languages 35Nordic, Proto- 247Norwegian 144, 150, 422

East Norwegian dialect 116

Ojibwe 10

Polish 245Portuguese, Brazilian 259, 284

Potawatami 47Puget Sound 35

Quechua 40–41, 44Quileute 35

Romance 163, 169, 171, 267, 286dialects 268

Rotokas 35Rumantsch 129Russian 16, 118, 245–246

Sango 274Sanskrit 62Saramaccan 261Serbian 120–121Serbo-Croatian 245Slovene 112, 115, 132, 393Snoqualmie 35Somali 47Spanish 171, 284Sranan 261Swedish, 144, 146, 149–150, 152, 217,

245Old 245

Tagalog 89Thai 260Tsez 44–45Turkish 25, 32, 59-94, 245

Anatolian dialects 65Istanbul dialect 65, 74

Tuvan 79

Wari’ 412Welsh 264, 268

Yiddish 60–61, 63

Zulu 32

89378429 exceptions in grammar

Documents