a computational theory of writing systems
TRANSCRIPT
Contents
1 ReadingDevices 11.1 Text-to-SpeechConversion:a Brief Introduction. . . . . . . . . . . . 21.2 TheTaskof PronouncingAloud: a Model . . . . . . . . . . . . . . . 5
1.2.1 A simpleexamplefrom Russian . . . . . . . . . . . . . . . . 61.2.2 Formaldefinitions . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2.1 AVMâ sandAnnotationGraphs . . . . . . . . . . . 101.2.2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . 121.2.2.3 Axioms . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 Centralclaimsof thetheory . . . . . . . . . . . . . . . . . . 141.2.3.1 Regularity . . . . . . . . . . . . . . . . . . . . . . 141.2.3.2 Consistency . . . . . . . . . . . . . . . . . . . . . 17
1.2.4 Furtherissues. . . . . . . . . . . . . . . . . . . . . . . . . . 191.2.4.1 Why a constrainedtheoryof writing systems? . . . 191.2.4.2 Orthographyandtheâsegmentalâassumption . . . 21
1.3 TerminologyandConventions . . . . . . . . . . . . . . . . . . . . . 231.A Overview of FSAâsandFSTâs . . . . . . . . . . . . . . . . . . . . . 27
1.A.1 Regularlanguagesandfinite-stateautomata. . . . . . . . . . 271.A.2 Regularrelationsandfinite-statetransducers . . . . . . . . . 28
2 Regularity 332.1 PlanarRegularLanguages . . . . . . . . . . . . . . . . . . . . . . . 342.2 TheLocality Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 412.3 PlanarArrangements:Examples . . . . . . . . . . . . . . . . . . . . 42
2.3.1 KoreanHankul . . . . . . . . . . . . . . . . . . . . . . . . . 422.3.2 Devanagari . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.3.3 Pahawh Hmong . . . . . . . . . . . . . . . . . . . . . . . . . 462.3.4 Chinese. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3.5 A counterexamplefrom AncientEgyptian . . . . . . . . . . . 52
2.4 Cross-Writing-SystemVariationin theSLU . . . . . . . . . . . . . . 532.5 MacroscopicCatenation:Text Direction . . . . . . . . . . . . . . . . 562.A ChineseCharacters. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
v
vi CONTENTS
3 ORL Depth and Consistency 673.1 RussianandBelarusianOrthography. . . . . . . . . . . . . . . . . . 68
3.1.1 Vowel reduction . . . . . . . . . . . . . . . . . . . . . . . . 683.1.2 Regressivepalatalization. . . . . . . . . . . . . . . . . . . . 733.1.3 Lexical markingin Russian . . . . . . . . . . . . . . . . . . 753.1.4 Summaryof RussianandBelarusian. . . . . . . . . . . . . . 77
3.2 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.3 Serbo-CroatianDevoicing . . . . . . . . . . . . . . . . . . . . . . . 86
3.3.1 Methodsandmaterials . . . . . . . . . . . . . . . . . . . . . 873.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4 Cyclicity in Orthography . . . . . . . . . . . . . . . . . . . . . . . . 913.5 SurfaceOrthographicConstraints. . . . . . . . . . . . . . . . . . . . 933.A EnglishDeepandShallow ORLâs . . . . . . . . . . . . . . . . . . . . 96
3.A.1 Lexical representations. . . . . . . . . . . . . . . . . . . . . 963.A.2 Rulesfor thedeepORL . . . . . . . . . . . . . . . . . . . . 1223.A.3 Rulesfor theshallow ORL . . . . . . . . . . . . . . . . . . . 125
4 Linguistic Elements 1274.1 Taxonomies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.1.1 Gelb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.1.2 Sampson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1284.1.3 DeFrancis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.1.3.1 No full writing systemis semasiographic. . . . . . 1314.1.3.2 All full writing is phonographic. . . . . . . . . . . 1314.1.3.3 Hankulis not featural . . . . . . . . . . . . . . . . 132
4.1.4 A new proposal. . . . . . . . . . . . . . . . . . . . . . . . . 1354.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.2 ChineseWriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1394.3 JapaneseWriting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1464.4 SomeFurtherExamples. . . . . . . . . . . . . . . . . . . . . . . . . 151
4.4.1 Syriacsyame . . . . . . . . . . . . . . . . . . . . . . . . . . 1514.4.2 Reduplicationmarkers . . . . . . . . . . . . . . . . . . . . . 1524.4.3 Cancellationsigns . . . . . . . . . . . . . . . . . . . . . . . 153
5 PsycholinguisticEvidence 1575.1 OrthographicDepth . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.1.1 Evidencefor theOrthographicDepthHypothesis . . . . . . . 1625.1.2 EvidenceagainsttheOrthographicDepthHypothesis. . . . . 163
5.2 âShallowâ Processingin âDeepâ Orthographies. . . . . . . . . . . . 1645.2.1 Phonologicalaccessin Chinese . . . . . . . . . . . . . . . . 1655.2.2 Phonologicalaccessin Japanese. . . . . . . . . . . . . . . . 1665.2.3 Evidencefor thefunctionof phoneticcomponentsin Chinese 1685.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.3 ConnectionistApproaches . . . . . . . . . . . . . . . . . . . . . . . 1705.3.1 Outlineof themodel . . . . . . . . . . . . . . . . . . . . . . 171
CONTENTS vii
5.3.2 Whatis wrongwith themodel . . . . . . . . . . . . . . . . . 1745.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6 Further Issues 1796.1 Adaptationof Writing Systems. . . . . . . . . . . . . . . . . . . . . 1806.2 OrthographicReforms . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.2.1 The1954spellingrules . . . . . . . . . . . . . . . . . . . . . 1866.2.2 The1995spellingrules . . . . . . . . . . . . . . . . . . . . . 188
6.3 NumericalNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1906.4 AbbreviatoryDevices . . . . . . . . . . . . . . . . . . . . . . . . . . 1966.5 Non-BloomfieldianViewson Writing . . . . . . . . . . . . . . . . . 2016.6 Postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
List of Figures
1.1 A partial linguistic representationfor thesentencein (1.1). Shown area pho-netic transcription,a prosodicanalysisinto two intonationalphrases( ďż˝ ) andoneutterance(U), accentassignment(*), a setof part of speechtags,andasimplephrase-structureanalysis.PhoneticsymbolsareIPA. Note that âMPâmeansâmeasurephraseâ.. . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 A simple FST implementingthe rewrite rules in (1.5). In this examplethemachinehasa singlestate(0), which is both an initial anda final state.Thelabelson theindividual arcsconsistof an input label (to theleft of thecolon)and an output label (to the right). Here, capital Romanlettersare usedtorepresenttheequivalentCyrillic letters.. . . . . . . . . . . . . . . . . . . 15
1.3 An acceptorfor ����� . Theheavy-circledstate(0) is (conventionally)theinitialstate,andthedouble-circledstateis thefinal state. . . . . . . . . . . . . . 28
1.4 An FSTthatacceptsa:c (b:d)ďż˝ . . . . . . . . . . . . . . . . . . . . . . . 30
1.5 Threetransducers,where������ ������ . . . . . . . . . . . . . . . . . . 31
2.1 ��������������� ����! �"���$#%�����&����'(� . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Anotherfiguredescribedby ���������)����� ��� ! �"���$#%�������*��'+�-, . . . . . . . . . . 35
2.3 Thefive planarconcatenationoperations:
(a) �����������.��� ��� ; (b) �������0/��1�*� ��� ; (c) ��������! ����� ��� ; (d) �������&2 ����� �*� ; (e)�������435��� �*� . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4 A 2FSAthatrecognizesFigure2.2.ThelabelsâRâ andâDâ onthearcsdenotereadingdirection; âleftâ on state0 (the initial state)denotesthe position atwhich scanningbegins. . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5 A 2FSTthatmapstheexpression���*#6' to ������� � �87 ��� ����! �*7 ���$#+� � ������'(�:9)9 . . 41
2.6 Thesyllables ; mos< /mo/ âcannotâ,and ; cal< /cal/ âwellâ in Hankul. . . . 44
2.7 A two-wordMayanglyphrepresentingtheadjectivesequenceyaxchâul âgreensacredâ, after (Macri, 1996, page 179), with the arrangement=?>ďż˝@ ���7 ACB4DFEG! ďż˝ Hďż˝I$9 Perconvention, the capitalizedglossrepresentsa logographicele-ment,andthelower caseglossesphonographicelements. . . . . . . . . . . 55
2.8 Threemethodsof wrappingthevirtual tape:(a) standardnon-boustrophedon;(b) boustrophedon;(c) invertedboustrophedon.. . . . . . . . . . . . . . . 59
ix
x LIST OF FIGURES
3.1 Waveformandvoicing profile for oneutteranceof gradski âurbanâ. Theclo-surefor the/d/ is labeledasâdclâ, thevoicing offsetis labeledasâvâ, andthestartof the /s/ is labeledasâsâ. Thevoicing profile is the third plot from thetop in thesecondwindow, labeledasprob voice. . . . . . . . . . . . . . . 89
3.2 Barplot showing the proportionsof voicing for all samplesof underlying/d/(blackbars),versus/t/ (shadedbars). . . . . . . . . . . . . . . . . . . . . 90
4.1 The taxonomyof Gelb (1963),alongwith examplesof writing systemsthatbelongto eachcase.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.2 Thetaxonomyof Sampson(1985). . . . . . . . . . . . . . . . . . . . . . 1304.3 Featuralrepresentationof KoreanHankul, from (Sampson,1985,page124),
Figure19. (Presentedwith permissionof Routledge/StanfordUniversityPress.)1334.4 DeFrancisâclassificationof writing systems. . . . . . . . . . . . . . . . . 1344.5 A non-arborealclassificationof writing systems.OnPerso-Aramaic,seeSec-
tion 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.1 A modelof readinga word aloud,simplifiedfrom (BesnerandSmith,1992).. 1605.2 TheSeidenberg andMcClellandmodelof lexical processing,(Seidenberg and
McClelland,1989,page526),Figure1. Usedwith permissionof theAmericanPsychologicalAssociation,Inc.. . . . . . . . . . . . . . . . . . . . . . . 172
5.3 The implementedportion of Seidenberg and McClellandâs model of lexicalprocessing,(Seidenberg andMcClelland, 1989, page527), Figure 2. Usedwith permissionof AmericanPsychologicalAssociation,Inc.. . . . . . . . . 173
5.4 Replicationof the (Seidenberg, McRae,andJared,1988) experiment,from(Seidenberg andMcClelland,1989,page545),Figure19. Usedwith permis-sionof AmericanPsychologicalAssociation,Inc.. . . . . . . . . . . . . . 176
6.1 A numeralfactorizationtransducerfor numbersup to 999. . . . . . . . . . 1936.2 Two-dimensionallayoutin ane-mailsignature . . . . . . . . . . . . . . . 203
List of Tables
2.1 Chinesecharactersillustratingthefivemodesof combinationof semantic(un-derlined)andphoneticcomponents. . . . . . . . . . . . . . . . . . . . . 34
2.2 Full anddiacritic forms for Devanagarivowels, classifiedby the positionofexpressionof thediacritic forms. Thusâafterâ meansthatthediacritic occursaftertheconsonantcluster, âbelowâ, below it, andsoforth. . . . . . . . . . 45
2.3 Distribution of placementof semanticcomponent(Kang-Xi Radical)among12,728charactersfrom theTaiwanBig5 characterset.. . . . . . . . . . . . 49
2.4 Distribution of placementof semanticcomponent(Kang-Xi Radical)among2,596charactersfrom theTaiwanBig5 characterset. . . . . . . . . . . . . 50
2.5 Linguisticunitscorrespondingto Mayancomplex glyphs,from (Macri, 1996);glossingconventionsfollow thoseof Macri. In somephrasalcasesit is unclearwhetherthe unit in questionis really a constituent:suchcasesareindicatedwith a questionmark. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.1 The ten most frequentlexical markingsfor the shallow ORL in the Englishfragment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2 Thetenmostfrequentlexicalmarkingsfor thedeepORL in theEnglishfragment. 85
4.1 Disyllabic morphemescollectedfrom theROCLING corpus(10million char-acters)and10 million charactersof the United Informaticscorpus. This setconsistsof pairsof charactersoccurringat leasttwice, andwhereeachmem-berof thepair only cooccurswith theother. . . . . . . . . . . . . . . . . 154
4.2 Furtherdisyllabic morphemescollectedfrom the ROCLING corpus(10 mil-lion characters)and10 million charactersof the United Informaticscorpus.This setconsistsotherpairsof charactersthat do not exclusively occurwitheachother, but wherethereis nonethelessa high mutual informationfor thepair. NotethatLV ( J ) indicatesthatthephoneticcomponentin questionoccurs9 out of 38 timesin characterspronouncedwith initial /l/ followed by somevowel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
xi
xii LIST OF TABLES
4.3 A sampleof Japanesekokuji (secondcolumn),with their componentialanal-ysis (third column). Thefirst columnis theentrynumberin Alexanderâs list(LehmanandFaust,1951). The fourth columnlists thephonetic,if any. Thefifth columnlists thekun pronunciation,andthesixth columntheon pronun-ciation, if any: in onecaseâ gozaâ thereis no kun pronunciation. Thelast threekokuji shown areformedassemantic-phoneticconstructs,with thelast two beingbasedon the kun pronunciation:note that the phoneticcom-ponentof masaalsomeansâstraightâ, so it is possiblethat this oneis alsoasemantic-semanticconstruct.. . . . . . . . . . . . . . . . . . . . . . . . 156
6.1 Total counts in Cregeenâs dictionary of words spelledwith initial ; Ch < ,whereC denotesaconsonant,andthenumberandpercentagesof thosewordsthat areminimal pairswith homophonicor close-homophonicwordsspelledwithout the ; h < . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Preface
Most generalbookson writing systemsarewritten or editedby scholarswho arespe-cialistsin a small subsetof thewriting systemsthat they cover, andwho have devel-opedtheirviewsonwriting in generalbasedontheirown experiencein theirparticularspecializedarea.
Thisbookis different:I cannotclaimto beanexpertonanyparticularwriting sys-tem. My interestin writing systemsstemsin part from my interestin text-to-speechsynthesissystems,and in particularthe problemof converting from written text in-to a linguistic representationthat representshow that text would be read. Given thatproblem,it is naturalto inquireabouttheformalnatureof therelationshipbetweenthewrittenform, andthelinguisticrepresentationthatthewrittenform encodes:Whatlin-guisticelementsdowrittensymbolsencode?Do writing systemsdiffer in theabstract-nessof thelinguistic representationencodedby orthography, andif sohow? Whataretheformal constraintson themappingbetweenlinguistic representationandwriting?Someof theseissueshave,of course,beenaddressedelsewhere,thoughusuallyin aninformal fashion.Thisbookis anattemptto answerthesequestionsin thecontext of aformal,computationaltheoryof writing systems.
One point that needsto be madeat the outsetis that this book is not intendedasan introductionto the topic of writing systems.Therearemany excellentbooksthatserve thatpurpose,including(Sampson,1985),(Coulmas,1989)and(DeFrancis,1989).Special mentionmustbegivento thesuperbcollectionin (DanielsandBright,1996),without which thepresentbookwould not have beenpossible.Thus,while Idodiscussaspectsof severalwriting systemsin someamountof detail,therearealsoanumberof writing systemsthatarediscussedin lessdetail.Thereaderunfamiliarwiththegeneralpropertiesof thewriting systemsdiscussedhereis urgedto consultoneofthemany generalintroductionsto thetopic,suchasthosecitedabove.
In preparingthis work, I have benefitedgreatly from discussionswith andcom-mentsfrom a numberof colleagues,listedherein alphabeticalorder:HaraldBaayen,Alan Black,WaylesBrowne,Roy Harris,LeonardKatz,GeorgeKiraz, KazuakiMae-da, Anneke Neijt, Elena Pavlova, Geoffrey Sampson,Chilin Shih, Brian Stowell,RobertThomson,J. MarshallUnger, andJenniferVenditti. I would especiallyliketo thankSteven Bird, who readthroughtwo whole draftsof this work, andgave meextensive commentson both. I alsoacknowledgean anonymousreviewer for Cam-bridge University Press. Portionsof this work were presentedat the University ofArizona,andat CharlesUniversity in Prague,andI thankaudiencestherefor useful
xiii
xiv PREFACE
commentsandquestions.I alsothankJuergenSchroeterfor helpin usinghisrecordingsetupfor theexperimentreportedin Section3.3.
Thetechnicalproductionof thisbookdependedheavily uponseveralfreeor publicdomainresourcesincludingdatabasesandsoftware. I amindebtedto Rick Harbaugh(developerof www.zhongwen.com) for kindly allowing me accessto his dataonChinesecharacterstructure. Several of the moredetailedanalysesin this book, in-cluding the treatmentof Englishin Section3.2 andof Chinesein Section2.3.4wereimplemented,andtheseimplementationsdependeduponthefsmlibrary developedbymy colleaguesat AT&T Labs,Michael Riley, FernandoPereiraandMehryarMohri.Chinesecharacterswereincorporatedinto LATEXusingStephenSimpsonâsPMC pack-age;for DevanagariI usedFransVelthuisâs devtex package;Visible Speechfontsaredue to Mark Shoulson. Editing of figuresand graphicswere doneusing VectaportInc.âs idraw, JohnBradleyâsxv, andDavor Maticâsbitmap.
Finally I would like to thankmy editorat CambridgeUniversityPress,ChristineBartels,for hersupportfor this project.
Richard SproatFlorhamPark, New Jersey
September1, 1999
Chapter 1
ReadingDevices
Ourstartingpoint for thisstudyof writing systemsis text-to-speechsynthesisâ TTS,andmorespecificallythecomputationalproblemof convertingfrom written text intoa linguistic representation.While the connectionbetweenTTS systemson the onehand,andwriting systemson theothermaynot beimmediatelyapparent,a momentâsreflectionwill make it clearthattheproblemto besolvedby a TTSsystemâ namelytheconversionof written text into speechâ is exactly thesameproblemasa humanreadermustsolve whenpresentedwith a text to be readaloud. And just aswritingsystems,their properties,and the ways in which they encodelinguistic informationareof interestto psycholinguistswho studyhow peopleread,so(in principle)shouldsuchconsiderationsbe of interestto thosewho developTTS technology:at the veryleast,it oughtto beof asmuchinterestas,for example,understandingthephysiologyand acousticsunderlyingspeechproduction,somethingthat early speechsynthesisresearcherssuchasFant(Fant,1960)wereheavily involvedin.1
Sincemy startingpoint is TTS, andsinceI assumethatmostreaderswill not befamiliar with this field, I will start this chapterwith a review of someof the issuesrelevantto thedevelopmentof TTS systems,particularlyasthey relateto theproblemof analyzinginput text. This will be the topic of Section1.1. In Section1.2 I willinformally introduce,by wayof asimpleexample,themodelthatI shallbedevelopingthroughoutthe restof this book. Finally, Section1.3 will introducesomeaspectsoftheformalism,andtheconventionsthatwill beusedthroughoutthisbook.
1It will perhapscomeasno surprisethatTTS researchershave not, in fact,generallybeenoverly inter-estedin writing systems.This is undoubtedlyduein partto therelatively low interestin text-analysisissuesin generalin theTTS literature,at leastascomparedto thehigh level of interestin suchmattersasprosody,intonation,voice quality andsynthesistechniques.It alsois undoubtedlyrelatedto the fact that muchofthework on TTS is driven by ratherpracticalaims(e.g. building a working system),whereanoveractiveinterestin theoriesof writing systemsmightappearto beanunnecessaryluxury.
1
2 CHAPTER1. READING DEVICES
1.1 Text-to-SpeechConversion: a Brief Intr oduction
As notedabove, thetaskof a TTS systemis to convertwritten text into speech.Nor-mally thewritten representationis in theform of anelectronictext â codedin ASCII,ISO, JIS, UNICODE or someotherstandarddependinguponthe languageandsys-tembeingused;this circumventsoneproblemthathumansmustsolve,namelythatofvisually recognizingcharactersprintedon a page.2 Similarly the output is a digitalrepresentationof speech.Betweenthesetwo representationsarenumerousstagesofprocessing,whichit is profitableto classifyinto two broadstages.Thefirst stageis theconversionof thewritten text into aninternallinguistic representation;andthesecondis the conversionfrom that linguistic representationinto speech.The latter consistsof computingvariousphoneticandacousticparameters,includingsegmentalduration,FK (âpitchâ) trajectory, propertiesof the outputspeechsuchasspectraltilt or glottalopenquotient,and(in concatenative speechsynthesissystems)selectionof appropri-ateacousticunits,or (in formant-basedsynthesissystems)thegenerationof vocal-tracttransferfunctionsappropriateto theintendedsounds.We will have nothingfurthertosayabouttheseissueshere;thereaderis referredto (Dutoit, 1997)for a goodgeneralintroductionto theseissues,andalso to (Allen, Hunnicutt,andKlatt, 1987;Sproat,1997b)for an overview of how two particularsystems(theMITalk system,andtheBell LabsTTS system)work.
In any TTS systemthe output speechwill be generatedfrom an annotatedlin-guistic representation,which is in turn derived from input text via the first stageofprocessingdefinedabove. How rich a linguistic representationis presumed(and intermsof which linguistic theoriesandassumptionsit is couched)differsfrom systemto system,of course,but we mayat leastassumethatthelinguistic representationwillincludeinformationon the sequenceof soundsto be enunciated(usuallyallophonesof phonemes,but in somesystemswhole syllable-sizedunits), lexical stressor toneinformation, word andphrase-level accentuationand emphasis;and the location ofvariousprosodicboundaries,includingsyllableandprosodicphraseboundaries.Thusfor an input suchasthat in (1.1),we might presumeasa plausible(partial) linguisticrepresentation,therepresentationin Figure1.1.
(1.1) I need2 oz. of Valrhonaand6 anchosfor themole.
In the particularrenditionof the sentencepresumedin Figure 1.1 thereare two in-tonationalphrases(denotedby L ) groupedinto a singleutterance(U). Lexical stressis indicatedby a metrical treedominatingindividual syllables( M ) anddominatedit-self by a prosodicword ( N ); we assumethatproclitics form a prosodicword with thefollowing contentword. Also indicatedare lexical accentsfor the wordsneed, two,ounces, Valrhona, six, anchosandmole.
In orderto producethis representation,or any equallyplausiblerepresentation,forthis sentence,a readermustâreconstructâa greatdealof linguistic informationthat is
2Of course,it is possibleto hookup a TTS systemto anoptical character recognition (OCR)system;suchsystemshave in fact beenavailable for several yearsin the form of page-readersfor the blind (e.g.Kurzweilâs reader);andtherehasbeenmuchrecentinterestin conversionof FAX into speech,which addsyet a furthercomplication,namelymessyinput.
1.1. TEXT-TO-SPEECHCONVERSION:A BRIEF INTRODUCTION 3
nid tu moleĂ°
I need two oz. of ValrhonaO
and 6 anchos forP
the mole
w s w s w s w
Ď Ď Ď Ď Ď Ď Ď Ď Ď Ď Ď Ď Ď Ď Ď Ď
Ď Ď Ď Ď Ď Ď Ď Ď
w w s w ws w
sss s
*Q
*Q
*Q
*Q
*Q
*Q
VR
Num N P N CnjS
Num N P Det N
Κ Κ
U
MP PP NP NP
PP
NP
VPR
Pro
ST
*Qa Is ksI
e v e n ef r
Ď
eawns z
eeva elron a âŤU
ent oz
NP
Figure1.1: A partial linguistic representationfor thesentencein (1.1). Shown area phonetictranscription,aprosodicanalysisinto two intonationalphrases( ďż˝ ) andoneutterance(U), accentassignment(*), a setof part of speechtags,anda simplephrase-structureanalysis.PhoneticsymbolsareIPA. NotethatâMPâ meansâmeasurephraseâ.
4 CHAPTER1. READING DEVICES
simplynotrepresentedin thewrittenform. Naturallyall syntacticinformation,includ-ing both the morphosyntacticpart of speechtagsaswell asphrasestructuremustbecomputed.Somusta greatdealof thephonologicalinformation.So,thesequenceofphoneticsegmentsareonly somewhat indirectly representedin Englishorthography:in somewritten formssuchas2, 6 andoz. they cannotbesaidto berepresentedatall.In the latter casethe linguistic form mustbe reconstructedentirely from thereaderâsknowledgeof the language,andoftendependsuponinformationaboutcontext (doesonesayounceor ounces?). In somecasesreadersmayneedto makeeducatedguessesaboutthepronunciationsof somewords,thoughif thesefollow thenormalpronunci-ationconventionsof thelanguagethey will usuallyguesscorrectly:evenreaderswhohad not previously seenthe words anchos or Valrhona could nonethelessprobablyhave guessedthecorrectpronunciation.For moleâ in thesenseof a Mexicansauce,andpronounced/ VmoleI/ â thesituationis morecomplex sincethepronunciationheredoesnotfollow standardEnglishconventions:in thiscaseonewouldsimplyhaveto befamiliarwith theword. But thereis of courseanadditionalproblemherein that,asinthecaseof oz., onemustalsodisambiguatethis word,sothatonedoesnot pronounceit asthehomographic/ Vmol/ (e.g.,in thesenseof a speciesof insectivore).
Prosodicphrasingis rarelyrepresented;notethatpunctuationis only partlyusedinthis function(Nunberg, 1995),andin any caseit is by no meansconsistentlyusedinevery casewhereonemight plausiblyfind a prosodicboundary. Lexical accentuationis almostnever indicated.3
Thus, if one is designinga TTS systemthat canhandlearbitrary text in a givenlanguage,it is generallynecessaryfor the systemto possessa large amountof lin-guistic knowledge,including knowledgeaboutthe lexical andphrasalphonologyofthelanguagein question,andat thevery minimuma setof heuristicsfor determiningplausiblelocationsfor accentsandprosodicphraseboundaries(Dutoit, 1997;Sproat,1997b).
If one,furthermore,is developinga TTS systemthat is intendedto be adaptableto morethanonelanguage,thenthereis anadditionalconsideration:not only do thewritten formsof utterancessystematicallyfail to indicatemany aspectsof thespokenforms, but differentwriting systemspresentdifferentsetsof problems.Thus,if onedesignsaTTSsystemwith Europeanlanguagesin mind,onemightreasonablyassume(asmany havedone)thatwordsin theinput text areseparatedby whitespace.But thisassumptionwill fail with writing systemslike thoseof Chinese,Japaneseor Thai,whereword boundariesareneverwritten. (Seethediscussionof variousAsianscriptsin (DanielsandBright, 1996),andsee(Sproatet al., 1996) for a discussionof the
3It is generallytrue that suprasegmentalandprosodicinformation is systematicallyomitted from theorthographiesof a large variety of languages.This is particularlytrue for high level prosodicinformationsuchasprosodicphraseboundaryplacement,andaccentuationandprominence.But it extendsto purelylexically determinedfeaturessuchaslexical tone.Thuswhile somelanguages,suchasThai,VietnameseorNavajo,do indicatelexically distinctive tonein theirorthographies,it seemsto befarmorecommonto omitthis feature:for examplemany orthographiesdevelopedfor tonal languagesof Africa omit marksof tone,thoughit shouldbenotedthatmany of thesescriptsweredevelopedby Europeanmissionarieswho hadnounderstandingof tone: see(Bird, 1999)for a discussionof morerecentlydevelopedAfrican orthographieswheretoneis marked.
A relatedpoint, asGeoffrey Sampsonhasnoted(personalcommunication),is that Latin did not marklengthin vowels(thoughgeminationin consonantswasmarked).
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 5
issuein acomputationalsetting.) Similarly, for many languagesonemayassumethatabbreviationsandnumberscanbe expandedin a âpreprocessingâphaseprior to fulllinguisticanalysis.For English(or Chinese)this(almost)worksin thatanabbreviationsuchasoz. hasonly two plausibletranslations,namelyounceor ounces, andin mostcasessomesimpleheuristicsbasedon the context can tell you which one it shouldbe.But asI havediscussedat lengthelsewhere(Sproat,1997b;Sproat,1997a),suchasimpleapproachcannotwork for Russian,wherein orderto decidehow to pronouncea seeminglyinnocuoussequencesuchas5%, oneneedsto determinesuchthingsaswhetherthe percentageexpressionis modifying a following noun(âa 5% discountâ)or not (âI need5%â). In the former casethe â5%â phraseis an adjective agreeingincase,numberandgenderwith thefollowing noun;in the lattercase,it is a noun,andits casenumberandgenderis determinedby thesyntacticcontext in which it occurs.Thus the expressionWYX[Z(\ďż˝]ďż˝^ďż˝\ďż˝_a` 5% skidkab â5% discountâ, is readas pjat+i-procent+n+ajaskidka(five+Gen-percent+Adj+NomFemdiscount),with anadjectivalform procentnajaagreeingin number, genderandcasewith thefollowing noun. Thesimpleexpression 5%b on its own would bereadas pjatâ procent+ov (five+Nompercent+GenPl),with a nominalprocentov in thegenitiveplural form. If theexamplewere ` 4%b , the word for âpercentâwould have to be in the genitive singular form:cetyreprocent+a(four+Nompercent+GenSg).If theâpercentphraseâis governedbyan element,suchasa preposition,requiringan obliquecase,thenthe entirephrase,including the numberandword for âpercentâ,mustappearin thatobliquecase:thusZcWYX5` s5%b âwith 5%â, is spjatâju procent+ami(with five+Instrpercent+InstrPl).
Considerationssuchastheseinevitably leadoneto askwhatcommonalitiesthereareamongthediversewritten representationsof language,andwhethera singlecom-putationalmodelcanencompassall systemsthatonemightencounter. A modelof thiskind for TTStext analysis,onethathasbeenappliedto languagesandwriting system-s asdiverseasGerman,Spanish,Russian,Hindi, ChineseandJapaneseis discussedelsewhere(Sproat,1997b;Sproat,1997a). The purposeof this book is to presentacomputationaltheoryof writing systemsthatwasmotivatedby thework on TTS,andthatis at leastto someextentconsistentwith themodelpresentedin thispreviouswork.
1.2 The Task of PronouncingAloud: a Model
We turnnow to sketchingthemodelof therelationbetweenwrittenandlinguistic for-m thatwe will developin this book. As implied by our discussionin the lastsection,we will, at leastinitially beconcernedwith specifyinga computationalmodelwhosetaskis to pronouncetext aloud.Thustheproblemwe startout with is essentiallywhatpsychologistswho studyreadingtermnamingâ thepronunciationaloudof a writtenform. This is in principlea differenttaskfrom thetaskof lexical accessvia a writtenform, andfrom thetaskof decidinghow to spella givenlinguistic form. Thecompu-tationalmodelof writing thatwe will proposewill nonethelesshave implicationsfortheseissuesalso: indeeda largeportionof thediscussionin Chapter3 will focusona modelof spellingfor English. We startherewith anexamplethatwill illustratethemodelto bedeveloped.
6 CHAPTER1. READING DEVICES
1.2.1 A simpleexamplefr om Russian
Most literatepeople,even thosewho aremonolingual,arebroadlyawarethat someorthographiesaremoreâregularâ thanothers;that,for example,Spanishorthographyis highly regular (âwritten asit soundsâ),andthat Englishorthography, on the otherhand,is highly irregular. This naive notionof regularity correspondsroughlyto whatpsychologiststerm orthographic depth. That is, psychologistsoften refer to an or-thographyasdeepif it is not generallypossibleto reconstructthepronunciationof awordby simplylookingatthestringof symbolsandapplyinggeneralâletter-to-soundârules; see(Frost,Katz, andBentin, 1987;BesnerandSmith, 1992;Katz andFrost,1992;Seidenberg, 1992), inter alia, aswell asthediscussionin Chapter5.4 Thus,in termsof themetaphorof depth,theorthographyof Spanish,is shallower thanthatof English(or Hebrew). With somelegitimacy we canconsiderSpanishandEnglishasbeingneartwo endsof a spectrumof possibleorthographicdepths.
Russianfalls somewherein betweenthesetwo extremes:it is not nearlyasirreg-ular asEnglish,but at the sametime it is not possibleto do asonecanin Spanish,andpredictthepronunciationof a word purelyby looking at theorthographicstring.Russianorthographyis oftendescribedasmorphological (Cubberley, 1996,page352),meaningthat thespellingsystemattemptsto representmorphologicallyrelatedformsconsistently, abstractingaway from at leastsomephonologicalchanges.As a corol-lary, a readerof Russianneedsaccessto this morphologicalinformation in order topronouncewordscorrectly.
To seewhat is meantby this, considerthe problemof pronouncinga particularletter string, say d4egfďż˝e"^ďż˝_h` gorodab . As it happens,this can representoneof twolexical forms in standardRussian:âof a cityâ (city+gen.sg.),in which caseit is pro-nouncedwith initial stress/ Vgori dj /; or âcitiesâ (city+pl.nom./acc.),in which caseitis pronouncedwith final stress/gi r j4Vda/. Thefact that therearetwo possiblepronun-ciationsfor the string d4egfďż˝e"^ďż˝_k` gorodab , shows immediatelythat it is not possibleto pronouncethis stringmerelyby looking at the sequenceof letters:onemusthaveaccessto lexical information,andin this caseonepresumablyneedsaccessto someinformationaboutthecontext in whichthewordoccurs,sincethereaderneedsto deter-minewhetherthegenitivesingularor plural nominative/accusativeis themoreappro-priateinterpretation.5 Not surprisingly, giventhehigh degreeof lexical competenceneededto be able to assignlexical stressin Russianwords, pedagogicalgrammarsof Russianroutinelymarkstressplacement.Thusthegenitivesingularform would bewritten d ´egfďż˝e"^ďż˝_8` gorodab , whereasthegenitiveplural form wouldbewritten d"egfďż˝e"^ ´_` gorodab . But suchmarksof stressarerarelyusedin non-pedagogicalcontexts. Innot markingstress,Russianorthographythusfails to mark informationthat is impor-tantfor gettingthereadingcorrect;to usea termsuggestedto meby AnnekeNeijt, itscoverageof thephonologicalinformationis incomplete.
But Russianorthography, in additionto its incompletecoverage,is alsorelatively
4An alternative termto orthographicdepth, namelyorthographictransparency, is gainingsomecurren-cy; LeonardKatz,personalcommunication.
5Notealsothat this particularambiguitybetweengenitive singularandplural nominative/accusative âwith concomitantshift in lexical stressâ is by no meansgeneralin Russian:only a subsetof nounsshowthis particularambiguity, thoughothercasesof stress-relatedminimalpairsarerife in thelanguage.
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 7
âdeepâ in thattherearestress-relatedvowel reductionsthatarenot markedin Russianorthography:note for example,that the first /o/ in d4egfďż˝e"^ďż˝_l` gorodab shows up as/o/ whenstressed,asin the genitive singularform; but as/ i / whendestressed(morecorrectly, whenin the syllableantepenultimateto the stress(Wade,1992)),asin thenominative/accusativeplural. Thesealternationsarequiteregularandpredictable,butthey arenevermarkedin theorthography, whichmeansthatRussianorthographyrep-resentsa level that is somewhatmoreabstractthana surfacephonemiclevel. As weshallseein Section3.1,thestandardorthographyfor Belarusiandoesorthographicallyrepresentthesevowel reductions,and is thereforesomewhat moreshallow than theorthographyof standardRussian. (Belarusianis like Russianin termsof coverage,though,in thatit too fails to markstressin theorthography.)
Beforeweproceedfurther, weneedto definea little morepreciselywhatwemeanwhenwe speakof anorthographicobjectrepresentinga linguistic object.Let usstartwith what I take to be a fairly uncontroversial(partial) representationof the genitivesingularform goroda âof a cityâ, namelythe Attribute-ValueMatrix (AVM) in (1.2).(On theuseof AVMâ s in phonologicalrepresentationssee,inter alia, (Bird andKlein,1994;MastroianniandCarpenter, 1994;Bird, 1995).)
(1.2) mnnnnnnnnoPHON p�qgV r+s%r%t"uGvSYNSEM
mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����
�)�������)���������
Firstof all, a few commentson(1.2).Theprimarystressonthefirst syllableis indicat-edherewith thestandarddiacritic â V â, ratherthanby anexplicit hierarchicalprosodicstructurewithin theAVM; this is purelya notationalconvenience.For similar reasonsof notationalconvenience,thephonologicalrepresentationis given,in thisexample,asa list of segments,with no indicationof higherlevel prosodicstructure,suchassylla-blesor feet. (Indeed,wearetakingsomeamountof liberty by evenallowing segmentsinto our ontology, giventhegrowing bodyof phonologicalwork thatviews segmentsasepiphenomenaof temporallyoverlappingcollectionsof features.We returnto thispoint lateron.) Also worthy of noteis thefact that thesegmentalrepresentationpre-sentedis what traditionallywould betermeda relatively âdeepâ representation,sinceit abstractsaway from variouslow-level phonologicalprocesses,suchas the vowelreductionswe havediscussed;this is intentional,sinceI shallarguethatit is this deepphonologicallevel thatis representedby theorthographyof Russian.Finally, therep-resentationin (1.2) fails to indicatethatgoroda is morphlogicallycomplex, arguablyconsistingof a stemgorod- andan inflectionalaffix -a. Perhapssurprisingly, I willhave relatively little to sayaboutmorphologyin this book,thoughI will returnbrieflyin Section3.4 to therelationbetweenorthographyandmorphologicalstructure.
Wheredoesorthographyfit into (1.2)? An obvious first cut at a representation
8 CHAPTER1. READING DEVICES
would besimply to assumeanotherattributeâORTHâ with anassociatedlist of ortho-graphicelements.
(1.3) mnnnnnnnnnnoPHON p�qgV r+s%r%t"uGvORTH p$d4egf�e"^�_gvSYNSEM
mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����
�)�������)�����������
But this representationis inadequatefor several reasons.First of all, while it repre-sentsthefact that d4egf�e"^�_�` gorodab is theorthographicrepresentationof goroda, itfails to indicatethe obvious fact that the individual lettersof the orthographicrepre-sentationeachcorrespondto a particularlinguistic unit, in this casea segment: thusdh` g b clearly represents/g/, and e�` o b clearly represents/o/. Second,it fails torepresentthe kind of relationbetween(in this case)the phonologicalportionsof therepresentationandtheorthographicportion. It seemsreasonableto view this relationasoneof licensing, whereparticular(setsof) linguistic elementslicensethe occur-renceof (setsof) orthographicelements.Thus/g/ licensestheoccurrenceof d�` g b inthis example.Third, andfinally, by presentingthevalueof ORTH asanordered list,weareredundantlyspecifyinginformationthatis specifiedelsewherein theAVM: thephonologicalsegmentsareorderedwith respectto oneanother, andthelinearorderingof thelicensedorthographicelementsoughtto follow in somefashionfrom that.
Theseconsiderationsleadus to propose,instead,the representationin (1.4). Werepresentlicensingusingnumericalcoindexation, wherethe index of the licenserismarkedwith an asterisk.Thevaluefor ORTH is itself an unorderedlist of objects:we indicatethis usingthestandardcurly-bracenotationfor sets.
(1.4) mnnnnnnnnnnoPHON p�q���%V r%���%s%���"r+���"t4�~�Yug�~�"vORTH �6d �(� e �+� f �Y� e ��� ^ ��� _ �%�SYNSEM
mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����
�)�������)�����������
As we have seen,we have assumeda relatively abstractphonologicalrepresenta-tion in theRussianexamplethatwe have beendiscussing.In generalwe will assumethat theorthographyof a languagerepresentsa particularlinguistic level of represen-tation. For phonologicalinformationthat is orthographicallyencodedwe canspeak
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 9
of this level asbeingrelatively âdeepâ comparedto a âsurfacephonemicârepresen-tation; or relatively shallow. We will term the linguistic level representedby the or-thographyof a languagetheOrthographicallyRelevantLevel â ORL.6 Notethatweare not claiming thatevery symbolin thespellingof a word necessarilyhasa (non-orthographic)linguistic counterpartat the ORL: so aswe shall arguein Section3.2,many aspectsof the spellingsof words in Englisharearbitraryandsimply mustbelistedaspartof thewordâs spelling. Nonethelesseven in anorthographyasirregularasthatof Englishthereareregularcorrespondencesbetweenlinguistic elementsandtheirorthographicexpression:theORL is simply thatlinguistic level of representationat which thoseregular correspondencesaremostsuccinctlystated.Note that for ex-positoryreasonswewill typically presentastheORL just thatportionof thelinguisticrepresentationthat is relevant to the particularorthographicphenomenonunderdis-cussion.Thus,for mostpurelyphonographicscripts,informationassociatedwith theSYNSEMportionof theAVM is not typically relevant(thoughin somecasesit mightbe, asfor examplein Germanwherecapitalizationis sensitive to whetheror not theword is a noun). In suchcasestheSYNSEMinformationwould beomittedfrom therepresentation:it shouldbeunderstood,however, that the informationis still present,just not germaneto thediscussionat hand.
Returningto (1.4),wenotethatthereis still someredundancy thatcanberemoved.Russianorthographyis largely regular in thesensethata given(abstract)phonemeistypically only spelledin oneway. This in turn implies that we shouldnot needtoexplicitly list theorthographicelementsin theAVM; indeedin theexamplein (1.4)allof thelettersarecompletelypredictable,andcouldbederivedvia asetof rewrite rulesasfollows:
(1.5) g ďż˝ dďż˝` g bo ďż˝ eďż˝` o br ďż˝ fďż˝` r bd ďż˝ ^l` d ba ďż˝ _ďż˝` ab
Suchrulescanbe viewedasfilling positionsin the orthographyportion of the AVMandhencelicensingthe materialin thosepositions. Of courseeven in fairly regularspellingsystemsâ andcertainlyin complex systemssuchasEnglishâ somelexicalspecificationof spellingis necessary. This canbehandledeitherby simply listing theirregular spelling,or elseby a lexically specificspellingrule. Thus for the Englishwordknit, for instance,wemightassumealexical representationasin (1.6a),or elsearule asin (1.6b),in eithercasespecifyingthespellingof /n/ as ` kn b ; weassumethattheremaining/it/ is regularlyspelled:
6This level is roughlyequivalentto whatI have referredto asthemorphologically motivatedannotation(MMA) in previouswork on text-analysisfor TTS (Sproat,1997b;Sproat,1997a).
10 CHAPTER1. READING DEVICES
(1.6) (a) mnnnno PHON p�w �:� I ��vORTH �%�gw � �SYNSEM � CAT �4|(s��
SEM �gw�}~���� �����
(b) n ďż˝ ` kn b in knit
As we will discussfurther, we will follow Nunn(1998)in assumingthatrulesareusednot only in the initial graphemic licensingphasethatwe have beendiscussing,but alsoin asubsequentphaseof whatNunntermsautonomousspellingrules. Wewillexpandher notion of autonomousspellingrule to includewhat we will term surfaceorthographicconstraints; seeSection3.5.
1.2.2 Formal definitions
In this sectionwe expandthe formalismfurther, and introducesomeadditionalfor-mal notations,aswell assomeaxiomsthat control the mappingbetweenlinguisticinformationandorthography. We will alsointroducethecentralthesesof this study.
1.2.2.1 AVMâ sand Annotation Graphs
Let usreturnto theAVM representationfrom (1.4),repeatedhereas(1.7):
(1.7) mnnnnnnnnnnoPHON p�q� � V r%� � s%� � r+� � t4� � ug� � vORTH �6d � � e � � f � � e � � ^ � � _ � �SYNSEM
mnnnnnoCAT w�r+x�wGEN yluGz({CASE qG|(wNUM z6}~w*qSEM {�}����
� �������)�����������
In the Russianexample,orthographicelementsare licensedpurely by phonologicalelements.In partlylogographicwriting systemslike Chinese,weproposethatpartof acomplex glyphmaybelicensedbyaportionof theSYNSEMpartof therepresentation.Thusconsiderthecharacter��` INSECT+CHAN b chan âcicadaââ seeSection1.3fora detaileddiscussionof our conventionsfor glossingChinesecharactersâ wheretheINSECT component (lefthandportionof thecharacter)is theso-called âsemanticradicalâ andthe righthandcomponentÂĄ chan cuesthe pronunciation.For this casewe assumean AVM as in (1.8), wherethe INSECT portion is licensedby the SEMentry, andthephonologicalportionis licensedby thesyllable:
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 11
(1.8) mnnnnnnnnnoPHON
mno SYL
mo SEG ¢�£ ONS {¼¤§Œ £ RIME u"w*ŒŠ¨TONE ª �� � �� � �
SYNSEM � CAT w�r+x�wSEM {�}~{¼u4t4ug�~�+�
ORTH �� � � ¥ � �� ����������
An equivalentrepresentationthatwewill useâ andwhichwill moredirectly formthebasisfor ouraxiomsâ is theannotationgraph; see(Bird andLiberman,1999)andalso(Bird, 1995).The annotationgraphsin (1.9)and(1.10),areequivalent(omittingsomedetail)to theAVMâ s in (1.7)and(1.8),respectively.
(1.9)
SEM: city
PHON: g : d o : e r : f o : e d : ^ a : _(1.10)
SEM: cicada: TONE: 2SYL: M : ÂĄONS-RIME: ch an
Therepresentationsof theannotationgraphsin (1.9)and(1.10)areto beinterpret-edasfollows. First of all theannotationssuchasâSEMâ, âSYLâ, andso forth in thelefthandcolumnmarkarc-sequencesthatencodevaluesof thethus-namedattribute(s)in the correspondingAVM. Thusin (1.10), for instance,the SEM arc representsthevaluecicadafor theattributeSEM.Second,theverticalmarksindicateverticesof thegraphoutof whichthehorizontalarcsemanate.Theverticesareassumedto betempo-rally anchored,with verticesontheleft precedingverticesontheright. Thusthesourcevertex of theONSarclabeledch in (1.10)â z(r+xďż˝s%{ďż˝|gÂŤďż˝{¼¤� â precedesits destinationvertex ( t4|%z6���{¼¤� ); it alsoprecedesthedestinationvertex of theSEM arccicada: .We will denoteprecedencein thestandardfashionwith â  â so that uďż˝ÂÂŽďż˝ is readâ uprecedesďż˝ â; â ÂŻ â will beusedto meanâprecedesor is cotemporaneouswithâ; finallyâ ° â andâ Âą â will alsobeusedwith theobviousmeanings.
Setsof arcsthatarein a dominancerelationâ i.e. form a graph-basedhierarchyin the senseof Bird andLiberman(1999)â are(vertically) adjacentto each otherand are joined at at leastone vertex. On the other hand,setsof arcs that are notin a dominancerelation are separatedby a blank line. Thesedominancerelations
12 CHAPTER1. READING DEVICES
correspondto relationsof dominancein thecorrespondingAVM. So,in (1.10)theSYLandONS-RIMEarcsequencesarein adominancerelation:thiscorrespondsto thefactthatin theAVM in (1.8),theSYL attributehasanAVM containingtheonsetandrimeAVMâ s, and thusdominatesthe AVMâ s. (Similarly, SYL dominatesTONE, thoughTONEis not in adominancerelationwith ONS-RIME,apointnotwell representedinthegraph.)On theotherhand,SEM is not in a dominancerelationwith SYL. Ratherthe SEM andSYL arcsmerelytemporallyoverlap(seebelow). Finally, we indicatelicensingby placingthe licensed elementon thesamearcasits licenser. Thusâg: d âmeansthatthephoneme/g/ licensestheletter dďż˝` g b .
1.2.2.2 Definitions
We now statesomedefinitionsandaxiomsover the annotationgraphrepresentationthatwe have just developed.
First of all somedefinitions,startingwith two versionsof temporaloverlap:
Definition 1.1(Overlap) Arc ² overlapsarc ³ ( ² � ³ ) if either:
1. z6r+x�s%{�|g�²´0¯¾z(r+x�s%{�|g$³œ and t4|+z6��:²�0°¡z(r+x�s%{�|g�³œ , or
2. t"|%z6���²´&¹¾t4|%z6��$³œ and z(r+xďż˝s%{ďż˝|gÂŤ:²�0ÂÂľt"|%z6��$³�Definition 1.2(CompleteOverlap) Arc ² completelyoverlapsarc Âł( ² �š¸ Âł ) if: z(r+xďż˝s%{ďż˝|g�²´0¯¡z(r+xďż˝s%{ďż˝|gÂŤ$³� and t4|%z6���²´0¹št4|%z¼���³œNote thatwhile overlapis symmetric,completeoverlapis not. (Note thatwe usethesymbolâ
ďż˝â for overlap,ratherthanthemorenormal Âş : this lattersymbolis usedhere
for composition.)Following Bird andLibermanâsnotionof graph-basedhierarchy, we defineimme-
diatedominancebothin termsof thegraph,andin termsof thetypesof arcsinvolved.
Definition 1.3(Immediate Dominance) Arc ² immediately dominates arc Âł( ²b½Ÿ¿ž�ĂĂÂł ) if ² �š¸ Âł andthetypeof Âł is (a list elementof) a valueof anattributeinAVMâsof type ² .
ThusaSYL arcthatcompletelyoverlapsanONSarcwould immediatelydominatetheONSarcassumingin theassociatedAVM theSYL AVM hasanattribute(e.g. SEG),whosevalueis a list containingtheAVM for ONS;cf. (1.8). On theotherhandSEMwouldnot dominateONS.
We will also needa definition of path-precedenceon arcs,denotinga situationwheretwo arcsjoin at thesamevertex, suchthat the secondimmediatelyfollows onthesecondwithin thesamepaththroughthegraph.
Definition 1.4(Immediate Path-Precedence)Arc ² immediatelypath-precedesarcÂł ( ²Â&Ă8Âł ) if t"|%z6���²´ is identicalto z(r+xďż˝s+{ÂĽ|4�³œ .
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 13
1.2.2.3 Axioms
This sectionintroducesthe axiomsthat form the coreof the theory that we will bedefending.Beforewe do that, we will formalizea few ideassomewhat further. Wehavealreadyintroducedthenotionof OrthographicallyRelevantLevel (ORL),asbeingthe level of linguistic representationthat is encodedorthographicallyby a particularwriting system.We will denotethe outputof themappingfrom the ORL to spellingâ i.e., thespellingitself â as Ă . As wehavealreadysaid,we follow Nunn(1998) inassumingthat this mappingcanbe decomposedinto a setof graphicencodingrules,anda setof autonomousspellingrules;again,seeSection3.5. Eachof thesesetsofmappingrulesimplementsarelation(wewill bemorespecificonwhatkindof relationmomentarily),theformerof which weâll notateas ĂĂà �à ¸ žŠŸ¿à andthe latteras ĂĂĂ Ă Ă~Ă)Ă .Theentiremapping,which we will denoteas ĂÂťĂďż˝Ăďż˝ĂĂ.Ă is simply thecompositionofthesetwo relations: ĂÂťĂÂśĂďż˝ĂĂ1ĂĂĂĂĂà œà ¸ žŠŸ¿Ă�º0ĂĂĂ Ă Ă~ĂĂĂ .
Wewill usetheexpressionĂÍ�²´ to denotethe imageof linguisticelement² underĂÂťĂÂśĂďż˝ĂĂ1Ă .Theaxiomsmakeuseto two furtherconcepts.Thefirst is thenotionof catenation.
Informally, ² catenateswith Âł , denoted²�ĂÂĽÂł if ² is adjacentto Âł . Themostfamiliarnotion of catenationis the string-basednotion of concatenationin formal languagetheory(Harrison,1978;HopcroftandUllman, 1979;Lewis andPapadimitriou,1981)where ²ĂĂ�³ constructsastringby concatenating² with Âł , in thatorder. In Chapter2,wewill generalizethisnotionto planar(two-dimensional)catenation.In thediscussionin this section,for simplicityâs sake, we will assumewhatwe shall later term left-to-right catenation, denoted
Ă Ă : ² Ă Ă8Âł simplydenotesastring ²�³ , where² immediatelyprecedesÂł .
The secondconceptis the idea that the spell-outof a linguistic sequenceunderĂÂťĂÂśĂďż˝ĂĂ1Ă maybe lexically specified,asalreadyintroducedabove. We illustratethispoint immediatelyafterthestatementof Axiom 1.1:
Axiom 1.1 If ² à ³ thenif Ă´:²�³� is nototherwisedefined,ĂÍ�²œ³œ�ĂÂľĂÍ�²´Ă:ĂĂÂŤ$³� . (If² immediatelypath-precedesÂł , thenthe image of ²œ³ under ĂÂťĂÂśĂďż˝ĂĂ1Ă is simplythecatenationof ĂÍ�²´ with ĂĂÂŤ$³� .)Thusin English,thespelloutof thephonemesequence/bo/ would, accordingto Ax-iom 1.1, be ĂĂÂŤ:�� Ă Ă¡ĂĂÂŤďż˝r� , or ` bob (assumingthe default ways of spelling thosephonemes).On the other hand,lexical specificationmay override this: /ks/ is fre-quentlyspelled x b , preemptingspelloutas ĂĂÂŤ:�§ Ă Ă½ĂĂÂŤ:z+ÂŹ .
The secondaxiom describesthe mechanismof inheritanceof graphicalspelloutfor acomplex linguisticconstructionthatimmediatelydominatesother(possiblycom-plex) linguistic constructions:
Axiom 1.2 If ²b½Ÿ¿ž-ĂĂÂł ( Âł possiblya sequence)thenif Ă´:²� is nototherwisedefined,ĂÍ�²´�Ă¡Ă´�³œ . (If ² immediatelydominatesÂł , thentheimage of ² under ĂÂťĂďż˝Ăďż˝ĂĂ.Ă issimplytheimageof Âł under ĂÂťĂďż˝Ăďż˝Ăďż˝Ă1Ă .)
Thus,for instance,thespelloutof the syllabledominating/kĂŚt/would consistof thespelloutof the onsetdominating/k/ andthe spelloutof the rime dominating/ĂŚt/. In
14 CHAPTER1. READING DEVICES
turn,theformerconsiststheof spelloutof /k/, andthelatterthespelloutof thesequence/ĂŚt/.
Finally, we introduceAxiom 1.3 which definesthe spelloutof two overlappingelements.Thefunctionalityof thisaxiomwill beillustratedwith datafrom ChineseinSection4.2:
Axiom 1.3 If ² ďż˝ Âł , then Ă´:² ďż˝ ³œĂĂĂĂďż˝Ă"Âł . (If ² overlaps Âł , then the image of ²togetherwith Âł under Ă Ăďż˝Ăďż˝ĂĂ.Ă is simplytheimageof ² , catenatedwith theimageofÂł .)
An importantpoint to noteabouttheseaxiomsis that they do not precluderegu-lar (i.e. non-lexically-specified)context-dependentspellout.For instance,thedefaultspellingof /k/ before ` i b , ` eb or ` y b in Englishis as ` k b , whereasin othercon-texts it is ` c b . Axiom 1.1 merelyrequiresthatwhatever spellsout /k/ catenatewithwhateverspellsout thevowel.
1.2.3 Central claimsof the theory
Wenow cometo thecoreproposalsthatI wishto defendin theremainderof thiswork:Ă Regularity: ThemappingĂÂťĂÂśĂďż˝ĂĂ1Ă is a regular relation.Ă Consistency: The ORL for a given writing system(as usedfor a particularlanguage)representsa consistentlevelof linguistic representation.
We describetheseclaimsin thenext two sections.Here,andelsewherein this work,I will capitalizethe termsâRegularâ, âRegularityâ, âConsistentâandâConsistencyâwhenthey areusedin thesetechnicalsenses,andotherwiselowercasethem.
1.2.3.1 Regularity
The first of the coreproposalsstatesthat ĂÂťĂÂśĂďż˝ĂĂ1Ă is a regular relationor, equiva-lently, that ĂÂťĂÂśĂďż˝ĂĂ1Ă canbe implementedasa finite-statetransducer(FST); readersnot familiarwith FSTâsmaywish to consultAppendix1.A, thougha shortsynopsisisgivenimmediatelybelow.
Our route to the claim of Regularity comesaboutin two ways. First of all, wehaveassumedthatthemappingbetweenlinguistic representationandorthographycanbe handledby context-sensitive rewrite rules, an assumptionthat is held by othersincluding (Venezky, 1970)and(Nunn,1998),andit is onewhich naturally fits wellwith the standardnotion of âspelling ruleâ. Now, as hasbeenshown in (Johnson,1972;KaplanandKay, 1994), as long ascertain constraintson non-applicationtotheir own outputareobserved,suchrulesareformally equivalentto regularrelations,andcanthereforebeimplementedusingFSTâs. Indeed,practicalcompilershavebeenbuilt that compile from rewrite rule representationsinto transducers(KarttunenandBeesley, 1992;KaplanandKay, 1994;Karttunen,1995;Mohri andSproat,1996).
An instanceof an FST â oneimplementingthe simplesetof rulesin (1.5) â isshown in Figure1.2.
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 15
0
g:Go:Or:Rd:Da:A
Figure1.2: A simpleFSTimplementingtherewrite rulesin (1.5). In thisexamplethemachinehasa singlestate(0), which is bothaninitial anda final state.Thelabelson theindividual arcsconsistof aninput label (to theleft of thecolon)andanoutputlabel (to theright). Here,capitalRomanlettersareusedto representtheequivalentCyrillic letters.
16 CHAPTER1. READING DEVICES
Second,Regularity follows from theaxiomsintroducedin Section1.2.2.3.To seethis,considerthateachof theaxiomsstatesthatin ĂÍ�²œ³œ , composedof ĂÍ�²´ andĂĂÂŤ$³œ ,ĂÍ�²´ is catenatedwith ĂĂÂŤ$³œ . Thedefinitionof regularrelations(seeAppendix1.A.2)statesfirst of all that a mappingbetweena pair of symbolsis a regular relation,andfurthermorethat theconcatenationof two regular relationsis itself a regular relation.It is thereforeeasyto seethatonecanprovideaconstructiveproofwherebyRegularityfollows from the statedaxioms. In onesense,the axiomsprovide a ratherrestrictivenotion of Regularity. Considera writing systemin which a linguistic object ²�³�Ăďż˝Ăis spelledout as ĂÍ�²´~ĂĂÂŤďż˝Ă�~ĂĂÂŤ:Ă"ÂŹ~ĂĂÂŤ$³œ : for example,the writing systemmight have the(bizarre)propertythatthesecondphonemeis alwaysspelledoutattheendof theword.This would certainlybe a violation of the axiomsinsofar asthe spelledstring is notformedby concatenatingeither Ă´:²� or Ă´:Ă� with Ă´�³œ . However thisexamplecanbehandledby a regularrelationthat in effect mapsa symbolâ here Âł â to nothing( Ă )on theoutputside,âremembersâthatit hasseenÂł , andthenspellsit outas ĂĂÂŤ$³œ at theendof the string. But suchâmemoryâ comesat somecost in finite-statemachinery,sincesucha machinemustrepresentinterveningmaterialmultiple times: in additionto mapping Ă to Ă´:Ă� and Ă to ĂĂÂŤ:ĂYÂŹ , the machinemustalsorememberwhich secondphoneme( Âł ) it hadseen,andtheonly way to do this is to haveseparatepathsthroughtheremainingportionsof the transducer, onepathfor eachphonemethatmight havebeendeleted. Memory in finite-statedevicescanonly be encodedin states:if onewishesto deleteÂł with a view to inserting Ă´�³œ lateron, thenonemusthave thearcthatdeletesÂł endin astatez%Ă distinctfrom thestatez(Ă thatterminatesanarcthat,forinstance,deletesĂ (insertedlateron as Ă´�à "ÂŹ ). z%Ă and z(Ă would in turn be thesourcestatesfor arcsthatmap Ă to ĂĂÂŤďż˝Ă� and Ă to Ă´�ĂYÂŹ , andwouldeachhavetheirown privatecopiesof thesearcs. Writing systemsgenerallydo not seemto requirethis kind ofmemory. At first one might think suchcasesarecommon. Consider, for instance,thespellingof English/eI/ as ` aCeb , whereâCâ is aconsonant(make) or sequenceofconsonants(taste). If ` eb is somehow partof thespellingof /eI/, thenthiswouldseemto bea violationof theaxioms.However, it seemsperfectlyreasonableto assumethat/eI/ is in factspelledby ` ab , andthat ` eb is merelyintroducedby rule to âsupportâthespellingof /eI/ as ` ab in certainenvironments;see(Cummings,1988).
An importantfeatureof regularrelationsis thatthey areclosedundercomposition.Supposewehavetwo regularrelationsĂĄ Ă and ĂĄ Ă , andsupposethatthedomainof ĂĄ Ăis (thesetof strings)â , its rangeďż˝ , andsupposefurtherthatthedomainof ĂĄ Ă is ďż˝ andits rangeis ĂŁ . Thenthe compositionof thesetwo relations,denotedĂĄ Ă Âş8ĂĄ Ă is alsoa regularrelationwhosedomainis â andrangeis ĂŁ . (Thenotionof compositionhereis exactly thesamenotionasthatof functioncompositionin algebra.)This propertyof closureundercompositionhasanimportantimplication. Sincesinglerewrite rulescanberepresentedcomputationallyasFSTâs,onecanalsorepresentanorderedseriesof suchrewrite rulesasa singleFST, by merelycomposingtogethertheFSTâs for theindividual rules.
A secondimportantpropertyof regularrelationsandFSTâs is thatthey areinvert-ible. That is, by switchingthe input andoutputlabels,oneswitchesthe domainandrangeof arelation.In thecaseathand,if onehasatransducerĂ thatmapsfrom ORLto Ă , thentheinverseof Ă , denotedĂüä Ă will mapfrom Ă to theORL. This is clearly
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 17
a usefulpropertysinceit meansthata modelof spellingcanalsoserve (inverted)asa modelof readingâ in the limited senseof decodinga linguistic structurefrom awritten text.
In additionto usingregularrelationsandFSTâs to implementthemappingbetweentheORLandĂ , onecanalsoimplementconstraintsusingregularlanguages, andfinite-stateautomata(FSAâs). Finite-stateconstraint-basedsystemshave beenusedwidelyotherareasof linguistic description,suchasphonology(Bird andEllison, 1994),andsyntax(Voutilainen,1994;Mohri, 1994). In writing systems,surfacespellingcon-straintscanbe modeledin this fashion.For instance,if a certainwritten symbol ĂŚ isdisallowed in word-final positiononemight write a constraintsuchasthe following(whereâ#â denotesa wordboundary):
(1.11) çhÌ�èSeeSection3.5 for somediscussionof real examplesof surfaceorthographiccon-straints.
As we have alreadydiscussed,we follow Nunn(1998)in our assumptionthat therelation Ă canbedecomposedinto a compositionof thesetof graphicencodingrulesĂà œà ¸ ž-Ÿ¿à andthe setof autonomousspellingrules ĂĂĂ Ă Ă~ĂĂĂ . At this point we canbemorespecificin our claim: Ýà �à ¸ žŠŸ¿à and ĂĂĂ Ă Ă~Ă)Ă both implementregular relationsand Ă is the compositionof thosetwo regular relations: Ăà œà ¸ ž-Ÿ¿Ă�º½ĂĂĂ Ă Ă~ĂĂĂ . Surfaceorthographicconstraintsareclearly a componentof ĂĂĂ Ă Ăďż˝ĂĂĂ : onecan factor ĂĂĂ Ă Ăďż˝ĂĂĂinto two components,onethat implementsa mapping ĂĂĂ Ă Ă~ĂĂĂ ĂĂŞĂŠ Ă , andthe otherthatimplementsasetof constraintsĂ Ă Ă Ăďż˝ĂĂà ¸ ž-ĂYĂŤ~ĂŹ$Ă . Ă Ă Ă Ă~Ă)Ă itself is thenjust thecompositionof thesetwo, or moreformally:
(1.12) ĂĂĂ Ă Ă~Ă)Ăďż˝ĂĂŽĂĂĂ Ă Ă~ĂĂĂ Ă0ĂŠ Ă ÂşaĂŻ4t�:ĂĂĂ Ă Ă~ĂĂà ¸ ž-ĂYĂŤ~ĂŹ$Ă ÂŹHere, ĂŻ"t is anoperationthatconvertsanFSA into anequivalentFST, wheretheinputandoutputlabelson eacharcareidentical(KaplanandKay, 1994,page341).
Finally, we have beenimplicitly assumingin this discussiona modelof regularrelationsthat containsa standardstring-basedâleft-to-rightâ concatenationoperator.As we have alreadynoted,we will needto extendthenotion of catenationto handlevariousformsof two-dimensionalcombination.We will discussthis in Chapter2.
1.2.3.2 Consistency
In Section1.2.1we introducedthenotionof theOrthographicallyRelevantLevel, andwesuggestedthatdependinguponthewriting system,theORL couldrepresentarela-tively deepor relatively shallow orthographiclevel. Thethesisof Consistency simplystatesthat this level is consistentacrossthe entire vocabulary of the language. Asshouldbeclear, andaswe will discussfurtherbelow, this notionpresumesa classicalderivationalmodelof phonology.
Considera sequenceof phonologicalrules ĂĄ Ă ĂĄ ĂĂĂ°ÂĽĂ°6Ă° ĂĄ Ă , which appliesin thederivation of every word of somelanguage:we will define Ăą to be the input levelto thesequenceof rules.For sucha system,thereare wĂòĂĂł consistentlevelsof repre-sentation,namely Ăą itself, and Ăą composedwith ĂĄ1Ă Ă°6Ă°ÂĽĂ° ĂĄ?Ă´ , }½þÜw . TheConsistency
18 CHAPTER1. READING DEVICES
hypothesisrequirestheORL to bepickedfrom oneof theseconsistentlevels } . A vio-lationof Consistency would bea systemwhereoneportionof thevocabulary (e.g.allnouns,or all wordshaving a particularphonologicalstructure)picksa level } , andtheremainderof thevocabularypicksa level á , }êøĂšá .
The model describedin the last paragraphcould be expandedto supportmoreintricatenotionsof consistency. For instance,in a Lexical Phonology-basedtheory(Mohanan,1986),insteadof sequencesof rules,wemight think in termsof sequencesof strata.TheORL couldthenbepickedto beeithertheinput level, or elsetheoutputof oneof thestrata.Thiswouldof coursebeamoreconstrainedtheoryof consistency,andprobablyonethatshouldbe favoredover the loosermodelpreviously described.We will not,however, attemptto choosebetweenthesevariantmodelshere.Notethata similar questionwasraisedby Klima (1972),who asked(page67) âwhich levelsoflinguistic structure. . .arethenmostreadily accessibleto the processof readingandwriting?â (italics original).
An additionalissueis cyclicity. If amorphologicallycomplex word is constructedin a cyclic fashion,might it be the casethatorthographicfeaturesof the morphemesarealsoaddedcyclically? In whatsensethencouldwespeakof orthographymappingto a singlelevel?SeeSection3.4 for furtherdiscussion.
Consistency will beexemplifiedin Chapter3 with acomparisonof RussianandBe-larusianorthographies,aswell asadiscussionof (American)Englishorthography. Wewill alsoexamineanapparentcounterexampleto Consistency from Serbo-Croatian:aswe shall see,Consistency forcesa reanalysisof theSerbo-Croatiandata,which leadsin turnto amoreinsightfuldescriptionof thephenomenonthanthetraditionaldescrip-tion. Evenin quiteregularsystemssuchasRussian,onedoesin factfind caseswheretheorthographywould appearto mapto a deeperor shallower level of representationthanwould beexpectedon thebasisof thepositedORL for theremainderof thevo-cabulary. We shall seesuchexamplesin thediscussionof RussianandBelarusianinChapter3. As longastheexceptionsconstituteasmallminority â asis thecasein theRussianandBelarusianexamplesthatwe shalldiscussâ they canalwaysbehandledby meansof lexical marking, thoughnaturallythis device comesat somecost. Theexamplesin Chapter3 will thusbeseengenerallyto supportConsistency, but we willnecessarilyleave it asa topic for futureresearchto determinewhetherConsistency issupportedmorebroadlyacrosstheworldâswriting systems.
Theassumptionthatorthographymayrepresenta particularlevel â deepor shal-low â of alanguageis implicit in many discussionsof readingin thepsycholinguisticsliterature;it is arguablyimplicit in Venezkyâs (1970)classicanalysisof English or-thography;andit is a claimalsomadein TheSoundPatternof English(Chomsky andHalle, 1968),whereEnglishorthographyis describedasa ânearperfectârepresenta-tion of anunderlyingphonologicalrepresentation.
As wehavealreadynoted,wetakeasthebasisfor Consistency atraditionalderiva-tional modelof phonology. This is surelya controversialmove: naturally it wouldseemdesirablein light of modernnon-derivationaltheoriesof phonologyto castouranalysisin termsof a non-derivationalparadigm. For example,it would be naturalto seekan accountof the phenomenathat we will discussin termsof a monostrataltheorysuchasthoseof (Bird andEllison, 1994;Bird andKlein, 1994;Bird, 1995).
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 19
Similarly, onemight desirean accountin termsof Optimality Theory(PrinceandSmolensky, 1993),wheretheonly linguistic level in effect is theoutputlevel atwhichthe rank-orderedconstraintsareevaluated(at leaston oneversionof the theory). Itis not at presentclear to me how to do this: while I do not doubt,for example,thatananalysisof theRussianandBelarusianfactsthat I will discussin Chapter3 couldbe recastin termsof sucha framework, they seemto be describablemostnaturallywithin a modelwhereinonecanspeakof differentlevels of representation.Oncea-gain,I leave it asa topic for futureresearchto work out analyseswithin morecurrentphonologicalframeworks.
1.2.4 Further issues
In this sectionwe discusstwo issuesthat the currentwork andthe modelpresentedhereinraise. Thefirst issue(Section1.2.4.1)concernsthe following question:giventhatwriting, unlikenaturallanguage,is anartefact,onethatâ againunlikenaturallan-guageâ mustbeexplicitly taught,why shouldonebelievethataconstrainedmodelofthekind typically appliedto language,wouldapplyto writing? Thesecondissue(Sec-tion 1.2.4.2)relatesto our adoptionof a phonologicalmodelthat includessegments:giventhatmany phonographicwriting systemsareessentiallyâsegmentalâ(thebasicsymbolsrepresentingsegment-sizedunits),this is certainlya convenientchoice,yet itseemsto fly in thefaceof morerecentmodelsof phonologythateschew segments.
1.2.4.1 Why a constrainedtheory of writing systems?
It mayhaveoccurredto thereaderto wonderwhy aconstrainedtheoryof writing sys-temsshouldhave any chanceof beingcorrect. To be sure,suchmodelshave beenappliedin linguisticswith greatsuccess.But writing is crucially differentfrom otheraspectsof linguisticknowledge.Languageoccursnaturallyin all humancommunities:writing, in contrast,is a technologicaldevelopmentthatwasapparentlyonly indepen-dently inventedfour timesin history (in Egypt,Sumer, ChinaandCentralAmerica)andhasonly beenusedby a minority of languagesandpeoplethroughoutmost ofhistory. With few exceptionsall humanslearnto speak(or sign)at leastonelanguagewithoutany specialinstruction;in contrast,readingandwriting mustbetaughtexplic-itly andin many casestakesyearsof specialinstructionto master. Writing is thereforenot ânaturalâ in thesamesenseaslanguage.
Onemight evengo further thanthis: writing systemsaredevelopedfor particularlanguages,with moreor lesscarebeingtakento ensurethat they reflectthelinguisticpropertiesof the languagein question. Furthermore,at leasta few writing systemshave undergonereformsover theyears,in orderto attemptto bring the systemmorein line with thelanguage(seeSection6.2). Orthography, then,canbethoughtof asakind of practicallinguistic theory.
This latterview hasbeenexpressedperhapsbestby Aronoff (1985)in a paperde-scribingthepunctuationsystemof MasoreticHebrew. Masoreticannotationsevolvedasa way of markingvariousinformationaboutBiblical Hebrew text, in particularin-formation abouthow to pronounce,andaccentor intone the text. The systemwas
20 CHAPTER1. READING DEVICES
basedon diacritics,with annotationsbeingaddedto, but not alteringthecoreconso-nantaltext, which wasconsideredsacred.The systemfor markingvowels survivesasthe (optional)vowel pointsof ModernHebrew. The notationfor accent,which isthe topic of Aronoff âs discussion,is only usedin the Bible. Aronoff arguesthat theaccentualmarkingsystemin factmarksâa completeunlabeledbinaryphrase-structureanalysisof every verseâof theBible (page28). It thusrepresentstheend-productofconsciouslinguisticanalysis,andthusin effectencodesa linguistic theoryof whatthephrasestructureof Hebrew shouldlook like. Furthermore,like any linguistic theory,theMasoreticannotationsystemcanbeincorrectin thestructuresit presumesfor par-ticularconstructions:indeedAronoff arguesthattheanalysisimplicit in theannotationis in somecasesincorrect.
As Aronoff notes,theMasoreticsystemis quiteunusualin the richnessof linguis-tic structurethat is marked: certainlyno orthographicsystemthat is in wide usehasconventionsfor markingconstituentstructure.(Onemight think of normalpunctua-tion symbolsasmarkingsomelevel of syntacticor phonologicalphrasing,but Nun-berg (1995)effectively arguesagainstthis.) And of coursethe Masoreticsystemisatypicalin anotherrespect:it wasnot anorthographicsystemusedby nativespeakersof a languagefor everydaycommunication,but rathera systemdesignedspecificallyto give preciseguidancein the pronunciationof sacredtexts to non-native speakers(sinceHebrew was,during the relevant period,nobodyâs mothertongue). From thatpointof view, thesystemhasmorein commonwith systemsof annotationfor markingscansionin poetrythanit doeswith theorthographicsystemof, say, ModernEnglish.Nonetheless,to the extent that consciouseffort goesinto the designof moretypicalorthographies,Aronoff âspointsremainvalid. Theseconsiderationswould thusappearto argueagainstapplyingthesamekindsof methodsin thestudyof writing asin thestudyof languagemoregenerally. Therearehoweverat leastacoupleof basicreasonswhy suchpessimismis ill-founded.
Firstly, while writing surelymustbe learned,andwhile writing systemsareoftenconsciouslydesigned,they mustalsobe used,which meansthat to be practicaltheymustbearsomesensiblerelationto thelanguagesthat they represent.Presumablybyâsensibleâwe imply non-arbitrary, andby ânon-arbitraryâwe meanthat it shouldbepossibleto stateformal constraints.WhetherConsistency andRegularity, introducedin Section1.2.3arereasonableconstraintsis an empiricalquestion. What is not indoubt,in my view, is thatsomesuchconstraintsmustexist.
Secondly, while orthographicsystemscertainlydependuponthelinguistic knowl-edgeof their creators,influencesin the other directionarealso found. First of all,asWells (1982)notes,theorthographicrepresentationof wordsis often thebasisforspeakersâ consciousbeliefs about their pronunciation:naive English speakers maybelieve thattow andtoearepronounceddifferentlybecausethey arespelleddifferent-ly. Secondly, thereareso-calledspellingpronunciations,suchas/v V Ikt Ăš ui lz/ (ratherthan/v V It i lz/) for victuals, wherethephonologicalrepresentationof individual wordshasbeenmodifiedover time on the basisof spelling. Thirdly, onealsofinds moresystematiceffectson the phonologyon the basisof written form. Thus, accordingto Serianni(1989),NorthernItalian dialectshistorically lack geminationâ termedraddoppiamentoin theItalian literatureâ bothwithin wordsandacrosswordsâ the
1.2. THE TASK OF PRONOUNCINGALOUD: A MODEL 21
so-calledraddoppiamentosintattico. Cross-wordgeminationis notwrittenin standardItalianorthography, andNortherndialectscontinueto lack raddoppiamentosintattico.However, word internalraddoppiamentoasin the second/m/ of mammaâmamaâ, isconsistentlyspelled. As a result, northerndialects,which historically lacked word-internalraddoppiamentonow possessit. Linguistic knowledgeis oftenassumedto bein somesenseprimary, or at leastmorebasicthanorthographicknowledge.Spellingpronunciationsandexampleslike thosein Italian show that in somecasesparticularbitsof linguisticknowledgecanbestbeexplainedon thebasisof orthography. This inturn suggeststhe needto understandthe relationbetweenorthographyandlinguisticstructure,andtheformal constraintson thatrelation.
1.2.4.2 Orthography and the âsegmentalâ assumption
In thediscussionabove,we assumedthatthegraphemesin a segmentalphonographicsystemlike Russianare licensedby phonologicalsegmentsin the traditional sense.In makingthis assumption,we mayseemto be taking two stepsbackwards.Severalstrandsof work in phonologyover the pastdecadeanda half, includingFeatureGe-ometry (Clements,1985;Sagey, 1986),DeclarativePhonology(Coleman,1998), andArticulatory Phonology(BrowmanandGoldstein,1989),haveconverged on thecon-clusionthat segmentsareepiphenomena,the resultof overlappinggestures.Indeed,thereseemsto beawidely accepteddogmathattheverynotionof segmentin Westernphonologicaltraditionderivesfrom segmentalalphabeticwriting.
We shouldnoteat theoutsetthatin onesensethis issueis orthogonalto themodelbeingdevelopedhere. That is, I have chosento representthe licenserof Russiand` g b asthe segment/g/, but I could just aseasilynot have. If we have insteada setof overlappinggesturesâ e.g. VELAR, ò VOICE, Ăş CONTINUANT, Ăş NASAL âeachonits own arcin anannotationgraphrepresentation,thenwecanassumethatthiscollectionof featurestogetherlicensesd.` g b . Oneimplementationof this ideawouldbeto assumethat the timing slot or syllablepositionthat is linkedto theoverlappingsetof featuresis thelicenserof d?` g b , andcanonly licensethisgraphemeby virtueofthecollectionof featuresthatit is associatedwith. Thosewhoarebotheredby my useof segmentalphonologicalrepresentationsareinvited to think of themasa shorthandfor themorearticulatedview I have just sketched.
On theotherhand,theview that thenotionâsegmentâ in phonologyderivesfromsegmentalwriting is overly facile,andshouldnot be uncritically accepted,I believe.Perhapsthe bestarticulatedpresentationof this conceptis a paperby Faber(1992),whereshesetsherselfthe task of explaining the following paradox: The notion ofsegmentis unnatural,andderivesin part from alphabeticwriting: âinvestigationsoflanguageusesuggestthatmany speakersdo not divide wordsinto phonologicalseg-mentsunlessthey have receivedexplicit instructionin suchsegmentationcomparableto that involved in teachingan alphabeticwriting systemâ(page111). On the otherhand,alphabeticwriting systemsdoexist. How couldthey havecomeaboutin thefirstplaceif theprinciplesuponwhich they arebasedaresounnatural?
Faberâs answermakesuseof thestandardview thatwhentheGreeksadoptedthePhoenicianscript, they misinterpretedsomeof theconsonantalsymbolsasrepresent-
22 CHAPTER1. READING DEVICES
ing vowels. Thustheuseof alpha to represent/a/ wasa misinterpretationof Phoeni-cian/ Ăť alpa/,representing/ Ăť /. Thismuchis widely accepted,andit thereforeis possiblethat the Greekinventorsof the segmentalalphabetdid not have an a priori notion ofsegment:on thecontrary, they thoughtthey wereborrowing a systemof writing thatrepresentedbothvowelsandconsonants.
A reasonablequestionat this point is why Faberis focussingon theGreekalpha-bet (and its derivatives): after all, thereare many apparentlysegmentalsystemsinthe world, including numerousSouth Asian scriptssuchasDevanagari(seeSec-tion 2.3.2),KoreanHankul(Section2.3.1),andEthiopic(Haile,1996).Someof these,suchastheSouthAsianscriptsmayhave hada Semiticorigin (Salomon,1996),likeGreekâ thoughsurely independentlyof Greek. For others,like Hankul, which isa totally endemicKoreaninvention,the external inspiration(if any) for designingasegmentalsystemis unclear(King, 1996). Indeed,evenunvocalizedSemiticwritingsystemscouldbeconsideredsegmental,thoughthey traditionallyomit marksfor vow-els: aswith thenon-representationof lexical stressin Russian(Section1.2.1)we cansaythatthecoverageof traditionalSemiticscriptsis incomplete.
Faberconcentrateson Greekbecauseshetakesa rathernarrow view of the no-tion of âalphabeticwritingâ, andit is only âalphabeticwritingâ, accordingto Faber,that engendersthe paradoxicalsituationintroducedabove. For her an alphabetis aâsegmentallylinear scriptâ that representsâvowels andconsonantsboth asseparateandequalâ. The latter requirement,of course,eliminatestraditionalSemiticscriptsfrom consideration:they do not representvowels. âSegmentallylinearâ scriptsarescriptswheretheelementsarearrangedin a moreor lesslinear fashion,without anysignificantuseof two-dimensionallayout: only in suchscriptsareall elementson apar with eachother. ThusSouthAsian scriptsandHankul scriptsareeliminated,s-ince in bothcasesthe consonantandvowel symbolsarelaid out in two-dimensional(syllable-sized)chunks(Sections2.3.2and 2.3.1);furthermorein many SouthAsianscripts(thoughnot soclearlyin thecaseof Hankul),thevowelsarefrequentlydiacrit-ic symbolswritten aroundthe consonantalcore,andthusarenot on a par with eachother. If onenarrowsthefield in this fashion,then,it would seemthatsegmentalwrit-ing wasreally only inventedonce,by accident,andwe do not needto attribute anyânaturalnessâto thenotionof segment.
Still, onemightwonderaboutthejustificationfor thelimitationsthatFaberimpos-es.Why is Devanagariany lesssegmentalthanGreek, justbecauseit happensto rep-resent/e/ asa stroke above thetemporallypreceding/k/, whereasGreekarrangesthesymbolsby left-to-right concatenation?Faberâs point, not surprisingly, is thatscriptslikeDevanagari(or Ethiopic,or Hankul)arrangetheirsegmentalelementsin syllable-sizedchunks(in Chapter2 we will saythat in suchscriptsthe SmallLinguisticUnitis thesyllable),which arethemselveslinearly arranged(âsegmentallycoded,syllabi-cally linearâ). In otherwords,the syllablehasa specialstatusin suchscriptsthat isseeminglylackingin Greek-derived(âsegmentallycoded,segmentallylinearâ) scripts.
Now, onecannotdeny theimportanceof thesyllableasanorganizingprincipleinorthographies:we will seeseveral instancesof this in Chapter2, andsyllablesevenshow themselvesto be importantin âsegmentallylinearâ scripts;seeSection3.5,and(Nunn,1998).But Ethiopic,SouthAsianscripts,Hankulandotherscriptsalsoencode
1.3. TERMINOLOGY AND CONVENTIONS 23
segmentalinformation.This point, it seemsto me,is not nullified by thefact that thescriptsalsoencodesyllabic information. Segmentalsystemshave evolved, or beendeveloped,in avarietyof differentcultures,speakingawidevarietyof languages,andundera variety of differentconditions. The notion âsegmentâ may be an unnaturalepiphenomenon,but if so,thenat leastit is onethatis fairly widespread.
1.3 Terminology and Conventions
Thissectionoutlinestheterminologyandconventionsthatwewill usethroughoutthisbook.
First of all, we will usethe termsâscriptâ, âorthographyâandâwriting systemâ,in their conventionalsensesasfollows. A âscriptâ is just a setof distinctmarkscon-ventionallyusedto representthe written form of oneor more languages:crucially,onecanspeakof a scriptwithout implying its usefor a givenlanguage.Thuswe willspeakof theâRomanscriptâ, or theâChinesescriptâ. Ontheotherhand,awriting sys-temis a scriptusedto representa particularlanguage.Thusâwriting systemâimpliesâwriting systemfor a given languageâ.7 We will usethe termsâorthographyâandâwriting systemâinterchangeably;8 in someof the literature,the term âorthographyâimplies âstandardizedorthographyâ,suchasthe standardsystemof spellingusedinAmericanEnglish,andthis implicitly excludessystemsof writing thathave not beenstandardized(aswasthecasein, say, ElizabethanEnglish).Thoughwe will primarilybe discussingstandardizedorthographiesin this work, we do not intendthe term tocarrywith it any implicationof standardization.
Thefollowing notationalconventionswill beobserved:Ă Angle bracketswill be usedto encloseorthographicrepresentationsin Romanscript. Note that this will only be the casewhenin the discussionat handthefocus is on the orthographicrepresentation.For examplein a discussionof alinguistic examplecontainingtheword frog, thatword will be italicizedaspernormallinguistic convention,if we aremerelyreferringto the linguistic object(word, morpheme,. . . ) frog. However if we arespecificallyinterestedin thestringof charactersâf â, ârâ, âoâ andâgâ, thenanglebracketswill beused: ` frog b .Ă Examplesin non-romanscriptswill generallybetransliterated,with thetranslit-erationgivenin angle-brackets.Phonemictranscriptionsandtranslationswill begivenwhererelevant. Inevitably somesinglecharactersof a non-Romanscriptwill needto betransliteratedwith a sequenceof charactersin Romanscript: insuchcases,thesequenceof characterswill beunderlinedin orderto indicatethatit is aunit. For example:(Cyrillic) Ăźďż˝` ja b .
7One could go further anddefinethe notion of writing systemat a more abstractlevel whereby, forexample,theBraille encodingof theRomanalphabet,asusedfor English,is aninstanceof thesamewritingsystemasis usedin printedEnglishâ thoughobviously thescript is quitedifferent. (Actually in ordertomake this connection,onewould have to glossover thefactthatbraille hasvariouslexical andstring-basedabbreviatory conventionsthathave no directcounterpartin standardprint.) Wewill not beconcernedwiththis level of abstractionhere.
8Thoughproperlyanorthographyis reallymerelyonetypeof writing system;see(Mountford,1996).
24 CHAPTER1. READING DEVICES
For scriptsthat run from right-to-left, I will indicatethis by markingthestringof graphemeswith thesymbolâ Ă˝ â.
For Chinesewriting I will adopta slightly morecomplex strategy, at leastincaseswheretheinternalstructureof Chinesecharactersis underdiscussion.Asmany as97% of Chinesecharacterscanbe analyzedasbeingcomposedof asemanticradicalplusa phoneticcomponent(DeFrancis,1984). In caseswherethis decompositionis feasibleI will âglossâ thecharacterin smallcapitalsthus:` SEMANTIC+PHONETIC b . Here SEMANTIC will be a conventionalterm todescribethe semanticradical in question,and PHONETIC will be a phonetictranscriptionin pinyin of thepronunciationof thephoneticcomponent;moreonthetranscriptionof thephoneticcomponentmomentarily. Following thiswill begivena phonetictranscriptionin pinyin of thewholecharacter, andanEnglishglosswherepossibleandrelevant.
Choosingthe appropriatetranscriptionfor the phoneticcomponentis not ass-traightforwardasit might seem.First of all, many phoneticcomponentshavemorethanonepronunciationasindependentcharacters.For example,thepho-neticcomponentof ďż˝ chan âcicadaâ, namely ÂĄ , hastwo independentpronun-ciations,namelydan andchan. Secondly, in a numberof cases,no indepen-dentpronunciationof thephoneticcomponentis particularlysimilar to thepro-nunciationof thesemantic-phoneticcompound,but a significantfractionof thecharactersthat containthat phoneticcomponenthave an identical pronuncia-tion, possiblyignoringtone,to thecharacterof interest.A particularlystrikinginstanceinvolvesthephoneticcomponentĂž , which asan independentcharac-ter is pronouncedchou, but asa phoneticcomponent,is alwayspronouncedniu(with varioustones).In sucha case,oneis arguablyjustifiedin transcribingthephoneticcomponentasniu ratherthanchou.
In decidinghow to transcribethephoneticcomponentof acharacterwethereforeadoptthestrategy of finding theclosestmatchbetweenthepronunciationof thesemantic-phoneticcompoundamong:
â theattestedindependentpronunciationsof thephonetic,and
â the pronunciationsof well-populatedsubsetsof thosecharacterssharingthesamephoneticcomponent
In casewe makeuseof thesecondof theseoptions,we indicatetheratioof:
â thenumberof characterslisted in (Wieger, 1965)with thephoneticcom-ponentand thepronunciationof interest,and
â thetotal numberof charactersin Wiegerâs lists with thatphoneticcompo-nent.
Wealsolist thepagenumber(s)in Wiegerwhereonecanfind thecharacterswiththatphoneticcomponent.If the tonesdiffer amongthemembersof thesubset,andonly in thatcase,weomit tonemarksfrom thetranscription.
1.3. TERMINOLOGY AND CONVENTIONS 25
For instance, for Ăż huang âlocustâ the transcription would be` INSECT+HUANG b 9 where huang happensto be the independentpro-nunciationof the phoneticcomponentďż˝ âemperorâ. For ďż˝ chan âcicadaâ wetranscribe INSECT+CHAN b , wherechan is oneof the independentpronunci-ationsof ÂĄ , thoughnot the mostfrequent.For ďż˝ tÄą, the phoneticcomponentis ďż˝ , whoseonly independentpronunciationis shÄą. However, a significantnumberof characterslistedin (Wieger, 1965)with ďż˝ asa phoneticcomponenthave thepronunciationtÄą, andthuswe transcribeďż˝ as ` WINE+TI ��� Ă ďż˝ Ă�� ďż˝ ďż˝"b ,meaningthatin nineoutof nineteencharacterswith ďż˝ asaphoneticcomponent(page498), Wieger lists the pronunciationas tÄą. This methodof transcription,while surelynot uncontroversial,is at leastreplicable.
In caseswheretheinternalstructureof theChinesecharacterisnotatissue,I willin generaldispensewith thedetailedcharacter-structuregloss,andmerelygivea phonetictransliterationin pinyin and(wherepossibleor relevant)anEnglishgloss.10Ă I will usethetermgraphemeto denotea basicsymbolof a writing system;this,despitethevalid objectionsto theuseof that termoutlinedin (Daniels,1991a;Daniels,1991b). However, notethat Danielsâ objectionsareaimedat the useof the termgrapheme, asan implicit parallelof phoneme: Danielsâ contentionis that thereis no âsystematicgraphemicsâparallel to a systematicphonemiclevel. I do not wish to contendthis point,andmerelyusethetermgraphemeasaconvenientshortway of sayingâbasicsymbolof awriting systemâ.
Note that in discussingsomewriting systemswe may usethe term graphemein slightly differentwaysdependinguponhow fine-grainedananalysisis beingassumed.For instance,it is convenientto refer to a singleChinesecharacterasbeinga graphemein somecontexts: in particular, in theelectroniccodingoftexts it is invariably the casethat singleChinesecharactersconstituteseparatecodes,and thus from the point of view of a computationalsystem(suchasaTTS system),Chinesecharactersare unanalyzablebasicunits. On the otherhand,thereis clearly importantinternalstructurein Chinesecharactersâ cf.the semantic+phoneticcompositionof Chinesecharactersalludedto above âandfrom thepoint of view of a finer-grainedanalysisof Chinesewriting, thesesmallerunitswouldcertainlybecalledgraphemes.
I will alsousethetermglyphconventionallyto referto a written symbolwith aparticularshape,independentlyof whetherit correspondsto a singlegraphemeor multiplegraphemes.Thusin my discussionof KoreanHankul(Section2.3.1)I will referto âsyllable-sizedglyphsâaswell asconsonantandvowelglyphs;thelattercorrespondto singlegraphemes,whereastheformerarepolygraphemic.Ă Wherethereis unlikely to be confusionI will usethe nameof languageX to
9I.e. +ďż˝ .10Note that throughoutthis work, I will usetraditionalChinesecharactersasusedin TaiwanandHong
Kong,andeschew theuseof simplifiedcharactersasusedontheMainlandandSingapore,exceptwherethestructureof suchsimplifiedcharactersis at issue.
26 CHAPTER1. READING DEVICES
denoteâthe orthographyof languageXâ. ThusâChineseâwill denoteChineseorthography, exceptwherethis usageis likely to beconfusing.
1.A. OVERVIEW OF FSAâS AND FSTâS 27
1.A Appendix: An Overview of Finite-State Automataand Transducers
In this appendixI give an overview of regular languagesand relations,and theirassociatedcomputationaldevices,finite-stateacceptors(FSAâs) andfinite-statetrans-ducers(FSTâs). Thecoveragehereis necessarilybrief, andfor furtherdiscussionothersourcesarerecommended.Finite-stateacceptorsandregularlanguagesarediscussedin any good introductionto the theoryof computation:see,for example(Harrison,1978;HopcroftandUllman, 1979;Lewis andPapadimitriou,1981). There arefew-er introductoryworkson transducers.Onereasonablyaccessiblediscussion(dealingwith transducers)canbefoundin (KaplanandKay, 1994).Onemightalsoconsultthethird chapterof (Sproat,1992)for an in-depthintroductionto the useof finite-statetransducersin computationalphonologyandmorphology. For transducers(aswell asweightedacceptors),thereis a recentpaperby Mohri (1997) that discussesvariousformalpropertiesandalgorithms,andvariousotherrelevantworksarecitedtherein.
1.A.1 Regular languagesand finite-stateautomata
Basic to the theoryof automatais the notion of an alphabetof symbols;the entirealphabetis conventionallydenotedďż˝ . Theemptystring is denotedby Ă , which is notanelementof ďż˝ ; also,theemptystringis distinctfrom theemptyset ďż˝ . �� denotesthesetof all stringsâ including Ă â over thealphabetďż˝ .
It is usual to definea regular language with a recursive definition suchas thefollowing (modeledon thatof (KaplanandKay, 1994,page338)):
1. ďż˝ is a regularlanguage
2. For all symbolsu������kà , �+u � is a regularlanguage
3. If �&à , ��à and � areregularlanguages,thensoare
(a) ďż˝ Ă Ă�� Ă , the concatenationof ďż˝ Ă and ďż˝ Ă : for every ďż˝ à ��� Ă and ďż˝ à ��� Ă ,ďż˝ Ă ďż˝ à ��� Ă Ă�� Ă(b) ďż˝ à ��� Ă , the unionof ďż˝ Ă and ďż˝ Ă(c) �� , the Kleeneclosureof ďż˝ . Using ďż˝ Ă´ to denoteďż˝ concatenatedwith itself} times, �� Ă����ô ��K ďż˝ Ă´ .
While theabove definition is complete,regular languagesobserve additionalclo-sure properties:Ă Intersection: if ďż˝ Ă and ďż˝ Ă areregularlanguagesthensois ďż˝ Ă"! ďż˝ Ă .Ă Difference: if ďż˝&Ă and ��à areregular languagesthenso is ďż˝&Ă8Ăş#��à , the setof
stringsin �&à thatarenot in ��à .à Complementation: if � is a regular language,thenso is �� 8ú#� , the setof allstringsover � thatare not in � . (Of course,complementationis merelyaspecialcaseof difference.)
28 CHAPTER1. READING DEVICES
0 1a
b
Figure1.3: An acceptorfor ��� � . Theheavy-circledstate(0) is (conventionally)theinitial state,andthedouble-circledstateis thefinal state.à Reversal: if � is aregularlanguage,thensois å1|(��$�� , thesetof reversalsof all
stringsin ďż˝ .
Regularlanguagesaresetsof strings,andthey areusuallynotatedusingregularexpres-sions. A fundamentalresultof automatatheoryarethe so-calledKleeneâs theorems,which demonstratethatregularlanguagesareexactly thelanguagesthatcanberecog-nizedusingfinite-stateautomata, wherethis computationaldevice canbe definedasfollows(Harrison,1978;HopcroftandUllman,1979;LewisandPapadimitriou,1981):
A finite-stateautomatonis a quintuple Ă Ă ÂŤ&% ďż˝ z ďż˝'0ďż˝ ďż˝ ďż˝ Ă� where:
1. % is afinite setof states
2. z is a designatedinitial state
3. ' is a designatedsetof final states
4. ďż˝ is analphabetof symbols,and
5. Ă is a transitionrelationfrom %)(�� to %As asimpleexample,considerthe(infinite) setof strings: ďż˝(u ďż˝ uGďż˝ ďż˝ uG��� ďż˝ uG���¼� Ă°ÂĽĂ°ÂĽĂ° ďż˝ â
i.e. the setconsistingof u followed by zeroor more ďż˝ s. The mostcompactregularexpressiondenotingthis setis uGďż˝* . Furthermore,the languagecanbe recognizedbythefinite-statemachinegivenin Figure1.3.
1.A.2 Regular relationsand finite-state transducers
Regular n-relationscan be definedin a way entirely parallel to regular languages.Again, thedefinitiongivenhereis modeledon thatof KaplanandKay (1994):
1. ďż˝ is a regularn-relation
2. For all symbolsu��,+�-ďż˝.��Ă¿�( Ă°ÂĽĂ°6Ă° (�-ďż˝.��Ă¿�/ , ďż˝+u ďż˝ is a regularn-relation
3. If ĂĄ1Ă , ĂĄ?Ă and ĂĄ areregularn-relations,thensoare
1.A. OVERVIEW OF FSAâS AND FSTâS 29
(a) ĂĄ.Ă&Ă6ĂĄ?Ă , the (n-way)concatenationof ĂĄ1Ă and ĂĄ?Ă : for every s%Ă0ďż˝ ĂĄ1Ă ands Ă ďż˝kĂĄ Ă , s Ă s Ă ďż˝lĂĄ Ă ĂÂĽĂĄ Ă(b) ĂĄ.Ă��kĂĄ8Ă(c) ĂĄ1 , the n-wayKleeneclosure of ĂĄ .
Onecanthink of regular n-relationsasacceptingstringsof a relationstatedover anm-tupleof symbols,andmappingthemto stringsof a relationstatedover a k-tupleof symbols,where y ò ďż˝aĂ w . We canthereforespeakmorespecificallyof y2(�� -relations.As in thecaseof regularlanguages,therearefurtherclosurepropertiesthatregularn-relationsobey:11Ă Composition: if ĂĄ Ă is aregular ďż˝"(Ăy -relationand ĂĄ Ă is aregular y2(43 -relation,
then ĂĄ Ă Âş&ĂĄ Ă is a regular ďż˝"(53 -relation.Compositionwill beexplainedbelow.Ă Reversal: if ĂĄ is a regularn-relation,thensois ĂĄ?|(ďż˝*�å.ÂŹ .Ă Inversion: if ĂĄ is a regular y#(Ăw -relation, then ĂĄ ä Ă , the inverse of ĂĄ , is aregular wďż˝(Ăy -relation.
Onecomputesthe inverseof a transducerby simply switching the input andoutputlabels.Thefactthatregularrelationsareclosedunderinversionhasanimportantprac-tical consequencefor systemsbasedon finite-statetransducers,namelythat they arefully bidirectional.Thus,aswe notedin Section1.2.2,a modelof spelling(mappingfrom the ORL to Ă ) canbe turnedinto a modelof reading(mappingfrom Ă to theORL) by simply invertingtheFSTimplementingĂÂťĂďż˝Ăďż˝Ăďż˝Ă1Ă .
For mostpracticalapplicationsof n-relationsw Ă ÂŞ (sothat ďż˝ and y areobviouslyboth Ăł ).12 In thiscasewecanspeakof arelationasmappingfrom stringsof oneregu-lar languageinto stringsof another. In thiswork wewill beconcernedexclusivelywithÂŞ -relations,andwe will usethetermregular relationswith thatmeaningthroughout.
Thecomputationaldevicecorrespondingto aregularrelationis afinite-statetrans-ducer. Thedefinitionof FSTcanbemodeledon thedefinitionof FSAâs givenabove,so we will merelyillustrateby example,ratherthanessentiallyrepeatthe definition.Saywe haveanalphabetďż˝ĂĂ ďż˝+u ďż˝ ďż˝ ďż˝ { ďż˝ t ďż˝ anda regularrelationover thatalphabetex-pressedby theset: ďż˝gÂŤďż˝u ďż˝ {6ÂŹ , ÂŤ:ugďż˝ ďż˝ {ďż˝tgÂŹ , ÂŤďż˝uG��� ďż˝ {ďż˝t"tgÂŹ , ÂŤďż˝uG����� ďż˝ {ďż˝t4t"tgÂŹ . . . ďż˝ . This relationthusconsistsof u mappingto { followedby zeroor more ďż˝ âsmappingto t . Thisrelationcanberepresentedcompactlyby the two-wayregular expression a:c (b:d) . Figure1.4,depictsanFSTthatcomputesthis relation.Wereferto theexpressionson thelefthandsideof theâ:â astheinputside,andtheexpressionson therighthandsideastheoutputside. Thus,in Figure1.4, the input sideis characterizableby the regularexpressionuGďż˝* , andtheoutputsideby theexpression{ďż˝t6 .
Compositionof regular relationshas the sameinterpretationas compositionoffunctions: if ĂĄ1Ă and ĂĄ?Ă are regular relations,then applying ĂĄ.Ă.º�å?Ă to an input
11Theomissionof difference,complementationandintersectionareintentional.In general,regular rela-tionsarenot closedundertheseoperations,thoughsomeimportantsubclassesof regular relationsare.See(KaplanandKay, 1994)for furtherdiscussion.
12Oneexceptionis thework of Kiraz (1999).
30 CHAPTER1. READING DEVICES
0 1a:c
b:d
Figure1.4: An FSTthatacceptsa:c (b:d)ďż˝ .
expressionĂŻ is the sameasapplying ĂĄ Ă to ĂŻ first andthenapplying ĂĄ Ă to the out-put. Figure1.5 depictstwo transducers,labeled7 Ă and 7 Ă . 7 Ă computestherelationexpressableas (a:c (b:d) ) 8 ((e:g) f:h) (where 8 denotesdisjunction),whereas7 Ăcomputesg:i Ă :j h:k (with the Ă : á term insertinga á ). The resultof composingthetwo transducerstogetherâ 7 Ă Âş97 Ă â is atransducerthatcomputesthetrivial relation,e:i Ă :j f:k. In thisparticularcase,thoughboth 7�à and 7�à expressrelationswith infinitedomainsandranges,theresultof compositionmerelymapsthestringef to ijk.
One other notion that is worth mentioningis the notion of projection onto onedimensionof a relation. For example,for a 2-way relation ĂĄ , :ďż˝Ă�:ĂĄ.ÂŹ projectsĂĄ ontothefirst dimensionand :ďż˝Ă4�å.ÂŹ projectsontotheseconddimension.Projectionappliedto anFSTproducesanFSA correspondingto onesideof thetransducer. Thusthefirstprojection( :*Ă ) of thetransducerin Figure1.4is theacceptorin Figure1.3.
1.A. OVERVIEW OF FSAâS AND FSTâS 31
7 Ă0
b:d
1
e:g
2
f:h3
a:c
e:g
f:h
7ďż˝Ă0 1
g:i2
Îľ :j h:k
75;0 1
e:i2
Îľ :j 3
f:k
Figure1.5: Threetransducers,where������ ��œ���
Chapter 2
Regularity
In this chapterwe defendthe first hypothesisthat was introducedin Section1.2.3,namelyRegularity.
It is obvious at the outsetthat the normal notion of a regular language,wherethecatenationoperatorâ Ă â denotessimple left-to-right concatenation,will not suffice.Thiscanbeseeneasilyenoughwith theChinesecharacter< ` WINE+JI ANG b ji angâsauceâwherethe semanticradical = ` WINE b 1 occursbelow the phoneticportion> ` JI ANG b . This contrastswith the caseof ? ` FISH+L I b l Äą âcarpâ, wherethesemanticradical @ ` fishb occursto the left of the phoneticcomponentA ` L I b ;with B ` BIRD+JI A b ya âduckâ, wherethe semanticradical C ` BIRD b occurstotheright of thephoneticcomponentDďż˝` JI A b ; with E ` GRASS+ZAO b cao âgrassâwherethe semanticradical ` GRASS bGF occursabove the phoneticcomponentH` ZAO b . andwith I ` SURROUND+HUO b guo âcountryâ, wherethesemanticradicalJ ` SURROUND b surroundsthe phoneticcomponentK ` HUO b . Thesedataaresummarizedin Table2.1.
Clearlywe needa morepowerful notionthansimpleconcatenationto handlesuchcases.Wewill thereforeintroducethenotionof planarregular languages, whichdifferfrom ordinary(string-based)regularlanguagesonly in definingarichersetof concate-nationoperations.Thedefinitionof planarregularlanguageswill begivenimmediatelyin Section2.1; we will alsointroduce(in Section2.2) the notion of SmallLinguisticUnit (SLU), the linguistic unit within which variationfrom the macroscopicâ line-anddocument-level â orderof a script is possible. In subsequentsectionswe willshow theapplicabilityof theexpandedformalismto variousphenomenathatarisein avarietyof scripts.It will beclearthat theextendedformalismis capableof providingstraightforwardanalysesof thesephenomena,which lendssupportto the Regularityhypothesis.Problematicexamplesfrom AncientEgyptian will be discussedin Sec-tion 2.3.5. In Section2.4 we briefly survey the possibleinstantiationsof the SLU indifferentwriting systems.Finally, weendthechapterwith theimplicationsof thethe-ory for themacroscopicarrangementof scripts,andin particularfor theinstantiations
1Usedalone,this character, pronouncedyou, is usedmostlyasa termin thecalendricalcycle, thoughinarcheologicalusageit retainsits originalmeaningof âamphoraâ.
33
34 CHAPTER2. REGULARITY
?ďż˝ĂL@ left of A Bďż˝ĂLC right of D
Eďż˝ĂLF above H <ďż˝ĂL= below>
I�à JsurroundingK
Table2.1: Chinesecharactersillustrating the five modesof combinationof semantic(under-lined)andphoneticcomponents.
of boustrophedonwriting.
2.1 Planar Regular Languagesand Planar Regular Re-lations
Planargrammarsof variouskinds have beenusedboth in two-dimensionalpatternrecognitionand in building generative modelsof two-dimensionallayouts. For in-stancetwo-dimensionalcontext freegrammarshave beenusedin the recognitionofprintedmathematicalequations(Chou,1989),andin formal descriptionsof Chinesecharacterconstruction(FujimuraandKagaya,1969;Wang,1983). Planarfinite-statemodelshave alsobeenused,mainly in patternrecognition: for instanceLevin andPieraccini(1991)developeda planar hiddenMarkov modelapproachto opticalchar-acterrecognition. A comprehensive review of two-dimensionalfinite-statemodelsandtheir propertiesis given in (GiammarressiandRestivo, 1997). GiammarresiandRestivoâs discussionfocusseson two-dimensionallanguagesâ alsotermed,for ob-
viousreasons,picturesâ thatcanberepresentedwith symbolson a rectangulargrid.For instance,thefollowing would beapictureover thealphabetďż˝(u ďż˝ ďż˝ ďż˝ :
(2.1) a a a a a a ba a a a a b aa a a a b a aa a a b a a aa a b a a a aa b a a a a ab a a a a a a
This view is not really adequatefor our purposes,however, sincewe would like toview theprimitiveelementsof thealphabetasbeing,in effect,geometricalfiguresthatmight occupy morethanoneâsquareâin sucha two dimensionalgrid. For example,in Figure2.2 below, the basicelementĂ´:²� is left-adjoinedwith the entirecomplexconsistingof Ă´�³œ , ĂĂÂŤ:Ă"ÂŹ , and ĂĂÂŤďż˝Ă� â andin particularis directly to the left of bothĂĂÂŤ$³œ and ĂĂÂŤ:Ă"ÂŹ â somethingthat is not easilyrepresentedin anarrangementsuchasthatin (2.1).
2.1. PLANAR REGULAR LANGUAGES 35
γ(ι) γ(β)
γ(Μ) γ(δ)
Figure2.1: ���������)�&�*� ��� ! �4�*�$#%����������'(� .γ(ι)
γ(β)
γ(Μ) γ(δ)
Figure2.2: Anotherfiguredescribedby ďż˝*�����œ�Ăďż˝&��� ďż˝*��! ďż˝4���$#%����������'(ďż˝-, .Thenotionof planarregularlanguagesthatwehavein mindherecanbedescribed
informally asfollows. Supposeyou have a setof two dimensionalfiguresarrangedinsomefashionon a flat surface:considerfor examplethefour rectangleslabeledĂ´:²� ,ĂĂÂŤ$³œ , ĂĂÂŤ:Ă"ÂŹ and ĂĂÂŤďż˝Ă� in Figure2.1. We assumefor simplicityâs sake thatwe aretoldwhatthesubfiguresareandwherethey arerelative to oneanother:that is, our taskisnot to computethat therearefour blocksin Figure2.1,andthat they arearrangedinsomepattern,but rather, givenapredeterminedlayout,to describethatlayoutin formalterms. The analogyin the one dimensionalcaseis between,say, optical characterrecognition,andstringmatching:in theformercaseonemustdiscoverwhatcharactersarein atext; in thelattercaseonealreadyknowsthecharactersandtheirrelativeorders,andonemerelyhasto, for example,find patternsin this alreadyknown sequenceofcharacters.
Thereareanumberof waysin whichonecoulddescribeFigure2.1,but supposingwestartin theupperlefthandcorner, we mightsaythat Ă´:²� left catenateswith Ă´�³œ ;thatthispairdownwardscatenateswith thepair ĂĂÂŤ:Ă"ÂŹ~ĂĂÂŤďż˝Ă� ; andthat ĂĂÂŤ:ĂYÂŹ left catenates
with ĂĂÂŤďż˝Ă� . If we useâĂ Ă â for âleft catenateswithâ and â M Ă â for âdownwardscatenates
withâ, we could describethe layout succinctlyas ĂÍ�²´ Ă ĂlĂ´�³œNM Ă&ĂĂÂŤ:ĂYÂŹ Ă ĂkĂĂÂŤďż˝Ă� . Ofcourseotherpatternsareconsistentwith this formula: considerFigure2.2.Thisbringsup thepoint thatunlike thecaseof one-dimensionalconcatenation,planarcatenationoperatorsarenot in generalassociative. More specifically, a sequenceof within op-
36 CHAPTER2. REGULARITY
eratorcatenationsis associative: ÂŤďż˝ĂÍ�²´�M Ăďż˝ĂĂÂŤ$³œ-ÂŹďż˝M ĂlÂŤďż˝ĂĂÂŤ:Ă"ÂŹďż˝M ĂcĂĂÂŤďż˝Ă�-ÂŹ is equivalent toĂÍ�²´NM Ă1ÂŤďż˝ĂĂÂŤ$³œ1M Ă�ô�ĂY�1M Ă�ô:Ă�� ; but cross-operatorcatenationsarenot in generalas-sociative. Thereare a coupleof possiblesolutionsthat allow us to more preciselydescribea particularlayout. Oneapproachis to make bracketsanexplicit partof the
formalism: thusFigure2.1 could be describedas + ĂÍ�²´ Ă Ă ĂĂÂŤ$³œ�/ M ĂO+ Ă´�ĂYÂŹ Ă Ă Ă´:Ă�-/ ,asdistinct from ĂÍ�²´ Ă ĂP+ ĂĂÂŤ$³� M ĂN+ Ă´�ĂYÂŹ Ă ĂlĂĂÂŤďż˝Ă�-/ / , which would describeFigure2.2.An alternative that can be adoptedin somecases(seeSection2.3.1, for instance)is to definea precedenceon operators. So Figure 2.1 can be describedas simplyĂÍ�²´ Ă ĂĂĂĂÂŤ$³œQM Ă´ĂĂÂŤ:Ă"ÂŹ Ă Ă ĂĂÂŤďż˝Ă� , if we have the understandingthat â
Ă Ă â hasprecedence
overâ M Ă â, sothatthegroupsĂ´:²� Ă Ă½ĂĂÂŤ$³œ andĂ´�ĂYÂŹ Ă Ă8ĂĂÂŤďż˝Ă� will form first, andonly then
will M Ă join thetwo groupstogether. Suchanapproachwould not allow usto describe
Figure 2.2, sinceno definition of precedencebetweenâĂ Ă â and â M Ă â will allow us to
groupthecomponentsappropriately. In suchcasesonewould have to resortto brack-eting.For example,theChineseR l Äąn âfish scaleâis composedof thecomponents,@, S , T and , arrangedasfollows: @ Ă Ă [ S M Ă [ T Ă Ă ]]
We turn now to a formal definition of planarregular languages.The definitionsof regularlanguagesintroducedin Appendix1.A carryover directly to planarregularlanguages,theonly novel featurebeingthesplittingof concatenationâ Ă â into fiveoper-ationsâ eachof whichis neededto describetheChinesecharactercomponentlayoutsillustratedin theintroductionto this chapter:2Ă Left catenation:
Ă ĂĂ Rightcatenation:U ĂĂ Downwardscatenation:M ĂĂ Upwardscatenation:V ĂĂ Surroundingcatenation:3Notethat 3 doesnot haveadual: we discussthis point furtherin Section2.3.4.
Thus,we canemendthe relevant portionsof the definition of regular languagesgivenin Appendix1.A.1 to readasfollows:
3. If �&à and ��à areregularlanguages,thensoare
(a) ďż˝ Ă Ă ĂQďż˝ Ă ; ďż˝ Ă U Ă0ďż˝ Ă ; ďż˝ ĂWM Ă9ďż˝ Ă ; ďż˝ ĂWV ĂXďż˝ Ă ; ďż˝ Ă 3)ďż˝ ĂEachof thesecatenationoperationsis illustratedin Figure2.3.Of course,for theimplementationof ĂÂťĂďż˝Ăďż˝Ăďż˝Ă1Ă in a givenwriting system,we will
notonly beinterestedin planarregularlanguages,but moregenerallyin planarregular2NotethatColeman(1998,pages27â28)usesdownwardsconcatenation(which hetermscocatenation)
aspartof hisdescriptionof theformal syntaxof IPA symbols.
2.1. PLANAR REGULAR LANGUAGES 37
(a) (b)
(c) (d)
(e)
γ(ι) γ(β) γ(ι)γ(β)
Îł(Îą)
γ(β) γ(ι)
γ(β)
γ(β) γ(ι)
Figure2.3: Thefive planarconcatenationoperations:
(a) Y5Z\[^]^_��WY4Z �5] ; (b) Y5Z\[5]a`��WY5Z �4] ; (c) Y4Z\[^] ! �bY5Z �4] ; (d) Y5Z\[^]5c �bY5Z �4] ; (e) Y5Z\[^]bdeY5Z �4] .
38 CHAPTER2. REGULARITY
relations. On theorthographicsideof themapping,oneis clearlymappingto planarobjectsbuilt usingsomecombinationof planarcatenationoperations.Onthelinguisticsidethingsareperhapslessclear. Although linguistic objectssuchastheannotationgraphsintroducedin Section1.2.2aredisplayedin two dimensions,they arenot reallyplanarobjects:thereis no sensein which theSEM arcis, say, above theTONEarcin(1.10).For thesakeof thepresentdiscussionwewill assumefor thesakeof simplicitythatgraph-theoreticobjectssuchas(1.10)have beenâlinearizedâ into strings,sothatwe can think of their constructionas being in termsof simple string concatenationâ f â. So, in thepresentdiscussionwe will be interestedin planarregular relationsthatinvolvemappingsbetweenstringsconstructedusingâ f â andplanarobjectsusingsomecombinationof planarcatenationoperations.Thuswemightwantto state,for instance,
that gPf�h,fji transducesto k�l&g"m0n fok�lphqm1r\fsk"l$itm . We canstraightforwardly redefinethenormalnotionof concatenationin regular relationsto implementthecasewe areinterestedin (Appendix1.A.2):
3. If u0v and uxw areregularrelations,thensoare:
(a) uNvLyzf|{ r f6}euxw ; uNvLyzf|{�~�f6}euxw ; uNvLyzf|{ n f-}euQw ; u0v�y�f {�� f�}�uxw ;u v yGf { d }�u wHere, the notation y����X�(s v {�����(s w } meansthat we combinethe input sideof therelationusing �����(s�v andtheoutputsideusing ���X�(s�w .
It shouldbestressedthattheplanarcatenationoperationsarenotgenerallyintend-ed to describethe exact placementof oneelementrelative to another. Thusstating
a formula suchas kďż˝l&g"m0n fWk"l&hďż˝m merelyentailsthat kďż˝l&g"m is placedsomewhereabovekďż˝lphďż˝m , but it saysnothingaboutwhetherthecenterof gravity of thevisibleglyphrepre-sentingkďż˝l&g"m is exactlycenteredon thevisibleglyph representingk"l&hďż˝m , or is perhaps,say, a little to theright. Of coursesometimessuchdifferencescorrelatewith a differ-encein meaning.To take anobviousexample,in a numberof scriptstheapostrophey â } andthe comma y , } arealmost identicalor completelyidentical in form, theonly differencebeing the vertical placement. In both y Jonesâ} and y Jones,} wewould saythat the comma/apostropheis catenatedto the right of y s} , so it wouldseemasif thecurrentformalismcansaynothingabouthow thesetwo casesaredistin-guished.Thesolution,of course,is to assumethatglyphsarenot merelya collectionof blackbits, but in generalalsoincludea block of white bits within which theblackbits aresituated.Thusapostropheandcommaarereally representedasin (2.2) and(2.3),respectively:
(2.2) âďż˝
(2.3) ,
Thuswe canpreserve the simplestatementthatapostrophesandcommasalike cate-natedin a left-to-right fashionwith their neighbors,andat the sametime guaranteethat they will be positionedappropriatelyin the vertical dimensionrelative to those
2.1. PLANAR REGULAR LANGUAGES 39
neighbors.In many othercases,though,issuesof exactplacementof glyphsrelatetowritten stylistics,andin generalmay vary substantiallydependinguponthe style offont, andwhetheroneis dealingwith ordinaryprintedtext, ordinaryhandwrittentext,or calligraphy. Suchstylistic concernsareoutsidethescopeof thepresentstudy.
A furtherpoint needsto bemadeabouttheuseof bracketsto indicateassociation,discussedabove. In principle, the unboundeduseof pairedbracketsintroducesnon-regularity, sincenon-finitesetsof well-formedbracketingsarewell-known to requirecontext-freepower;see,e.g.,(Harrison,1978,pages312ff.). Wecankeepthelanguagewithin thesetof regularlanguages,however, if we limit thedepth ďż˝ of bracketingthatwe allow, thusalsolimiting thenumberof switchesbetweencatenationoperatorsthatweallow. (Sincebracketingis not involvedwhenwecombineelementswithin agivencatenationoperatorâ e.g. whenwe combine g r fďż˝h and kďż˝l&itm r fďż˝kďż˝l$��m using r f âthereareno restrictionson âdepthâ of combinationin suchcases.)It is unclearwhatthesettingfor ďż˝ shouldbe,but a reasonablesettingmight beseven.3 This would bemorethansufficient to allow for anexhaustivestructuralanalysisof themostcomplexChinesecharacters(Rick Harbaugh,personalcommunication);seeSection2.3.4.
Thecomputationaldevicescorrespondingto planarregularlanguagesandrelationsareplanar(or âtwo-dimensionalâ)finite-stateautomata(2FSA)andplanarfinite-statetransducers(2FST), respectively. Wecandefineaplanarfinite-stateacceptoralongthelinesof thedefinitionof (one-dimensional)finite-stateautomatafrom Appendix1.A.1,addingto thedefinitionasetof directions,astartposition,andasetof groupingbrack-ets; computationallyit is easierto definethe machinesusingbracketsratherthanintermsof operatorprecedence.
A planar finite-state acceptor is an octuple � �l&��{���{&�a{���{���{��{��Q{itm where:
1. ďż˝ is afinite setof states
2. ďż˝ is a designatedinitial state
3. ďż˝ is thestartingposition(in theplanarfigure)for ďż˝ , chosenfrom theset ďż˝ left, top,right, bottomďż˝ .
4. � is thesetof groupingbrackets � [, ] �5. � is a designatedsetof final states
6. ďż˝ is thesetof directionsďż˝ R(ight) , L(eft), D(own), U(p), I(nwards)ďż˝(correspondingto thecatenationoperatorsr f , ~ f , n f , ďż˝ f and d , respec-tively)
7. ďż˝ is analphabetof symbols,and
8. i is a transitionrelationbetween�)�,l��.�s������m���� and �To recognizethefigurein Figure2.2we effectively needto have a 2FSAthatrec-
ognizesthedescriptionk�l&g"m r f¥ k�lph�m n f� k"l��jm r fQk�l&itm-¢ ¢ . Soweneedto haveamachine3In asimilar vein,Church(1980)proposedahardlimit on thedepthof embeddingin syntacticstructure
in orderto beableto implementafinite-statesyntacticanalyzer.
40 CHAPTER2. REGULARITY
0 1
2 3
4ÂŁ
5¤
6ÂĽ
7ÂŚ
8
Îł(Îą)
γ(β)
Îł(Îś)
[ [
] ]γ(δ)
R
R
R
D
RR
R R
left
Figure2.4: A 2FSA that recognizesFigure2.2. The labelsâRâ andâDâ on the arcsdenotereadingdirection; âleftâ on state0 (the initial state)denotesthe position at which scanningbegins.
wherescanningbegins at the lefthandside of the figure, proceedsrightwardsread-ing k�l&g"m , readsrightwardsacrossonegroupingbracket,readsrightwardsacrossk�lph�m ,readsdownwardsacrossone groupingbracket, readsrightwardsacrossk�l$��m , readsrightwardsacrossk"l$itm , andfinally readsrightwardsacrosstwo groupingbrackets.A2FSAthataccomplishesthis is givenin Figure2.4.
A 2FSTcanbe definedsimilarly to a 2FSA. For our purposeswe areinterestedin machinesthatmapfrom expressionsconstructedusingstringcatenation,to expres-sionsconstructedusingplanarcatenationoperators.Theonly partof thedefinitionthatchangesis 8:
A planar finite-state transducer is an octuple � �l&��{���{&�a{���{���{��{��Q{itm where:
1. ďż˝ is afinite setof states
2. ďż˝ is a designatedinitial state
3. ďż˝ is thestartingposition(in theplanarfigure)for ďż˝ , chosenfrom theset ďż˝ left, top,right, bottomďż˝ .
4. � is thesetof groupingbrackets � [, ] �5. � is a designatedsetof final states
6. ďż˝ is thesetof directionsďż˝ R(ight) , L(eft), D(own), U(p), I(nwards)ďż˝(correspondingto thecatenationoperatorsr f , ~\f , n f , ďż˝ f and d , respec-tively)
2.2. THE LOCALITY HYPOTHESIS 41
0 1
2 3
4ÂŁ
5¤
6ÂĽ
7ÂŚ
8R
R
R
D
RR
R R
left
ξ:γ(β)
Μ:γ(Μ) δ:γ(δ)§ ξ:]
Îľ:[ Îľ:[
Îľ:]
Îą:Îł(Îą)
Figure2.5: A 2FSTthatmapstheexpression[X¨XŠ�ª to Y5Z\[5]a_|Q Y5Z�¨4]a X Y4ZpŠ�]^_|WY5Z\ª�]$Ž¯Ž .
7. ďż˝ is analphabetof symbols,and
8. i is a transitionrelationfrom ďż˝)���°�,l��.�����s��m���� to ďż˝In generalarcsthatarelabeledwith bracketson theâplanarsideâ will belabeledwithďż˝ on theâstring sideâ of the transduction.A 2FSTthatmapstheexpressiongahqďż˝ti to
k�l&g"mor�f¹ k�lph�m�n f� k"l��jm�r\fQk�l&itm-¢ ¢ (Figure2.2),is givenin Figure2.5.
2.2 The Locality Hypothesis
How arethevariouscatenationoperatorsactuallydistributedin a writing systemthatusesmore than one? Invariably one finds a situationsuchas the following. At amacroscopiclevel, the script runs in a particulardirection, say left to right ( r f ) or
top to bottom( n f ) (seeSection2.5); the particularchoicemay be somewhat free, asit is in Chinese,but whatever is chosenis fixed for a given text. Alterationsof thismacroscopicorderoccuronly locally. Thusin Chinese,theconstructionof aparticularcharactermay involve variouscombinationsof thecatenationoperatorsthatwe havedescribed,but thereis (for a giventext) only onechoiceavailablefor catenatingthatcharacterwith thefollowing one.Thisobservationleadsto thefollowing claim,whichwe shalltermLocality:
42 CHAPTER2. REGULARITY
(2.4)Locality
Changesfrom themacroscopiccatenationtypecanonly occurwithin a graphicunit that correspondsto a small linguisticunit(SLU).
As we shallsee,in many writing systems,theSLU in questionis thesyllable,thoughin somecases(Section2.3.2) the âorthographicsyllableâ is non-isomorphicto thephonologicalsyllable;in othercasestheunit seemsto betheword (aswe shallseeina casefrom Aramaicin Section4.4.1).This issueis furtherdiscussedin Section2.4.
In theremainingsectionsof thischapter, weturnto anapplicationof theformalismandtheorydevelopedhereto variousphenomenafoundin writing systems.
2.3 Planar Arrangements: Examples
In this sectionwe discussfour writing systemsâ KoreanHankul, Devanagari,Pa-hawh HmongandChineseâ which make substantialuseof more thanone planarcatenationoperator. In eachof thesecasestheSLU is thesyllable,thoughin Devana-gari, the relevant notion of âsyllableâ is orthographicallyratherthanphonologicallydefined.In thefinal subsectionwediscussanapparentcounterexampleto theclaimofregularity from AncientEgyptian.
2.3.1 KoreanHankul
Thediscussionof KoreanHankulheredrawsuponthedescriptionpresentedby King(1996)(andseealso (Sampson,1985)).Thefollowing summarizesthefactsdiscussedin detail by King. The lettersof Hankul are arrangedinto âsyllable-sizedâglyph-s.4 Thesyllable-sizedglyphsarecatenatedwith eitherleft-catenationor downwards-catenation. Within the syllable-sizedglyphs, however, both left- and downwards-catenationareusedin waysthat arepredictablegiven the particularsegmentsbeingcombined.Vowel anddiphthongglyphsareclassifiedinto two classes,VERTICAL andHORIZONTAL; examplesof eachof thesewill be givenmomentarily. All orthographicsyllablesin Hankulmusthaveonsets:if thecorrespondingphonologicalsyllablelacksanonset,thenaâplaceholderâglyph ²³ is usedto representtheemptyonset.5 Thatis:kďż˝l$´�¾�œ*¡&¸-šºmoďż˝ ²³ .
4Note that by âsyllableâ, herewe meansyllableat a morphophonemic(ratherthansurfacephonemic)level of representation:in many cases,unitsthatarerepresentedorthographicallyassyllablesdo not repre-sentsinglesyllablesin thesurfacephonology. TheORL in ModernKoreanorthographywould appearto befairly deep.King observes(page223)that:
Hankul orthographydrifts from a moreor lessconsistentlyphonemicapproachin thefifteenthcentury, to anincreasinglymorphophonemiconeby thetwentiethcentury.
Sampson(Sampson,1985,pages135ff.) givesadetaileddiscussionof this issue.5In codaposition,thissymbolrepresents/ Âť /, whichdoesnotoccurin syllable-initialposition.
2.3. PLANAR ARRANGEMENTS:EXAMPLES 43
As examplesof theconstructionof of Hankulorthographicsyllables,considerthe
glyphs y mos} and y cal} . Thefirst is constructedout of threecomponentsarrangedin verticaldescendingorderasfollows:
Âź y m }Âź y o }Âź y s}
For y cal} , thecomponentglyphsareasfollows,with thefirst two arrangedhori-zontallywith respectto eachother, but abovethethird glyph:
Âź y c }Âź y a}Âź y l }
Theglyph y o } belongsto thehorizontalclasswhereas y a} belongsto the
verticalclassâ notethelargely horizontalorientationof y o } asopposedtoy a} â andthis in turn correlateswith the fact that y o } is arrangedvertically withrespectto theprecedingonset,whereasy a} is arrangedhorizontally.
The macroscopicarrangementof Hankul syllable glyphs is traditionally top-to-bottom,thoughleft-to-rightarrangementis becomingmuchmorecommon.Deviationsfrom this macroscopicorderonly occurwithin the syllable,promptingthe followingstatementfor Hankul:
(2.5) TheSLU is thesyllable.
Thefull setof rulesfor thearrangementof glyphsin Hankulareasfollows;see(Samp-son,1985, page132) and(King, 1996,page222). In this versionof the rules,weassumethatsyllable-sizedunitsarearrangedin a left-to-right fashion:
Âź For syllables½Xv and ½9w , kďż˝l&½4vďż˝f�½�wďż˝mž��k"l$½Xvďż˝mďż˝r\fÂżkďż˝l&½�wďż˝m .Âź For onset-nucleusclusterĂWĂ andcoda Ă , kďż˝l\ĂWĂďż˝fďż˝ĂXmo�°k"lpĂWĂĂm n fĂkďż˝l$ĂXm .Âź If codaĂ is complex, consistingof (maximally)two consonantsĂ v and Ă w , then
k"lďż˝Ăďż˝mo��k"l�à v f�à w mž�#kďż˝l$Ă v m r fĂkďż˝l$Ă w mÂź For onsetĂ andnucleusĂ ,
â if Ă belongsto theVERTICAL classthen k"lpĂ.fĂ ĂĂmo�°k"lpĂWm r fĂkďż˝l$Ă9mâ elsekďż˝l\Ă,fďż˝ĂĂmo�°k"lpĂWm n fĂkďż˝l$ĂĂm
44 CHAPTER2. REGULARITY
n f ¢ n f �y m } y o } y s} y mos} r f ¢ n f �y c } y a} y l } y cal}
Figure2.6: ThesyllablesĂ mosĂ /mo/ âcannotâ,and Ă calĂ /cal/ âwellâ in Hankul.
Theprinciplesof Hankulgraphicsyllableconstructionareillustrated in Figure2.6,againusingthe syllables/mos/and/cal/. Recall that y o } belongsto the horizontalclass,whereasy a} belongsto the vertical class. Figure2.6 makesuseof bracketsto indicategrouping.Thus,in y cal} , y c } is catenatedto the left of y a} , andthenthis whole groupis catenatedon top of y l } : the alternative bracketingwould resultin a differentarrangementof symbols.However, in Hankul it is possibleto dispensewith brackets in favor of operatorprecedence,as discussedin Section2.1: giving
leftwardscatenationhigherprecedencethandownwardscatenationâ r f�à n f â yieldsthedesiredresult.
2.3.2 Devanagari
The Devanagariscript is a modernIndian script derived originally from the Brahmiscript (Bright, 1996). It is usedto representHindi, NepaliandMarathi,aswell asavarietyof local languagesof NorthIndia; it is alsotheusualscriptusedin representingSanskrit.In thepresentdiscussion,wewill assumetheuseof theDevanagariscriptasawriting systemfor Hindi, thougheverythingthatwewill discusscarriesover, mutatismutandis,to thescriptâsusefor otherlanguages.
Bright describesDevanagariasanalphasyllabary, meaningthat thesystemis ba-sically alphabetic,but that the symbolsarearrangedin syllable-sizedunits. As weshallseemomentarily, therelevantsyllablesfor theorthographyarenot isomorphictophonologicalsyllables.
Thebasicfeaturesof thescriptthatwill concernusherearethefollowing:
Âź (Phonological)syllable-initialvowelsarerepresentedasfull symbols,but whencombinedwith a precedingconsonantthey appearin diacritic forms that ap-pearabove, below, beforeor after the consonantin question. The vowel / Ă /hasno diacritic form, sothata consonantwithout a vowel markhasaninherentschwa. Theformsof theindependentanddiacriticvowels(with onsetconsonantĂ y k } ) aregivenin Table2.2.
Âź Consonantclustersarerepresentedby ligaturedgroups,thatbehaveasunitsforthepurposesof vowel placement.Thus ĂĂy s} +
Ă y k } yields Ă Ă y ska} ;Ă
y k } + Ăďż˝y s} + ĂĂy m } yields Ăďż˝ĂĂy ksma} .6 A sequence/skI/ is represented
6Therulesfor ligatureformationaresomewhatcomplex andwill notconcernushere.
2.3. PLANAR ARRANGEMENTS:EXAMPLES 45
Expression Full form Diacritic form
Null Ă yďż˝Ăj} Ă y k Ăt}After ĂÂĄĂ y a} Ă Ă y ka}
Ă6Ă y o } Ă Ă y ko }Ă6Ă yďż˝Ăj} Ă Ă y k Ăt}Ă y i } Ă�à y ki }
Above Ă y e} Ă Ă y ke}Ă Ă yĂĂďż˝} Ă Ă y k ĂĂ }
Below à y U } à à y kU }à y u } à å y ku }â y ri } à ã y kri }Before ä y I } ü à y kI }
Table2.2: Full anddiacritic forms for Devanagarivowels, classifiedby the position of ex-pressionof thediacritic forms. Thusâafterâ meansthatthediacritic occursafter theconsonantcluster, âbelowâ, below it, andsoforth.
ĂĽ$Ă Ă y i+sk } , with the y i } occurringin the positionbeforethe cluster. Theligatured-unit-plusvowel combinationformsanorthographicsyllable.
Âź A preconsonantalinitial /r/ in anorthographicsyllableis representedby asuper-scriptsymboloccurringat theendof theorthographicsyllable.Thus/v Ă rma/isrepresentedasĂŚďż˝ĂWĂç y v Ă ma+r} .
An algorithmanda setof mappingrulesfor Devanagarithat handlesthesefactsfollows:
Ÿ Divide thephonologicalstring into orthographicsyllablesby placinga syllableboundaryèºÊÍê :
â At thebeginningof theword;
â Betweeneachpair of adjacentvowels;
â Beforethefirst consonantof a cluster.
ThustheSLU in Devanagariis theorthographicsyllable.
46 CHAPTER2. REGULARITY
Âź Assumea function ĂŹoĂ�Î , which formstheligaturedform of asequenceof conso-nantglyphs.Then kďż˝l$ĂŻďż˝vĂ°ĂŻďż˝w�ù�ù*ù�ïWò�mo��ϞĂ$ĂŽ5l\kďż˝l$ĂŻďż˝vĂŤmďż˝k"l�ï�wďż˝m4ù�ù*Ăąďż˝k"l�ïWòĂmm .
Ÿ Let �0óXô$ô|þ bethefull vowel glyph for vowel Üáþ , then k�l$Ü6þ�moø)�0ó4ô&ô|þ�Ú è�ÊÍê .
Âź For consonantcluster Ă andvowel Ăş , thenkďż˝l$Ăďż˝fĂ°Ăş5m ďż˝#kďż˝l$ĂXm if ú��2Ăďż˝#kďż˝l$ĂXm r fQkďż˝lpĂş5m if ú�� /a,o,Ă ,i/ďż˝#kďż˝l$ĂXm ďż˝ f9k"l&Ăş4m if ú�� /e,Ă /ďż˝#kďż˝l$ĂXmqn f9k"l&Ăş4m if ú�� /u,U,ri/ďż˝#kďż˝l$ĂXm ~ fQkďż˝lpĂş5m if ú�� /i/
Âź For an orthographicsyllablestartingwith /r/ andremainderĂĂĂş with non-null
consonantĂ , k"l-Úà ÝðÚĂfďż˝Ă9Ăş4mo��k"l-Úà ÝðÚ�m n f6kďż˝l$Ă9Ăş4m .Thepropertiesof Devanagarithatwehave justanalyzedarecommonamongother
IndianandIndian-derivedscripts.Indeed,comparedwith thoseof someotherscripts,Devanagaridiacritic vowels are relatively simple: Thai for example(Diller, 1996)hasvowel symbolsthat not only occurabove, below, beforeandafter the consonantsymbol,but alsovowel diacriticsthatsurroundtheconsonantsymbol.
2.3.3 PahawhHmong
The Pahawh Hmong messianicscript inventedin 1959 by ShongLue Yang, a H-mongpeasant,is describedat lengthin a fascinatingstudyby Smalley, VangandYang(1990);a more concisedescriptioncanbe found in (Ratliff, 1996). Therewereac-tually four stagesof thescript,which evolvedasShongLue Yangrefinedhis originaldesign:we will beconcernedwith theThird Stage,which is theversionthatreceivedthewidestacceptanceanduse.Therearetwo setsof glyphsin Pahawh, thefirst rep-resentingonsetconsonantclusters,andthe secondthe rime, i.e. the vowel plus thelexical tone.7 Pahawh is thussometimesdescribedasa demisyllabicsystem,thoughthis is really a misnomer. Thewriting runsfrom left to right, with spacesseparatingsyllables,making the syllable-sizedchunksquite easyto identify. What is notableaboutPahawh is that the glyph representingthe rime is systematicallywritten to theleft of theglyph representingtheonset,in contraventionto theoverall left-to-right or-
derof thescript.For exampleShongLueâsnameis written y ´ĂĂ°Âť^} + y s}y ËĂź } + y l } /s´ĂĂ°Âť lË
Ăź/ in Pahawh Hmongwriting. Clearly, asothershave noted,onecan
view this asa generalizationof thepropertyof many IndianandIndian-derivedSouthEastAsianscripts,to allow somevowel glyphsto precedeconsonantclustersthat,onthe basisof phonologicalordering,they logically follow; we saw an exampleof this
7In thefinal (fourth)stage,thevowel andtonesymbolshadbecomecompletelyseparate,andevenin theThird stage,thereis apartialseparation,with sometonalinformationbeingrepresentedby diacriticsymbolswritten over thevowel symbol. We will not beconcernedwith therepresentationof tonehere,andfor thepurposesof thisdiscussion,wewill considerthevowel-plus-tonecombinationasa singleunit.
2.3. PLANAR ARRANGEMENTS:EXAMPLES 47
with Devanagariin theprevioussection.However, Pahawh is theonly known writingsystemthatconsistentlyhasthis reversal.8
While theorigin of thisuniquefeatureof Pahawh is mysterious,its implementationwithin thecurrentframework is simple.As for Hankul,theSLU is thesyllable:
(2.6) TheSLU is thesyllable.
TherulesdescribingHmongglypharrangementsareasfollows:
Âź For syllables½Xv and ½9w , kďż˝l&½4vďż˝f�½�wďż˝mž��k"l$½Xvďż˝mďż˝r\fÂżkďż˝l&½�wďż˝m .Âź For onsetĂ andnucleusĂ , kďż˝l\Ă,fďż˝ĂĂmWďż˝#kďż˝l\ĂWm ~ fQkďż˝l$ĂĂm
2.3.4 Chinese
As will be recalledfrom Section1.2.2(andseealsoSection4.2), Chineseis a partlylogographicwriting systemwheremostindividualcharactersaremadeupof acompo-nentthatgivessomeinformationaboutthepronunciation(theâphoneticâ component)andanothercomponent(the âsemanticâcomponent)thatgivescluesto themeaning.Thesetwo componentscanbearrangedin a numberof waysrelative to eachother, aswediscussedbriefly in theintroductionto thischapter. Fromtheannotation-graphrep-resentationof a Chinesemorphemesuchasthatin (1.10),repeatedbelow as(2.7),wehave assumethat the semanticinformationassociatedwith that morphemeoverlaps,but doesnot dominate,thephonologicalinformation:
(2.7)
SEM: cicada: ýTONE: 2SYL: ½ : ÞONS-RIME: ch an
Given Axiom 1.3 from Section1.2.2, it follows that the imageof the semanticpor-tion under �Pÿ���� r � mustcatenatewith theimageof thephonologicalportionunder� ÿ���� r � . TheSLU in Chinese,likethethreewriting systemswehavejustdiscussed,is the syllable,and(with a singleexceptionthat neednot concernus here)syllablesarein turn implementedusingsinglecharacters.In principletherefore,thecatenationoperatorchosento implementthe within-charactercombinationof the semanticandphoneticelementscandiffer from themacroscopiccatenation,whetherthatbethetra-ditionaldownwardscatenation,or themoremodernleft-to-rightcatenation.In fact,theparticularcatenationoperatorchosendependsupona relatively complex setof rulesandlexical specifications.This sectionpresentsa preliminaryanalysisof theinternalstructureof Chinesecharactersin termsof thepresentplanargrammarformalism.
8Furthermore,sinceShongLue Yangwasilliterate whenhefirst beganto createthescript, it is hardtoseehow hecouldhaveknown aboutthistendency to reversethelogicalorderin theregionâswriting systems.
48 CHAPTER2. REGULARITY
Therehasbeena long history of structuralanalysisfor Chinesecharactersstart-ing with the AD 200 Shuo Wen Jie ZÄą; see(Wieger, 1965) for a brief history. Oneimportantpoint in thehistoryof Chinesecharacterstudiesis thecompendiousdictio-narycompiledduring the reignof theKang-Xi emperor(r. 1661â1722):themodernclassificationof charactersaccordingto semanticradicalslargely follows theusageinthat dictionary. In moremoderntimes, therehave beenvariousgenerative analyses.Onesuchstudywasthatof FujimuraandKagaya(1969),who constructeda programthatwascapableof generating(andoutputtingon anoscilliscope),not only realChi-nesecharacters,but alsopossiblecharactersâ thatis charactersthatarenon-existent,but obey the structuralconstraintsof Chinesecharacters.Wang (1983)presentedagenerative-grammar-basedmodelof Chinesecharacterstructurethatpredictedtherel-ative placementof semanticâclassifiersâand phoneticâspecifiersâwithin characters,andalsoprovidedamodelof theactualwriting of thecharacters,with specialattentionbeingpaidto theanalysisof thestrokeorder;wewill discussWangâsanalysisin moredepthmomentarily.
An interestingstudyby Myers (1996)arguesfor the relevanceof prosodichead-ednessin Chinesecharacterconstruction.By assumingthat the structuralheadof acharacteris (dependingupontheoverall compositionof thecharacter)on thebottom,the right or thebottomright, Myers is ableto explain several robust featuresof Chi-nesecharacters:for instance,thelargestcomponentor stroketendsto beonthebottomor on the right â i.e., in the headposition; the leftmoststroke in a characterwith asignificantamountof structureto its right is curvedâ i.e., a non-headverticalstrokeis curved;thereis a strongpreferencefor semanticcomponentsto occuron theleft oronthetopâ i.e., in anon-headposition.This latterpoint is completelyin accordwithour observations,reportedbelow. Finally, Myersnotesa tendency for reducedformsof radicals(seepage50), to occuron thetop or theleft â i.e., in non-headposition.
Finally, thereis a websiteâ www.zhongwen.com (Harbaugh,1998)â whichproducesstructuralanalysesfor selectedcharacters.Oneof the pointsthat is nicelybroughtout in this websiteis the fact that Chinesecharactersaretree-structuredob-jects,andcomplex characterscanbeanalyzedinto many levels.Thusin thecharacterďż˝ y TREE+FENG } fengâmapleâ, which is analyzedat thelevel weareinterestedin asďż˝ y TREE } r fďż˝Ăy FENG } , we canfurther breakdown the righthandcomponentinto
the components and ý . Thusan exhaustive analysiswould be� r f n f$ý�¢ ; ex-
amplesmorecomplex thanthisarenothardto find (cf. theexampleďż˝ l Äąn âfish scaleâ,introducedearlier). In thepresentdiscussionwe will only beconcernedwith the toplevel, namelythecombinationof semanticwith phonetic,or semanticwith semantic,in charactersthathavesuchanalyses.
We return now to the study by Wang (1983). The analysis of semantic-phoneticcomponentplacementin Wangâs study is feature based,using the fea-tures[high], [low], [left] and[right]. A semanticcomponentsuchas ďż˝ y DOOR } ,which typically takes its phonetic componentinside (e.g. y DOOR+GUI }guÄą âdoor to womenâs apartmentsâ),is analyzedas [+high,+left,+right]. Similar-ly ďż˝ y FORCE } , which typically occurs on the right ( ďż˝ y FORCE+JING } j Äąnâstrengthâ, is analyzedas [+right]. Finally ����� ďż˝Py SURROUND } , which complete-
2.3. PLANAR ARRANGEMENTS:EXAMPLES 49
left 8,303top 1,964bottom 1,246right 882surround 159other 174
Table2.3: Distributionof placementof semanticcomponent(Kang-Xi Radical)among12,728charactersfrom theTaiwanBig5 characterset.
ly surroundsits phonetic( ďż˝ y SURROUND+HUO } guo âcountryâ), is analyzedas[+high,+low,+right,+left]. Phoneticcomponentsmay also have specifications: ďż˝zhuang as in ďż˝ y CLOTHING+ZHUANG } zhuang âpack, containâ, is [+high,ďż˝ low].Wang assumesa seriesof rules that usethe featurespecificationsto determinetheactualplacementof thecomponents:
Âź If neithercomponentis specified,a default semantic-left,phonetic-rightplace-mentis used;
Âź If onecomponentis specified,the oppositefeaturesarefilled in for the othercomponent;
Âź If bothcomponentsarespecified,andif thereis a conflict, thesemanticcompo-nentwins.
Therearevariousinterestingaspectsto Wangâsanalysis,andonthewholeit seemsto beontheright track.It is certainlytrue,for example,thatthesemantic-left/phonetic-right placementis in somesensethedefault. This is shown in Table2.3,which showsthe frequencies,for 12,728charactersfrom the Taiwan Big5 characterset,of vari-oussemanticradicalplacements.9 Oneproblemwith Wangâsanalysis,though,is thatit is too powerful. In particular, the featuralsystemhe developswould predict thatonemight have a componentthat is specified[ ďż˝ left, ďż˝ right, ďż˝ high,ďż˝ low], andwouldthusselectto bein themiddleof whatever it combineswith. However, theonly caseswheresuchplacementoccursis in fossilizedforms. Thus ďż˝ dong âeastâ is tradition-ally analyzedasbeingcomposedof ďż˝ rÄą âsun, dayâ placedin the middle of
�mu
âtree, woodâ, but ďż˝ rÄą doesnot in factselectfor this position: thereis no productivecharacter-formationprocessthatcanaccountfor ďż˝ dong. This is why thesurroundingcatenationoperatord doesnothaveadual: thenotionof âinsidecatenationâdoesnotappearto benecessary.
It alsoseemsthatthedifferencebetween� y DOOR } as[+high,+left,+right],ver-sus � ��� � y SURROUND } as[+high,+low,+left,+right], is ratherredundant:presum-ably the fact that the latter completelysurroundsits sistercomponent,whereastheformer only partially doesso, follows from the shapesof the two components.Tohandlebothcases,it oughtto besufficient to saythatthey surroundtheir sister.
9Here,I tookatfacevaluethetraditional214Kang-Xi radicals,assumingthatthesearein factthecorrectsemanticcomponentsin all cases.Occasionallythis assumptioncanbemisguided.
50 CHAPTER2. REGULARITY
left 1,745top 313bottom 313right 166surround 51
Table2.4: Distributionof placementof semanticcomponent(Kang-Xi Radical)among2,596charactersfrom theTaiwanBig5 characterset.
Wethereforedispensewith thefeaturalapproach,andpresentinsteadapreliminaryanalysisbasedon thecatenationoperatorsintroducedin this chapter. Our analysisisbasedupon2,596charactersfrom theTaiwanBig5 charactersetfor whichweknow thebreakdowninto semanticandphonetic/semanticcomponent.10 As weseein Table2.4,the relative magnitudesof the differentplacementsof the componentsis roughly thesameasfor the fuller Big5 characterset (Table2.3), so this smallersamplemay betakenasrepresentative.11
I now turn to a descriptionof the rulesdevelopedto handlethesecharacters.Inthefollowing discussion,I dispensewith thenormalglosses,exceptwherecritical, soasnot to overly clutter the text. Note that somesemanticradicalsaremarkedas fullor red (reduced): in thesecasestherearealternative forms â âfullâ andâreducedâfor the radical,andthesetwo forms behave differently. Thus ďż˝ xÄąn âheartâ hastwoforms as the radical y HEART } , onebeing moreor lessthe sameshapeas the fullcharacter(placedunderneaththe secondcomponent),andthe othera reducedthree-stroke component,as in the lefthandportion of ďż˝ mang âbusyâ (placedto the leftof the secondcomponent). We assumefor the presentthat it is part of the lexicalorthographicspecificationof a morphemewhethera particularcharacterhasthe fullor reducedvariant,thoughto someextentonecanpredictthis by the positionof theradical in the character(Myers, 1996). Which of the two optionsâ âfullâ or âredââ is markeddependsuponwhich oneis reasonablyregardedasthedefault; only thenon-defaultis markedin theglosses.Finally, asindicatedin thedescriptionsbelow, weassumethatoneof thecomponentsâ usuallythesemanticradical,but sometimesthephoneticradicalâ is the âdeterminingcomponentâ:it is the one thatwill be listed
first in the formulae,so that an expressionsuchas ďż˝ n fQďż˝ , with ďż˝ the determiningcomponent,will unambiguouslymeanthat ďż˝ occursabove ďż˝ :
(2.8) (a) The following, asphoneticcomponents,take precedenceâ i.e., deter-mine the placementof the componentsin the character:ďż˝ , , ! , ", # , $ , % , & . In othercasesthe semanticcomponentdeterminestheplacement.As notedabove, the âdeterminingcomponentâoccursto the
10Thesebreakdownsweretakenfrom theraw datausedin thewww.zhongwen.com (Harbaugh,1998)website.I amgratefulto Rick Harbaughfor makingthesedataavailableto me.
11For thepurposesof thepresentdiscussion,we eliminatedcharacters,suchas ďż˝ dong âeastâ,which donotbelongto oneof thefivecategoriesin Table2.4. Invariablysuchcharactersareold constructionsthatarenotbuilt outof their supposedcomponentsby any productive process.
2.3. PLANAR ARRANGEMENTS:EXAMPLES 51
left of thecatenationoperatorin theformulae.
(b) fďż˝' n f , if any of the following is thedeterminingcomponent:( , ) , * ,+, ! , ,.-0/1 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , " , # , $ , % , & .
(c) f:' ~ f , if any of thefollowing is thedeterminingcomponent:; , < , = ,>, ? , @ , A , B , C , ďż˝ , D , E .
(d) fF' � f , if any of thefollowing is thedeterminingcomponent:G ,� -0/�1 1 , ý,ý ,12 H , � , I.Jp¸LK , M , N , O , P -0/1 1 , ; -Q/�1 1 , � .
(e) fR' d , if any of thefollowing is thedeterminingcomponent:ďż˝ , S , T, U.��� ďż˝ ( ďż˝Ăy SURROUND } ), V
(f) OtherwisefR' r fThe currentrulesetis able to analyze88% of the 2,596characters,leaving the
remaining12%to belexically specifiedexceptions;examplesof 196of theanalyzed,and 38 of the unanalyzedcharactersare given in Appendix 2.A. Although furthertuningof the rulescouldundoubtedlyincreasethecoveragefurther, it mustbebornein mind thatsomeamountof lexical specificationis alwaysgoingto benecessary:forexample, W y EYE } and X y MANG vZY\[ďż˝wĂŤvďż˝]_^ `\a\` } cancombinein two ways,yieldingthecharactersb mang âblindâ and c mang, a variantspellingof b .
Oneinterestingobservationaboutthesedatais thatif oneconsidersonly thechar-actersfor which the phoneticcomponentis a perfectpredictorof the pronunciation(includingtone)of thewholecharacterâ 624charactersin thissetâ theaccuracy oftherulesetpresentedincreasesto 92%.Whatthismeansis thatâregularlypronouncedâcharactersarealsomoreregular in structure.This result is perhapsnot surprising:itparallelscommonlyobservedpatternsin morphologywheresemanticallytransparentandmorphologicallyproductive constructionsalsotendto be the onesthat aremorephonologicallyregular. Presumablythis increasein regularityfor phonologicallyregu-lar compoundsis usefulfor theChinesereader. As wediscusselsewhere(Section5.2),thereis psycholinguisticevidencethatChinesereadersmakeuseof thephoneticcom-ponentof characterswhenthey areuseful; in order to be ableto reliably locateandidentify thephoneticcomponent,it helpsif theplacementof this componentis moreregular, andlesssubjectto idiosyncraticlexical specifictions.In Appendix2.A, regu-larly pronouncedcharactersareindicatedwith â d â.
Oneadditionaltopic thatWang(1983)discusses,andwhichweneedto provideanaccountfor is what he termsclassifierraising: in somecasesthe semanticclassifierof thephoneticcomponentis âraisedâ to becomethe semanticclassifierof theentirecharacter. This is a relatively rarephenomenon,restrictedin part to spellingvariantsof certaincharacters.Howeverthereareacoupleof phoneticcomponentsthatseemtoundergothisprocessasamatterof course.Onesuchcomponentis thetopportionof ey INSECT+TENG f [ĂŤvÂşwg]h^ iZaji } teng âserpentâ, k y SILK+TENG f [ĂŤvďż˝wďż˝]h^ ila\i } teng âbindâor m y HORSE+TENG f [ďż˝vďż˝wďż˝]h^ iZaji } tengâmountâ. Thephoneticcomponentin eachcaseconsistsof the portion above the semanticradical, plus the lefthandcomponentn ,which is at leaststructurallythe phoneticcomponentâs own semanticradical. This
12I.e.,doubleýpo INSECT q .
52 CHAPTER2. REGULARITY
n hasbeenraisedto becomethe semanticradicalof the whole character. In orderto accountfor this in the currentformalism, we needto assumea rule suchas thefollowing,whichreassociatesthecomponentsof characterscontainingthisphonetic:13
(2.9) rRs�tvuxwFy\z|{�fj �n f ¢4ø}n r f� n f:rRst~u�wFylz�{Í¢2.3.5 A counterexamplefr om Ancient Egyptian
Thehieroglyphicwriting systemof AncientEgyptian (Gardiner, 1982;Ritner, 1996)presentsonecasethatis problematicfor regularity. In Old KingdomEgyptian,pluralnumberwas indicatedin the orthographyby the doublecopying of the baseword,or a portion thereof. The copying could be implementedin variousways includingduplicatingtheentirephonographicallywrittenwordasin (2.10),duplicatingtheentirelogographas in (2.11) or duplicatingonly the semanticclassifierof the word as in(2.12):
(2.10) nr
rnw ânamesâ
(2.11) ntr ntrw âgodsâ
(2.12)
n
h
w t
TREE nhwt trees
Theproblemlies in thefact that theEgyptianplural wasmorphologicallymarkedbysuffixation, -w with masculinenounsand -wt with femininenouns.14 This suffix -wt is actually spelledout in the orthographyin the representationin (2.12), thoughit wasnot in generalrequiredto spell out the plural suffix. Sincethe orthographicduplicationdoesnot representany linguistic duplication,it mustbe partof the map-ping betweenlinguistic representationandorthography, or in otherwordsit mustbehandledby �Pÿ���� r � . Sinceregular relationsarenot powerful enoughto handlear-bitrary copying, this orthographicpracticestandsasa counterexampleto theclaim ofRegularity.
It is interestingto notethat this copying wasonly usedduring the Old Kingdom(prior to 2240BC); by the Middle Kingdom it hadbeenreplacedby a logographic
13In thisandsimilarcases,then,weactuallyneedto delveslightly deeperthanthetop-level componentialanalysisthatwehave dealtwith in this section,andat leastrecognizeasecondtier of structure.
14Thesamepoint canbemadefor theorthographicrepresenationof dualnumber, which wasalsorepre-sentedby copying, in thiscaseasinglecopy. As in thecaseof theplural, thedualwasencodedmorpholog-ically via suffixation.
2.4. CROSS-WRITING-SYSTEMVARIATION IN THE SLU 53
symbolinvolving threestrokes(Gardiner, 1982,page59),apparentlyderivedfrom theoriginal repetition.An exampleis givenin (2.13):
(2.13) ntr Pl ntrw gods
Oneassumesthat this device wasintroducedby the scribesasanabbreviatory aid tosave themwriting (or carving) threecopiesof a setof glyphs. But it is interestingthatthis abbreviatorydevicealsorenderedEgyptianorthographymoreRegular, in theformal sensedefinedhere.
Themostsimilar exampleI amawareof in a modernorthographyis theencodingof plurality in Spanishinitials by doublingtheletterscorrespondingto thepluralnoun:thusEstadosUnidosâUnited Statesâis abbreviatedasEE.UU., andFuerzasArmadasâarmedforcesâ is FF.AA.. But althoughthis processis productive in Spanish,sinceit only involvesdoublingindividual lettersof thealphabetit is feasibleto simply listall doubledletterswith an indicationthat thesedoublesareto be usedto abbreviateplurals.(Alternatively onecouldwrite arulethatcopiesindividual lettersof thealpha-bet,andthis rule couldberepresentedasa transducer:notethoughthatthis would becomputationallyequivalentto simply listing thedoubles.)
Apart from theSpanishexample,which is easilyhandledwithin the theory, I amawareof no othercasesof suchduplicationin amodernwriting system.It is temptingthereforeto concludethat the Egyptianexamplewasmarked,asthe theorypredicts,andthattherarity of suchcasesis aconsequenceof this markedness.
2.4 Cross-Writing-SystemVariation in the SLU
All of the writing systemsthat we have examinedin the previous sectioninvolvesyllable-sizedSLUâs. This is no accident.Writing systemswherebasicglyphsareor-ganizedinto syllable-sizedunitsseemto bequiteprevalentamongtheworldâswritingsystems,ashasoftenbeennotedin the literature: for instance,Faber(1992),assignsa nodein her arborealtaxonomyof writing systemsto what shetermssegmentallycoded, syllabicallylinear scripts,which includeHankulandDevanagari.
However largerSLUâs certainlyseemto bepossible.For example,Mayanwritingappearsto have hadanSLU at thelevel of theword or smallphrase.Mayancomplexglyphsweretypically arrangedin pairedcolumnsthatwereto bereadleft to right, topto bottom(Macri, 1996).Thus,thereadingorderof thefollowing example,would beA B C D E F:
A BC DE F
The basicreadingorderis thereforeleft-to-right, with eachline of text consistingoftwo glyphs.Within eachglyph, thearrangementof basicglyphswassomewhatfreer,
54 CHAPTER2. REGULARITY
with signsrunningâloosely from upperleft cornerto lowerright corner, with generousallowancesfor artistic conventionâ (Macri, 1996,page178). The complex glyph isthereforeclearlythegraphicunit correspondingto theSLU, but whatkind of linguis-tic unit is this? Thesingleexamplepresentedin Macriâs discussion(1996,page178)suggeststhat in many casestheSLU is a singleâ potentiallymorphologicallycom-plex, usuallypolysyllabicâ word,but in somecasesit seemsto beanasmallphrase.Table2.5 lists eachof the linguistic elementscorrespondingto a singlecomplex g-lyph in Macriâssampletext. Figure2.7illustratesthetwo-wordglyphyakchâul âgreensacredâ.
element gloss sizeof elementy-akâaw he.presents wordu-pi(s) his-cycle wordpixol hat wordu-ha(l) his-necklace wordu-tup his-earrings wordyax green wordu-kawaw his-helmet wordchâok young.one wordKawil (aname) wordy-akâa-w it.is.given wordu-sakhunal his-whiteheadband (adjective-noun)phrasechantun sky stone (noun-noun)phrasehunwinik onetwenty phrase?yaxchâul greensacred (adjective-adjective)phrase?
Table 2.5: Linguistic units correspondingto Mayan complex glyphs, from (Macri, 1996);glossingconventionsfollow thoseof Macri. In somephrasalcasesit is unclearwhethertheunit in questionis really a constituent:suchcasesareindicatedwith a questionmark.
Needlessto say, a morethoroughsurvey of thedecipheredcorpusof Mayantextswill benecessaryto determinethemaximalsizeof theSLU in thatwriting system,andwhatconstraints,if any existed,on valid SLUâs.15 This small text does,however, atleastshow thattheSLU in Mayanis largerthana singlesyllable.
Not only is thesizeof theSLU writing-systemdependent,but it seemsalsoto beconstructiondependent:onefindscaseswherethecatenationoperatoris changedonlyin certainkindsof (local) constructions.Oneexampleof suchconstruction-particularreorderingis foundin AncientEgyptian.This is theexampleof âhonorific inversionâ(Ritner, 1996,page80)wherebytermsfor godsor kings wouldbewrittenbeforeterms
15Thedatathatwouldallow oneto investigatethisquestionalreadyexist aspartof theMayaHieroglyphicDatabaseProjectattheUniversityof California,Davis,Departmentof NativeAmericanStudies.I hadhopedto beableto examinesomeof thesedata,andhadonmorethanoneoccasionrequestedaccessto aportionofthedatabase.Unfortunately, theserequestshave led nowhere.Theresolutionof this questionwill thereforehave to wait until thesepotentiallyvaluabledataaremadeavailableto awider rangeof scholars.
2.4. CROSS-WRITING-SYSTEMVARIATION IN THE SLU 55
YAX
châu
li
Figure2.7: A two-word Mayanglyph representingthe adjective sequenceyax châul âgreen
sacredâ,after(Macri, 1996,page179),with thearrangement����� _ ÂŤQÂŹ ďż˝jďż˝Fďż˝Qďż˝  ��Qďż˝pÂŽ Perconvention,thecapitalizedglossrepresentsalogographicelement,andthelowercaseglossesphonographicelements.
thatthey logically follow.16 Thusthephrasemdwntr (wordsgod)âgodâswordsâwouldbewrittenasin (2.14),with thelogographfor âgodâ beingwrittenbeforetheduplicated(henceplural) phonographfor /md/:
(2.14) ntr md
This canbeaccountedfor if we assumethat theSLU canbea word or small phrase.In thatcasewe canassumethatthereis simply reversalwithin thatSLU of theoverallcatenationoperatorin usefor the text. Thus if the text is beingwritten from left toright, andthereforegenerallyusesthecatenationoperatorr f , wecanassumeaspecialprinciplelike thatin (2.15)thatin honorificcontexts implementsf as ~ f ratherthan r f :
(2.15) k�l&g�fðh�mž�°k�l&g"m ~ fQk"l&h�m , if h is to behonored.
As Ritnernotes,theEgyptianexampleis really no differentfrom the situationinmany modernwriting systemswith currency amounts.ThusconsiderEnglishexam-pleslike y $1000} for onethousanddollars or moresignificantly y $1 million } foronemillion dollars. The âsymbolic abbreviationâ (to coin a term) â$â for dollar(s) iswritten beforethe numberphrasewhich it logically follows. (Note that the currencyterm that is âmovedâ may itself becomplex: onehundredUS dollars canbe writtenas y US$100} .) Thispromptsa construction-particularstatementsimilar to (2.15)inEgyptian:
(2.16) k�l&g�fðh�mž�°k�l&g"m�~\fQk"l&h�m , if:
16Additional rearrangementsof symbolsfor artistic reasonswerealso found (Ritner, 1996). Plausiblysuchcasesaredueto stylistic considerationsandthusfall outsidetherangeof thepresenttheory.
56 CHAPTER2. REGULARITY
Ÿ g is anumberphrase,and k�l&g"m startswith a digit, andŸ h is acurrency term,and k�lphqm is asymbolicabbreviationfor thatcurrencyterm.
The first constraintcapturesthe fact that symbolicabbreviationsfor currenciesmustprecedea number expressedas a digit: one does not find such expressionsas* y $twentyfive} . Indeedthis in turn suggeststhat the true sourceof the reorderingmayactuallybeasurfaceorthographicconstraintthatrequiresthecurrency symboltoimmediatelyprecedeadigit:
(2.17) If k�lph�m is a symbolicabbreviation for a currency term h , then k"l&h�m mustpre-cedea digit.
This constraintthenforcesthecatenationoperatorto be ~\f , asin (2.16).Notethough,in any case,thattheSLU mustbedefinedfor thisconstructionto bethewholenumber-plus-currency phrase,in orderto accountfor thedeviation from thenormalordering.
Theexamplesdiscussedin this sectionandelsewherein this chaptersuggestthatthereis quitea rangeof variationin thedefinitionof theSLU acrosswriting systems,andevenwithin differentcomponentsof thesamewriting system.Wewill leaveit asatopic for futureresearchto provideacompletetaxonomyof thepossibleinstantiationsof theSLU.
2.5 MacroscopicCatenation: Text Dir ection
. . . their mannerof writing is verypeculiar, beingneitherfromtheleft to theright, like theEuropeans;nor fromtheright to theleft, like theArabians;nor fromup to down,like theChinese;but aslantfromonecornerof thepaperto theother. . .
Swift, Jonathan.1726.Gullivers Travels:A Voyage to Lilliput , chapter6.
Many of theexamplesdiscussedin this chapterrelateto themicro-arrangementofgraphicalsymbols.Naturally, in additionto specifyinghow, for example,the glyphsarrangethemselvesinto orthographicsyllables,any writing systemmustalsospecifytheoveralldirectionof thescript. As Harris(Harris,1995,chapter19) usefullypointsout,thenotionof directionis morecomplex thanit first appearsto be. In characterizingthe default directionalityof text in English,for instance,the following specificationsneedto bemade:
Âź Eachline of text is composedof glyphsarrangedfrom left to right.
Âź Eachpageof text is composedof linesarrangedfrom top to bottom.
Âź Eachmultipage documentis (whencorrectlybound)composedof pagesboundon the lefthandside.
Thesespecificationsare of coursescript-dependent.The traditional statementsforChineserun asfollows:
2.5. MACROSCOPICCATENATION: TEXT DIRECTION 57
Âź Eachline of text is composedof glyphsarrangedfrom top to bottom.
Âź Eachpageof text is composedof linesarrangedfrom right to left.
Âź Eachmultipage documentis (whencorrectlybound)composedof pagesboundon therighthandside.
In principle thethreetypesof specificationâ âlineâ level, pagelevel, anddocumentlevel â arealsologically independentof eachother. Still, asHarris notes,thereareplausiblebiomechanicalor otherreasonsfor theeliminationof somecombinations.Soonceonehasfixedoneâs scriptasrunningin horizontallines,theoptionof arrangingthoselinesfrom bottomto topseemsnotto begenerallyavailable,presumablybecausethe productionof suchtext would requireone to cover up what onehadpreviouslywritten.17 TheLilliputiansâ diagonalarrangementsof lines is presumablydisfavoredsimply becauseit would forceconstantlychangingline lengthson arectangularpage.
Similarly, multipagedocumentbindingpracticesarenot independentof thedirec-tion of thescript. If oneâsscriptrunsfrom left to right acrossthepage,thereis anaturaltendency to want to reada multipagedocumentfrom âleft-to-rightâ, whereasif oneâsscript runsfrom right-to-left acrossthepage(whetherright-to-left in horizontallinesasin Hebrew, or right-to-left in verticalcolumnsasin Chinese),thereis a tendency towantto readfrom âright-to-leftâ. So,holdinganEnglishbookwith thespinepointingaway from you, you startreadingon the leftmostpageandcontinueto the rightmostpage.For a Chineseor Hebrew book,holdingthebookin thesameconfigurationyoustartfrom therightmostpage.
Wewill havenothingfurtherto sayaboutbindingpracticeshere,but it will beuse-ful to dwell for amomenton Harrisâ secondcharacterizationof text direction,namelythe arrangementof lines on a page. We have of coursedealt with the issueof themacroscopicarrangementof symbolswithin a line (or column)of text by assuming
a singlemacroscopiccatenationoperatorsuchas r�f or n f for a givenscriptor styleofwriting within a script. Oneis thereforetemptedto dealwith thearrangementof linesof text in a similar fashion.Suchanaccountfor Englishwould runasfollows:
(2.18) (a) Ă´pĂ��^ďż˝Ăďż˝Gg v r fxg w r fÂĄĂą*ù�ù r fxg ò , for g a letter, symbolor space.
(b) ďż˝:��Î6ďż˝Qďż˝GĂ´pĂ��^ďż˝ v n fXĂ´pĂ��^ďż˝ w n f^Ăą*ù�ù n f9Ă´&Ă��^ďż˝ òSoa line would beanarrangementof basicsymbolsusing r f anda pagewould bean
arrangementof linesusing n f .Theproblemwith this accountis thatit wouldappearto violateLocality: a line of
text doesnot constitutea SmallLinguistic Unit, andthereforetheswitchfrom r f to n fwould constitutea violation. This considerationwould appearto forceanalternativeview, onewhich alsohasthe benefitof beingmoreintuitively appealing.Underthisalternativeview, text is written on a virtual tapein onedirectiononly: in Englishthiswould befrom left to right, in Chinesefrom top to bottom.This tapeis thenâpastedâ
17Similar considerationspresumablyalsoaccountfor the extremerarity (as the survey in (DanielsandBright, 1996)shows) of scriptswherecolumnsrun from bottomto top.
58 CHAPTER2. REGULARITY
to aphysicalsurface,with theinevitableconsequencewith asufficiently long tapethatonewill run out of spaceon a line (or column)andhave to wrapthe tapeto thenextcolumn.
Thereareonly threereasonablewaysto performthiswrapping.Thefirst, practicedby everymodernwriting system,involvesâcuttingâ thevirtual tape,andcontinuingonthenext line or columnnearwhereonestartedthepreviousline; see(a) in Figure2.8.Thesecondandthird methodsbothinvolve startingat thesideof thepagewhereonefinishedtheprecedingline, andheadingbackacrossthepage.In orderto do this,onemustbendthetapearound,with theimmediateconsequencethatthefaceof theglyphsmustalso be turnedaround; notethat the term âf aceâ is suggestedby Harris (1995,page132). The mostcommonway to performthis bendingis to fold the tapeoveritself so that the glyphsâ now runningin theoppositedirectionacrossthe physicalsurfaceâ areflipped aroundthe vertical axis; seediagram(b) in Figure2.8. Thisis thestandardboustrophedonwriting foundin severalancienteasternMediterraneanscripts,all of which involve this changeof facearoundthe vertical axis; seevariouschaptersin (DanielsandBright, 1996). Theotherway to bend the tapeis to twist itaroundso that the glyphsrunning in the backwardsdirectionareupsidedown: thisâinvertedboustrophedonâsystemseemsto be foundin only a handfulof scripts,onebeingtheEasterIslandrongorongoscript(Fischer, 1997b),andanotherbeingthean-cientItalianscriptVenetic(Lejeune,1974;Bonfante,1996).See(c) in Figure2.8;notethat rongorongoactuallyrunsfrom bottomto top acrossthesurface,ratherthantop
to bottomasdiagrammedhere.(Veneticapparentlyhadbothkindsof boustrophedon,eitherflipping the faceof the characterswhenswitchingdirection,or elseinvertingthem:(Lejeune,1974,pages180â181).)
It is importantto understandthatthevirtual tapemodel,andtheconsequencesthatfollow from it whichwehavejustseen,areadirectconsequenceof thetheoryadoptedhere:accordingto Locality onecannotmodelboustrophedonwriting asa line-by-lineswitchof catenationtype,any morethanonecanmodelthe arrangementof linesona pageby modelingwithin-line catenationas(e.g.) r f andacross-linecatenationas
n f . The flip of facein boustrophedonsystems,which follows from the virtual tapemodel, is thus effectively forced by the theory. A boustrophedonsystemthat doesnot involve a changeof facewould thusbe a problemfor the theory. It is thereforeinterestingto notethatsuchsystemsdo not appearto becommon,if indeedthey existat all: variousauthors,including Harris (1995)allude to the existenceof suchnon-flipping boustrophedonsystems,yet I have beenunableto find specificexamplesofthis phenomenon.
Nonetheless,onemustpoint out thathigherlevel script-dependentgeneralizationsareoften muchfreer thanthe micro-constraintswe have discussedelsewherein thischapter. Thus,with the exceptionof the relatively few Chinesecharactersthat havealternateforms, thereis generallyonly oneway to arrangethe componentswithin aChinesecharacter. Ontheotherhand,modernChineseallowstwo schemesfor arrang-ing the symbolson the page,the traditionalonedescribedabove, and the Western-influencedleft-to-right/top-to-bottomarrangementfound in English; the samefacts
2.5. MACROSCOPICCATENATION: TEXT DIRECTION 59
a)
This is a test of the emergency broadcast
system. This is only a test. Had this been a
real emergency, you would have been
This is a test of the emergency broadcast
system. This is only a test. Had this been a
real emergency, you would have been
c)
This is a test of the emergency broadcast
system. This is only a test. Had this been a
real emergency, you would have been
b)
Figure2.8: Threemethodsof wrappingthevirtual tape:(a) standardnon-boustrophedon;(b)boustrophedon;(c) invertedboustrophedon.
60 CHAPTER2. REGULARITY
holdfor otherEastAsianscripts.18 Onceonemovesoutof thedomainof printedtexts,otherarrangementsoften becomepossible. Shopsignsin Englishmay be arrangedwith thecharactersrunningfrom top to bottom,for instance,andmorenovel arrange-mentsarepossiblein othercontexts, as long assomenotion of sequentialityis pre-served.Thislooseningof constraintsasonemovesfrom themicroto themacrolevel ishardlysurprisingandhasanexactanalogin linguisticstructure:thesyntacticpossibil-ities for combiningmorphemeswithin wordsaregenerallyhighly constrainedacrosslanguages,andin many âfix edword orderâ languagesthe possibilitiesfor rearrange-mentsof wordsor phraseswithin sentencesarealso limited. Beyond the sentence,however, the interrelationbetweenunits (sentences,paragraphs,turns in a dialogue,etc.) is muchmorelooselyconstrainedby purelyformal linguistic considerations,andmuchmoregovernedby considerationsof how languageis used.Similarly, themacrolevel of written text is constrainedby considerationsof usageasmuchasby formalorthographicconstraints,a point thatHarrisâ (1995)discussionbringsoutnicely.
18A thirdscheme,aSemitic-styleright-to-left/top-to-bottom arrangementisalsofoundin Chinese,thoughperhapsnot ascommonin ordinarytext astheothertwo.
2.A. CHINESECHARACTERS 61
2.A SampleChineseCharactersand their Analyses
The following pagescontaina randomlyselectedsampleof 194 Chinesecharactersthatareaccountedfor by (2.8),andanadditional38 thatarenot. In theanalyses,thedeterminingcomponentis listedfirst, after theequalssign,andbeforethecatenationoperator. Thesemanticcomponentis boxed.Thesymbolâ ďż˝ â markscharacterswherethe independentpronunciationof the phoneticcomponentis a perfectpredictor(in-cluding tone)of the pronunciationof the complex character. This is perhapsa morestringentdefinition of phonologicalregularity thanis strictly speakingnecessary, butit hastheadvantageof beingeasyto compute.
Notethatwhatis listedfor theincorrectlyanalyzedcharactersis thecorrectanaly-sis: thepredicted(but incorrect)catenationoperatoris shown in parenthesesaftereachexample.In theseexampleswe list thesemanticcomponentuniformly first sinceit isunclearin many caseswhatthedeterminingcomponentis.
62 CHAPTER2. REGULARITY
CorrectlyAnalysedCharacters
����� � ���������� �
����
����� ďż˝ ��¥��¢�� ďż˝ ďż˝ ��£��¤�� ÂĽ ďż˝ ��Œ��§�� ďż˝ ����¨��Š�� ÂŞ ďż˝ ������ Â
Ž��¯
��°�� ďż˝ ����¹��²�� ďż˝ ďż˝ ��³��´�� ÂŞ ďż˝ ��¾��œ�� ÂĽ ����¡��¸�� Â
Ž��š
��º�� Âť Ÿ���½��ž�� Âż ďż˝ ��Ă��Ă�� Ă ďż˝ ��Ă��Ă�� Ă ďż˝ ��¥��à �� à ����Ă��Ă�� ÂĽ ďż˝ ��Ă��Ă�� Ă ďż˝ ��Ă��Ă�� Â
Ž��Ă
��Ă�� ÂŽ��³
��Ă�� Ă ďż˝ ��Ă��Ă�� Ă ďż˝ ��Ă��Ă�� ĂĂĂQĂ�à Ă
��ĂĂ
��Ă�� Âż ����Ă��à �� Âż ďż˝ ���� ĂŁ ďż˝ ��ä��ü�� ĂŚ ďż˝ ���� ĂŠ ������� ĂŠ ����Ï
��Ă�� ÂĽ ���ïÎ��ð�� à ���ïù��ò�� ĂŚ ďż˝ �ïó��ô�� Ă ďż˝ �ïþ��Ü�� á Âź �ïø��Ú�� Ăş
Ž�FÝ
���� Ă ďż˝ �ïý��Þ�� Ă ďż˝ �ïÿ����� ďż˝ ���������� ĂŠ ďż˝ �������� ďż˝ ������� ���������� ďż˝ �������� Ă ďż˝ �������� ďż˝ ���������� ďż˝ �������� Â
Ž���
����� � ���������� � Ÿ �������� Ê � �� ��!�� ¼ � ��"��#�� ú
Ž��$
��%�� � ����&��'�� ( ����)* � � � �,+- � � � ��. � � � �0/1 � à � �,23 � 4 � �,56 � ª � �,78 � ª � �,9: � ¼ � �,2; � � � �,<
2.A. CHINESECHARACTERS 63
= � >@?BA C DFEG � 4 � ��HI � J � ��KL � ª ����MN � ª ����OP � ¼ � ��QR � ¼ � �ïS � ¼ � ��TU � ¼ ���V/W � X
Ž�ZY[ �  Ÿ���\] � à � ��^_ � ª ����`a � b ����Tc � à � ��de � f � ��gh � � � ��9
i � \��kjl � m � ��no � � � ��pq �  Ÿ��ïÿr �  Ÿ ��st � 4 � �Vuv � ( ����w
x ďż˝ Ă Ă0Ă�à Ă��zy
{ ďż˝ ÂŞ Ă0Ă�à Ă��}|~ ďż˝ ÂŞ ďż˝ ���� ďż˝ Âż ������ ďż˝ Âż ������ ďż˝ Âż ďż˝ ���� ďż˝ ÂĽ ďż˝ ���� ďż˝ ďż˝ ďż˝ ���
ďż˝ ďż˝ f ďż˝ ďż˝,�� ďż˝ Ă ďż˝ ďż˝,�� ďż˝ Â
Ž�}�� � � �,�� � � � �,�� � à � �,�� � 4 � �,�� � ( � �,�� � à � �,�� � ª � �,�� � � �,¥¢ � à � �,£
¤ � úŽ�}¼Œ � à � �,§¨ � � �,Šª � � �,� � à � �, � à � �,Ž¯ � ª � �,�° � ª � �,¹² � ¼ � �,³´ � ¾ � �,p
œ � úŽ�}¡
¸ � úŽ�}šº � � �,p
Âť ďż˝ \��}Ÿ½ ďż˝ ďż˝ ďż˝ ďż˝,ž¿ ďż˝ ďż˝ ďż˝ ďż˝,gĂ ďż˝ ďż˝ ďż˝ ďż˝,ĂĂ ďż˝ Ă D ¿à � ďż˝ ďż˝ ďż˝,Ă Ă ďż˝ Ă ďż˝ ďż˝,ĂĂ ďż˝ Ă ďż˝ ďż˝,Ă
64 CHAPTER2. REGULARITY
Ă ďż˝ ĂŠ ����ĂĂ ďż˝ 4 ����ĂĂ ďż˝ 4 ďż˝ ��ĂĂ ďż˝ ÂŞ ďż˝ ��ĂĂ ďż˝ Âż ďż˝ ďż˝VĂĂ ďż˝ ÂĽ ����ĂĂ ďż˝ Ă@ĂĂĂĂĂ
���ĂĂ ďż˝ Ă ďż˝ ��ĂĂ ďż˝ Ă ďż˝ ��åâ ďż˝ ĂŚ ďż˝ ���ã ďż˝ ä ���ïóü ďż˝ ďż˝ ����OĂŚ ďż˝ ç ďż˝ ��èÊ ďż˝ 4 ����ê
ĂŤ ďż˝ ĂŹÂŽďż˝ZĂĂŽ ďż˝ ÂŞ ����ĂĂŻ ďż˝ Âż ďż˝ ��ðù ďż˝ Âż ďż˝ �ïÎ
ò ďż˝ ÂÂŽďż˝Zóô ďż˝ ĂŚ ďż˝ ��þÜ ďż˝ ďż˝ Âź ��áø ďż˝ Ă ďż˝ ��Úú ďż˝ ÂŞ ����ÝßFĂ˝ Ăž ÿ����
�Fý � ÿ�����Fý � ÿ ��Fý � ÿ���� Fý � � ����Fý � � ���� ý � ÿ ����Fý � ÿ ����Fý � � ���
�Fý � !�#"$Fý � !�#%&Fý ' ÿ!�#()Fý+* � � ,-Fý . ÿ �#/0Fý 1 ÿ �#23Fý � ÿ!�#45Fý 6 � �879Fý 6 � �8:;Fý � ÿ!�#<=Fý > ÿ �#�?Fý @BADCFE E G �8HIFý 1 ÿ!�#JKFý � � �8LMFý 6 � �8NOFý > ÿ!�#PQFý R !�#STFý U ÿ �#VWFý � ÿ!�#XYFý Z �#[\Fý 6 � �8]^Fý _ ÿ �#`aFý � ÿ!�#bcFý 1 ÿ!�#`dFý 1 ÿ �#efFý g ÿ �#MhFý R !�#ijFý � ÿ �#klFý monFp q rtsuFý v ÿ �#wxFý y ÿ!�#z{Fý+| � � }
66 CHAPTER2. REGULARITY
IncorrectlyAnalysedCharacters��� ý � G ���
(ďż˝ ďż˝)
��� ý � ÿ��#�(G �)
��� ý � G ���(ÿ �
)��� ý v � ���
(Ăż!ďż˝
)��� ý , G ���
(Ăż!ďż˝
)��� ý , G ���
(Ăż!ďż˝
)��� ý � G ���
(Ăż!ďż˝
)�Fý ' G ���
(Ăż ďż˝
)�Fý ' G ���
(Ăż ďż˝
)�Fý ' G ���
(Ăż ďż˝
) Fý . G ��¥
(Ăż ďż˝
)¢Fý £ ÿ ��¤
(G ďż˝)
¼Fý Œ G ��§(ÿ �
)¨ ý v G ��Š
(Ăż ďż˝
)ªFý � G ��
(ÿ��
)Fý  G ��Ž
(ÿ��
)¯Fý g G ��°
(ÿ��
)¹Fý v � ��²
(ÿ��
)³Fý � � ��´
( ��
)¾Fý ´ G ��œ
(Ăż ďż˝
)¡Fý . � ��¸
(Ăż ďż˝
)šFý Œ ��º(ÿ��
)Fý  G ��R
(ÿ��
)ŸFý � G ��½
(ÿ��
)žFĂ˝ Âż ��Ă
(ďż˝ ďż˝)
ĂFĂ˝ ďż˝ Ăż ��Ă(G ďż˝)
ĂFĂ˝ v ďż˝ ��Ă(Ăż ďż˝
)Ă FĂ˝ à ��Ă(ÿ��
)ĂFĂ˝ Ă G ��Ă
(ÿ��
)
ĂFĂ˝ ďż˝ Ăż!ďż˝#Ă(G ďż˝)
ĂFĂ˝ v ďż˝ ďż˝8Ă(Ăż ďż˝
)ĂFĂ˝ Ă G ďż˝8Ă
(ÿ��
)ĂFĂ˝ . ďż˝ ďż˝8Ă
(Ăż ďż˝
)ĂFĂ˝ ďż˝ G ďż˝8Ă
(Ăż ďż˝
)ĂFĂ˝ g G ďż˝8Ă
(Ăż ďż˝
)ĂFĂ˝ ďż˝ G ďż˝8/
(Ăż ďż˝
)ĂFĂ˝ > G ďż˝8Ă
(Ăż ďż˝
)ĂFĂ˝ Ă G ďż˝8g
(Ăż ďż˝
)
Chapter 3
ORL Depth and Consistency
In this chapterwe addressthesecondof thetwo proposalspresentedin Section1.2.3,namelyConsistency. We will first examinetheorthographiesof RussianandBelaru-sian,which form a near minimal pair from thepoint of view of thelevel of theORL.We will show thatasimplecoherentanalysisof thetwo systemscanbeobtainedif weassumethat in eachcasethe ORL is a Consistentlevel, the only differencebetweenRussianandBelarusianbeingthedepthof thatlevel.
We thenturn our attentionto English. In light of theanalysisof RussianandBe-larusian,how deepis the ORL for English,andcanoneassumeConsistency for theORL?Theevidencewe will examinesuggeststhatConsistency is possible.Not sur-prisingly, theanalysisis simplerif we assumea relatively deepORL, thoughperhapsunexpectedlythe evidenceis not asclearcutasin the caseof RussianversusBelaru-sian.
An apparentcounterexampleto Consistency is foundin theorthographicrepresen-tation of obstruantsin Serbo-Croatian:thesedataarediscussedin Section3.3, anddatafrom a smallphoneticexperimentarepresentedthatsuggestthat in fact thedataarenot a counterexample,but ratheroffer supportfor Consistency.
Anotherpotentialproblemfor Consistency wouldbeevidencethatthespellingof awordneedsto beconstructedin acyclic fashion,sincethiswouldseemto suggestthattheremight in effect be several âORLâsâ for a givenmorphologicallycomplex word,onefor eachcycle. A potentialexampleof this is discussedin Section3.4.
Finally, aswe discussedin the introduction,we assume,similar to Nunn (1998),that ĂoĂĂĂà à ÿâå can be split into two components,which we have termed Ăoãüä8ÌèçêÊÏÍand Ă+ĂĂŻĂŽ Íèðùð . Onecomponentof the latterarewhatwe cantermsurfaceorthographicconstraints, andfor lack of a betterplaceto discussthem,we turn to shortdiscussionof this topic in Section3.5.
67
68 CHAPTER3. ORL DEPTHAND CONSISTENCY
3.1 Russianand Belarusian Orthography: A CaseS-tudy
Oneway to illustratethe functionality of the proposedmodelis to comparetwo lan-guagesthat have similar phonologies,but select different levels for the ORL. Analmostidealpair of languagesfor this purposeis RussianandBelarusian.Theselan-guagessharemany phonologicalfeatures,including strongvowel reductionin un-stressedsyllables,andpalatalizationassimilationin consonantclusters. The ortho-graphicrepresentationof thesephonologicalphenomenais, however, quitedifferentinthetwo writing systems.
3.1.1 Vowel reduction
Oneway in which theRussianandBelarusianorthographiesdiffer is in thetreatmentof the vowel reductionprocessknown in the Slavic literatureby the namesakanâje,ikanâje andjakanâje. We have alreadyseenaninstanceof akanâje â reductionof /a/and/o/ â in theexampleò8óþôÜó�áÜøúÚ gorodaĂť /gĂź r ýÿÞda/âcitiesâ in Section1.2.1.Ikanâjeinvolvesthereductionof /(j)e/ and/(j)a/ aftersoftconsonants1 to / ďż˝ /: two examplesare������� Ăš jazyk Ăť /j ďż˝ Ăžz ďż˝ k/ âlanguageâ,and ďż˝ ô�� þó8á Ăš perevodĂť /p,ďż˝ r, ďż˝ Ăžvot/ âtranslationâ.(We usea â,â to denotea palatalizedconsonantin the phonetictranscription.) Thedetailsof Russianakanâje/ikanâjearewell-known (Wade,1992,pages5â7):ďż˝ In pretonicposition(i.e. in the syllableprecedingthe lexically stressedsylla-
ble), word-initial in anunstressedsyllable,or word-final in anopenunstressedsyllable,underlying/o/ and/a/ (afterhardconsonants)arereducedto / Ă˝ /.ďż˝ In all otherunstressedsyllablesunderlying/o/ and/a/ (after hardconsonants)arereducedto / Ăź /.ďż˝ Underlying/(j)e/ and/(j)a/ (aftersoftconsonants)arereducedto / ďż˝ / in unstressedsyllables.
Russianrepresentsneitherreductionprocessin its orthography, andso it seemsrea-sonableto suppose,asis typically done(cf. again(Cubberley, 1996)),thatRussianor-thographyis morphological in thesensethatit representsanunderlyingphonologicallevel â UL, thoughthis doesnot necessarilyrepresentthe mostabstractphonologi-cal level onecouldposit. The tablein (3.1) givesthe levelsof representationfor thewordsfor âcitiesâ andâtranslationâ. Here,andelsewhere, ďż˝ denotestheactualsurfacephonemicrepresentation(i.e., thepronunciation):
(3.1)âcitiesâ âtranslationâ
ORL ( � UL) goroÞda pereÞvod� òÿóþô ó8á ø � ô�� þó8á� gß r ýÿÞda p� r � Þvot1Russianand Belarusianphonemicallydistinguishsoft or palatalizedconsonantsfrom hard or non-
palatalizedconsonants.
3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 69
A setof spellingrulesâ à ãüä�Ìèç ĂŠĂŹĂŤ â includingtherulesin (3.2)(copiedin partfrom(1.5)) is sufficient to accomplishthismapping:2
(3.2) g � ò Ú g Ýo � ó Ú o Ý������������! #"�$&%('*)�%,+r � ô Ú r Ýd � á Ú d Ýa � ø Ú aÝ������������! #"�$&%('-),%,+p � Ú p Ýe � úÚ eÝ������������! �+v � Ú v Ý
Notetherestrictionson therewrite of vowels: /o/ and/a/ appearas ó Ú o Ý and ø Ú aÝonly after hardconsonants( � �������! #"�$&%,'*),%�+ ); andin the majority of Russianwords,/e/ appearsas Ú eÝ after all consonants,whereasin syllable-initial position/e/ (asopposedto /je/) appearsasthe non-palatal. , which we will notatehereas Ú eÝ (seealsoSection3.5).
Belarusianalsohasakanâjeandikanâje â thelattercalled jakanâje, thatbehavevery similarly to their Russiancounterparts.However, unlike the situationin Rus-sian, Belarusianorthographygenerallyreflectstheseprocesses;see(Carlton, 1990,pages299â301).Therulescanbestatedasfollows(following Carlton):ďż˝ In pretonicposition,or word-initial in anunstressedsyllable,underlying/e/and
/o/ arereducedto /a/.3ďż˝ In all otherunstressedsyllablesunderlying/o/ and/e/ (after hardconsonants)arereducedto /a/.
(We returnat the endof this sectionto the caseof non-pretonicunstressed/e/ aftersoftconsonants.)Examples(from (Krivickij andPodluzhnyj, 1994,pages15,22))are /ďż˝0ďż˝ Ă´tĂš vecerĂť / Ăžv,et 1ďż˝" er/ âwind (noun)â, versus ďż˝(2 Ă´ ďż˝ Ăš vjatry Ăť /v,aĂž tr ďż˝ / âwindsâ;3 óþò i4 Ăš nogiĂť / Ăžnogi/ âfeetâ, versus3 ø�ò8ø Ăš nagaĂť /naĂžga/âfootâ; and 0!. ò54Üø Ăš ceglaĂť/ Ăž t 1 eagla/âbrickâ, versus0à ø�ò#4 ďż˝,3�� Ăš cagljany Ăť /t 1 aĂžgl,an
ďż˝/ âmadeof brickâ.
Similar to thetablegivenfor theRussiancases,we canassumethe tablesin (3.3)for theBelarusianexampleswe have just discussed.In this casetheORL reflectstheapplicationof vowel reduction,and ďż˝ is effectively thesameastheORL:5
2While we expressthe ruleshere,andelsewhere,asa setof orderedrewrite rules, thereis, often nocrucialorderingto suchrules.Whenorderingis notcrucialthey arebestviewedasasetof paralleltwo-levelrulesin thesenseof (Koskenniemi,1983).
3It is unclearwhetherthis is really /a/ or somethingmoreakin to / Ă˝ /, detailedphoneticdescriptionsofBelarusianakanâjehaving provedelusive.
4TheRussiansymbol 687 i 9 , is notusedin Belarusian.Notealsothat ò , representedphonemicallyhereas /g/, is actuallyavoicedfricative, oftentransliteratedas/h/ (WaylesBrowne,personalcommunication).
5On the changefrom /t/ to /t : / in the form /;0ďż˝ Ă´<7 vecer9 âwindâ, which is also reflectedin theorthography, seeSection3.1.2. In the underlyingrepresentationswe assumean underlying/t/ for surface/t : /.
70 CHAPTER3. ORL DEPTHAND CONSISTENCY
(3.3)âwindâ âwindsâ
UL Þv,et,er v,eÞ tr �ORL Þv,et1 er v,aÞ tr �� /�0� ô �,2 ô �� Þv,et1 er v,aÞ tr �
âfeetâ âfootâUL Ăžnogi noĂžga
ORL Þnogi naÞga� 3 óþò i 3 ø�ò8ø� Þnogi naÞga
âbrickâ âmadeof brickâUL Ăž t,egla t,eĂžgl,an
�ORL Þ t 1 egla t 1 aÞgl,an
�� 0!. ò54Üø 0à ø�ò#4 �(3��� Þ t 1 egla t 1 aÞgl,an�
Thespellingrulesnecessaryto mapfrom ORL toďż˝
for Belarusianincludethosein (3.4)
(3.4) v � Ú v Ýe � Ú eÝ=�����������! #">��%('*)�%,+t � 2 Ú t Ýt 1 � 0 Ú c Ýr � ô Ú r Ýa � � Ú ja Ý��,� �������! 5"?��%,'*),%�+� � � Ú y Ýn � 3 Ú n Ýo � ó Ú o Ýg � ò Ú g Ýi � i Ú i Ýa � ø Ú aÝ=�����������! #"�$&%('*)�%,+e � . Ú eÝ@�����������! #"�$&%('*)�%,+l � 4 Ú l Ý
The encodingof consonantsandmostvowels is identicalto that in Russian:the on-ly differencesevident in thesecasesareencodingof /e/ following hardconsonants,which is .úÚ eÝ in Belarusian(which is generallydisallowed exceptin syllable-initialpositionin Russian),andthedifferentsymbolusedfor /i/.
It is worthnotingthat,at leastin someversionsof Belarusian,akanâjeandjakanâjeoccursnot only within wordsbut alsowithin clitic groupsâ andis likewisereflectedin the orthography. Thus in the text of (Lyosik, 1926),onefinds examplessuchasthe following, involving the negative clitic /ne/, in many environmentswritten as 3 Ăš neĂť , but in the jakanâjeenvironmentwritten as 3�� Ăš njaĂť , evidently reflectingthe
3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 71
pronunciation/n,a/.Contrasttheexamplesin (3.5a)with thosein (3.5b):6
(3.5) (a) 3��A2 ´óB4ďż˝C ďż˝ i Ăš nja tolâki Ăť ânot onlyâ page75,833�� ËD,E ´ Ăš njawseĂť ânot allâ page853�� Äą F ø#0ďż˝0à ø Ăš njapÄąsaccaĂť âis not writtenâ page109
(b) 3 HG#ø�ò54 Äą Ăš nemaglÄą Ăť âthey werenot ableâ page823 E 4 ø# ´�,3ďż˝E�� i Ăš neslavjanskiĂť ânot Slavicâ page983 HI/ďż˝4Üø�ô ´D,E�� i Ăš nebelaruskiĂť ânot Belarusianâ page95
However, themorerecentdiscussionin (Krivickij andPodluzhnyj, 1994,page22)explicitly deniesthat Lyosikâs examplesarecorrect,andcontraststhebehavior of 3 Ăš neĂť ânotâ and I/ ďż˝ Ăš bezĂť âwithoutâ asseparateclitics andasprefixes:
Regularly written with [ Ăš eĂť ] are the particle 3 [ Ăš neĂť ] and theprepositionI/ ďż˝ [ Ăš bezĂť ]; if, however, they appearasprefixes,thentheyobey thegeneralrule of jakanâje: I/ ďż˝ 4ďż˝J á ďż˝ Ë6 [ Ăš bezljudzejĂť âwithoutpeopleâ],but I ��� 4ďż˝J á 3�� [ Ăš bjazljudny Ăť âunpopulatedâ],3 ďż˝FKG ø 2 [ Ăš nesmatĂť ] ânot muchâ,but 3�� FKG#ø 2 [ Ăš njasmatĂť âa littleâ] . . .
Evidentlythis differenceis dueto a changein Belarusianorthographicnorms since Lyosikâs day (rather than being due to an actual changein the lan-guage).7 Krivickij andPodluzhnyjâs choiceof wording â âregularly written with . . . â (â Ă´ďż˝ ò D 4 ďż˝ Ă´ 3 ĂłLďż˝6ďż˝F D(2BE;ďż˝NM Ă´ďż˝ ďż˝ â) â certainlysuggeststhis possibility. Ifthis is the case,andthis doesreflecta spellingreform in Belarusian,it is interestingto notethatit is, consistentwith themostrecentDutchspellingreform,whichwe willdiscusslateron in Chapter6, a reformthatfavorsmorphological,ratherthanphoneticregularity: asa resultof this orthographicprinciple, 3 Ăš neĂť ânotâ and I/ ďż˝ Ăš bezĂťâwithoutâ arespelledthe same(at leastwhenthey areusedasseparatewords),eventhoughthey maychangein pronunciation.However this maybe,themostdirect im-plementationof the versionof Belarusiandescribedby Krivickij andPodluzhnyj intermsof our model is to assumethat bezâwithoutâ andne ânotâ aresimply lexicallymarkedto alwaysbespelledwith úÚ eĂť .
Over andabove the spellingconventionsfor 3 Ăš neĂť and I/ ďż˝ Ăš bezĂť , andde-spiteBelarusianâs generaltendency to have a shallow, phonemically-basedorthogra-phy, therearea few lexical exceptionsto theorthographicconventionsfor akanâjeandjakanâje. Krivckij and Podluzhnyj note that unstressedĂš eĂť is written in á ďż˝ ďż˝ ďż˝POø 2/ďż˝ Ăš dzevjaty Ăť âninthâ (from á ďż˝ ďż˝ ďż˝ 0ďż˝C Ăš dzevjacâ Ăť ânineâ), á ďż˝ E;ďż˝,2/ďż˝ Ăš dzesjaty Ăťâtenthâ (from á ďż˝ E�� 0ďż˝C Ăš dzesjacâ Ăť âtenâ), andin someothernumerals.And etymo-logical Ăš o Ăť can be found in unstressedsyllablesin loan words: 2/ďż˝ G#Ăł 4 ¨ ò i M,3��
6Nonetheless,about25% of thirty oneexamplescollectedfrom Lyosikâs text seemto be inconsistentwith the correctapplicationof jakanâje for /ne/. For examplehe uses
3�� I ďż˝ 4 ĂłQ7 nja bylo 9 âwasnotâ
(page63). Stressmustbeonthesecondsyllableof I ďż˝ 4Üó�7 bylo 9 âwasâ (asit is in Russian),asevidencedby non-applicationof akanâjeto the/o/. Therefore,the/e/ of /ne/shouldin principlebewritten as R7 e9 ,sinceit is not in apretonicsyllable.
7It seemsplausiblethatLyosikâsorthographicschemewasin factexperimental,sinceBelarusianorthog-raphyhadprobablynotbeenstandardizedwhenLyosikwrote(ElenaPavlova,personalcommunication).
Also, it seemsthat thespellingsystemassumedby Krivckij and Podluzhnyj wasimposedby decreebyStalinin 1933,replacinganearlierpopularspellingsystem(Maksymiuk,1999).
72 CHAPTER3. ORL DEPTHAND CONSISTENCY
Ăš etymoljogicny Ăť âetymologicalâ. Suchcasesmustpresumablysimply be lexicallymarked. For example,for the word ânineâ, we canassumea (partial) lexical repre-sentationas in (3.6a),whereonly the Ăš eĂť is lexically specified. (Recall that theorthographicspecificationsin the Russianexamplein (1.4) were redundant: we as-sume,in fact, that the orthographicrepresentationof òÿóþô Ăł8á ø Ăš gorodaĂť âcitiesâ isregularly derived.) The lexical specificationwill thencarryover to thederivedformâninthâ, asshown in (3.6b):
(3.6) (a) SPHON T�Þ UBV;WYX*Z\[�"^]#_a`Y"^bORTH c� X�d e
(b) SPHON T-U V W\X Z Ăž [ďż˝"f]B_g'hbORTH c; X d e
Theremainingcasesof potentialjakanâje,namelycaseswhere/e/ occursin non-pretonicunstressedsyllablesaftera palatalconsonant,areof uncertainstatus.In suchcases,Carltonnotes(1990,page300)thatBelarusianâspecialistsdiffer . . . [s]omerec-ommend[ing]âa asthecorrectpronunciation. . . â, othersrecommending/e/,or âevena vowel betweenâa andâeâ. While thetwo unequivocalinstancesof akanâje/jakanâjearereflectedin Belarusianspelling,this latterinstanceis not. Thespellingthatis cho-senâ Ăš eĂť or ďż˝ Ăš ja Ăť â dependsuponthe spellingof the vowel in the root inquestionin a form wherethat vowel is stressed.Thus(to useCarltonâs examples),wehave 4ďż˝ E Ăš lesĂť âforestâ, 4 ďż˝,E;3 i ďż˝ Ăš ljasnÄąk Ăť âwoodsmanâ,but 4 E;3 i ďż˝ Äą Ăš lesnikiĂťâwoodsmanâ;but 0 ďż˝ ijďż˝ ø5 8ø 2/ďż˝ Ăš tsjazkavatyĂť âsomewhat heavyâ, derived from 25OE;ďż˝,ijďż˝ ø Ăš tsjazkaĂť âheavilyâ. In any case,the pronunciationof the boxed vowels isthesame,though,aswe have noted,Belarusianspecialistsdiffer asto what it shouldbe. On thefaceof it, then,we would appearto have a casewhereBelarusianspellingbehavesmorelike that of Russian,in representingan underlyingratherthansurfacevowel, somethingthatwouldappearto bein directviolationof Consistency.
However, a possiblesolution to this dilemmasuggestsitself: supposethat thejakanâje of non-pretonicunstressedpost-palatal/e/âs â call it âjakanâje-Bâ â is adifferent processthan the remainingcasesof akanâje/jakanâje(â(j)akanâje-Aâ), andsupposefurtherthatjakanâje-Boccurslaterthan(j)akanâje-A.Thenonecouldassumethat theORL for Belarusianrepresentsa stageat which (j)akanâje-A hasapplied,butbeforejakanâje-B hasapplied. What lendsplausibility to this suggestionis precise-ly the disagreementamongBelarusianspecialistsasto what thevowel in suchcasesshouldbe,which contraststo their (apparent)agreementaboutall otherinstancesofakanâjeandjakanâje. If this disagreementreflectsa truephoneticvariationin the im-plementationof jakanâje-B â one that is not in evidencefor (j)akanâje-A â thenit is quite possiblethat thesedo in fact representtwo stagesof vowel reduction,one((j)akanâje-A) which is firmly rootedin thephonologyof the languageandtheother,
3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 73
which is lessfirmly establishedandwhich is subjectto morevariationacrossspeaker-s.8 Furtherevidencefor this positioncomesfrom thehistoryandpresentdistributionof akanâjeand ikanâje in Russiandialects(Avanesov, 1974,andElenaPavlova, per-sonalcommunication). Ikanâje definitely postdatedakanâje in Moscow dialectsofRussian,duein part to the fact that the distinctionbetweenhardandsoft consonantshadnot stabilizeduntil quite late (14thcentury). In modernRussiandialectsthereisstill a greatdealof variationin ikanâje, comparedwith akanâje,suggestingthatevenin Russianthe two processesmay be at different levels of the phonology. The ulti-matecorrectnessof this suggestionnecessarilyawaits further study, but if it canbemaintained,thenthesefactsdo not constitutea counterexampleto Consistency.
3.1.2 Regressivepalatalization
AnotherdifferencebetweenRussianorthographyandat leastsome versionsof Be-larusianorthographyis in thetreatmentof regressivepalatalizationof consonants.
In Russian,a dentalor alveolar consonantbecomespalatalizedif the followingadjacentdentalor alveolar consonantis also palatalized. More specifically(Wade,1992,pages9â10):9ďż˝ Dentalstops(/t/, /d/, /n/) becomepalatalizedbeforeapalatalizeddentalor alve-
olar: thus/dn,i/ âdaysâ is pronounced/d,n,i/ďż˝ Alveolarfricatives(/s/, /z/) followedby a palatalizeddentalstop,alveolarfrica-tiveor lateral: thus/v ýþÞzn,ik/ âaroseâis /v ýþÞz,n,ik/
We mayassumefor thesake of concretenessthat theassimilationinvolvesspreadingof thefeature[ ďż˝ high] within thesequenceof consonants.10
While palatalizationis marked in the orthographyof Russian,thereis no specialmarkof thespreadingitself. That is, thefinal consonantof theclusteris orthographi-cally markedaspalatal,eitherby virtueof its occurringbeforeoneof theâsoftâ vowelsâ Ăš (j)e Ăť , 6 Ăš i Ăť , J Ăš ju Ăť , ¨�Ú jo Ăť or ďż˝ Ăš ja Ăť , or elseexplicitly usingthesoftsignC Ăš â Ăť . But consonantsinternalto theclusterarenot markedaspalatal.11 Thusin theword E;2 C Ăš jestâ Ăť âis, areâ, thefinal /t/ is orthographicallymarkedaspalatalwith thesoft sign C , but in fact theentire/st/ clusteris palatal:/jes,t,/. We assumethetablein(3.7) for this case:
8One might suppose,along theselines, that (j)akanâje-A is a lexical phonologicalprocess,whereasjakanâje-Bis apostlexical or phoneticprocess.
9Wayles Browne notes(personalcommunication)that this spreadingof palatalization acrossden-tal/alveolarconsonantclustersis a featureof olderdialects,andis becominglessprevalentin contemporaryRussian.
10Notethoughthatwordsadmitof alternatepronunciations,includingsomecasesthatdo not fall undertherubricof thetwo classeslistedabove: theseincludecaseslike /dv,er,/ âdoorâ, whichmaybeeither/dv,er,/or /d,ver,/. Many suchexceptionscanbefoundin (Avanesov, 1983).
11Consonantsinternalto clusterscanbemarkedwith asoftsign,but in thatcasethey arelexically palatal,andthishasnothingto dowith theassimilationprocessthatwearediscussingnow: anexampleis
Eďż˝D á�C5I8ø7 sudâba9 âf ateâ
74 CHAPTER3. ORL DEPTHAND CONSISTENCY
(3.7)âis, areâ
ORL ( ďż˝ UL) jest,ďż˝ E;2 C Ăš jestâ Ăťďż˝ jes,t,
Palatalizationandregressivepalatalizationin Belarusianis on thewholesimilar tothat of Russian(see(Krivickij andPodluzhnyj, 1994,pages55â57)),but thereareacouple of notabledifferences.Onedifferencein thepalatalizationprocessitself is thatpalatalized/t/ and/d/ becomepalatalizedaffricates,/t 1 ,/ and/dk ,/ respectively. ThusforRussian/d,at,ka/âuncleâ we have in Belarusian/dk ,at1 ,ka/; for Russian/t,es,t,/âf atherin lawâ, Belarusianhas/t 1 ,es,t1 ,/.
A seconddifferenceis thatdentalstopsin Belarusianregularlypalatalizebeforeapalatalized/v/. Thusalongsidethemasculine/neuterform /dva/ âtwoâ, we have femi-nine/dk ,v,e/ âtwoâ; alongside/catyry/âfourâ, wehave thecollective form /cat1 ,v,ora/.
Onceagainunlike thesituationin Russian,Belarusianorthographicallymarksre-gressive palatalization,at leastin casesinvolving /t 1 ,/ and /dk ,/, which have a sepa-rateorthographicrepresentation,namely0mlnC/o Ăš c(â) Ăť andá ďż˝ lnC/oÜÚ dz(â) Ăť respectively.Thus/dk ,v,e/ âtwo (feminine)â is written as á ďż˝ B Ăš dzveĂť , and/cat1 ,v,ora/âfour (col-lective)â is writtenas M ø#0ďż˝ ¨ Ă´ ø Ăš cacvjoraĂť . ThusalthoughKrivickij andPodluzhnyjnotethattheeffectsof regressivepalatalizationarenot indicatedin writing (page56),this is not strictly correctsincein the caseof palatalized/t/ and/d/, theseareortho-graphicallymarkedasaffricates,thoughthey arenot followedby a soft sign C Ăš â Ăť .However, for a form like ¨ E 0ďż˝C Ăš josc,Ăť âis, areâ, which is pronounced/jos,t1 ,/, thepalatal/s/ is certainlynotmarkedorthographicallyin any way. Theexplanationfor thedifferencein behavior is presumablythattheaffricates/t 1 / and/dk /, whetherpalatalizedor not have a standardorthographicrepresentationas 0oĂš c Ăť and á ďż˝ Ăš dzĂť ; whereas/s/, for instance,only hasonerepresentation,namely E Ăš sĂť , andthereis no separatesymbol for a palatalized/s/. But why is the soft sign not written cluster internal-ly?12 We canpresumethat in modernBelarusianorthography, a soft signâs functionis merelyto marka clusterof consonantsasbeingpalatalized.Thuswe couldwrite arule,thatwould form partof the Ă ĂŁĂä�Ìèç ĂŠĂŹĂŤ for Belarusian,andthatwouldsimply insertasoft signafterapalatalizedconsonant,whenever it is not followedby avowel (sincein thatcaseoneusesoneof thesoft vowel symbols,which implicitly markpalatality)or anotherpalatalizedconsonant:
(3.8) pqďż˝rC��,ďż˝ �������! #"?��%,'*),%ďż˝+ (# st���������! #"ďż˝$&%('-),%,+vuThe table in (3.9) gives the representationsfor á� 8ø Ăš dvaĂť âtwoâ (mascu-
line/neuter)and á ďż˝ BúÚ dzveĂť âtwoâ (feminine):
(3.9)âtwoâ (masculine,neuter) âtwoâ (feminine)
UL dva dv,eORL dva dk ,v,e� á� 8ø Ú dvaÝ á � / Ú dzveÝ� dva dk ,v,e
12Thatis, besidescaseswheretheconsonantis lexically markedaspalatal,asin Russian.
3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 75
This caseis interesting,becausethe representationof regressive palatalizationinBelarusianorthographywould appearto beprimafacieevidenceagainstConsistency:on theonehandregressive palatalizationis representedwhenit involves/t, d/, whichbecomeaffricates;on theotherhandit is not representedfor any otherconsonant.Butaswe cansee,the Inconsistency is only apparent:the reasonthat the soft sign is notusedto mark regressive assimilationin generalmerelyrelatesto thestatementof therule thatspellsout thesoft sign.
Interestingly, Lyosikâsusageis againdifferentfrom thatof Krivickij andPodluzh-nyj. In Lyosikâs usage,in clustersof palatalconsonants,all assimilatedconsonantsaremarked with a soft sign. ThuswhereasKrivickij andPodluzhnyj have 0ďż˝ ¨ ôþá �Ú cvjordy Ăť /t 1 v,ordy/ âhardâ, Lyosik writes 0ďż˝C# ¨ ôþá ďż˝ Ăš câvjordy Ăť ; for á ďż˝ /âÚ dzveĂť/dk ,v,e/ âtwo (feminine)â, Lyosik has á ďż˝ C5 / Ăš dzâveĂť ; for ¨ E 0ďż˝C Ăš joscâ Ăť /jos,t1 ,/ âis,areâ,hehas¨ E C#0ďż˝C Ăš josâcâ Ăť . This is readilyinterpretableasasimplificationof therulein (3.8),which canberewritten to describeLyosikâsspellingconventionsasfollows:
(3.10) pAďż˝wCHďż˝,ďż˝ �������! #"?��%,'*),%ďż˝+ xzy(Thatis, thesoft signis insertedaftera palatalizedconsonant,exceptbeforeavowel.)Thetablefor thefeminineform of âtwoâ shown abovein (3.9)now becomes:
(3.11)âtwoâ (feminine)
UL dv,eORL dk ,v,eďż˝ á ďż˝ C5 / Ăš dzâveĂťďż˝ dk ,v,e
3.1.3 Lexical marking in Russianand other issues
Despitethe regularity of Russianspelling, thereare caseswhereone must assumelexical markingof orthographicinformation,andwe will examinea coupleof thesehere.Wewill start,however, by consideringacasethatmightappearto involvemarkedorthography,but whichin factinvolvesmarkedphonology, theorthographyitself beingperfectlyregular.
In theSlavic-derivedvocabulary, sequencesof hardconsonantsfollowedby non-palatalized/e/ do not occur. Basicallyeitheronegetsa palatalizedconsonant(e.g.,otâec âf atherâ, with a palatalized/tâ/), or elseone finds a hard (in this casepartial-ly palatalized)consonant,followed by /je/: otjezd âdepartureâ. Thereare howevera large numberof borrowed words that have suchsequences,particularly thosein-volving dentalconsonants.Considerthe following examples,in which thehardcon-sonanthasbeenunderlined:seks âsexâ, test âtestâ, arteriosclâeroz âarteriosclerosisâ,gâeteromorfizmâheteromorphismâ,dekagram âdecagramâ.In eachof thesecases,thevowel /e/ is spelledwith Ăš eĂť : thusdekagram is spelledá� ďż˝ ø�ò8Ă´ ø5G Ăš dekagramĂť ,andseksis spelled E ��E Ăš seksĂť . From the point of view of the readerof Russian,thesecasesinvolveanirregularusageof thewritten vowel âÚ eĂť . But thereis in factnothingirregularin thespellinghere:whatis irregularis thephonology. Thefactthat
76 CHAPTER3. ORL DEPTHAND CONSISTENCY
thevowel is spelledwith Ú eÝ follows from generalconstraintson Russianorthog-raphy. This is becausetheonly otherway thatthevowel couldbespelledwould beas. Ú eÝ , but this vowel symbolis generallydisallowedin non-syllable-initialposition.As weshallsuggestin Section3.5,this constraintis bestexpressedasasurfaceortho-graphicconstraintof Russian.What this entailsthenis that theunusualphonologicalstructurein thecaseswe have beenconsideringarespelledin the only way they canbe,with Ú eÝ . Notethatthespellingrulesthatwe presentedin (3.2)alreadycapturethesefacts,sincepostconsonantal/e/ is rewrittenby theserulesas úÚ eÝ .
Onegenuinecaseof orthographiclexical marking involvesthe genitive endings-evo and -ovo, which arealwayswritten òÿó Ăš egoĂť and óÿòÿó Ăš ogoĂť . Thusfor theword bolâsovo âbigâ (masculine/neutergenitive singular)it will be necessaryto onlyspecifythatthe/v/ is writtenas ò Ăš g Ăť .
Anotherandsomewhatmoretroublingcasethat requireslexical specificationun-der the presenttheoryareprefixessuchas raz-, or bez-, which assimilatein voicingto thefollowing consonant,following regularprinciplesof Russianphonology. Basi-cally, word final obstruentsin Russianarealwaysvoiceless,andobstruentsassimilatein voicing to a following obstruentin a cluster;this assimilationappliesacrosswordsaswell aswithin words.13 With oneclassof exceptions,assimilationanddevoicingof obstruentsis never reflectedin theorthographyof Russian.Thus òÿóþô Ăł8á Ăš gorodĂťâcityâ is thuswritten, even thoughthe final /d/ is actuallya /t/; similarly, the phraseI/ ďż˝ à ø54ďż˝C50à ø Ăš bezpalâcaĂť âwithout a fingerâ is thuswritten eventhoughthefinal /z/of bezassimilatesin voicing to thefollowing /p/, andis thusreally /s/. Thesystematicclassof exceptionsto this generalizationaretheaforementionedprefixes(alongwithv(o)z-andiz-; see(Wade,1992,page16)). Thuswe have I/ ďż˝ Iþó ij3�� Ë6 Ăš bezboznyj Ăťâgodlessâ,but IB E à ø54 ďż˝ Ë6 Ăš bespalyjĂť âfingerlessâ; Ă´ ø ďż˝ á D G{C# Ăš razdumâeĂť âthoughtâ,but Ă´ ø E ďż˝6 E;ďż˝ ø Ăš raspiskaĂť âreceiptâ. Notethatthis exceptionalspellingis only foundwith prefixesendingin (underlying)voicedobstruents.Thosesuchass- thatareun-derlyinglyvoiceless,retaintheir expectedspellingswhenprecedingvoicedobstruents(in which casetheir final consonantsbecomephonologicallyvoiced): thus sdavatâ/zdĂ˝ vatâ/ âto handinâ is written E á ø5 8ø 2 C Ăš sdavatâ Ăť , not * ďż˝ á ø# �ø 2 C Ăš zdavatâ Ăť .
In additionto voicing assimilation,akanâjeis also representedin theorthographyfor theprefixraz-: indeed,theunderlyingform of raz-is really roz-. Thus,understressit is spelledĂ´ Ăł ďż˝ Ăš rozĂť (or ôÜó E Ăš rosĂť ): thus ôÜó Eďż˝E�� ďż˝C Ăš rossypâĂť âmineraldepositâversusĂ´ ø Eďż˝E;ďż˝ à ø 2 CúÚ rassypatâ Ăť âspill, scatterâ.Thealternationis notmarkedfor bez-.Thusoneneverfindsthespelling* I#6 ďż˝ Ăš biz Ăť : in I/ E à ø#4 ďż˝ Ë6 Ăš bespalyj Ăť âfingerlessâ,thevowel in theprefix is / ďż˝ /, not /e/,despitethespelling.
Thebehavior of theseprefixeswould appearto requirea relaxationof theConsis-tency assumption,sincethey wouldseemto involveanORL thatis muchlaterthantheORL thatwe have beenassumingfor therestof theRussianvocabulary. On the faceof it thefactsarereminiscentof Pesetskyâs (1979)unpublishedbut influentialanalysisof Russianlexical phonology, whereinhearguesthatprefixesarephonologicallyon alatercycle thansuffixes.Might it be,then,thattheorthographyof prefixesis similarlyhandledat a later level, andthusreflectsvowel andconsonantchangesthat have not
13Thefactsareidenticalin Belarusian,andthediscussionherewouldcarryover ratherdirectly to Belaru-sian.
3.1. RUSSIAN AND BELARUSIAN ORTHOGRAPHY 77
takenplaceat thepoint at which thespellingof stemsandsuffixesis handled?Underthisanalysiswe wouldhaveto abandonConsistency in favor of akind of ConstrainedInconsistency. The problemwith this move is that, aswe notedabove, the spellingirregularitiesareby no meanscommonto all prefixes: prefixesendingin voicelessconsonantssuchass- or ot- do not undergo thesechanges.Furthermoreot- is alwaysspelledĂł 2 Ăš ot Ăť in Russian,evenwhenthe/o/ is reducedto / Ăź / by akanâje. Thustheinconsistency would beconstrainedindeed,applyingto just a few lexical entries.
Thecleanestsolutionto theproblemwithin thecurrentframework is to assumethatthis classof prefixeshasa specialsetof spellingrulesthat is sensitive to thevoicingof the following consonantand(for roz-) the stresson the prefix. Thus,we assumeaccordingto the Consistency hypothesisthat the ORL representationof I/ E à ø#4 ďż˝ Ë6Ăš bespalyjĂť is /bezpalyj/. The normal spelling rule for /z/ would of coursegive �Ú z Ăť , but if we assumea rule suchasthat in (3.12), the /z/ will insteadbe written EĂš sĂť . Similarly, the ORL representationof Ă´ ø Eďż˝E;ďż˝ à ø 2 C Ăš rassypatâĂť is /rossypat,/,andthespellingof /o/ as ø Ăš aĂť is accomplishedby therule in (3.13):
(3.12) z ďż˝ E Ăš sĂť / [ $ voiced]
Condition:/z/ is in oneof theprefixesbez-, roz-, v(o)z-or iz-
(3.13) o � ø Ú aÝ
Condition:/o/ is unstressedin theprefix roz-
Theunusualspellingof underlying/z/ and/o/ in thesecasesis thusdueto a form oflexical marking,this time a lexical conditionon a rule. We arethusableto preserveConsistency, thoughat thecostof a redundantspellingrule.
3.1.4 Summary of Russianand Belarusian
The previous discussionhasoffereda comparative analysisof a portion of RussianandBelarusianorthography, andtherelationshipof thoseorthographicsystemsto thephonologiesof thelanguages.Of course,a full evaluationof themodelâsapplicabilityto thesewriting systemsawaitsa completedescriptionof theorthographyâ aswellasa completedescriptionof the relevantphonologicalphenomenaâ somethingthatis certainlybeyond the scopeof this work. Nonetheless,the datapresentedhereareat leastconsistentwith theConsistency hypothesis,which is whatwe aimedto show.Thuswe concludethat theORL in RussianandBelarusianareConsistentlevels,andthatfurthermorethereis agreatsimilarity in thetwo systemsof spellingâ Ă ĂĂĽĂà à ÿâå .ThemaindifferencebetweenRussianandBelarusianorthographylies in thedepthoftheORL.
78 CHAPTER3. ORL DEPTHAND CONSISTENCY
3.2 English
As we have seen,theorthographiesof RussianandBelarusianarebothquite regular(i.e.,in thesenseof ârule-governedâ),theonly differencebeingin thelevel of abstract-nessof theORL: in Russian,onerepresentsin thewrittenform alevel of phonologicalrepresentationthat is closerto an âunderlyingâ representation,thanwhat onerepre-sentsin Belarusian.Underthatassumption,relatively little orthographicinformationneedsto belexically marked.In contrast,if weassumedashallowerORL for Russian,thena largeportionof thevocabulary, particularlythosewordswith lexical /o/ or /a/,bothof which surfaceas/ Ăź / or / Ă˝ / in unstressedpositions,would requireorthograph-ic information to be marked. That is, if the spelling òÿóþô Ăł8á ø Ăš gorodaĂť is taken torepresenta phonologicalrepresentationsuchas/gĂź r ýþÞda/,thenthe/ Ăź / andthe/ Ă˝ / willeachneedto be lexically marked asbeingwritten as Ăł Ăš o Ăť , sinceeitheronecouldequallywell have beenwritten ø Ăš aĂť , yielding the samepronunciation.Of course,our assumptiondoesnot allow usto avoid lexical markingin Russian(or Belarusian)completely:for instance,we consideredthe irregularly pronouncedò Ăš g Ăť /v/ in thegenitiveendings ò8Ăł / óÿòÿóâÚ egoĂť / Ăš ogoĂť . But suchitemswouldrequirelexical mark-ing of somekind in any case,sincethey patentlyfall outsidethe regular systemofRussianspelling.
Therelativelyclearstatusof Russianasaâdeepâorthographybringsusto theques-tion of how to characterizetheorthographyof ModernEnglish,anotherphonographicwriting systemthathasbeendescribedasâdeepâ. Of course,evena cursoryknowl-edgeof Englishspellingleadsoneimmediatelyto the conclusionthat the systemofEnglishspellingis agreatdealmorechaoticthanthatof Russian,or indeedalmostanyotherlanguagethatusesa scriptwhoseoriginal designwaspurelyphonological,14 afactthathasnot goneunnoticedby scoresof spelling-reformersfrom theseventeenthcenturyto thepresentday(Venezky, 1970,pages19ff.). Nonetheless,this observationhasnotpreventedvariousauthorsfrom attemptingto find regularityin thesystem.Onesuchenterprisewastheclassicwork of Venezky (1970),who arguedthat therelationbetweenspellingandpronunciationâ hewasprimarily interestedin themappinginthis direction,ratherthanthe otherway aroundâ wasgovernedby clearsetsof or-deredrules.15 In Venezkyâs system,a spellingsuchas Ăš socialĂť wasmappedto themorphophonemicrepresentationc sosIĂŚld , by a setof grapheme-to-morphophonemerules(page46); andthenceto a surfacepronunciationby phonologicalrules.RulesinVenezkyâs systemwerearrangedin orderedblocks. Thusoneblock statesthat initialĂš h Ăť correspondsto morphophonemiccY| d in heir, (AmericanEnglish)herb, honest,honorandhour; medialpreconsonantal,andfinal Ăš h Ăť is cďż˝| d ; and Ăš h Ăť is elsewherec h d (page74). Interestingly, Venezky saysrelatively little of asystematicnatureabout
14Oneexceptionis theorthographyof ManxGaelic,asystemthatis basedontheorthographyof English,andwhoseapparentarbitrarinessapproachesthatof Englishin somerespects;we will discussthewritingsystemof Manx lateron in Section6.1.
15Anotherimportantwork that presentsa rule-basedaccountof Englishspellingis Cummingâs (1988)treatmentof AmericanEnglishspelling. Cummingâs work is relatively unusualin the literatureEnglishorthographyin thathedeals,aswe do in this section,with theproblemof predictingspellingfrom pronun-ciationratherthanpronunciationfrom spelling;aswe have just noted,Venezkyâs work, for instance,dealtwith theproblemof inferringpronunciationfrom spelling.
3.2. ENGLISH 79
the influenceof morphology: the vowel-shift-relatedalternationssuchas thoseex-emplifiedin extremeâextremity, andwhich wereto play so centrala role in the earlydevelopmentof generative phonology, were treatedonly briefly (pages108â109)inVenezkyâsdiscussion.
It is temptingto classifyVenezkyâs model in our termsasbeing onewheretheORL â hismorphophonemiclevel â correspondsmoreor lessto anunderlyingmor-phophonemicrepresentation.But this is not strictly accurate:Venezky wasoperatingwithin apre-generativeAmericanstructuralistsetof assumptions,16 within whichsuchnotionsasunderlyingrepresentationwerenot available.
Within the generative phonologicaltradition, however, suchnotionsas deeporsurfacestructureareexplicitly assumed(or at leastwere until abouta decadeago),and it was within this set of assumptionsthat arguably the most radical statementaboutthenatureof Englishspellingwasmade.Chomsky andHalle (1968) arguethatEnglishorthographyis, despiteappearances,a nearoptimal spellingsystem,the keyassumptionbeing that what is representedis not a surfacephonemicrepresentation,but rathera quite abstractlexical representation.The claim wasmadelargely on thebasisof suchalternationsasassignâassignation. For this pair, the surfacephonemicrepresentation/ Ăź ĂžsaIn/â/asIgĂžneIsĂź n/ givesno clueasto why thereshouldbea Ăš g Ăť inthespellingof assign, or why Ăš i Ăť shouldrepresentsuchtwo quitedifferentvowels.The deeprepresentation,which would be somethinglike /ĂŚĂž sÄągn/ and/ }ĂŚsÄągĂžnatyon/,respectively, makesthis clear, asbothformshave a /g/ (removedby subsequentrulesin assign), andbothformshavethesame/Äą/, whichundergoesvowelshift in assign, andtrisyllabic laxing in assignation. Chomsky andHallegosofarasto claimthat Englishorthography, far from beingtheunfortunatesystemit is usuallytakento be,is in factcloseto anidealsystemof orthography. This is becauseâthe fundamentalprincipleoforthographyis thatphoneticvariationis not indicatedwhereit is predictableâandthatâan optimalorthographywould haveonerepresentationfor eachlexical entryâ (1968,page49) (alsocitedin (Sampson,1985,page200)).
Needlessto sayfew scholarsof writing systemswould agreewith Chomsky andHalleâs ratherprescriptive statementaboutorthographicprinciples,which is neitherobviously trueasa statisticalstatementaboutwriting systemscross-linguistically, norcanit betakenasanything otherthana statementof personaltasteabouthow writingsystemsshouldbedesigned.17 Furthermore,andmoredirectly relevantto ourcurren-
16Hedoesnot evenciteTheSoundPatternof English, eventhoughthatwork appearedtwo yearsprior tothepublicationof hismonograph.
17Having saidthat, it is certainlytrue that thereseemsto be a âtensionâ betweenwhat onemight termphonological faithfulnesson the onehand,and morphological faithfulnesson the other. That is, writingsystemsoften facea choicebetweenrepresentinga word in a form that is representative of its (surface)pronunciation,andrepresentingthemorphemesof aword in a fashionconsistentwith theirspellingin otherrelatedwords. This is hardly a new observation, andlinguistshave for decadesmadethis observation invariousforms. Russianorthographycanbe said to have addressedboth problemsratherelegantly in thesensethat morphologicallyrelatedforms â at least thosethat are relatedby fairly regular and generalphonologicalalternationsâ areconsistentlyspelled,yet going from the spelling to the pronunciationisalsoquitestraightforwardâ givenof coursethatonehascertainlexical informationin hand.
Onemight be temptedto statethe notionsof phonologicalfaithfulnessandmorphologicalfaithfulnessassoft constraints,andexplain variousspellingsby meansof differentrankingsof theseconstraintsin themannerof Optimality Theory: indeedwhatwe have termedmorphologicalfaithfulnessis quitesimilar in
80 CHAPTER3. ORL DEPTHAND CONSISTENCY
t discussion,several authorshave taken issuewith the specificclaim aboutEnglish.For example,Sampson(Sampson,1985)notesseveral ratherseriousproblemswithChomsky andHalleâsposition.Onesignificantproblemis thatevenif Englishspellingis assumedto representa deeplevel, it canhardlybesaidto beconsistentin its repre-sentation.For instance,while assignâassignationshow retentionof thesamespellingacrossdifferentderivedforms,someotherpresumablymorphologicallyrelatedpairsdo not, even in caseswherethereis no changein pronunciation:so considerthe al-ternationin spelling for the vowel /i/ in Ăš speechĂť â Ăš speak Ăť ; or the alternationinspellingfor the pair Ăš collideĂť â Ăš collision Ăť , wherethe pronunciationof underlying/d/ as/z/ shouldbepredictablein thismorphophonologicalenvironment,sooneoughtin principle to be able to spell the word collision as Ăš collidion Ăť (Sampson,1985,page201); or considersuchminimal pairs as Ăš raceĂť â Ăš racial Ăť versus Ăš spaceĂť âĂš spatial Ăť , wherethe phonologicalalternation(/s/ versus/s/) is identical,asarethemorphologicalenvironments,andyet in onecasethespelling Ăš c Ăť is retainedin the-ial derivative,whereasin theotherthe/s/ is spelledas Ăš t Ăť .
After discussingvariousotherapproachesto Englishspelling,Sampsonproposes(page203):
We may seeanotherkind of methodin the apparentmadnessof ourspelling,though,if we avoid letting ourselvesbeobsessedby thephono-graphicoriginsof theRomanalphabetandthink of Englishspellingasatleastpartly logographic.
The proposalthat Englishspellinghaslogographicpropertiesis certainlya widelyexpressedone: for example,as Bloomfield notes(Bloomfield and Barnhart,1961,page27) (usingthetermword writing for logographic):
Now someonemay askwhetherthe spellingof knit with k doesnotserve to distinguishthis word from nit âthe egg of a louseâ. Of courseit does,andthis is exactly whereour writing lapsesfrom the alphabeticprinciplebackinto theolderschemeof wordwriting. Alphabeticwriting,which indicatesall thesignificantspeechsoundsof eachword, is just asclearasactualspeech,whichmeansthatit is clearenough.Word writing,on theotherhand,providesa separatecharacterfor eachandevery word. . .Ourspellingtheverbknit with anextrak (andthenounnit without thisextrak) is astepin thedirectionof word writing.
While thereis certainlysomemerit in thisview, I feel thatit is importantto distinguishwriting systemswith a truelogographiccomponent,suchasChineseor AncientEgyp-tian, from theratherhaphazardâpseudologographicâpropertiesof Englishwriting.
On the onehand,many of the logographiccomponents(the so-calledâsemanticradicalsâ) of Chinesecharactersseemto representsemanticaspectsof morphemesin a surprisinglyconsistentway. As we have notedelsewhere(Sproatet al., 1996),many semanticradicalsin Chinesearequite consistentin the semanticinformation
spirit to paradigmuniformity, a principle that hasbeenproposedasa phonologicalconstraintby Steriade(1999)andothers.I think, however, thattheseâprinciplesâ maybea little too vagueandfuzzy to allow forthis treatment,at leastatpresent.
3.2. ENGLISH 81
thatthey mark. Thusin thelists presentedin (Wieger, 1965,pages773â776),254outof 263(97%)characterswith thesemanticradical
U Ăš INSECT Ăť denotecrawling orinvertebrateanimals;for ~ Ăš GHOST Ăť (page808),21 out of 22 (95%)denoteghostsor spirits. The semanticinformationprovidedby theselogographicelementsis thusstrikingly consistent.As wealreadyproposedin Section1.2.2,for aChinesewordlikeďż˝ Ăš INSECT+CHAN Ăť chan âcicadaâ, the insectradical
U Ú INSECT Ý representsaportionof thesemanticfeaturesetfor themorpheme,whereasthephonologicalportion(in thiscase�tÚ CHAN Ý ) representsthepronunciationassociatedwith themorpheme.
On the other hand, it is hard to find any suchconsistency in the âlogographicâaspectsof Englishorthography:for instance,thesetof wordsorthographicallydistin-guishedfrom otherwordsby aninitial silent Ăš k Ăť â knit, know, knight, knave. . .âformsno naturalclass,andthemostwe cansayaboutthe Ăš k Ăť hereis thatit is anor-thographicelementwith no correspondingphonologicalelement.Thustheword knitmight be representedasin (3.14),repeatedin part from (1.6a): noteagainthatwhiletherepresentationof /n/ as Ăš kn Ăť is idiosyncratic,thetheremainingspelling Ăš it Ăť isin thiscasepredictablegiventhephonologicalform, anddoesnotneedto bespecified:
(3.14) SPHON T���X-Z I _abORTH cY�/� X�dPe
It would of coursebecompletelyunmotivatedto saythatthe Ăš k Ăť , or Ăš kn Ăť herecorrespondsto any non-phonologicalcontentof thefeaturestructureof thisword,andsowereally havenoparallelhereto theChinesecase.
All of which still leavesuswith thequestionof how bestto characterizeEnglishorthographyin termsof themodelwe aredeveloping. The interestingquestionfromourperspectiveis nothow to dealwith irregularitiessuchasthe Ăš k Ăť in knit, but ratherhow to characterizethephonologicallevel of representationthat is representedby theregularspelling.In our terms:how deepis theORL for Englishspelling?A definitiveanswerto this questionwould requirea completeanalysisof the spellingof a largeportion of the Englishvocabulary â somethingakin to Nunnâs (1998) treatmentofDutchspelling,onewhich systematicallyanalyzeshow well thestandardorthographyof, say, AmericanEnglish,correspondsboth to the standard(surface)pronunciationandto a plausibleproposalfor theunderlyingrepresentationof eachword.18
In this sectionI will describea smallanalysisof a kind thatshouldeventuallybedoneon a more completescale. I selected1169 words from an on-line dictionarywith their (American)spellingsandphonemicrepresentationfor a standardAmerican
18Alternatively one might considera data-orientedapproachfor measuringthe relation betweenthespellingandvariousproposalsfor theORL. VandenBoschandhiscolleagues(vandenBoschet al., 1994)investigatedthreedata-orientedlearningmethodsfor measuringtherelative complexity of English,FrenchandDutch orthographiesâ morespecificallythe complexity of the relationbetweenthe spellingandthesurfacephonemicrepresentationthatonewould find in a dictionary. They proposedtheinverseof thevari-ousmodelsâperformanceasa measureof thecomplexity of eachsystem.Of course,oneweaknessof theirapproachfrom our point of view is that they took it for grantedthat the surfacephonemicpronunciationis the correctlevel to relatethe spellingto. Our thesisthat the depthof the ORL differs amongdifferentlanguagessuggeststhattheirassumptionis notnecessarilyvalid.
82 CHAPTER3. ORL DEPTHAND CONSISTENCY
pronunciation.Thesewordsconsistmostlyof forms thatareat leastin partLatin- orGreek-derivedandshow alternationsof thekind thatwerecentralto theargumentsforvowel shift, laxing,velarsoftening,andsomeotherconsonantalternationsin (Chom-sky andHalle,1968)(SPE).Thuswefind exampleslikeaboundâabundanceor helio-centricâheliocentricityâheliocentricism. In additionto the dictionary-derivedsurfacephonemicrepresentation,I alsoreconstructedanSPE-styleunderlyingrepresentation.Thustheunderlyingformsof aboundandabundanceareassumedto be,respectively,/ĂŚĂžbund/ and /ĂŚĂžbundans/;similarly the, underlyingforms of electric andelectricityare,respectively, / ďż˝ Ăž l ďż˝ ktrIk/ and/ ďż˝ l ďż˝ k Ăž trIkIti/. Hereandelsewhere/u/ is usedto repre-sentunderlying/u/ that diphthongizesto /aU/; tense/u/ as in super, which doesnotdiphthongize,is representedwithout the overbar;/k/ representsa /k/ that undergoesvelar softening(similarly /g/). WhereI perhapsdepartfrom SPEis in only positingabstractformsin caseswherethereis plausibleevidencefor analternation.Thussincethereis no evidencefor the first /s/ in cervix alternatingwith anything else,I repre-sentthis word underlyinglyas/ Ăžsďż˝ rvIks/ (but notethepenultimate/k/ asevidencedbythe form cervicitis, wherethe /k/ hasundergonevelar softening). As a generalrule,I assumethat thereareno schwasin underlyingrepresentation,only full vowels: s-incethepositedfull vowelsaregenerallypositedon thebasisof theorthography, thisnecessarilybiasesthe analysis,and this point shouldbe bornein mind. The com-pletelist of wordsalongwith their surfaceandpositedunderlyingforms is given inAppendix3.A.1at theendof this chapter.
Whatotherassumptionsdo we thenneedto make in orderto predictthespellingfrom eachof theselevels?More specifically:ďż˝ Whatrulesdo we needto assumefor Ă+ĂĂĽĂà à ÿâå ?ďż˝ And whatlexical markingof orthographicpropertiesmustweassume?
Needlessto say, therearevariouswaysin which onecould juggle thesetwo kindsofdevices,andwhatI presenthereshouldbeunderstoodasjust beingonepossibleway,andnotnecessarilythebestone.
Lexical specificationsneededfor eachform are given in Appendix 3.A.1. Inthatlist, orthographicspecificationsaregivenby subscriptedlettersin anglebrackets.Thus: /ĂŚĂžbu ���,ďż˝ ndans/denotesthe fact that the /u/ is spelledas Ăš u Ăť . The notationďż˝ ďż˝ E ďż˝ denotesa casewhereletter Ăš l Ăť correspondsto no phonologicalmaterial. (Insomecasesonecouldalternatively havecoalescedtheaddedletterwith thespelloutofa precedingor following phoneme,aswasdonewith Ăš knit Ăť above.) Thetwo othermaindevicesarethe (subscripted)feature[ ďż˝ db], usedto marka consonantthat is toberepresentedorthographicallyasdouble;19 andthefeature[ ďż˝ gk], which is usedtomarkwordsthathave the Greekspellings Ăš phĂť for /f/ and Ăš rh Ăť for (initial) /r/. Insomecases,particularlywith plural +es, morphemeboundaryinformationis neededin orderto predicttheappropriatespelling;themorphemeboundaryis markedasâ+â.Thesecondandthird columnsof thetablein Appendix3.A.1 constituteproposalsfordeepandshallow ORLâs for theseEnglishwords.
19It might bebelieved thatdoubledconsonantsin Englishspellingarepredictable,but in fact this is notthecase.It is certainlytruethatin generaladoubleconsonantis anindicatorthattheprecedingvowel is lax(cf. (Venezky, 1970,pages106â107),inter alia). But theimplicationdoesnotgo theotherway.
3.2. ENGLISH 83
The(ordered)rulescorrespondingto thedeepandshallow ORLâsaregivenin Ap-pendix3.A.2 andAppendix3.A.3 respectively. In many casesthe import of the par-ticular rule shouldbeclear: in caseswhereit is not,somecommentaryis added.Bothof theserulesetshave beentestedwith their correspondingORLâs, to verify that therulesetsappliedto theirORLâsdo indeedderive thecorrectspellingsfor all words.20
To illustratethedifferencebetweentheassumptionof adeepversusshallow ORL,considerthe word audacity, the AVM representationsof which aregiven in (3.15a)(deep)and(3.15b)(shallow). In this word, only a shallow ORL would requirea lex-ical marking,namelythespecificationof Ăš c Ăť for thespellingof the/s/, indicatedinAppendix3.A.1asa subscriptedĂš c Ăť .
(3.15) (a) SPHON T��FÞdakIti bORTH c d e
(b) SPHON T�� ÞdÌsX Z Iti bORTH c cX d e
Therulesneededto accountfor thespellinggiventhesetwo ORLâsarelistedin (3.16).The numbersin the last two columnsindicate the rule numberfor the DeepORL(Appendix3.A.2)andShallow ORL (Appendix3.A.3),respectively. Notethatin somecasesslightly differentrulesareneededfor thetwo ORL levels.No rule is neededforthe Ăš c Ăť spellingfor theShallow ORL, sincethis spellingis lexically specified.
(3.16) Spelling Rule DeepRule# Shallow Rule#Ăš auĂť : ďż˝ ďż˝ Ăš auĂť 1 2Ăš d Ăť : d ďż˝ Ăš d Ăť 22 24Ăš aĂť : a ďż˝ Ăš aĂť 45 â
ĂŚ ďż˝ Ăš aĂť â 48Ăš c Ăť : k ďż˝ Ăš c Ăť 21 âĂš i Ăť : I ďż˝ Ăš i Ăť 52 57Ăš t Ăť : t ďż˝ Ăš t Ăť 33 35Ăš y Ăť : i ďż˝ Ăš y Ăť / # 58 50
We turn now to a discussionof the fragmentpresentedin Appendix 3.A. Firstof all, thereis a clear differencein the numberof rules neededin eachcase,with58 rulesfor the deepORL, and69 for the shallow ORL. (Someof theserulescouldhave been combined;this would changetheoverall counts,but not the relative sizesof the two sets.)More interestingarethe lexical markingsgiven in Appendix3.A.1.We discountthe[ ďż˝ gk] and[ ďż˝ db] markings,which aregenerallyneededundereitherassumptionof thedepthof theORL.For thedeepORL,389(33%)of thewordsrequire
20Thesystemwasdevelopedandtestedusingthe lextoolsfinite-statelinguisticanalysistoolkit developedat AT&T Bell Labs,anddescribedin (Sproat,1997b;Sproat,1997a).
84 CHAPTER3. ORL DEPTHAND CONSISTENCY
lexical marking,with 509totalmarksbeingneeded.For theshallow ORL, in contrast,892/1169(76%) of the words requiresomelexical marking, with 1452 total marksbeingused. So the shallow ORL is certainlya morecostly assumption,particularlywith respectto theamountof lexical marking,but alsoto someextentwith thenumberof requiredrules.This muchsupportsChomsky andHalleâs position.Howeverwhenoneconsidersthe distribution of the marks,the situationis lessconvincing. The tenmostcommonlexical specificationsin theshallow ORL,covering1311/1452(90%)ofthecases,aregivenin Table3.1.Similarly, thetenmostcommonlexical specificationsin thedeepORL, covering453/509(89%)of thecasesaregivenin Table3.2. Amongthe markingsfor the shallow ORL, four relateto the spellingof reducedvowels / Ăź /,and/I/ (as Ăš eĂť ); oneinvolvesthespellingof /s/ as Ăš c Ăť asin electricity; two involvethe irregular representation(mostly in Greek-derivedwords)of /I/ and/aI/ as Ăš y Ăť ;oneinvolveswriting /i/ as Ăš i Ăť (ratherthanthemorenormal Ăš eĂť ). Finally, we havespecificationsof /k/ as Ăš chĂť (in Greekwords)and/z/ as Ăš sĂť . Of these,five do notoccurin someform amongthe top tenfor thedeepORL markings:the four reducedvowel marks,and the specificationof /s/ as Ăš c Ăť . In contrast, Ăš y Ăť spellingsforvarietiesof /i/, Ăš chĂť spellingsfor /k/, Ăš sĂť for /z/ and Ăš i Ăť for /i/ are neededaslexically specifiedmarkingseven underthe assumptionof a deepORL. The needtomarkthespellingof reducedvowelsundertheassumptionof ashallow ORL is courseunsurprising:I believe it is necessaryto assumethat in English,as in Russian,theORL correspondsat least to a phonologicalrepresentationthat containsfull ratherthanreducedvowels. Similarly, the necessityof markingthe Ăš c Ăť spellingof velar-softening-derived /s/ in a shallow ORL would appearto be someevidencein favorof an SPE-styledeeplevel for the ORL. On the other hand,someother aspectsofdeepstructure,which wereimportantin the analysisin SPE,turn out to have muchlessimportancethanonemight expect. It makes little difference,for example,thatthereis avowel alternationin thepairchasteâchastity, analternationthatis abstractedawayfrom in theunderlyingrepresentation:in thecaseof adeepORL, theunderlyingvowel of the stem/a/ is mappedto Ăš aĂť ; with a shallow ORL, we simply have rulesthatmapthetwo distinctvowels/ĂŚ/and/eI/ to Ăš aĂť . No lexical markingis required.Similarly, while an alternationlike assignâassignationdoesrequirelexical markingfor the shallow ORL (sincewe must simply mark the fact that in the word assign,thesequence/aIn/ is spelled Ăš ign Ăť ), thereareonly six suchcasesin our list, not anoverwhelmingamountof lexical marking.
Interestingly, oneclearcasethatrequireslexical markingwith adeepORL but notwith ashallow ORL involvesalternationssuchasaboundâabundance. Theunderlyingrepresentationof thesecondvowel in this pair of wordsis presumablyuniformly /u/,soonewouldexpectaconsistentspellingâ e.g. Ăš ouĂť . Yet in thiscase,thespellings,whichalternatein parallelwith thephonologicalvowelalternationsaremoreconsistentwith a shallow ORL thanwith a deepORL, contraryto what onemight expectfromSPE.
What do we concludefrom all of this? Thereseemsto be someevidenceforthe EnglishORL beingrelatively deep,somethingthat is hardly surprising. On theother hand,with the exceptionof Ăš c Ăť in velar softeningcases,the considerationsthatfiguredprominentlyin Chomsky andHalleâsdiscussionof Englishspellingdonot
3.2. ENGLISH 85
Phoneme Orthographicmark Numberof cases/ Ăź / Ăš o Ăť 395/s/ Ăš c Ăť 242/ Ăź / Ăš aĂť 170/aI/ Ăš y Ăť 123/I/ Ăš y Ăť 112/I/ Ăš eĂť 67/i/ Ăš i Ăť 63/ Ăź / Ăš i Ăť 61/k/ Ăš chĂť 47/z/ Ăš sĂť 31
Table3.1: Thetenmostfrequentlexical markingsfor theshallow ORL in theEnglishfragment.
Phoneme Orthographicmark Numberof cases/Äą/ Ăš y Ăť 180/s/ Ăš c Ăť 84/I/ Ăš y Ăť 70/k/ Ăš chĂť 47/z/ Ăš sĂť 27/u/ Ăš u Ăť 12/i/ Ăš i Ăť 9/k/ Ăš k Ăť 8/i/ Ăš eĂť 8/u/ Ăš euĂť 8
Table3.2: Thetenmostfrequentlexical markingsfor thedeepORL in theEnglishfragment.
in fact seem to be of suchgreatimportance.Onefurther point shouldbe borneinmind: we have consideredabout1100words,carefully selectedto exhibit the kindsof alternationsunderdiscussionin SPE.This is hardlya representative sampleof theEnglishvocabulary, eitherin termsof theraw count,or in termsof thepropertiesthewordsexhibit. Indeed,an examinationof a larger fragmentof the vocabulary wouldprobablymake theargumentfor a deepORL lessconvincing: themajority of Englishwordssimply do not participatein the kinds of alternationsexhibited by the subsetconsideredhere.
To reiteratethecaveatsthatwehavealreadypresented,onenaturallymusttaketheanalysispresentedherewith at leasta small grain of salt: in particularif onemakesdifferentassumptionsaboutthe underlyingrepresentations,thenonewould arrive atdifferent results. Still, I shouldbe surprisedif they turnedout to be too different,and I would expect the basicconclusionto remainthe same:with the exceptionoftheorthographicrepresentationof reducedvowels,which is moreelegantlyhandledifoneassumesa relatively deepORL, theevidencefor a deepâmorphologicalâORL in
86 CHAPTER3. ORL DEPTHAND CONSISTENCY
Englishis equivocal.
3.3 The Orthographic Representation of Serbo-Croatian ConsonantDevoicing
An interestingprima faciecounterexampleto theConsistency hypothesisis found inSerbo-Croatian,andinvolvesthespellingof dentalobstruentsbefore/s/and/s/.21 Ac-cordingto thestandarddescriptionof Serbo-Croatian,obstruentclustersagreein voic-ing,with thevoicingof theclusterbeingdeterminedby thefinal memberof thecluster.Thusalongsidesvezatiâto bindâ, onefindssveska ânotebookâ;besideredakâlineâ, onefindsthegenitivesingularform retka. This muchis in commonwith otherSlavic lan-guagessuchasRussian.What is unusualaboutSerbo-Croatianis that thesevoicingassimilationsarereflectedin theorthography, so thata /b/ thathasbecomedevoicedto a /p/, for instance,is writtenas Ăš p Ăť ratherthan Ăš b Ăť . ThemodernSerbo-Croatianorthography, due to Vuk Karadzic(1787â1864),is often cited as an instanceof aâshallowâ orthography(seeChapter5 for somefurther discussionof this point), andoneof thefeaturesof this âshallownessâis that it spellswordsaccordingto their sur-facephoneticrealization:in popularparlance,Serbo-Croatianis writtenâasit soundsâ.
If this werethe entirestory, thenSerbo-Croatianwould be handleableunderthepresenttheorywithout further comment:the ORL would simply be a level at whichvoicing assimilationin obstruentclustershadalreadyapplied. Thereis, however, asystematicexceptionto the spellingprinciple we have just outlined: underlying/d/when followed by /s/ or /s/ retainsits spellingas Ăš d Ăť , even thoughit is describedasbeingvoiceless:notethat in otherenvironments(e.g.,before/k/, or /p/) devoiced/d/ is spelledas Ăš t Ăť ; andotherobstruentsbesides/d/ arespelledas their voicelesscounterpartsbefore/s/ or /s/. Thusprefix od- beforepad- âf allâ yieldsotpad âtrashâ;srb- âSerbâ yields srpski âSerbianâ. But grad- âcityâ yields gradski âurbanâ, andod-plus stetaâdamageâyieldsodstetaâcompensationâ.
On thefaceof it, then,we wouldappearto havea problem:for mostobstruentsinmostenvironmentstheevidencewouldappearto favor placingtheORL afterobstruentvoicingassimilation.But just in casewehaveanunderlying/d/ precedingan/s/or /s/,weseemto needto placetheORL earlier. In orderto maintainConsistency, wewouldhave to resortto oneof two possiblestrategies,neitherof which is palatable:ďż˝ Assumea late(postvoicing-assimilation)ORL, but markunderlying/d/ before
/s/or /s/with adiacritic,sothatthespellingrulescanseethefactthatit was/d/,andspell it accordingly.ďż˝ Assumean early (pre voicing-assimiliation)ORL. In this caseonehasaccessto the underlyingsegments,so thereis no problemspellingunderlying/d/ asĂš d Ăť in gradski. Unfortunately, however, in mostcasestheobstruentis spelledaccordingto its surfacephoneticrealization,meaningthat onewould in effect
21I amgratefulto WaylesBrownefor bringingthisexampleto my attention.
3.3. SERBO-CROATIAN DEVOICING 87
beduplicatingin theorthographicrules,theeffectsof voicing assimilationthatarealreadyhandledin thephonology.
The seeminglyexceptionalspellingof underlying/ds/ and /ds/ sequencesis notmerelya problemfor Consistency, however. It is, in fact, a puzzlemoregenerally.Why did Karadzicfail to spell thevoicedstopasits voicelesscounterpartin just thisone case? Could it be, in fact that suchunderlying/d/âs soundvoiced, despitethestandarddescription? If so, this would suggest,amongother things, that obstruentvoicingassimilationis notaunitaryphenomenon,but appliesto varyingdegreesunderdifferentconditions.
Supportfor thenon-uniformityof obstruentvoicing assimilationis alreadygivenby Browne(1993,page317),who notesthatassimilationto a voicedcluster-final ob-struent,andassimilationto a voicelesscluster-final obstruent,behave differentlywithrespectto thephonologicalrule of clusterbreakingin nominalgenitive plurals. Con-sonantsthat have becomevoicedby voicing assimilation,remainvoicedafter beingseparatedfrom theconsonantthattriggeredvoicing. Thusprimetiti âto remarkâ,yieldsprimedba âcommentâ(noun);in thegenitivepluralprimedaba, theacquiredvoicingonthe/d/ is retained.On theotherhandthedevoiced/z/ in sveska ânotebookâ,shows upagainasa /z/ in thegenitiveplural form svezaka. Evidently, [ ďż˝ voiced]assimilationisdeeperin thephonologythan[ $ voiced]assimilationinsofar asa traditionalanalysiswouldordertheformerbeforeclusterbreaking,andthelatterafter.
To explain the orthographicfacts,however, we are interestedin an even finer-grainedquestion: is theresomereasonto believe that [ $ voice] assimilationis lesscompletein the sequences/ds/ and/ds/, thanit is in othersequences?As a prelimi-naryanswerto thisquestionweconductedapilot studyof [ $ voice]assimilationin thespeechof a singleCroatianspeaker. In this studywe addressedthe following ques-tions:
1. Are underlying/d/ and/t/ before/s/ phoneticallydistinct with respectto theirvoicingprofiles,contraryto standarddescriptions?
2. Are underlying/b/ and/p/ before/s/ (bothspelledĂš p Ăť ) phoneticallydistinct?
3. How dounderlying/d/ and/t/ (bothspelledĂš t Ăť ) beforeanon-sibilantobstruentâ /k/ in ourdataâ compareto thesestopsin thepre-/s/position?
Thestudyandits resultsaredescribedin thetwo sectionsthatfollow.
3.3.1 Methods and materials
A list of Croatianwordswaspreparedthatcoveredtheenvironmentsof interestfor thequestionsabove. Specificallythesewordscovered:
1. Underlying/d/ before/s/: akadskiâAcadianâ,gradskiâurbanâ.
2. Underlying/t/ before/s/: anegdotskiâanecdotalâ,hrvatskiâCroatianâ.
3. Underlying/b/before/s/: arapskiâArabâ, mikropskiâmicrobialâ, ropskiâslavishâ,srpskiâSerbianâ.
88 CHAPTER3. ORL DEPTHAND CONSISTENCY
4. Underlying/p/ before/s/: mikroskopskiâmicroscopicâ
5. Underlying/d/ before/k/: glatka âsmoothâ, lutka âdollâ, otpatke ârefuse(noun)â,votkaâvodkaâ.
Thesewordswereprinted,alongwith filler material,in four iterations,eachwith adifferentrandomorder.
Thesubjectof the experimentâ a researcherat AT&T Labsâ is a malenativespeakerof a Dalmatiandialectof Croatian.Hehaslivedfor overa decadein English-speakingcountries,but hisCroatianspeechis self-describedasnormalfor thatdialect,andnot affectedby his exposureto English. He wasnot informedof the purposeoftheexperiment.22
The speaker wasasked to readthe wordson the printed list at a normal rateofspeed. His speechwas recordedto DAT using a Bruel and KjĂŚr Microphonein aquiet room. The datawassubsequentlyuploadedto a Silicon Graphicsworkstation,andhigh-passfilteredat 40 Hz to remove low-frequency noise.Thespeechwasthensegmentedinto wordsusingtheEntropicResearchLaboratorieswavesďż˝ package.Pre-dictionof voicingwascomputedfor eachfile usingtheEntropicget f0 utility (Talkin,1995). Note that the voicing profile for a speechfile producedby get f0 is a timeserieswith two values,namely1 for voicedand0 for unvoiced. The individual fileswerethenhandlabeledusingwavesďż˝ andthexmarksutility for thefollowing features:ďż˝ Onsetof thepre-/s/,or pre-/k/stop.ďż˝ Offsetof thevoicingwithin thestop.ďż˝ Onsetof thefollowing segment(/s/ or /k/).
The first andthird of thesewerelabeledbasedon visual inspectionof the waveformandthespectrogram.Thesecondwaslabeledbasedon thevoicing profile. A typicalwaveformandvoicingprofile for thewordgradskiis shown in Figure3.1.
3.3.2 Results
Thereareat leasttwo plausiblemeasuresof thedegreeof voicing of a stop,giventhevoicing profile: onemeasureis theabsolutedurationof theinterval betweentheonsetof thestop,andtheoffsetof thevoicing;anotheris theproportionor percentageof thestopthatis voiced.As it turnsout,bothmeasuresyield similar resultsin this study.
Let us dealfirst with the leastsurprisingresult: /d/, written as Ăš t Ăť before/k/ isclearly voiceless,essentiallythroughout. The meanabsolutedurationof the voicedregionof thestopis 5 msec,andthemeanproportionof thevoicedis 0.06.Thustheseunderlying/d/âs really are/t/âs,hencetheir spelling.
Turningnow to thecaseof /p,b/before/s/ (bothwritten Ăš p Ăť ), thefirst thing thatwe noteis thatvoicing is generallyfoundin, on average,thefirst 25 msecof thestop,which is greaterthantheamountwe observedin thecaseof /tk/. Betweenunderlying
22Thesubjectwasaskedto readthefirst pageof thetext asecondtimeat theendof therecordingsession,sothatwehave five ratherthanfour repetitionsof somewords.
3.3. SERBO-CROATIAN DEVOICING 89
Figure3.1: Waveform andvoicing profile for oneutteranceof gradski âurbanâ. The closurefor the /d/ is labeledas âdclâ, the voicing offset is labeledas âvâ, and the start of the /s/ islabeledasâsâ. Thevoicing profile is thethird plot from thetop in thesecondwindow, labeledasprob voice.
90 CHAPTER3. ORL DEPTHAND CONSISTENCY
dďż˝
dďż˝
dďż˝
dďż˝
dďż˝
dďż˝
dďż˝
dďż˝
dďż˝
dďż˝
tďż˝
tďż˝
tďż˝
tďż˝
tďż˝
tďż˝
tďż˝
tďż˝
tďż˝
t�0.
00.
20.
40.
60.
8-dski (d) vs. -tski (t)
Pro
port
ion
of v
oice
d cl
osur
e
Figure3.2: Barplotshowing theproportionsof voicing for all samplesof underlying/d/ (blackbars),versus/t/ (shadedbars).
/p/ and/b/ therewasnosignificantdifference:for /p/ (4 samples)theaveragedurationof voicing 24 msec,for /b/ (18 samples)it was26 msec. A ďż˝ -testshowed this smalldifferenceto be non-significant( �����(ďż˝ ďż˝5ďż˝,ďż˝-�����,ďż˝ ďż˝\ďż˝ ). Looking at the proportionofvoicing,wedofind amildly significantdifference:themeanproportionfor underlying/p/ was0.55andfor underlying/b/ was0.36( ���wďż˝5ďż˝ ďż˝#ďż˝,�������(ďż˝ ďż˝#ďż˝ ), but notethat thedifferenceis not in theexpecteddirectionsincetheunderlying/p/ behavesmorevoicedthanunderlying/b/ by this measure.This resultmight at leastin partbeexplainedbythe fact that the underlying/p/âs hadshorterdurations(mean44 msec)comparedtounderlying/b/âs (mean72 msec).If thereis a tendency to keepa constantdurationofvoicing,thiswouldresultin a largerproportionof voicing for theunderlying/p/ cases.All in all though,thereseemsto beno convincingevidencethatunderlying/b/ and/p/before/s/behavedifferentlywith respectto their surfacerealization.
With /t/ and/d/ before/s/ the story is very different. First of all, considerabso-lute duration,which averaged14 msecfor /t/ (10 samples)and34 msecfor /d/ (10samples).This differenceis significant: �{���,� �\���-�¢¥@�,� �5�B£ . Theproportionof voic-ing alsoshows a significantdifference,with a meanof 0.25 for /t/, and0.46 for /d/:�¤�@�(� �,�5���¼¥=�,� �#£ . Theproportionsof voicing for all samplesof /d/ and/t/ areshownin thebarplotin Figure3.2.While thereis clearlyoverlapbetweenthetwo categories,theconclusionseemsunequivocal: contraryto thestandarddescription,/d/ before/s/in words like gradskior akadskihasa greaterpropensitytoward beingvoiced than/t/ in the sameposition. This is differentfrom the casewith /d/ before/k/, which is
3.4. CYCLICITY IN ORTHOGRAPHY 91
unequivocallyvoiceless,andit is differentfrom thecaseof underlying/p,b/before/s/,wherewe foundnoreliabledifferencein voicingbehavior.
Should/d/ in wordsgradskibe consideredvoicedasopposedto voiceless?Thisdependsuponwhatonemeansby âvoicedâ. In Serbo-Croatian,voicedobstruentclus-tersshow clearvoicing throughout,whereas/d/ in gradskiisnever voicedthroughout. Perhapsfor this reasonthis underlying/d/ shouldbe con-sideredvoiceless. But no matterwhat the correctanswerto that questionmay be,onepoint seemsunequivocal from thedatawe have presentedhere: [ ÂŚ voice] assim-ilation is not a simpleacross-theboardphenomenon.It happensto differentdegreesin differentenvironments. Evidently, it appliesin the leastcompletefashionwhere/d/ precedes/s/, andthis fact is reflectedin the orthography:such/d/âs soundmorevoiced,andhencearewrittenas ÂĄ d § âs.So,far from beingaproblemfor Consistency,Serbo-Croatianlendsratherdetailedsupportto thenotionof a uniformORL.
Needlessto say, theresultsof this preliminaryexperimentneedto becorroboratedby a more thoroughstudyof a wider rangeof speakers. Nonetheless,I believe theburdenof proof now lies with thosewho would standby the traditionaldescriptionwhich presentsobstruentclusterdevoicing in Serbo-Croatianasa simpleacross-the-boardphenomenonthatappliesequallyin all cases.
Onefinal questionneedsto be addressed:is it possiblethat the speaker in thisexperimentwasinfluencedby theorthography, andwasthusproducingspellingpro-nunciations?More generally, might literateSerbo-Croatianspeakersbe influencedintheir applicationof obstruentdevoicing by thevery spellingthatwe areattemptingtoexplain?This would suggest,then,thatwhile in Karadzicâs time thevoicingassimila-tion wascomplete,dueto his (onceagain,peculiar)spellingof underlying/d/ as ÂĄ d §before/s/, subsequentgenerationsof speakershave beeninfluencedby the spelling,andnow differentiatethedegreeof assimilationaswehaveobserved.This is certainlypossible,but if it is so,thenonceagainit would appearto supportthenotionof Con-sistency: onemight inventawriting systemthatfails to observeConsistency, but therewill bea strongtendency on thepartof usersof thatsystemeitherto adjustthewrit-ing systemto make it moreConsistent,or elseto (unconsciously)adjusttheir speechto bring it more in line with the orthographicrepresentation.In any case,then, theConsistency hypothesisappearsto be supportedby the Serbo-Croatiandatawe havepresentedin this section.
3.4 Cyclicity in Orthography
Traditionalmodelsof Generative Phonology, including classicalSPE-stylephonolo-gy, andlater morearticulatedtheoriessuchasLexical Phonology(Mohanan,1986),includethe familiar mechanismof cyclicity. Phonologicalrules that apply cyclical-ly do so by applying in tandemwith the morphology, so that a setof phonologicalrulesis appliedaseachaffix is attached.Cyclicity is not in favor muchin present-dayphonologicaltheories.
We are interestedhere, though,not in cyclicity in phonologybut rather in or-thography. Perhapsnot surprisingly, given the dearthof formal analysesof ortho-
92 CHAPTER3. ORL DEPTHAND CONSISTENCY
graphicsystems,very little evidencehasbeenadducedin the literaturefor cyclicityin orthography. There is however one suchpotential instancein Dutch, discussedby Nunn (1998, pages102â103)23 involving the interactionbetweenOrthographicConsonantDegemination,andOrthographicSyllabification.OrthographicConsonantDegemination,roughlyspeaking,simplifiesdoubledconsonantsthatoccurwithin thesameorthographicsyllable: thusverbrand+d (burn+ed)âburntâ (adjective) is spelledÂĄ verbrand§ . OrthographicSyllabificationis arelatively complex rule in Nunnâsanal-ysis, but oneresultof the rule is to split up intervocalic geminateconsonantsif therighthandmemberof the pair can possiblybe syllabified to the right: thuswassterâwasherwomanâ is syllabifiedas [was] ¨ [ster] ¨ . Nunn givesa numberof argumentsthatdespitetheir similarity to phonologicaldegeminationandsyllabification,theset-wo processesarein factorthographicallybased.I will not repeatherargumentshere,but referthereaderto herdiscussion,in particularin Chapters3 and5.
As exampleslikewasstershow, OrthographicSyllabificationcanblockConsonantDegemination:sincethe two ÂĄ s§ âs areseparatedinto two syllables,the rule of Or-thographicConsonantDegeminationis nolongerapplicable.Ontheotherhand,formslikewijsteâwisestâ which is morphologicallywijs+st+e (wise+Superlative+Inflection)show that in somecasesSyllabificationseemsnot to block Degemination:hereonewould have expectedthe syllabification[wijs] ¨ [ste] ¨ , andthe spelling* ÂĄ wijsste§ .Nunn suggeststhat suchexamplescanbe handledif we assumethat SyllabificationandDegeminationapplycyclically. Thusin wijste, ontheinnercyclewehavewijs+st.Syllabificationhasnothingto dohere(therebeingonly oneorthographicvowel, name-ly ÂĄ ij § ), andDegeminationappliesto yield wijst. On thenext cycle e is added,andSyllabificationappliesto yield [wij] ¨ [ste] ¨ .
Is cyclic applicationin orthographya problemfor Consistency? It would be ifonecould show that the orthographiccycleswerebuilt in tandemwith phonologicalcycles.In thatcase,onecouldno longerspeakof aconsistentlevel for theORL: rathertherewouldbemultiplelevels,onefor eachcycle. However, Nunnâsevidencedoesnotseemto requirethisassumption:preciselybecausewearedealingherewith thecyclicbehavior of two orthographic rules, we have no evidencefor a crucial dependenceuponphonological cyclicity.
Nunnassumesthatherphoneme-to-graphemerulesâ the first stagein themap-pingfrom theORL â mapfrom asomewhatabstractrepresentationof morphological-ly complex words:herpresumedunderlyingspelling ÂĄ wijsste§ , canonly bederivedfrom a phonologicalrepresentationwhereonerepresentsboth the /s/ of the root andthe /s/ of thesuffix ([[[we Is]st] Š ] ) ratherthana moresurfacephonologicalrepresen-tation that representsthe effectsof phonologicaldegemination(/weIstŠ /). We couldthereforemapin onestepfrom a phonologicalrepresentationincluding morphologi-cal constituency informationsuchas[[[we Is]st] Š ] into anorthographicrepresentation[[[wijs]st]e] , which alsoincludesmorphologicalconstituency information. A cyclicapplicationof orthographicrulescould thenproceedon this orthographicrepresenta-tion, independentlyof whatevergoeson in thephonology. ThusNunnâsexampleneednot bea problemfor Consistency sinceunderthis scenariotheORL is indeeda single
23I thankAnneke Neijt for pointingmeto this example.
3.5. SURFACE ORTHOGRAPHICCONSTRAINTS 93
level of representation(in thiscaseaphonologicallyabstractone),andthecyclicity oftheorthographicrulesis entirelyinternalto theorthography.
It shouldbestressedthatNunnâscaseis theonly phenomenonthatseemsto requireacyclic treatmentin heranalysisof theorthographicsystemof Dutch,ananalysisthatincludesthirteenautonomousspelling rules (of which theseare two), anda coupleof hundredphoneme-to-grapheme rules. Theremay of coursebe further evidencefor cyclicity in orthographyin Dutch,or in otherwriting systems,but at presenttheevidenceis at bestsparse,andit is hardthereforeto concludemuchfrom it.
3.5 SurfaceOrthographic Constraints
While many aspectsof spellingarebestthoughtof in termsof a mappingfrom somelevel of linguistic structureto written form, thereareotherswhich seemto bepurelyorthographicin nature.Venezky (1970,pages59â62)termsthesegraphemicalterna-tions. Nunn (1998),aswe have alreadynoted,distinguishesphoneme-to-graphemeconversionrules(our ª��5ÂgÂŽ^ÂŻ?° ), andasetof purelyorthographicautonomousspellingrules(our ª²¹\³�°¾´ ´ ). Nunnidentifiesanumberof phenomenain Dutch spellingthatshearguesarebestdescribedin termsof rulesthatreferonly to orthographicinformation.Onesuchphenomenoninvolvestheorthographicrepresentationof phonologicallylongvowels: thesearegenerallyrepresentedasdoublevowelsin closedsyllablesasin maanâmoonâ, but assinglevowelsin opensyllables:manenâmoonsâ. Nunnstatesthis gen-eralizationasa rule thatdeletesthesecondof anidenticalpair of vowelsprecedinganorthographicsyllableboundary.
AssumingNunnâs analysisof Dutch vowel degeminationis correct,we mustas-sume,asindeedwehave,that ÂŞ Âą\³�°¾´ ´ in generalimplementsarelation,sinceit includesrewrite rulesthatchangepropertiesof theorthographicrepresentation.However, thereis goodreasonto assumethatat leastsomepurely orthographicphenomenaarebestdescribedasconstraints.As wesuggestedin Section1.2.3.1,onecanthenview ÂŞ Âą\³�°g´ ´asbreakingdown into twocomponents,oneconsistingof aregularrelation ª²¹\³�°¾´ ´�œ¸¡ Âł ,andoneconsistingof a regular languageª²¹\³�°g´ ´ ÂgÂŽfÂŹ5š¾º� . We will have nothingfurtherto sayabout ª²¹\³�°¾´ ´ ÂśR¡ Âł here:thereaderis referredto Nunnfor extensiveargumenta-tion for suchautonomousspellingrulesin Dutch.Ratherwe will focushereon a fewexamplesof surfaceorthographicconstraints.
A simpleexampleis affordedby the alternationof ÂĄ i § and ÂĄ y § in Malagasy.Both lettersrepresentthevowel /i/, but they arein complementarydistribution, ÂĄ y §occurringonly at theendsof words, ÂĄ i § only in non-finalposition.24 For example,from ÂĄ omby§ âcattleâ, onecanderivethereduplicativeform ÂĄ tsiombiomby§ (achil-drenâs gamein which the childrenplay the role of cattle)(Rajemisa-Raolison,1971,page19),wherethefirst copy of thestemombyspellsthe/i/ with ÂĄ i § , sinceit is wordinternal.
24It is temptingto think that this purelyorthographicrestrictionmayhave beenborrowedfrom English,which hasthe samerestriction,at leastwhenyou discountwordsborrowed from Greek,Latin andothersources;seeVenezkyâs discussion(page59). This is notanimplausiblesuggestion,sinceit wasBritish mis-sionaries,invited in 1817by King RadamaI, who introducedtheRomanalphabetto Madagascar, replacingtheolderArabic-derivedorthography.
94 CHAPTER3. ORL DEPTHAND CONSISTENCY
Sucha restrictionis easilymodeledby a surfacefilter (partof ª²¹\³�°¾´ ´ ÂgÂŽfÂŹ5š¾º� ) thatdisallowsword-final ÂĄ i § andnon-word-final ÂĄ y § (thewordboundarybeingdenotedwith â#â):
(3.17) ÂźR½nž*¿¸ĂHÂĄďż˝Ăç�ĂQĂĂĂ Ăž*¿¸Ă�¥�Ăç�Ÿ¤ĂQĂÂľĂAssumenow that ª�!ÂŹ#ÂgÂŽfÂŻ?° containsthefollowing rule, which maps/i/ to either ÂĄ i §or ÂĄ y § :
(3.18) i ĂĂÂĄ i §ĂĂ\ÂĄ y §Then the mapping ª²Ă!Ăďż˝Ă(Ăďż˝ĂĂďż˝ĂÂŞ ÂŤ!ÂŹ5ÂgÂŽ^ÂŻ?°ĂĂ ÂŞ ÂąY³�°¾´Ă´ will have the desiredresult ofmapping/i/ to ÂĄ i § and ÂĄ y § , andrestrictingtheseto theappropriatepositions.
Another straightforward exampleof a surfacefilter involves positionalvariantsof graphemes,which arefound in many languages;many writers would term theseallographs, thoughDaniels(1991a;1991b)hasarguedagainstthis termon theoreticalgrounds.Oneexampleis the f-lik e âlong ÂĄ s§ â which occurredexclusively in non-word-finalpositionin variousRomanscriptsdatingfrom theHalf Uncialsof thefourthcentury(Knight, 1996),aswell aslaterprintedforms;theâround ÂĄ s§ â occurredonlyin word-finalposition. Clearly this distribution canbemodeledin exactly thesameway as the distribution of ÂĄ i § and ÂĄ y § in Malagasy. The identicaldistribution isfound with Greek Ă�¥ s§ (non-finalonly) and Ă (final only). Comparableexamplesarefoundin Hebrew; andin Arabic,whichhasinitial, medialandfinal formsfor mostletters.
It is alsopossibleto considertheprohibitionon internal ĂĂÂĄ e§ in Russianto beasurfaceorthographicconstraint.However, thestatementof theconstraintis certainlymorecomplex thanthecasewe have examinedin Malagasy. For onething thestate-mentwould clearly have to restrict ĂjÂĄ e§ not to word initial position,but rathertosyllable-initial position,sincetherearenumerouscaseswhereonefindsword-internal,thoughsyllableinitial ĂHÂĄ e§ :
(3.19) Ă#Ăďż˝Ă/Ă!Ă;à �å;â�Ă/ã�ä/Ă<ÂĄ antielektron§ âantielectronâĂĽ Ă;ã�ä/ĂŚ!à �ä�Ă8ÂĄ aeroflot § âAeroflotâç�è Ăďż˝ĂĂÂĄ duet § âduetâĂŠ Ă�ã è Ăďż˝ĂĂÂĄ piruet § âpirouetteâ
In somecases,asin thecaseof âantielectronâ,thesyllableboundaryalsocorrespondsto a morphemeboundary, but this is not alwaysthecase,astheotherexamplesshow.The restrictionon the distribution of ĂÂĽÂĄ e§ can thusbe statedin termsof syllablestructure,andmore specificallyasa prohibition on ĂÂĽÂĄ e§ occurringanywherebutright-adjacentto anorthographicsyllableboundary.25 Evenso,therearestill lexicalexceptionswhich would have to bemarkedassuch,andwhich would have to beableto overridethis surfaceconstraint:acronyms, not surprisingly, regularly do so (e.g.
25As thereaderwill not fail to have observed,all of theexamplesin (3.19)areborrowedwords.But thisis not surprising,since Ă�ê eĂŤ â with theexceptionof somehigh-frequency wordssuchas Ăďż˝Ă�äQĂŞ eto ĂŤâthisâ, andits derivativesâ is mostlyfoundin borrowedwords
3.5. SURFACE ORTHOGRAPHICCONSTRAINTS 95ĂŹHĂĂŻĂŽ ÂĄ nep § for novajaekonomiceskajapolitika âNew EconomicPolicyâ); andtherearea handfulof borrowedwordsthatdo not obey theprinciple (e.g. ĂŁďż˝Ă;â�å;ĂAÂĄ reket§âracketâ). It might bebetterthereforeto considerthis to benot anabsoluteconstraint,but rathera soft constraint,onewhich couldbe implementedwith a weightedratherthan unweightedfinite-stateacceptor.26 The constraint would allow non-initial ĂÂĄ e§ , but only at somecost. If a lexical item is marked ashaving ĂĂÂĄ e§ in a non-syllable-initial position,thenit will be allowed. In all othercases,both non-initial ĂÂĄ e§ and ĂĄ&ÂĄ e§ will beallowed,but ĂPÂĄ e§ will notbeselectedsinceit will beamorecostlyanalysis.
26An alternative would beto assumepriority union(Karttunen,1998).
96 CHAPTER3. ORL DEPTHAND CONSISTENCY
3.A English Deepand Shallow ORLâs
3.A.1 Lexical representations
deep shallow
abound ĂŚĂ°bund Š�ðbaUndabundance ĂŚĂ°bu ù�ò�ó ndĂŚns Š�ðbĂ´ ndŠ\ù�þ\Ăł nsacademe Ă°ĂŚkĂŚĂśdem Ă°ĂŚkŠYù�þ5ĂłĂĂśdimacademicals ÜÌkĂŚĂ°demIkĂŚls ÜÌkŠYù�þ5ĂłĂĂ°dá mIk ŠYù�þ5Ăł lzacademicism ÜÌkĂŚĂ°demI ĂśkIsm ÜÌkŠ ù�þ5Ăł Ă°dá mI Ăśsù�ø�ó IzŠ macademic ÜÌkĂŚĂ°demIk ÜÌkŠ ù�þ5Ăł Ă°dá mIkacetone Ă°ĂŚsù�ø�ó á>Ăś ton Ă°ĂŚsù�ø�ó I ù�Ú;Ăł Ăś toUnacetonic ÜÌsù�ø�ó á>Ă° tonIk ÜÌsù�ø�ó I ù�Ú;Ăł Ă° tonIkacetylene ĂŚĂ°sù�ø�ó etI Ăą(Ăş,Ăł Ăś len Š�ðsù�ø�ó á t Š Ăą(Ăş(Ăł Ăś linacetylenic ĂŚĂśsù�ø�ó etI Ăą(Ăş,Ăł Ă° lenIk Š�Üsù�ø�ó á t Š Ăą(Ăş(Ăł Ă° l á nIkachondrite aĂ°k ù�øaĂť,Ăł ondrÄąt eI Ă°k ù�øaÝ�ó ondraItachondritic Ăś akù�øaĂť,Ăł onĂ°drÄątIk ĂśeIk ù�øaÝ�ó onĂ°drItIkacidophile Ă°ĂŚsù�ø�ó IdoĂś fÄąl Ăź ý�Þ;ÿ�� Ă°ĂŚsù�ø�ó IdoU Ăś faI l Ăź ý�Þ;ÿ��acidophilic ÜÌsù�ø�ó IdoĂ° fÄąl Ik Ăź ý�Þ;ÿ�� ÜÌsù�ø�ó IdoU Ă° f I l Ik Ăź ý�Þ;ÿ��aconite Ă°ĂŚkoĂśnÄąt Ă°ĂŚkŠYù��#ĂłĂĂśnaItaconitic ÜÌkoĂ°nÄątIk ÜÌkŠYù��#ĂłĂĂ°nItIkactinomycete ÜÌktInomi ù�ú,ó�ðket ÜÌktInoUmaI ù�ú,ó�ðsù�ø�ó itactinomycin ÜÌktInoĂ°mi ù�ú,Ăł kIn ÜÌktInoU Ă°maI ù�ú,Ăł sù�ø�ó Inactinomycosis ÜÌktInomi ù�ú,Ăł Ă°kosIs ÜÌktInoUmaI ù�ú,Ăł Ă°koUsIsadvocation ÜÌdvoĂ°katyon ÜÌdvŠ ù��5Ăł Ă°keIsŠ naeruginous á ù�þ�Ú;Ăł Ă° rugInos I ù�þ�Ú;Ăł Ă° ruËInŠ saerugo á ù�þ�Ú;Ăł Ă° rugo I ù�þ�Ú;Ăł Ă° rugoU
agnosticism ĂŚgĂ°nostI ĂśkIsm ĂŚgĂ°nostI Ăśsù�ø�ó IzŠ magnostic ĂŚgĂ°nostIk ĂŚgĂ°nostIkalbite Ă°ĂŚlbÄąt Ă°ĂŚlbaItalbitic ĂŚlĂ°bÄątIk ĂŚlĂ°bItIkalcoholicity ÜÌlkohoĂ° l IkIti ÜÌlkŠYù��5Ăł hďż˝Yù��#Ăł!Ă° l Isù�ø�ó Itialcoholic ÜÌlkoĂ°holIk ÜÌlkŠYù��5ĂłĂĂ°hďż˝Yù��#Ăł l Ikalkaline Ă°ĂŚlkĂą Ăż Ăł ĂŚĂś l Äąn Ă°ĂŚlkĂą Ăż Ăł Š\ù�þ\óïÜ laInalkalinity ÜÌlkĂą Ăż Ăł ĂŚĂ° l InIti ÜÌlkĂą Ăż Ăł Š\ù�þ\óïð l InItiallophone Ă°ĂŚlĂź ý���� oĂś fonĂź ý�Þ;ÿ�� Ă°ĂŚlĂź ý������ĂŠ ù��#Ăł Ăś foUn Ăź ý�Þ;ÿ��allophonic Ăś Ă°ĂŚlĂź ý���� oĂ° f Ă° onIk Ăź ý�Þ;ÿ�� ÜÌlĂź ý������ĂŠ ù��#Ăł Ă° fonIk Ăź ý�Þ;ÿ��allotrope Ă°ĂŚlĂź ý���� oĂś trop Ă°ĂŚlĂź ý������ĂŠ ù��#Ăł Ăś troUpallotropic ÜÌlĂź ý���� oĂ° tropIk ÜÌlĂź ý������ĂŠ ù��#Ăł Ă° tropIkammonite Ă°ĂŚmĂź ý������ oĂśnÄąt Ă°ĂŚmĂź ý����ĂŠ ù��5Ăł ĂśnaItammonitic ÜÌmĂź ý������ oĂ°nItIk ÜÌmĂź ý����ĂŠ ù��5Ăł Ă°nItIkamortization ÜÌmortÄą Ă°zatyon ÜÌmŠ ù��5Ăł rtI Ă°zeIsŠ namortize Ă°ĂŚmorĂś tÄąz Ă°ĂŚmŠ ù��5Ăł r Ăś taIzanabolite ĂŚĂ°nĂŚboĂś l Äąt Š�ðnĂŚbŠYù��5ó�Ü laItanabolitic ĂŚĂśnĂŚboĂ° l ÄątIk Š�ÜnĂŚbŠYù��5ó�ð l ItIkanecdote Ă°ĂŚná k Ăśdot Ă°ĂŚnI ù�Ú;Ăł k ĂśdoUtanecdotic ÜÌná k Ă°dotIk ÜÌnI ù�Ú;Ăł k Ă°dotIkangelic anĂ°gá l Ik ĂŚnĂ°Ë Ăˇ l Ikangel Ă° angá l Ă°eInË ÂŠ ù�Ú�ó lannounce ĂŚĂ°n Ăź ý���� uns Š�ðn Ăź ý������ aUns
3.A. ENGLISHDEEPAND SHALLOW ORLâS 97
deep shallow
annunciate ĂŚĂ°n Ăź ý������ u Ăą ò�ó nsù�ø�ó I Ăś at Š�ðn Ăź ý������tĂ´ nsù�ø�ó i Ăą ďż˝ Ăł!ĂśeItannunciation ĂŚĂśn Ăź ý������ u Ăą ò�ó nsù�ø�ó I Ă° atyon Š�Ün Ăź ý������tĂ´ nsù�ø�ó i Ăą ďż˝ Ăł!Ă°eIsŠ nanorthite ĂŚnĂ°orďż˝ Äąt ĂŚnĂ° ďż˝ r ďż˝ aItanorthitic ÜÌnorĂ° ďż˝ ÄątIk ÜÌnďż˝ r Ă° ďż˝ ItIkanthracene Ă°ĂŚnďż˝ rĂŚĂśken Ă°ĂŚnďż˝ r Š ù�þ5Ăł Ăśsù�ø�ó inanthracite Ă°ĂŚnďż˝ rĂŚĂśkÄąt Ă°ĂŚnďż˝ r Š ù�þ5Ăł Ăśsù�ø�ó aItanthracitic ÜÌnďż˝ rĂŚĂ°kÄątIk ÜÌnďż˝ r Š ù�þ5Ăł Ă°sù�ø�ó ItIkanthracoid Ă°ĂŚnďż˝ rĂŚĂśkoId Ă°ĂŚnďż˝ r Š ù�þ5Ăł ĂśkoIdanticyclone ÜÌntI Ă°sù�ø�ó i Ăą(Ăş,Ăł klon ÜÌntiù�� Ăł Ă°sù�ø�ó aI Ăą(Ăş,Ăł kloUnanticyclonic ÜÌntIsù�ø�ó i Ăą(Ăş(Ăł Ă°klonIk ÜÌntiù�� Ăł sù�ø�ó aI Ăą(Ăş,Ăł Ă°klonIkantique ĂŚnĂ° teù��vĂł k ù��\ò,Ăł ĂŚnĂ° ti ù�� Ăł k ù��\ò,Ăłantiquity ĂŚnĂ° teù��vĂł kwIti ĂŚnĂ° tIkwItiantitype Ă°ĂŚntI Ăś ti Ăą(Ăş(Ăł p Ă°ĂŚntiù�� ó�Ü taI Ăą(Ăş,Ăł pantitypic ÜÌntI Ă° ti Ăą(Ăş(Ăł pIk ÜÌntiù�� ó�ð tI ù�ú,Ăł pIkapical Ă°ĂŚpIkĂŚl Ă°ĂŚpIk ŠYù�þ\Ăł lapices Ă°ĂŚpIk ďż˝ ez Ă°ĂŚpIsù�ø�ó�� izaplite Ă°ĂŚplÄąt Ă°ĂŚplaItaplitic ĂŚpĂ° l ÄątIk ĂŚpĂ° l ItIkappeal ĂŚĂ°p Ăź ý������ eù�Ú>Ăľ\Ăł l Š�ðp Ăź ý������ i ù�Ú>Ăľ\Ăł lappellation ÜÌpĂź ý������ná>Ă° l Ăź ý������ atyon ÜÌpĂź ý�����nŠ�ð l Ăź ý����� eIsŠ nappendicectomy ĂŚĂśp Ăź ý������ná ndI Ă°k á ktomI Š�Üp Ăź ý������ á ndI Ă°sù�ø�ó á kt Š ù��5Ăł miappendicitis ĂŚĂśp Ăź ý������ná ndI Ă°kÄątIs Š�Üp Ăź ý������ á ndI Ă°sù�ø�ó aItIsappendicle ĂŚĂ°p Ăź ý������ná ndIkl Š�ðp Ăź ý������ á ndIk Š�� l ù�� Ú�óappendix ĂŚĂ°p Ăź ý������ná ndIks Š�ðp Ăź ý������ á ndIksarchangelic ÜÌrkù�øaĂť,Ăł anĂ°gá l Ik Ăś ďż˝ rk ù�øfÝ�ó ĂŚnĂ°Ë Ăˇ l Ikarchangel Ă°ĂŚrkù�øaĂť,ĂłĂĂś angá l Ă° ďż˝ rk ù�øfÝ�óĂĂśeInË ÂŠYù�Ú;Ăł larenite Ă°ĂŚrá>ĂśnÄąt Ă°ĂŚrŠ�ÜnaItarenitic ÜÌrá>Ă°nÄątIk ÜÌrŠ�ðnItIkargillite Ă°ĂŚrgI Ăś l Ăź ý���� Äąt Ă° ďż˝ r Š\ù�� ĂłĂĂś l Ăź ý������ aItargillitic ÜÌrgI Ă° l Ăź ý���� ÄątIk Ăś ďż˝ r Š ù�� Ăł Ă° l Ăź ý������ ItIkasceticism ĂŚĂ°sďż˝ ù�ø�ó á tI ĂśkIsm Š�ðsďż˝ ù�ø�ó á tI Ăśsù�ø�ó IzŠ mascetic ĂŚĂ°sďż˝ ù�ø�ó á tIk Š�ðsďż˝ ù�ø�ó á tIkasinine Ă°ĂŚsI ĂśnÄąn Ă°ĂŚsŠ ù�� Ăł ĂśnaInasininity ÜÌsI Ă°nÄąnIti ÜÌsŠ ù�� Ăł Ă°nInItiasparagine ĂŚĂ°spĂŚrĂŚĂśgeù�� Ăł n Š�ðspĂŚrŠ ù�þ5Ăł ĂśËi ù�� Ăł nasparagus ĂŚĂ°spĂŚrĂŚgUs Š�ðspĂŚrŠYù�þ5Ăł gŠYù�ò,Ăł sassignation ÜÌsĂź ý������ ÄągĂ°natyon ÜÌsĂź ý����� IgĂ°neIsŠ nassign ĂŚĂ°sĂź ý������ Äągn Š�ðsĂź ý������ aI ù�� Ăž Ăł nasymptote Ă°ĂŚsI ù�ú,Ăł m��ù��(ĂłĂĂś tot Ă°ĂŚsI Ăą(Ăş,Ăł m��ù��,ĂłĂĂś toUtasymptotic ÜÌsI ù�ú,Ăł m��ù��(ĂłĂĂ° totIk ÜÌsI Ăą(Ăş,Ăł m��ù��,ĂłĂĂ° totIkathlete Ă°ĂŚďż˝ let Ă°ĂŚďż˝ litathletic Ì��ð letIk Ì��ð l á tIkatone ĂŚĂ° ton Š�ð toUnatonic aĂ° tonIk eI Ă° tonIkatrocious ĂŚĂ° trokyos Š�ð troUsŠ s
98 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
atrocity ĂŚĂ° trokIti Š�ð trosù�ø�ó Itiaudacious ��ðdakyos ��ðdeIsŠ saudacity ��ðdakIti ��ðdĂŚsù�ø�ó Itiaugite Ă° ďż˝ gÄąt Ă° ďż˝ gaItaugitic ��ðgÄątIk ��ðgItIkaustenite Ă° ďż˝ stá?ĂśnÄąt Ă° ďż˝ stŠ�ÜnaItaustenitic Ăś ďż˝ stá?Ă°nÄątIk Ăś ďż˝ stŠ�ðnItIkaustralopithecine ��ÜstraloĂ°pI ďż˝;á>ĂśkeĂą ďż˝ Ăł n ��ÜstreI loU Ă°pI ďż˝ I ù�Ú;Ăł Ăśsù�ø�ó i ù��vĂł naustralopithecus ďż˝ sĂś traloĂ°pI ďż˝;á kUs ďż˝ sĂś treI loU Ă°pI ďż˝ I ù�Ú;Ăł k Š ù�ò�ó sauthenticity Ăś ���;á nĂ° tIkIti Ăś ����á nĂ° tIsù�ø�ó Itiauthentic ��ð ďż˝;á ntIk ��ð ��á ntIkauthorization Ăś ��� orÄą Ă°zatyon Ăś ����Š ù��5Ăł rI Ă°zeIsŠ nauthorize Ă° ��� oĂś rÄąz Ă° ����ŠYù��5óïÜraIzautomate Ă° ďż˝ toĂśmat Ă° ďż˝ t ŠYù��5óïÜmeItautomatic Ăś ďż˝ toĂ°mĂŚtIk Ăś ďż˝ t ŠYù��5óïðmĂŚtIkautophyte Ă° ďż˝ toĂś fi Ăą(Ăş,Ăł t Ăź Ă˝ĂÞ�ÿ�� Ă° ďż˝ t ŠYù��5óïÜ faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��autophytic Ăś ďż˝ toĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� Ăś ďż˝ t ŠYù��5óïð f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��autotype Ă° ďż˝ toĂś ti Ăą(Ăş,Ăł p Ă° ďż˝ t ŠYù��5óïÜ taI Ăą(Ăş(Ăł pautotypic Ăś ďż˝ toĂ° ti Ăą(Ăş,Ăł pIk Ăś ďż˝ t Š ù��5Ăł Ă° tI Ăą(Ăş,Ăł pIkavocation ÜÌvoĂ°katyon ÜÌvŠ ù��#Ăł Ă°keIsŠ nazeotrope ĂŚĂ°zI ù�Ú;Ăł oĂś trop Š�ðzi Š ù��#Ăł Ăś troUpazeotropic Ăś azI ù�Ú�ó oĂ° tropIk ĂśeIzi Š ù��5Ăł Ă° tropIkbacteriophage bĂŚkĂ° t á rIoĂś fagĂź ý�Þ�ÿ�� bĂŚkĂ° tI ù�Ú;Ăł r.i Ăą ďż˝ Ăł Š ù��5Ăł Ăś feIË Ăź ý�Þ�ÿ��bacteriophagic bĂŚkĂś t á rIoĂ° fagIk Ăź ý�Þ;ÿ�� bĂŚkĂś tI ù�Ú;Ăł r.i Ăą ďż˝ Ăł Š ù��5Ăł Ă° fĂŚ Ik Ăź ý�Þ�ÿ��balance Ă°bĂŚlĂŚns Ă°bĂŚlŠ\ù�þ\Ăł nsbale Ă°bal Ă°beI lbaroscope Ă°bĂŚroĂśskop Ă°bĂŚrŠ\ù��#ĂłĂĂśskoUpbaroscopic ĂśbĂŚroĂ°skopIk ĂśbĂŚrŠ\ù��#ĂłĂĂ°skopIkbasicity baĂ°sIkIti beI Ă°sIsù�ø�ó Itibasic Ă°basIk Ă°beIsIkbeneficence bá>Ă°ná f Ik á ns bŠ�ðná f Isù�ø�ó Š nsbeneficent bá>Ă°ná f Ik á nt bŠ�ðná f Isù�ø�ó Š ntbenefic bá>Ă°ná f Ik bŠ�ðná f Ikbiconcave bÄą Ă°konkav baI Ă°konkeIvbiconcavity ĂśbÄąkonĂ°kavIti ĂśbaIkonĂ°kĂŚvItibiophysicist ĂśbÄąoĂ° f I ù�ú,Ăł z ù��>Ăł IkIstĂź ý�Þ;ÿ�� ĂśbaIoU Ă° f I Ăą(Ăş(Ăł z ù��>Ăł Isù�ø�ó IstĂź ý�Þ;ÿ��biophysics ĂśbÄąoĂ° f I ù�ú,Ăł z ù��>Ăł Ik ďż˝ sĂź Ă˝ĂÞ�ÿ�� ĂśbaIoU Ă° f I Ăą(Ăş(Ăł z ù��>Ăł Ik ďż˝ sĂź ý�Þ;ÿ��biotite Ă°bÄąoĂś tÄąt Ă°baI Š\ù��#ĂłĂĂś taItbiotitic ĂśbÄąoĂ° tÄątIk ĂśbaI Š\ù��#ĂłĂĂ° tItIkbiotype Ă°bÄąoĂś ti Ăą(Ăş,Ăł p Ă°baI Š\ù��#ĂłĂĂś taI Ăą(Ăş,Ăł pbiotypic ĂśbÄąoĂ° ti Ăą(Ăş,Ăł pIk ĂśbaI Š\ù��#ĂłĂĂ° tI ù�ú,Ăł pIkbiquadrate bÄą Ă°kwĂŚdrat baI Ă°kwodreItbiquadratic ĂśbÄąkwĂŚĂ°dratIk ĂśbaIkwoĂ°drĂŚtIkbreve Ă°brev Ă°brivbrevity Ă°brevIti Ă°brá vIti
3.A. ENGLISHDEEPAND SHALLOW ORLâS 99
deep shallow
bromide Ă°bromÄąd Ă°broUmaIdbromidic broĂ°mÄądIk broU Ă°mIdIkbronchoscope Ă°bronkù�øaĂť,Ăł oĂśskop Ă°broďż˝ k ù�øaĂť,Ăł Š\ù��#Ăł!ĂśskoUpbronchoscopic Ăśbronkù�øaĂť,Ăł oĂ°skopIk Ăśbroďż˝ k ù�øaĂť,Ăł Š\ù��#Ăł!Ă°skopIkbryophyte Ă°bri ù�ú,Ăł oĂś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� Ă°braI ù�ú,Ăł Š ù��#Ăł Ăś faI ù�ú,Ăł t Ăź Ă˝ĂÞ�ÿ��bryophytic Ăśbri ù�ú,Ăł oĂ° fi ù�ú,Ăł tIk Ăź ý�Þ�ÿ�� ĂśbraI ù�ú,Ăł Š ù��#Ăł Ă° f I Ăą(Ăş(Ăł tIk Ăź ý�Þ�ÿ��calcination ĂśkĂŚlkI Ă°natyon ĂśkĂŚlsù�ø�ó I Ă°neIsŠ ncalcine Ă°kĂŚlkÄąn Ă°kĂŚlsù�ø�ó aIncalcite Ă°kĂŚlĂśkÄąt Ă°kĂŚlĂśsù�ø�ó aItcalcitic kĂŚlĂ°kItIk kĂŚlĂ°sù�ø�ó ItIkcalices Ă°kĂŚlI Ăśk ďż˝ ez Ă°kĂŚlI Ăśsù�ø�ó ďż˝ izcalicle Ă°kĂŚlIkl Ă°kĂŚlIk Š lcalyces Ă°kĂŚlI Ăą(Ăş,ĂłĂĂśk ďż˝ ez Ă°kĂŚlI Ăą(Ăş,ĂłĂĂśsù�ø�ó�� izcalycine Ă°kĂŚlI Ăą(Ăş,Ăł kIn��ù�Ú;Ăł Ă°kĂŚlI Ăą(Ăş,Ăł sù�ø�ó In��ù�Ú;Ăłcalycle Ă°kĂŚlI Ăą(Ăş,Ăł kl Ă°kĂŚlI Ăą(Ăş,Ăł k Š lcapacious kĂŚĂ°pakyos k ŠYù�þ\Ăł!Ă°peIsŠ scapacity kĂŚĂ°pakIti k ŠYù�þ\Ăł!Ă°pĂŚsù�ø�ó Iticapitalization ĂśkĂŚpItĂŚlÄą Ă°zatyon ĂśkĂŚpIt ŠYù�þ\Ăł l I Ă°zeIsŠ ncapitalize Ă°kĂŚpItĂŚĂś l Äąz Ă°kĂŚpIt Š ù�þ\Ăł Ăś laIzcapitation ĂśkĂŚpI Ă° tatyon ĂśkĂŚpI Ă° teIsŠ ncaput Ă°kapUt Ă°keIpŠ Ăą ò�ó tcarbonization ĂśkĂŚrbonÄą Ă°zatyon Ăśk ďż˝ rbŠ ù��#Ăł nI Ă°zeIsŠ ncarbonize Ă°kĂŚrboĂśnÄąz Ă°k ďż˝ rbŠ ù��#Ăł ĂśnaIzcathode Ă°kĂŚďż˝ od Ă°kĂŚďż˝ oUdcathodic kĂŚĂ° ďż˝ odIk kĂŚĂ° ďż˝ odIkcatholicity ĂśkĂŚďż˝ oĂ° l IkIti ĂśkÌ��Š ù��#Ăł Ă° l Isù�ø�ó Iticatholic Ă°kĂŚďż˝ olIk Ă°kÌ��ŠYù��#Ăł l Ikcausticity k ��ðstIkIti k ��ðstIsù�ø�ó Iticaustic Ă°k ďż˝ stIk Ă°k ďż˝ stIkcave Ă°kav Ă°keIvcavity Ă°kavIti Ă°kĂŚvIticease Ă°sù�ø�ó eù�Ú>Ăľ\Ăł s Ă°sù�ø�ó i ù�Ú>Ăľ\Ăł scenobite Ă°sù�ø�ó enoĂśbÄąt Ă°sù�ø�ó in Š ù��5Ăł ĂśbaItcenobitic Ăśsù�ø�ó enoĂ°bÄątIk Ăśsù�ø�ó in Š ù��5Ăł Ă°bItIkcenocyte Ă°sù�ø�ó enoĂśsù�ø�ó i ù�ú,Ăł t Ă°sù�ø�ó in Š ù��5Ăł Ăśsù�ø�ó aI Ăą(Ăş,Ăł tcenocytic Ăśsù�ø�ó enoĂ°sù�ø�ó i ù�ú,Ăł tIk Ăśsù�ø�ó in Š ù��5Ăł Ă°sù�ø�ó I ù�ú,Ăł tIkcentricity sù�ø�ó á nĂ° trIkIti sù�ø�ó á nĂ° trIsù�ø�ó Iticentric Ă°sù�ø�ó á ntrIk Ă°sù�ø�ó á ntrIkcentrosome Ă°sù�ø�ó á ntroĂśsom Ă°sù�ø�ó á ntrŠ ù��5Ăł ĂśsoUmcentrosomic Ăśsù�ø�óïá ntroĂ°somIk Ăśsù�ø�óïá ntrŠYù��5óïðsomIkcercopithecid Ăśsù�ø�óïá rkopI Ă° ďż˝ ekId Ăśsù�ø�ó r.koUpI Ă° ďż˝ is ù�ø�ó Idcercopithecoid Ăśsù�ø�óïá rkopI Ă° ďż˝ ekoId Ăśsù�ø�ó r.koUpI Ă° ďż˝ ikoIdcervical Ă°sù�ø�óïá rvIkĂŚl Ă°sù�ø�ó r.vIk ŠYù�þ5Ăł lcervicitis Ăśsù�ø�óïá rvI Ă°kÄątIs Ăśsù�ø�ó r.vI Ă°sù�ø�ó aItIscervix Ă°sù�ø�ó á rvIks Ă°sù�ø�ó r.vIks
100 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
cessation sù�ø�ó eĂ°sĂź ý���� atyon sù�ø�óïá>Ă°sĂź ý����� eIsŠ ncharacterization Ăśk ù�øaÝ�ó ĂŚrĂŚktá rÄą Ă°zatyon Ăśk ù�øaĂť,Ăł ĂŚrI ù�þ\Ăł ktr.I Ă°zeIsŠ ncharacterize Ă°k ù�øaÝ�ó ĂŚrĂŚktá>Ăś rÄąz Ă°k ù�øaĂť,Ăł ĂŚrI ù�þ\Ăł kt Š�ÜraIzchaste Ă° cast Ă° ceIstchastity Ă° castIti Ă° cĂŚstItichondrite Ă°k ù�øaÝ�ó ondrÄąt Ă°k ù�øaĂť,Ăł ondraItchondritic k ù�øaÝ�ó onĂ°drÄątIk k ù�øaĂť,Ăł onĂ°drItIkchromate Ă°k ù�øaÝ�ó romat Ă°k ù�øaĂť,Ăł roUmeItchromaticism k ù�øaÝ�ó roĂ°matI ĂśkIsm k ù�øaĂť,Ăł roU Ă°mĂŚtI Ăśsù�ø�ó IzŠ mchromatic k ù�øaÝ�ó roĂ°matIk k ù�øaĂť,Ăł roU Ă°mĂŚtIkchronicity k ù�øaÝ�ó roĂ°nIkIti k ù�øaĂť,Ăł roĂ°nIsù�ø�ó Itichronic Ă°k ù�øaÝ�ó ronIk Ă°k ù�øaĂť,Ăł ronIkchronoscope Ă°k ù�øaÝ�ó ronoĂśskop Ă°k ù�øaĂť,Ăł ronŠYù��5ĂłĂĂśskoUpchronoscopic Ăśk ù�øaÝ�ó ronoĂ°skopIk Ăśk ù�øaĂť,Ăł ronŠYù��5ĂłĂĂ°skopIkchrysolite Ă°k ù�øaÝ�ó rI Ăą(Ăş(Ăł soĂś l Äąt Ă°k ù�øaĂť,Ăł rI Ăą(Ăş,Ăł sŠYù��#ó�Ü laItchrysolitic Ăśk ù�øaÝ�ó rI Ăą(Ăş(Ăł soĂ° l ÄątIk Ăśk ù�øaĂť,Ăł rI Ăą(Ăş,Ăł sŠYù��#ó�ð l ItIkcivilization Ăśsù�ø�ó IvI l Äą Ă°zatyon Ăśsù�ø�ó Iv ŠYù�� Ăł l I Ă°zeIsŠ ncivilize Ă°sù�ø�ó IvI Ăś l Äąz Ă°sù�ø�ó Iv ŠYù�� ó�Ü laIzclassicism Ă°klĂŚsĂź ý���� I ĂśkIsm Ă°klĂŚsĂź ý������ I Ăśsù�ø�ó IzŠ mclassicist Ă°klĂŚsĂź ý���� IkIst Ă°klĂŚsĂź ý������ Isù�ø�ó Istclassic Ă°klĂŚsĂź ý���� Ik Ă°klĂŚsĂź ý������ Ikclone Ă°klon Ă°kloUnclonic Ă°klonIk Ă°klonIkcognizance Ă°kognÄązĂŚns Ă°kognIzŠ ù�þ\Ăł nscognize Ă°kognÄąz Ă°kognaIzcoincidence koĂ° Insù�ø�ó Äądá ns koU Ă° Insù�ø�ó IdŠ nscoincide ĂśkoInĂ°sù�ø�ó Äąd ĂśkoUInĂ°sù�ø�ó aIdcolic Ă°kolIk Ă°kolIkcollotype Ă°kol Ăź ý������ oĂś ti Ăą(Ăş,Ăł p Ă°kol Ăź ý����� Š\ù��#ó�Ü taI ù�ú,Ăł pcollotypic Ăśkol Ăź ý������ oĂ° ti Ăą(Ăş,Ăł pIk Ăśkol Ăź ý����� Š\ù��#ó�ð tI Ăą(Ăş,Ăł pIkcolonic koĂ° lonIk koU Ă° lonIkcolonization ĂśkolonÄą Ă°zatyon Ăśkol Š ù��#Ăł nI Ă°zeIsŠ ncolonize Ă°koloĂśnÄąz Ă°kol Š ù��#Ăł ĂśnaIzcolon Ă°kolon Ă°koUl Š ù��5Ăł ncombination ĂśkombI Ă°natyon ĂśkombŠ ù�� Ăł Ă°neIsŠ ncombine komĂ°bÄąn k Š ù��#Ăł mĂ°baIncommode koĂ°m Ăź ý���� od k Š ù��#Ăł Ă°m Ăź ý���� oUdcommodity koĂ°m Ăź ý���� odIti k Š ù��#Ăł Ă°m Ăź ý���� odIticompilation ĂśkompÄą Ă° latyon ĂśkompŠ ù�� Ăł Ă° leIsŠ ncompile komĂ°pÄąl k ŠYù��#Ăł mĂ°paI lconcave konĂ°kav konĂ°keIvconcavity konĂ°kavIti konĂ°kĂŚvIticonceal konĂ°sù�ø�ó eù�Ú>Ăľ\Ăł l k ŠYù��#Ăł nĂ°sù�ø�ó i ù�Ú>Ăľ\Ăł lcone Ă°kon Ă°koUnconfidence Ă°konfÄądá ns Ă°konfIdŠ ns
3.A. ENGLISHDEEPAND SHALLOW ORLâS 101
deep shallow
confide konĂ° fÄąd k ŠYù��5Ăł nĂ° faIdcongeal konĂ°geù�Ú>Ăľ\Ăł l k ŠYù��5Ăł nĂ°Ëi ù�Ú^Ăľ5Ăł lcongelation ĂśkongeĂ° latyon Ăśkon Š�ð leIsŠ nconic Ă°konIk Ă°konIkconsignation ĂśkonsÄągĂ°natyon ĂśkonsIgĂ°neIsŠ nconsign konĂ° sÄągn k Š ù��5Ăł nĂ°saI ù�� Ăž Ăł nconsolation ĂśkonsoĂ° latyon ĂśkonsŠ ù��5Ăł Ă° leIsŠ ncontravene ĂśkontrĂŚĂ°ven ĂśkontrŠ ù�þ5Ăł Ă°vincontravention ĂśkontrĂŚĂ°v á ntyon ĂśkontrŠ ù�þ5Ăł Ă°v á ncŠ nconvene konĂ°ven k Š ù��5Ăł nĂ°vinconvention konĂ°v á ntyon k Š ù��5Ăł nĂ°v á ncŠ nconvocation ĂśkonvoĂ°katyon Ăśkonv Š\ù��#ó�ðkeIsŠ nconvoke konĂ°vok k ŠYù��5Ăł nĂ°voUkcormophyte Ă°kormoĂś fi ù�ú,Ăł t Ăź ý�Þ�ÿ�� Ă°k ďż˝ rmŠYù��#ĂłĂĂś faI ù�ú,Ăł t Ăź ý�Þ;ÿ��cormophytic ĂśkormoĂ° fi ù�ú,Ăł tIk Ăź ý�Þ;ÿ�� Ăśk ďż˝ rmŠYù��#ĂłĂĂ° f I Ăą(Ăş,Ăł tIk Ăź Ă˝ĂÞ�ÿ��creophagous krI ù�Ú;ĂłĂĂ°ofagosĂź ý�Þ;ÿ�� kri Ă°of ŠYù�þ\Ăł gŠ sĂź ý�Þ;ÿ��creophagy krI ù�Ú;Ăł Ă°ofagI Ăź Ă˝ĂÞ�ÿ�� kri Ă°of Š ù�þ\Ăł Ëi Ăź ý�Þ;ÿ��creosote Ă°krI ù�Ú;Ăł oĂśsot Ă°kri Š ù��5Ăł ĂśsoUtcreosotic ĂśkrI ù�Ú;Ăł oĂ°sotIk Ăśkri Š ù��5Ăł Ă°sotIkcriticism Ă°krItI ĂśkIsm Ă°krItI Ăśsù�ø�ó IzŠ mcriticize Ă°krItI ĂśkÄąz Ă°krItI Ăśsù�ø�ó aIzcritic Ă°krItIk Ă°krItIkcrocein Ă°krokeIn Ă°kroUsù�ø�ó i Incrocus Ă°krokUs Ă°kroUk ŠYù�ò,Ăł scryoscope Ă°kri ù�ú,Ăł oĂśskop Ă°kraI ù�ú,ó�ŠYù��#Ăł!ĂśskoUpcryoscopic Ăśkri ù�ú,Ăł oĂ°skopIk ĂśkraI ù�ú,ó�ŠYù��#Ăł!Ă°skopIkcrystallite Ă°krI Ăą(Ăş,Ăł stĂŚĂś l Ăź ý������ Äąt Ă°krI Ăą(Ăş,Ăł stŠYù�þ\óïÜ l Ăź ý���� aItcrystallitic ĂśkrI Ăą(Ăş,Ăł stĂŚĂ° l Ăź ý������ ÄątIk ĂśkrI Ăą(Ăş,Ăł stŠYù�þ\óïð l Ăź ý���� ItIkcyanite Ă°sù�ø�ó i Ăą(Ăş,Ăł ĂŚĂśnÄąt Ă°sù�ø�ó aI Ăą(Ăş,Ăł Š ù�þ\Ăł ĂśnaItcyanitic Ăśsù�ø�ó i Ăą(Ăş,Ăł ĂŚĂ°nÄątIk Ăśsù�ø�ó aI Ăą(Ăş,Ăł Š ù�þ\Ăł Ă°nItIkcyclone Ă°sù�ø�ó i Ăą(Ăş,Ăł klon Ă°sù�ø�ó aI Ăą(Ăş,Ăł kloUncyclonic sù�ø�ó i Ăą(Ăş,Ăł Ă°klonIk sù�ø�ó aI Ăą(Ăş,Ăł Ă°klonIkcynicism Ă°sù�ø�ó I ù�ú,Ăł nI ĂśkIsm Ă°sù�ø�ó I ù�ú,Ăł nI Ăśsù�ø�ó IzŠ mcynic Ă°sù�ø�ó I ù�ú,Ăł nIk Ă°sù�ø�ó I ù�ú,Ăł nIkcystoscope Ă°sù�ø�ó I ù�ú,Ăł stoĂśskop Ă°sù�ø�ó I ù�ú,Ăł stŠ ù��5Ăł ĂśskoUpcystoscopic Ăśsù�ø�ó I ù�ú,Ăł stoĂ°skopIk Ăśsù�ø�ó I ù�ú,Ăł stŠ ù��5Ăł Ă°skopIkdeclination Ăśdá klÄą Ă°natyon Ăśdá kl ŠYĂą ďż˝ Ăł!Ă°neIsŠ ndecline dá>Ă°klÄąn dI Ă°klaIndendrite Ă°dá ndrÄąt Ă°dá ndraItdendritic dá nĂ°drÄątIk dá nĂ°drItIkdenounce dá>Ă°nuns dI Ă°naUnsdenunciate dá>Ă°nu ù�ò,Ăł nsù�ø�ó I Ăś at dI Ă°nĂ´ nsù�ø�ó i ù�� Ăł ĂśeItdenunciation dá>Ăśnu ù�ò,Ăł nsù�ø�ó I Ă° atyon dI ĂśnĂ´ nsù�ø�ó i ù�� Ăł Ă°eIsŠ ndeprave dá>Ă°prav dI Ă°preIvdepravity dá>Ă°pravIti dI Ă°prĂŚvIti
102 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
deprivation Ăśdá prÄą Ă°vatyon Ăśdá prŠYù�� ĂłĂĂ°veIsŠ ndeprive dá>Ă°prÄąv dI Ă°praIvderivation Ăśdá rÄą Ă°vatyon Ăśdá r ŠYĂą ďż˝ Ăł!Ă°veIsŠ nderive dá>Ă°rÄąv dI Ă°raIvdermatome Ă°dá rmÜÌtom Ă°dr.mŠ ù�þ5Ăł Ăś toUmdermatomic Ăśdá rmĂŚĂ° tomIk Ăśdr.mŠ ù�þ5Ăł Ă° tomIkdermatophyte Ă°dá rmĂŚtĂśofi Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ�� Ă°dr.mŠ ù�þ5Ăł t Š ù��#Ăł Ăś faI ù�ú,Ăł t Ăź ý�Þ;ÿ��dermatophytic Ăśdá rmĂŚtoĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� Ăśdr.mŠ ù�þ5Ăł t Š ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��desensitization deĂśsá nsItI Ă°zatyon di Ăśsá nsItI Ă°zeIsŠ ndesensitize deĂ°sá nsI Ăś tÄąz di Ă°sá nsI Ăś taIzdesignation Ăśdá sÄągĂ°natyon Ăśdá z ù��>Ăł IgĂ°neIsŠ ndesign dá>Ă° sÄągn dI Ă°z ù��>Ăł aI ù�� Ăž Ăł ndeuteranope Ă°duù�Ú?ò,Ăł t á rĂŚĂśnop Ă°duù�Ú?ò�ó tr. Š\ù�þ\ó�ÜnoUpdeuteranopic Ăśduù�Ú?ò,Ăł t á rĂŚĂ°nopIk Ăśduù�Ú?ò�ó tr. Š\ù�þ\ó�ðnopIkdiaphone Ă°dĹÌÜ fonĂź Ă˝ĂÞ�ÿ�� Ă°daI ŠYù�þ\ĂłĂĂś foUn Ăź ý�Þ�ÿ��diaphonic ĂśdĹÌð fonIk Ăź ý�Þ;ÿ�� ĂśdaI ŠYù�þ\ĂłĂĂ° fonIk Ăź ý�Þ;ÿ��dibasicity ĂśdÄąbaĂ°sIkIti ĂśdaIbeI Ă°sIsù�ø�ó Itidibasic dÄą Ă°basIk daI Ă°beIsIkdichroite Ă°dÄąk ù�øaĂť,Ăł roĂś Äąt Ă°daIk ù�øaÝ�ó roU ĂśaItdichroitic ĂśdÄąk ù�øaĂť,Ăł roĂ° ÄątIk ĂśdaIk ù�øaÝ�ó roU Ă° ItIkdichromate dÄą Ă°k ù�øaĂť,Ăł romat daI Ă°k ù�øaÝ�ó roUmeItdichromaticism ĂśdÄąk ù�øaĂť,Ăł roĂ°matI ĂśkIsm ĂśdaIk ù�øaÝ�ó roU Ă°mĂŚtI Ăśsù�ø�ó IzŠ mdichromatic ĂśdÄąk ù�øaĂť,Ăł roĂ°matIk ĂśdaIk ù�øaÝ�ó roU Ă°mĂŚtIkdichroscope Ă°dÄąk ù�øaĂť,Ăł roĂśskop Ă°daIk ù�øaÝ�ó r Š ù��#Ăł ĂśskoUpdichroscopic ĂśdÄąk ù�øaĂť,Ăł roĂ°skopIk ĂśdaIk ù�øaÝ�ó r Š ù��#Ăł Ă°skopIkdiorite Ă°dÄąoĂśrÄąt Ă°daI Š ù��5Ăł Ăś raItdioritic ĂśdÄąoĂ°rItIk ĂśdaI ŠYù��5ĂłĂĂ° rItIkdiscommode ĂśdIskoĂ°m Ăź ý������ od ĂśdIskŠ\ù��#ĂłĂĂ°m Ăź ý����� oUddiscommodity ĂśdIskoĂ°m Ăź ý������ odIti ĂśdIskŠ\ù��#ĂłĂĂ°m Ăź ý����� odItidisinclination ĂśdIsInklÄą Ă°natyon ĂśdIsInklI Ă°neIsŠ ndisincline ĂśdIsInĂ°klÄąn ĂśdIsInĂ°klaIndivination ĂśdIvÄą Ă°natyon ĂśdIv Š Ăą ďż˝ Ăł Ă°neIsŠ ndivine dI Ă°vÄąn dI ù�� Ăł Ă°vaIndivinity dI Ă°vÄąnIti dI ù�� Ăł Ă°vInItidolerite Ă°dolá>ĂśrÄąt Ă°dolŠ�ÜraItdoleritic Ăśdolá>Ă°rÄątIk ĂśdolŠ�ðrItIkdramatization ĂśdrĂŚmĂŚtÄą Ă°zatyon ĂśdrĂŚmŠ ù�þ\Ăł tI Ă°zeIsŠ ndramatize Ă°drĂŚmĂŚĂś tÄąz Ă°drĂŚmŠ ù�þ\Ăł Ăś taIzdynamite Ă°di ù�ú,Ăł nĂŚĂśmÄąt Ă°daI Ăą(Ăş(Ăł nŠ ù�þ5Ăł ĂśmaItdynamitic Ăśdi ù�ú,Ăł nĂŚĂ°mÄątIk ĂśdaI Ăą(Ăş(Ăł nŠYù�þ5Ăł!Ă°mItIkecclesiastical á>Ăśk Ăź ý������ lezù��>Ăł I Ă°ĂŚstIkĂŚl I ù�Ú�ó�Ük Ăź ý������ liz ù��>Ăł i ù�� ó�ðÌstIk ŠYù�þ\Ăł lecclesiasticism á>Ăśk Ăź ý������ lezù��>Ăł I Ă°ĂŚstI ĂśkIsm I ù�Ú�ó�Ük Ăź ý������ liz ù��>Ăł i ù�� ó�ðÌstI Ăśsù�ø�ó IzŠ mecclesiastic á>Ăśk Ăź ý������ lezù��>Ăł I Ă°ĂŚstIk I ù�Ú�ó�Ük Ăź ý������ liz ù��>Ăł i ù�� ó�ðÌstIkeclecticism á>Ă°kl á ktI ĂśkIsm I ù�Ú�ó�ðkl á ktI Ăśsù�ø�ó IzŠ meclectic á>Ă°kl á ktIk I ù�Ú�ó Ă°kl á ktIk
3.A. ENGLISHDEEPAND SHALLOW ORLâS 103
deep shallow
ecotype Ă° á koĂś ti ù�ú,Ăł p Ă° á k Š\ù��#ó�Ü taI ù�ú,Ăł pecotypic Ăś á koĂ° ti ù�ú,Ăł pIk Ăś á k Š\ù��#ó�ð tI Ăą(Ăş,Ăł pIkectoparasite Ăś á ktoĂ°pĂŚrĂŚĂś sÄąt Ăś á ktoU Ă°pĂŚrŠYù�þ5ĂłĂĂśsaItectoparasitic Ăś á ktoĂśpĂŚrĂŚĂ° sÄątIk Ăś á ktoU ĂśpĂŚrŠYù�þ5ĂłĂĂ°sItIkedacious á>Ă°dakyos I ù�Ú;Ăł Ă°deIsŠ sedacity á>Ă°dakIti I ù�Ú;Ăł Ă°dĂŚsù�ø�ó Itielasticity á lĂŚĂ°stIkIti I ù�Ú;Ăł lĂŚĂ°stIsù�ø�ó Itielasticize á>Ă° lĂŚstI ĂśkÄąz I ù�Ú;Ăł Ă° lĂŚstI Ăśsù�ø�ó aIzelastic á>Ă° lĂŚstIk I ù�Ú;Ăł Ă° lĂŚstIkelectrical á>Ă° l á ktrIkĂŚl I ù�Ú;Ăł Ă° l á ktrIk Š ù�þ5Ăł lelectricity á l á k Ă° trIkIti I ù�Ú;Ăł l á k Ă° trIsù�ø�ó Itielectric á>Ă° l á ktrIk I ù�Ú;Ăł Ă° l á ktrIkelectrolyte á>Ă° l á ktroĂś li ù�ú,Ăł t I ù�Ú;óïð l á ktr ŠYù��5Ăł!Ăś laI Ăą(Ăş,Ăł telectrolytic á>Ăś l á ktroĂ° li ù�ú,Ăł tIk I ù�Ú;óïÜ l á ktr ŠYù��5Ăł!Ă° l I ù�ú,Ăł tIkelectrophone á>Ă° l á ktroĂś fonĂź Ă˝ĂÞ�ÿ�� I ù�Ú;óïð l á ktr ŠYù��5Ăł!Ăś foUn Ăź ý�Þ;ÿ��electrophonic á>Ăś l á ktroĂ° fonIk Ăź Ă˝ĂÞ�ÿ�� I ù�Ú;óïÜ l á ktr ŠYù��5Ăł!Ă° fonIk Ăź ý�Þ;ÿ��electroscope á>Ă° l á ktroĂśskop I ù�Ú;óïð l á ktr ŠYù��5Ăł!ĂśskoUpelectroscopic á>Ăś l á ktroĂ°skopIk I ù�Ú;óïÜ l á ktr ŠYù��5Ăł!Ă°skopIkelliptical á>Ă° l Ăź ý������ IptIkĂŚl I ù�Ú;Ăł Ă° l Ăź ý���� IptIk Š ù�þ\Ăł lellipticity á l Ăź ý������ IpĂ° tIkIti I ù�Ú;Ăł l Ăź ý���� IpĂ° tIsù�ø�ó Itiempiricism á mĂ°pIrI ĂśkIsm á mĂ°pIrI Ăśsù�ø�ó IzŠ mempiric á mĂ°pIrIk á mĂ°pIrIkendoparasite Ăś á ndoĂ°pĂŚrĂŚĂś sÄąt Ăś á ndoU Ă°pĂŚrŠ ù�þ5Ăł ĂśsaItendoparasitic Ăś á ndoĂśpĂŚrĂŚĂ° sÄątIk Ăś á ndoU ĂśpĂŚrŠ ù�þ5Ăł Ă°sItIkendophyte Ă° á ndoĂś fi Ăą(Ăş,Ăł t Ăź Ă˝ĂÞ�ÿ�� Ă° á ndŠ ù��#Ăł Ăś faI ù�ú,Ăł t Ăź Ă˝ĂÞ�ÿ��endophytic Ăś á ndoĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� Ăś á ndŠ ù��#Ăł Ă° f I Ăą(Ăş(Ăł tIk Ăź ý�Þ�ÿ��endoscope Ă° á ndoĂśskop Ă° á ndŠYù��#Ăł!ĂśskoUpendoscopic Ăś á ndoĂ°skopIk Ăś á ndŠYù��#Ăł!Ă°skopIkenounce á>Ă°nuns I ù�Ú;óïðnaUnsentophyte Ă° á ntoĂś fi Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ�� Ă° á ntŠYù��5óïÜ faI Ăą(Ăş,Ăł t Ăź ý�Þ�ÿ��entophytic Ăś á ntoĂ° fi Ăą(Ăş(Ăł tIk Ăź ý�Þ;ÿ�� Ăś á ntŠYù��5óïð f I ù�ú,Ăł tIk Ăź ý�Þ;ÿ��enunciable á>Ă°nu Ăą ò�ó nsù�ø�ó IĂŚbI l I ù�Ú;Ăł Ă°nĂ´ nsù�ø�ó i Ăą ďż˝ Ăł Š ù�þ\Ăł bŠ lenunciate á>Ă°nu Ăą ò�ó nsù�ø�ó I Ăś at I ù�Ú;Ăł Ă°nĂ´ nsù�ø�ó i Ăą ďż˝ Ăł ĂśeItenunciation á>Ăśnu Ăą ò�ó nsù�ø�ó I Ă° atyon I ù�Ú;Ăł ĂśnĂ´ nsù�ø�ó i Ăą ďż˝ Ăł Ă°eIsŠ nepidote Ă° á pI Ăśdot Ă° á pI ĂśdoUtepidotic Ăś á pI Ă°dotIk Ăś á pI Ă°dotIkepiphyte Ă° á pI Ăś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� Ă° á pŠ ù�� Ăł Ăś faI Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ��epiphytic Ăś á pI Ă° fi ù�ú,Ăł tIk Ăź ý�Þ�ÿ�� Ăś á pŠ ù�� Ăł Ă° f I ù�ú,Ăł tIk Ăź ý�Þ;ÿ��episode Ă° á pI Ăśsod Ă° á pŠ ù�� Ăł ĂśsoUdepisodic Ăś á pI Ă°sodIk Ăś á pŠ\ù�� óïðsodIkequivocation á>ĂśkwIvoĂ°katyon I ù�Ú;óïÜkwIv ŠYù��#ĂłĂĂ°keIsŠ nequivoke Ă° á kwI Ăśvok Ă° á kw Š\ù�� óïÜvoUkeremite Ă° á r á>ĂśmÄąt Ă° á r Š�ÜmaIteremitic Ăś á r á>Ă°mÄątIk Ăś á r Š�ðmItIkeroticism á>Ă° rotI ĂśkIsm Š ù�Ú;Ăł Ă°rotI Ăśsù�ø�ó IzŠ m
104 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
erotic á?Ă°rotIk ŠYù�Ú;ó�ð rotIkerythrocyte á?Ă°rI ù�ú,ó�� roĂśsù�ø�ó i Ăą(Ăş,Ăł t I ù�Ú;ó�ð rI ù�ú,ó�� r ŠYù��5Ăł!Ăśsù�ø�ó aI Ăą(Ăş,Ăł terythrocytic á?ĂśrI ù�ú,ó�� roĂ°sù�ø�ó i Ăą(Ăş,Ăł tIk I ù�Ú;ó�Ü rI ù�ú,ó�� r ŠYù��5Ăł!Ă°sù�ø�ó I ù�ú,Ăł tIkesophageal á?ĂśsofaĂ°gI ù�Ú;Ăł ĂŚlĂź ý�Þ�ÿ�� I ù�Ú;ó�ÜsofŠYù�þ5ĂłĂĂ°Ëi ŠYù�þ\Ăł l Ăź ý�Þ�ÿ��esophagus á?Ă°sofagUsĂź ý�Þ;ÿ�� I ù�Ú;Ăł Ă°sofŠ ù�þ5Ăł gŠ Ăą ò�ó sĂź ý�Þ�ÿ��esthete Ă° á sďż˝ et Ă° á sďż˝ itestheticism á sĂ° ďż˝ etI ĂśkIsm á sĂ° ďż˝;á tI Ăśsù�ø�ó IzŠ mesthetic á sĂ° ďż˝ etIk á sĂ° ďż˝;á tIkethicize Ă° � I ĂśkÄąz Ă° � I Ăśsù�ø�ó aIzethic Ă° � Ik Ă° � Ikevocation Ăś á voĂ°katyon Ăś á v Š ù��5Ăł Ă°keIsŠ nevoke á?Ă°vok I ù�Ú;ó�ðvoUkexegete Ă° á ksá?Ăśget Ă° á ksI ù�Ú;ó�ÜËitexegetic Ăś á ksá?Ă°getIk Ăś á ksI ù�Ú;Ăłďż˝Ă°Ë Ăˇ tIkexile Ă° á gzÄąl Ă° á gzaI lexilic á?Ă°gzÄąl Ik á>Ă°gzI l Ikextreme á?Ă°kstrem I ù�Ú;Ăł Ă°kstrimextremity á?Ă°kstremIti I ù�Ú;Ăł Ă°kstrá mItifalciform Ă° fĂŚlkI Ăś form Ă° fĂŚlsù�ø�ó Š ù�� Ăł Ăś f ďż˝ rmfalcon Ă° fĂŚlkon Ă° fĂŚlkŠ ù��5Ăł nfanaticism fĂŚĂ°nĂŚtI ĂśkIsm f Š ù�þ\Ăł Ă°nĂŚtI Ăśsù�ø�ó IzŠ mfanaticize fĂŚĂ°nĂŚtI ĂśkÄąz f Š ù�þ\Ăł Ă°nĂŚtI Ăśsù�ø�ó aIzfanatic fĂŚĂ°nĂŚtIk f ŠYù�þ\Ăł!Ă°nĂŚtIkfasciation Ăś fas��ù�ø�ó I Ă° atyon Ăś fĂŚsù��gø�ó i Ăą ďż˝ Ăł!Ă°eIsŠ nfascia Ă° fas��ù�ø�ó IĂŚ Ă° feIsù��gø�ó i ù��vĂł!ŠYù�þ5Ăłfascination Ăś fĂŚs��ù�ø�ó I Ă°natyon Ăś fĂŚs��ù�ø�ó�ŠYù�� óïðneIsŠ nfascine fĂŚĂ°s��ù�ø�ó eù�� Ăł n fĂŚĂ°s��ù�ø�ó i Ăą ďż˝ Ăł nfederalization Ăś f á dá rĂŚlĂ° Izatyon Ăś f á dr ù�Ú! ?Ăł!ŠYù�þ\Ăł l ŠYù��vó�ðzeIsŠ nfederalize Ă° f á dá r ÜÌlÄąz Ă° f á dr ù�Ú! ?Ăł Š ù�þ\Ăł Ăś laIzfelsite Ă° f á lsÄąt Ă° f á lsaItfelsitic f á l Ă° sÄątIk f á l Ă°sItIkferocious f á>Ă° rokyos f Š�ð roUsŠ sferocity f á>Ă° rokIti f Š�ð rosù�ø�ó Itiferroelectricity Ăś f á r Ăź ý������ oá l á k Ă° trIkIti Ăś f á r Ăź ý������ oUI ù�Ú�ó l á k Ă° trIsù�ø�ó Itiferroelectric Ăś f á r Ăź ý������ oá?Ă° l á ktrIk Ăś f á r Ăź ý������ oUI ù�Ú�ó Ă° l á ktrIkfertilization Ăś f á rtI l Äą Ă°zatyon Ăś fr.t Š Ăą ďż˝ Ăł l I Ă°zeIsŠ nfertilize Ă° f á rt Ăś I l Äąz Ă° fr.t ŠYĂą ďż˝ ó�Ü laIzfinance f I Ă°nĂŚns f I Ă°nĂŚnsfinance Ă° fÄą ĂśnĂŚns Ă° faI ĂśnĂŚnsfluoroscope Ă°flu ù�ò��5Ăł roĂśskop Ă°flu Ăą ò�#Ăł r. ŠYù��5Ăł!ĂśskoUpfluoroscopic Ăśflu ù�ò��5Ăł roĂ°skopIk Ăśflu Ăą ò�#Ăł r. ŠYù��5Ăł!Ă°skopIkfugacious fugĂ° akyos fyu Ă°geIsŠ sfugacity fugĂ° akIti fyu Ă°gĂŚsù�ø�ó Itifumarole Ă° fumĂŚrĂś ol Ă° fyumŠ ù�þ\Ăł ĂśroUlfumarolic Ăś fumĂŚrĂ° olIk Ăś fyumŠ ù�þ\Ăł Ă°rolIk
3.A. ENGLISHDEEPAND SHALLOW ORLâS 105
deep shallow
fungicide Ă° f Ă´ ngI ĂśkÄąd Ă° f Ă´ nËI Ăśsù�ø�ó aIdfungic Ă° f Ă´ ngIk Ă° f Ă´ nËIkgalvanoscope Ă°gĂŚlvĂŚnoĂśskop Ă°gĂŚlvŠYù�þ\Ăł nŠYù��#Ăł!ĂśskoUpgalvanoscopic ĂśgĂŚlvĂŚnoĂ°skopIk ĂśgĂŚlvŠ ù�þ\Ăł nŠ ù��#Ăł Ă°skopIkgastronome Ă°gĂŚstroĂśnom Ă°gĂŚstrŠ ù��#Ăł ĂśnoUmgastronomic ĂśgĂŚstroĂ°nomIk ĂśgĂŚstrŠ ù��#Ăł Ă°nomIkgastroscope Ă°gĂŚstroĂśskop Ă°gĂŚstrŠ ù��#Ăł ĂśskoUpgastroscopic ĂśgĂŚstroĂ°skopIk ĂśgĂŚstrŠ ù��#Ăł Ă°skopIkgeneralization ĂśË Ăˇ ná rĂŚlĂ° Äązatyon ĂśË Ăˇ nr ù�Ú! ?Ăł Š ù�þ\Ăł l Š Ăą ďż˝ Ăł Ă°zeIsŠ ngeneralize Ă°Ë Ăˇ ná r ÜÌlÄąz Ă°Ë Ăˇ nr ù�Ú! ?Ăł Š ù�þ\Ăł Ăś laIzgeneticist ËeĂ°ná tIkIst Š�ðná tIsù�ø�ó Istgenetic ËeĂ°ná tIk Š�ðná tIkgene Ă°Ëen Ă°Ëingenic Ă°ËenIk Ă°Ë Ăˇ nIkgenotype Ă°ËenoĂś ti ù�ú,Ăł p Ă°Ë Ăˇ nŠYù��5Ăł!Ăś taI Ăą(Ăş,Ăł pgenotypicity ĂśËenotiù�ú,Ăł Ă°pIkIti ĂśË Ăˇ nŠYù��5Ăł tI ù�ú,Ăł!Ă°pIsù�ø�ó Itigenotypic ĂśËenoĂ° ti ù�ú,Ăł pIk ĂśË Ăˇ nŠ ù��5Ăł Ă° tI ù�ú,Ăł pIkgeode Ă°Ëeod Ă°ËioUdgeodic ËeĂ° odIk Ëi Ă°odIkgeophagism ËeĂ°ofaĂśgIsmĂź ý�Þ;ÿ�� Ëi Ă°of Š ù�þ\Ăł ĂśËIzŠ m Ăź ý�Þ;ÿ��geophagous ËeĂ°ofagosĂź ý�Þ;ÿ�� Ëi Ă°of Š ù�þ\Ăł gŠ sĂź ý�Þ�ÿ��geophagy ËeĂ°ofagI Ăź ý�Þ�ÿ�� Ëi Ă°of Š ù�þ\Ăł Ëi Ăź ý�Þ;ÿ��geophyte Ă°ËeoĂś fi Ăą(Ăş(Ăł t Ăź ý�Þ�ÿ�� Ă°Ëi ŠYù��5ĂłĂĂś faI ù�ú,Ăł t Ăź Ă˝ĂÞ�ÿ��geophytic ĂśËeoĂ° fi Ăą(Ăş(Ăł tIk Ăź ý�Þ;ÿ�� ĂśËi ŠYù��5ĂłĂĂ° f I Ăą(Ăş(Ăł tIk Ăź ý�Þ�ÿ��gibbose Ă°gIb Ăź ý���� os Ă°gIb Ăź ý����� oUsgibbosity gI Ă°b Ăź ý���� osIti gI Ă°b Ăź ý����� osItiglauconite Ă°gl ďż˝ koĂśnÄąt Ă°gl ďż˝ k ŠYù��5ó�ÜnaItglauconitic Ăśgl ďż˝ koĂ°nÄątIk Ăśgl ďż˝ k Š ù��5Ăł Ă°nItIkglobose Ă°gloĂśbos Ă°gloU ĂśboUsglobosity gloĂ°bosIti gloU Ă°bosItiglucose Ă°glukos Ă°glukoUsglucosic gluĂ°kosIk gluĂ°kosIkglucoside Ă°glukoĂś sÄąd Ă°glukŠ ù��#Ăł ĂśsaIdglucosidic ĂśglukoĂ° sÄądIk ĂśglukŠ ù��#Ăł Ă°sIdIkglycine Ă°gli Ăą(Ăş,Ăł keù��vĂł n Ă°glaI ù�ú,Ăł sù�ø�ó i Ăą ďż˝ Ăł nglycoside Ă°gli Ăą(Ăş,Ăł koĂś sÄąd Ă°glaI ù�ú,Ăł k ŠYù��5ó�ÜsaIdglycosidic Ăśgli Ăą(Ăş,Ăł koĂ° sÄądIk ĂśglaI ù�ú,Ăł k ŠYù��5ó�ðsIdIkgrandiose Ă°grĂŚndI Ăś os Ă°grĂŚndiù��vĂł ĂśoUsgrandiosity ĂśgrĂŚndI Ă° osIti ĂśgrĂŚndiù��vĂł Ă°osItigranulite Ă°grĂŚnUl Ăś Äąt Ă°grĂŚny Š�Ü laItgranulitic ĂśgrĂŚnUl Ă° ÄątIk ĂśgrĂŚny Š�ð l ItIkgranulocyte Ă°grĂŚnUlosù�ø�ó Ăś i Ăą(Ăş(Ăł t Ă°grĂŚny Š loU Ăśsù�ø�ó aI Ăą(Ăş,Ăł tgranulocytic ĂśgrĂŚnUloĂ°sù�ø�ó i Ăą(Ăş,Ăł tIk ĂśgrĂŚny Š loU Ă°sù�ø�ó I ù�ú,Ăł tIkgrave Ă°grav Ă°greIvgravity Ă°gravIti Ă°grĂŚvIti
106 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
gyroscope Ă°Ëi Ăą(Ăş,Ăł roĂśskop Ă°ËaI Ăą(Ăş(Ăł r ŠYù��5Ăł!ĂśskoUpgyroscopic ĂśËi Ăą(Ăş,Ăł roĂ°skopIk ĂśËaI Ăą(Ăş(Ăł r ŠYù��5Ăł!Ă°skopIkhagioscope Ă°hĂŚgIoĂśskop Ă°hĂŚgiù��vĂł Š\ù��#ó�ÜskoUphagioscopic ĂśhĂŚgIoĂ°skopIk ĂśhĂŚgiù��vĂł Š\ù��#ó�ðskopIkhalophyte Ă°hĂŚloĂś fi Ăą(Ăş,Ăł t Ăź Ă˝ĂÞ�ÿ�� Ă°hĂŚlŠ ù��#Ăł Ăś faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��halophytic ĂśhĂŚloĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� ĂśhĂŚlŠ ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��haplite Ă°hĂŚplÄąt Ă°hĂŚplaIthaplitic hĂŚpĂ° l ÄątIk hĂŚpĂ° l ItIkhelical Ă°há l IkĂŚl Ă°há l Ik Š ù�þ\Ăł lhelices Ă°há l I Ăśk ďż˝ ez Ă°há l I Ăśsù�ø�ó ďż˝ izheliocentricism ĂśhelIoĂ°sù�ø�ó á ntrI ĂśkIsm Ăśhili ù�� Ăł oU Ă°sù�ø�ó á ntrI Ăśsù�ø�ó IzŠ mheliocentricity ĂśhelIosù�ø�ó á nĂ° trIkIti Ăśhili ù�� Ăł oUsù�ø�ó á nĂ° trIsù�ø�ó Itiheliocentric ĂśhelIoĂ°sù�ø�óĂá ntrIk Ăśhili ù�� Ăł oU Ă°sù�ø�ó�á ntrIkheliotrope Ă°helIoĂś trop Ă°hili ù�� óÊ\ù��#Ăł!Ăś troUpheliotropic ĂśhelIoĂ° tropIk Ăśhili ù�� óÊ\ù��#Ăł!Ă° tropIkheliotype Ă°helIoĂś ti ù�ú,Ăł p Ă°hili ù�� óÊ\ù��#Ăł!Ăś taI Ăą(Ăş(Ăł pheliotypic ĂśhelIoĂ° ti ù�ú,Ăł pIk Ăśhili ù�� óÊ\ù��#Ăł!Ă° tI Ăą(Ăş,Ăł pIkhematite Ă°há mĂŚĂś tÄąt Ă°há mŠYù�þ\ĂłĂĂś taIthematitic Ăśhá mĂŚĂ° tÄątIk Ăśhá mŠ ù�þ\Ăł Ă° tItIkhemitrope Ă°há mI Ăś trop Ă°há mI Ăś troUphemitropic Ăśhá mI Ă° tropIk Ăśhá mI Ă° tropIkhemophile Ă°hemoĂś fÄąl Ăź ý�Þ;ÿ�� Ă°himŠ ù��#Ăł Ăś faI l Ăź ý�Þ;ÿ��hemophilic ĂśhemoĂ° fÄąl Ik Ăź ý�Þ;ÿ�� ĂśhimŠ ù��#Ăł Ă° f I l Ik Ăź Ă˝ĂÞ�ÿ��heteroclite Ă°há t á roĂśklÄąt Ă°há tr. Š ù��5Ăł ĂśklaItheteroclitic Ăśhá t á roĂ°klÄątIk Ăśhá tr. Š ù��5Ăł Ă°kl ItIkhistiocyte Ă°hIstIoĂśsù�ø�ó i ù�ú,Ăł t Ă°hIsti ù�� Ăł Š ù��#Ăł Ăśsù�ø�ó aI Ăą(Ăş(Ăł thistiocytic ĂśhIstIoĂ°sù�ø�ó i ù�ú,Ăł tIk ĂśhIsti ù�� óÊ\ù��#ó�ðsù�ø�ó I Ăą(Ăş,Ăł tIkhistoricism hI Ă°storI ĂśkIsm hI Ă°stďż˝ rI Ăśsù�ø�ó IzŠ mhistoric hI Ă°storIk hI Ă°stďż˝ rIkholophyte Ă°holoĂś fi ù�ú,Ăł t Ăź ý�Þ�ÿ�� Ă°holŠYù��5ó�Ü faI ù�ú,Ăł t Ăź ý�Þ;ÿ��holophytic ĂśholoĂ° fi ù�ú,Ăł tIk Ăź ý�Þ;ÿ�� ĂśholŠYù��5ó�ð f I Ăą(Ăş,Ăł tIk Ăź Ă˝ĂÞ�ÿ��holotype Ă°holoĂś ti ù�ú,Ăł p Ă°holŠ ù��5Ăł Ăś taI ù�ú,Ăł pholotypic ĂśholoĂ° ti ù�ú,Ăł pIk ĂśholŠ ù��5Ăł Ă° tI Ăą(Ăş,Ăł pIkhomologize hoĂ°moloĂś gÄąz hŠ ù��5Ăł Ă°molŠ ù��#Ăł ĂśËaIzhomologous hoĂ°mologos hŠ ù��5Ăł Ă°molŠ ù��#Ăł gŠ shomology hoĂ°mologI hŠ ù��5Ăł Ă°molŠ ù��#Ăł Ëihomophile Ă°homoĂś fÄąl Ăź ý�Þ;ÿ�� Ă°hoUmŠ ù��#Ăł Ăś faI l Ăź Ă˝ĂÞ�ÿ��homophone Ă°homoĂś fonĂź ý�Þ�ÿ�� Ă°homŠ\ù��#Ăł Ăś foUn Ăź ý�Þ;ÿ��homophonic ĂśhomoĂ° fonIk Ăź ý�Þ;ÿ�� ĂśhomŠ\ù��#Ăł Ă° fonIk Ăź ý�Þ;ÿ��homophyllic ĂśhomoĂ° f I ù�ú,Ăł l Ăź ý������ Ik Ăź Ă˝ĂÞ�ÿ�� ĂśhoUmŠYù��#ó�ð f I Ăą(Ăş,Ăł l Ăź ý������ Ik Ăź Ă˝ĂÞ�ÿ��homozygote ĂśhomoĂ°zi ù�ú,Ăł got ĂśhoUmŠYù��#ó�ðzaI Ăą(Ăş,Ăł goUthomozygotic Ăśhomoziù�ú,Ăł Ă°gotIk ĂśhoUmŠYù��#Ăł zaI Ăą(Ăş,ó�ðgotIkhoplite Ă°hoplÄąt Ă°hoplaIthoplitic hopĂ° l ÄątIk hopĂ° l ItIkhoroscope Ă°horoĂśskop Ă°hďż˝ r Š ù��#Ăł ĂśskoUp
3.A. ENGLISHDEEPAND SHALLOW ORLâS 107
deep shallow
horoscopic ĂśhoroĂ°skopIk Ăśhďż˝ r Š\ù��#ĂłĂĂ°skopIkhospitalization ĂśhospItĂŚlÄą Ă°zatyon ĂśhospIt ŠYù�þ5Ăł l I Ă°zeIsŠ nhospitalize Ă°hospItĂŚĂś l Äąz Ă°hospIt ŠYù�þ5Ăł Ăś laIzhumane humĂ° an hyuĂ°meInhumanity humĂ°ĂŚnIti hyuĂ°mĂŚnItihydroelectricity Ăśhi Ăą(Ăş,Ăł droá l á k Ă° trIkIti ĂśhaI ù�ú,Ăł droUI ù�Ú;Ăł l á k Ă° trIsù�ø�ó Itihydroelectric Ăśhi Ăą(Ăş,Ăł droá>Ă° l á ktrIk ĂśhaI ù�ú,Ăł droUI ù�Ú;Ăł Ă° l á ktrIkhydrolyte Ă°hi Ăą(Ăş,Ăł droĂś li Ăą(Ăş(Ăł t Ă°haI ù�ú,Ăł drŠ ù��#Ăł Ăś laI ù�ú,Ăł thydrolytic hi Ăą(Ăş,Ăł droĂ° li ù�ú,Ăł tIk haI ù�ú,Ăł drŠ ù��#Ăł Ă° l I Ăą(Ăş,Ăł tIkhydrophyte Ă°hi Ăą(Ăş,Ăł droĂś fi Ăą(Ăş(Ăł t Ăź ý�Þ�ÿ�� Ă°haI ù�ú,Ăł drŠ ù��#Ăł Ăś faI ù�ú,Ăł t Ăź ý�Þ;ÿ��hydrophytic Ăśhi Ăą(Ăş,Ăł droĂ° fi Ăą(Ăş(Ăł tIk Ăź ý�Þ;ÿ�� ĂśhaI ù�ú,Ăł drŠ ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��hydroscope Ă°hi Ăą(Ăş,Ăł droĂśskop Ă°haI ù�ú,Ăł drŠ ù��#Ăł ĂśskoUphydroscopic Ăśhi Ăą(Ăş,Ăł droĂ°skopIk ĂśhaI ù�ú,Ăł drŠYù��#Ăł!Ă°skopIkhygroscope Ă°hi Ăą(Ăş,Ăł groĂśskop Ă°haI ù�ú,Ăł grŠYù��#Ăł!ĂśskoUphygroscopic Ăśhi Ăą(Ăş,Ăł groĂ°skopIk ĂśhaI ù�ú,Ăł grŠYù��#Ăł!Ă°skopIkhypersthene Ă°hi Ăą(Ăş,Ăł pá r Ăśsďż˝ en Ă°haI ù�ú,Ăł pr. Ăśsďż˝ inhypersthenic Ăśhi Ăą(Ăş,Ăł pá r Ă°sďż˝ enIk ĂśhaI ù�ú,Ăł pr. Ă°s��á nIkhypogene Ă°hi Ăą(Ăş,Ăł poĂśËen Ă°haI ù�ú,Ăł pŠYù��5Ăł!ĂśËinhypogenic Ăśhi Ăą(Ăş,Ăł poĂ°ËenIk ĂśhaI ù�ú,Ăł pŠ ù��5Ăł Ă°Ë Ăˇ nIkichthyolite Ă° Ik ù�øaÝ�ó ďż˝ I ù�ú,Ăł oĂś l Äąt Ă° Ik ù�øaĂť,Ăł ďż˝ i Ăą(Ăş(Ăł Š ù��#Ăł Ăś laItichthyolitic Ăś Ik ù�øaÝ�ó ďż˝ I ù�ú,Ăł oĂ° l ÄątIk Ăś Ik ù�øaĂť,Ăł ďż˝ i Ăą(Ăş(Ăł Š ù��#Ăł Ă° l ItIkichthyophagous Ăś Ik ù�øaÝ�ó ďż˝ I ù�ú,Ăł Ă°ofagosĂź ý�Þ;ÿ�� Ăś Ik ù�øaĂť,Ăł ďż˝ i Ăą(Ăş(Ăł Ă°of Š ù�þ5Ăł gŠ sĂź Ă˝ĂÞ�ÿ��ichthyophagy Ăś Ik ù�øaÝ�ó ďż˝ I ù�ú,Ăł Ă°ofagI Ăź Ă˝ĂÞ�ÿ�� Ăś Ik ù�øaĂť,Ăł ďż˝ i Ăą(Ăş(Ăł Ă°of Š ù�þ5Ăł Ëi Ăź ý�Þ;ÿ��iconomaticism Äą ĂśkonoĂ°mĂŚtI ĂśkIsm aI ĂśkonŠ ù��#Ăł Ă°mĂŚtI Ăśsù�ø�ó IzŠ miconomatic Äą ĂśkonoĂ°mĂŚtIk aI ĂśkonŠYù��#ó�ðmĂŚtIkidiophone Ă° IdIoĂś fonĂź ý�Þ;ÿ�� Ă° Idi Ăą ďż˝ óÊYù��5Ăł!Ăś foUn Ăź ý�Þ;ÿ��idiophonic Ăś IdIoĂ° fonIk Ăź ý�Þ;ÿ�� Ăś Idi Ăą ďż˝ óÊYù��5Ăł!Ă° fonIk Ăź ý�Þ;ÿ��imide Ă° ImÄąd Ă° ImaIdimidic I Ă°mÄądIk I Ă°mIdIkimpastation Ăś ImpasĂ° tatyon Ăś ImpĂŚsĂ° teIsŠ nimpaste ImĂ°past ImĂ°peIstimpolite Ăś ImpoĂ° l Äąt Ăś ImpŠ ù��#Ăł Ă° laItimpolitic ImĂ°polÄątIk ImĂ°polItIkinane I Ă°nan I Ă°neIninanity I Ă°nanIti I Ă°nĂŚnItiinclination Ăś InklI Ă°natyon Ăś Inkl Š ù��vĂł Ă°neIsŠ nincline InĂ°klÄąn InĂ°klaInincommode Ăś InkoĂ°m Ăź ý������ od Ăś InkŠYù��5ó�ðm Ăź ý���� oUdincommodity Ăś InkoĂ°m Ăź ý������ odIti Ăś InkŠYù��5ó�ðm Ăź ý���� odItiindigene Ă° IndI ĂśËen Ă° IndI ĂśËinindigenity Ăś IndI Ă°ËenIti Ăś IndI Ă°Ë Ăˇ nItiindignation Ăś IndÄągĂ°natyon Ăś IndIgĂ°neIsŠ nindignity InĂ°dÄągnIti InĂ°dIgnItiindign InĂ°dÄągn InĂ°daI ù�� Ăž Ăł ninefficacious Ăś Iná f Ăź ý���� I Ă°kakyos Ăś Iná f Ăź ý�����tŠ ù�� Ăł Ă°keIsŠ s
108 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
inefficacity Ăś Iná f Ăź ý���� I Ă°kakIti Ăś Iná f Ăź ý����nŠYù�� óïðkĂŚsù�ø�ó Itiinelasticity Ăś Iná lĂŚsĂ° tIkIti Ăś InI ù�Ú;Ăł lĂŚsĂ° tIsù�ø�ó Itiinelastic Ăś Iná>Ă° lĂŚstIk Ăś InI ù�Ú;Ăł!Ă° lĂŚstIkiniquity I Ă°nIkwIti I Ă°nIkwItiinsane InĂ°san InĂ°seIninsanity InĂ°sanIti InĂ°sĂŚnItiintervene Ăś Intá r Ă°ven Ăś Intr. Ă°vinintervention Ăś Intá r Ă°v á ntyon Ăś Intr. Ă°v á ncŠ ninurbane Ăś InUr Ă°ban Ăś Inr. ù�ò� ?Ăł Ă°beIninurbanity Ăś InUr Ă°bĂŚnIti Ăś Inr. ù�ò� ?Ăł Ă°bĂŚnItiinvitation Ăś InvÄą Ă° tatyon Ăś InvI Ă° teIsŠ ninvite InĂ°vÄąt InĂ°vaItinvocation Ăś InvoĂ°katyon Ăś Inv Š\ù��#ĂłĂĂ°keIsŠ ninvoke InĂ°vok InĂ°voUkionization Ăś ÄąonÄą Ă°zatyon ĂśaI ŠYù��#Ăł nŠ\ù�� óïðzeIsŠ nionize Ă° ÄąoĂśnÄąz Ă°aI ŠYù��#Ăł!ĂśnaIzisocline Ă° ÄąsoĂśklÄąn Ă°aIsŠYù��5Ăł!ĂśklaInisoclinic Ăś ÄąsoĂ°klÄąnIk ĂśaIsŠYù��5Ăł!Ă°kl InIkisotone Ă° ÄąsoĂś ton Ă°aIsŠ ù��5Ăł Ăś toUnisotonic Ăś ÄąsoĂ° tonIk ĂśaIsŠ ù��5Ăł Ă° tonIkisotope Ă° ÄąsoĂś top Ă°aIsŠ ù��5Ăł Ăś toUpisotopic Ăś ÄąsoĂ° topIk ĂśaIsŠ ù��5Ăł Ă° topIkkaleidoscope k Ăą Ăż Ăł ĂŚĂ° li ù�Ú"ďż˝ Ăł doĂśskop k Ăą Ăż Ăł Š ù�þ5Ăł Ă° laI ù�Ú"ďż˝ Ăł dŠ ù��#Ăł ĂśskoUpkaleidoscopic k Ăą Ăż Ăł ĂŚĂś li ù�Ú"ďż˝ Ăł doĂ°skopIk k Ăą Ăż Ăł Š ù�þ5Ăł Ăś laI ù�Ú"ďż˝ Ăł dŠ ù��#Ăł Ă°skopIkkaryotype Ă°k Ăą Ăż Ăł ĂŚrI ù�ú,Ăł oĂś ti Ăą(Ăş,Ăł p Ă°k Ăą Ăż Ăł ĂŚriĂą(Ăş(Ăł Š ù��#Ăł Ăś taI ù�ú,Ăł pkaryotypic Ăśk Ăą Ăż Ăł ĂŚrI ù�ú,Ăł oĂ° ti Ăą(Ăş,Ăł pIk Ăśk Ăą Ăż Ăł ĂŚriĂą(Ăş(Ăł Š ù��#Ăł Ă° tI Ăą(Ăş(Ăł pIkkyanite Ă°k Ăą Ăż Ăł i ù�ú,Ăł ĂŚĂśnÄąt Ă°k Ăą Ăż Ăł aI ù�ú,óÊYù�þ\ó�ÜnaItkyanitic Ăśk Ăą Ăż Ăł i ù�ú,Ăł ĂŚĂ°nÄątIk Ăśk Ăą Ăż Ăł aI ù�ú,óÊYù�þ\ó�ðnItIklaccolite Ă° lĂŚkĂź ý������ oĂś l Äąt Ă° lĂŚkĂź ý������tŠYù��5ó�Ü laItlaccolitic Ăś lĂŚkĂź ý������ oĂ° l ItIk Ăś lĂŚkĂź ý������tŠYù��5ó�ð l ItIklachrymose Ă° lĂŚkù�øfÝ�ó rI Ăą(Ăş,ó�Ümos Ă° lĂŚkù�øfÝ�ó r ŠYĂą(Ăş(Ăł!ĂśmoUslachrymosity Ăś lĂŚkù�øfÝ�ó rI Ăą(Ăş,Ăł Ă°mosIti Ăś lĂŚkù�øfÝ�ó r Š Ăą(Ăş(Ăł Ă°mosItilactone Ă° lĂŚkton Ă° lĂŚktoUnlactonic lĂŚkĂ° tonIk lĂŚkĂ° tonIklanose Ă° lanos Ă° leInoUslanosity laĂ°nosIti leI Ă°nosItilanuginous lĂŚĂ°nugInos l Š ù�þ5Ăł Ă°nu Š ù�� Ăł nŠ slanugo lĂŚĂ°nugo l Š ù�þ5Ăł Ă°nugoU
laryngoscope lÌðrI ù�ú,ó ngoÜskop l ŠYù�þ5ó�ð rI ù(ú,ó�� gŠYù��5ó!ÜskoUplaryngoscopic lÌÜrI ù�ú,ó ngoðskopIk l ŠYù�þ5ó�Ü rI ù(ú,ó�� gŠYù��5ó!ðskopIklaterite ð lÌtá>Ü rĹt ð lÌtŠ�Ü raItlateritic Ü lÌtá>ð rĹtIk Ü lÌtŠ�ð rItIklavation lÌðvatyon lÌðveIsŠ nlenticellate Ü l á ntI ðk á l ß ý������ at Ü l á ntI ðsù�ø�ómá l ß ý���� I ù�þ\ó t ��ù�Ú;ólenticel ð l á ntI Ük á l ð l á ntI Üsù�ø�ó á l
3.A. ENGLISHDEEPAND SHALLOW ORLâS 109
deep shallow
lentic Ă° l á ntIk Ă° l á ntIkleucite Ă° lu ù�Ú?ò�ó kÄąt Ă° lu ù�Ú?ò�ó sù�ø�ó aItleucitic lu ù�Ú?ò�óĂĂ°kÄątIk lu ù�Ú?ò�óĂĂ°sù�ø�ó ItIkleukocyte Ă° lu ù�Ú?ò�ó k Ăą Ăż Ăł oĂśsù�ø�ó i ù�ú,Ăł t Ă° lu ù�Ú?ò�ó k Ăą Ăż óÊYù��5Ăł!Ăśsù�ø�ó aI Ăą(Ăş,Ăł tleukocytic Ăś lu ù�Ú?ò�ó k Ăą Ăż Ăł oĂ°sù�ø�ó i ù�ú,Ăł tIk Ăś lu ù�Ú?ò�ó k Ăą Ăż Ăł Š ù��5Ăł Ă°sù�ø�ó I ù�ú,Ăł tIklignite Ă° l IgnÄąt Ă° l IgnaItlignitic l IgĂ°nÄątIk l IgĂ°nItIklimonite Ă° l ÄąmoĂśnÄąt Ă° laImŠ ù��#Ăł ĂśnaItlimonitic Ăś l ÄąmoĂ°nÄątIk Ăś laImŠ ù��#Ăł Ă°nItIklithophyte Ă° l I ďż˝ oĂś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� Ă° l I ��Š ù��5Ăł Ăś faI Ăą(Ăş,Ăł t Ăź Ă˝ĂÞ�ÿ��lithophytic Ăś l I ďż˝ oĂ° fi ù�ú,Ăł tIk Ăź ý�Þ�ÿ�� Ăś l I ��Š ù��5Ăł Ă° f I ù�ú,Ăł tIk Ăź ý�Þ�ÿ��logicism Ă° logI ĂśkIsm Ă° loËI Ăśsù�ø�ó IzŠ mlogic Ă° logIk Ă° loËIkloquacious loĂ°kwakyos loU Ă°kweIsŠ sloquacity loĂ°kwĂŚkIti loU Ă°kwĂŚsù�ø�ó Itilycanthrope Ă° li Ăą(Ăş,Ăł kĂŚnĂś ďż˝ rop Ă° laI Ăą(Ăş,Ăł k ŠYù�þ5Ăł nĂś ďż˝ roUplycanthropic Ăś li Ăą(Ăş,Ăł kĂŚnĂ° ďż˝ ropIk Ăś laI Ăą(Ăş,Ăł k Š ù�þ5Ăł nĂ° ďż˝ ropIklymphocyte Ă° l I ù�ú,Ăł mfoĂśsù�ø�ó i Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ�� Ă° l I ù�ú,Ăł mf Š ù��#Ăł Ăśsù�ø�ó aI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��lymphocytic Ăś l I ù�ú,Ăł mfoĂ°sù�ø�ó i Ăą(Ăş,Ăł tIk Ăź Ă˝ĂÞ�ÿ�� Ăś l I ù�ú,Ăł mf Š ù��#Ăł Ă°sù�ø�ó I Ăą(Ăş,Ăł tIk Ăź Ă˝ĂÞ�ÿ��lyricism Ă° l I ù�ú,Ăł rI ĂśkIsm Ă° l I ù�ú,Ăł rI Ăśsù�ø�ó IzŠ mlyricist Ă° l I ù�ú,Ăł rIkIst Ă° l I ù�ú,Ăł rIsù�ø�ó Istlyric Ă° l I ù�ú,Ăł rIk Ă° l I ù�ú,Ăł rIkmacrocyte Ă°mĂŚkroĂśsù�ø�ó i ù�ú,Ăł t Ă°mĂŚkrŠ ù��#Ăł Ăśsù�ø�ó aI Ăą(Ăş,Ăł tmacrocytic ĂśmĂŚkroĂ°sù�ø�ó i ù�ú,Ăł tIk ĂśmĂŚkrŠ ù��#Ăł Ă°sù�ø�ó I ù�ú,Ăł tIkmacrophage Ă°mĂŚkroĂś fagĂź ý�Þ;ÿ�� Ă°mĂŚkrŠYù��#ó�Ü feIË Ăź Ă˝ĂÞ�ÿ��macrophagic ĂśmĂŚkroĂ° fagIk Ăź ý�Þ�ÿ�� ĂśmĂŚkrŠYù��#ó�ð fĂŚ Ik Ăź Ă˝ĂÞ�ÿ��magnetite Ă°mĂŚgná>Ăś tÄąt Ă°mĂŚgnI ù�Ú;ó�Ü taItmagnetitic ĂśmĂŚgná>Ă° tÄątIk ĂśmĂŚgnI ù�Ú;ó�ð tItIkmagnificence mĂŚgĂ°nIf Ik á ns mĂŚgĂ°nIf Isù�ø�ó Š nsmagnificent mĂŚgĂ°nIf Ik á nt mĂŚgĂ°nIf Isù�ø�ó Š ntmagnific mĂŚgĂ°nIf Ik mĂŚgĂ°nIf Ikmalignity mĂŚĂ° l ÄągnIti mŠ ù�þ5Ăł Ă° l IgnItimalign mĂŚĂ° l Äągn mŠ ù�þ5Ăł Ă° laI ù�� Ăž Ăł nmartensite Ă°mĂŚrtá nĂśz ù��>Ăł Äąt Ă°mďż˝ rt á nĂśz ù��>Ăł aItmartensitic ĂśmĂŚrtá nĂ°z ù��>Ăł ÄątIk Ăśmďż˝ rt á nĂ°z ù��>Ăł ItIkmatrices Ă°matrI Ăśk ďż˝ ez Ă°meItrI Ăśsù�ø�ó ďż˝ izmatrix Ă°matrIks Ă°meItrIksmedicine Ă°má dIkIn��ù�Ú;Ăł Ă°má dIsù�ø�ó In��ù�Ú;Ăłmedic Ă°má dIk Ă°má dIkmegaphone Ă°má gĂŚĂś fonĂź ý�Þ;ÿ�� Ă°má gŠYù�þ\ĂłĂĂś foUn Ăź ý�Þ;ÿ��megaphonic Ăśmá gĂŚĂ° fonIk Ăź ý�Þ;ÿ�� Ăśmá gŠYù�þ\ĂłĂĂ° fonIk Ăź ý�Þ;ÿ��mendacious má nĂ°dakyos má nĂ°deIsŠ smendacity má nĂ°dakIti má nĂ°dĂŚsù�ø�ó Itimesophyte Ă°má z ù��>Ăł oĂś fi Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ�� Ă°má z ù��>Ăł Š ù��#Ăł Ăś faI ù�ú,Ăł t Ăź ý�Þ;ÿ��mesophytic Ăśmá z ù��>Ăł oĂ° fi Ăą(Ăş(Ăł tIk Ăź Ă˝ĂÞ�ÿ�� Ăśmá z ù��>Ăł Š ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź Ă˝ĂÞ�ÿ��
110 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
mesothoracic Ăśmá z ù��>Ăł oďż˝ oĂ° rĂŚkIk Ăśmá z ù��>óÊYù��5Ăł#����ð rĂŚsù�ø�ó Ikmesothorax Ăśmá z ù��>Ăł oĂ° ďż˝ orĂŚks Ăśmá z ù��>óÊYù��5óïð ��� rĂŚksmetaphysicist Ăśmá tĂŚĂ° f I Ăą(Ăş,Ăł z ù��>Ăł IkIstĂź ý�Þ�ÿ�� Ăśmá t ŠYù�þ5Ăł!Ă° f I Ăą(Ăş,Ăł z ù��>Ăł Isù�ø�ó IstĂź ý�Þ;ÿ��metaphysic Ăśmá tĂŚĂ° f I Ăą(Ăş,Ăł z ù��>Ăł Ik Ăź Ă˝ĂÞ�ÿ�� Ăśmá t ŠYù�þ5Ăł!Ă° f I Ăą(Ăş,Ăł z ù��>Ăł Ik Ăź ý�Þ�ÿ��metathoracic Ăśmá tĂŚďż˝ oĂ°rĂŚkIk Ăśmá t Š ù�þ5Ăł ����ðrĂŚsù�ø�ó Ikmetathorax Ăśmá tĂŚĂ° ďż˝ orĂŚks Ăśmá t Š ù�þ5Ăł Ă° ��� rĂŚksmeteorite Ă°meteoĂśrÄąt Ă°miti Š ù��#Ăł Ăś raItmeteoritic ĂśmeteoĂ°rÄątIk Ăśmiti Š ù��#Ăł Ă° rItIkmetronome Ă°má troĂśnom Ă°má tr Š ù��#Ăł ĂśnoUmmetronomic Ăśmá troĂ°nomIk Ăśmá tr Š ù��#Ăł Ă°nomIkmicrocyte Ă°mÄąkroĂśsù�ø�ó i Ăą(Ăş,Ăł t Ă°maIkr Š ù��#Ăł Ăśsù�ø�ó aI Ăą(Ăş,Ăł tmicrocytic ĂśmÄąkroĂ°sù�ø�ó i Ăą(Ăş,Ăł tIk ĂśmaIkr Š ù��#Ăł Ă°sù�ø�ó I ù�ú,Ăł tIkmicroparasite ĂśmÄąkroĂ°pĂŚrĂŚĂś sÄąt ĂśmaIkroU Ă°pĂŚrŠ\ù�þ\Ăł ĂśsaItmicroparasitic ĂśmÄąkroĂśpĂŚrĂŚĂ° sÄątIk ĂśmaIkroU ĂśpĂŚrŠ\ù�þ\Ăł Ă°sItIkmicrophone Ă°mÄąkroĂś fonĂź ý�Þ;ÿ�� Ă°maIkr ŠYù��#ó�Ü foUn Ăź ý�Þ;ÿ��microphonic ĂśmÄąkroĂ° fonIk Ăź ý�Þ;ÿ�� ĂśmaIkr ŠYù��#ó�ð fonIk Ăź Ă˝ĂÞ�ÿ��microphyte Ă°mÄąkroĂś fi ù�ú,Ăł t Ăź ý�Þ�ÿ�� Ă°maIkr ŠYù��#ó�Ü faI Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ��microphytic ĂśmÄąkroĂ° fi ù�ú,Ăł tIk Ăź ý�Þ;ÿ�� ĂśmaIkr ŠYù��#ó�ð f I ù�ú,Ăł tIk Ăź ý�Þ;ÿ��microscope Ă°mÄąkroĂśskop Ă°maIkr Š ù��#Ăł ĂśskoUpmicroscopic ĂśmÄąkroĂ°skopIk ĂśmaIkr Š ù��#Ăł Ă°skopIkmicrotome Ă°mÄąkroĂś tom Ă°maIkr Š ù��#Ăł Ăś toUmmicrotomic ĂśmÄąkroĂ° tomIk ĂśmaIkr Š ù��#Ăł Ă° tomIkmime Ă°mÄąm Ă°maImmimic Ă°mÄąmIk Ă°mImIkmisanthrope Ă°mIsĂŚnĂś ďż˝ rop Ă°mIsŠ ù�þ5Ăł nĂś ďż˝ roUpmisanthropic ĂśmIsĂŚnĂ° ďż˝ ropIk ĂśmIsŠ ù�þ5Ăł nĂ° ďż˝ ropIkmispronounce ĂśmIsproĂ°nuns ĂśmIsprŠYù��#ĂłĂĂ°naUnsmispronunciation ĂśmIsproĂśnu ù�ò,Ăł nsù�ø�ó I Ă° atyon ĂśmIsprŠYù��#ĂłĂĂśnĂ´ nsù�ø�ó i Ăą ďż˝ Ăł!Ă°eIsŠ nmithridate Ă°mI ďż˝ rI Ăśdat Ă°mI ďż˝ r Š\ù�� Ăł!ĂśdeItmithridatic ĂśmI ďż˝ rI Ă°datIk ĂśmI ďż˝ rI Ă°dĂŚtIkmonasticism moĂ°nĂŚstI ĂśkIsm mŠYù��5Ăł!Ă°nĂŚstI Ăśsù�ø�ó IzŠ mmonastic moĂ°nĂŚstIk mŠ ù��5Ăł Ă°nĂŚstIkmonochromate ĂśmonoĂ°k ù�øaĂť,Ăł romat ĂśmonŠ ù��#Ăł Ă°k ù�øaÝ�ó roUmeItmonochromatic Ăśmonokù�øaĂť,Ăł roĂ°matIk ĂśmonŠ ù��#Ăł k ù�øaÝ�ó roU Ă°mĂŚtIkmonocline Ă°monoĂśklÄąn Ă°monŠ ù��#Ăł ĂśklaInmonoclinic ĂśmonoĂ°klÄąnIk ĂśmonŠ ù��#Ăł Ă°kl InIkmonocyte Ă°monoĂśsù�ø�ó i Ăą(Ăş,Ăł t Ă°monŠ ù��#Ăł Ăśsù�ø�ó aI Ăą(Ăş(Ăł tmonocytic ĂśmonoĂ°sù�ø�ó i Ăą(Ăş,Ăł tIk ĂśmonŠ ù��#Ăł Ă°sù�ø�ó I Ăą(Ăş,Ăł tIkmonotype Ă°monoĂś ti ù�ú,Ăł p Ă°monŠ ù��#Ăł Ăś taI Ăą(Ăş(Ăł pmonotypic ĂśmonoĂ° ti ù�ú,Ăł pIk ĂśmonŠYù��#ó�ð tI Ăą(Ăş,Ăł pIkmonzonite Ă°monzoĂśnÄąt Ă°monzŠYù��5ó�ÜnaItmonzonitic ĂśmonzoĂ°nÄątIk ĂśmonzŠYù��5ó�ðnItIkmordacious morĂ°dakyos mďż˝ r Ă°deIsŠ smordacity morĂ°dakIti mďż˝ r Ă°dĂŚsù�ø�ó Itimucose Ă°mukos Ă°myukoUs
3.A. ENGLISHDEEPAND SHALLOW ORLâS 111
deep shallow
mucosity mukĂ° osIti myuĂ°kosItimyope Ă°mi Ăą(Ăş,Ăł op Ă°maI Ăą(Ăş,Ăł oUpmyopic mi Ăą(Ăş,ó�ð opIk maI Ăą(Ăş,ĂłĂĂ°opIkmysticism Ă°mI ù�ú,Ăł stI ĂśkIsm Ă°mI ù�ú,Ăł stI Ăśsù�ø�ó IzŠ mmystic Ă°mI ù�ú,Ăł stIk Ă°mI ù�ú,Ăł stIkneoclassicism ĂśneoĂ°klĂŚsĂź ý������ I ĂśkIsm ĂśnioU Ă°klĂŚsĂź ý���� I Ăśsù�ø�ó IzŠ mneoclassic ĂśneoĂ°klĂŚsĂź ý������ Ik ĂśnioU Ă°klĂŚsĂź ý���� Ikneophyte Ă°neoĂś fi Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ�� Ă°ni Š ù��5Ăł Ăś faI Ăą(Ăş,Ăł t Ăź ý�Þ�ÿ��neophytic ĂśneoĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� Ăśni Š ù��5Ăł Ă° f I ù�ú,Ăł tIk Ăź ý�Þ;ÿ��neuroticism nuù�Ú?ò,Ăł Ă°rotI ĂśkIsm nuù�Ú?ò,Ăł Ă°rotI Ăśsù�ø�ó IzŠ mneurotic nuù�Ú?ò,Ăł Ă°rotIk nuù�Ú?ò,Ăł Ă°rotIknoctiluca ĂśnoktI Ă° lukĂŚ ĂśnoktŠ Ăą ďż˝ Ăł Ă° luk Šnoctilucent ĂśnoktI Ă° luk á nt ĂśnoktŠYĂą ďż˝ ó�ð lusù�ø�ó!Š ntnodose Ă°noĂśdos Ă°noU ĂśdoUsnodosity noĂ°dosIti noU Ă°dosItinummulite Ă°nĂ´ m Ăź ý���� Ul Ăś Äąt Ă°nĂ´ m Ăź ý���� y Š�Ü laItnummulitic ĂśnĂ´ m Ăź ý���� U Ă° l ÄątIk ĂśnĂ´ m Ăź ý���� y Š�ð l ItIkobligee ĂśoblI Ă°ge ĂśoblŠYù�� ĂłĂĂ°Ëiobligor ĂśoblI Ă°gor ĂśoblŠ ù�� Ăł Ă°gďż˝ roblique oĂ°bleù��vĂł k ù��\ò,Ăł Š ù��5Ăł Ă°bli Ăą ďż˝ Ăł k ù��\ò�óobliquity oĂ°bleù��vĂł kwIti Š ù��5Ăł Ă°blIkwItiobscene obĂ°sďż˝ ù�ø�ó en Š ù��5Ăł bĂ°sďż˝ ù�ø�ó inobscenity obĂ°sďż˝ ù�ø�ó enIti Š ù��5Ăł bĂ°sďż˝ ù�ø�ó á nItiomnificent omĂ°nIf Ik á nt omĂ°nIf Isù�ø�ó Š ntomnific omĂ°nIf Ik omĂ°nIf Ikomophagous oĂ°mofagosĂź Ă˝ĂÞ�ÿ�� oU Ă°mofŠYù�þ\Ăł gŠ sĂź ý�Þ�ÿ��omophagy oĂ°mofĂŚgI Ăź ý�Þ�ÿ�� oU Ă°mofŠYù�þ\Ăł Ëi Ăź ý�Þ;ÿ��oolite Ă° ooĂś l Äąt Ă°oU ŠYù��5ĂłĂĂś laItoolitic Ăś ooĂ° l ÄątIk ĂśoU Š ù��5Ăł Ă° l ItIkoophyte Ă° ooĂś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� Ă°oU Š ù��5Ăł Ăś faI ù�ú,Ăł t Ăź Ă˝ĂÞ�ÿ��oophytic Ăś ooĂ° fi ù�ú,Ăł tIk Ăź Ă˝ĂÞ�ÿ�� ĂśoU Š ù��5Ăł Ă° f I Ăą(Ăş(Ăł tIk Ăź ý�Þ�ÿ��opacity oĂ°pakIti oU Ă°pĂŚsù�ø�ó Itiopaque oĂ°pakù��\ò�ó oU Ă°peIk ù��\ò�óoperate Ă°opá>Ăś rat Ă°opŠ�Ü reItoperatic Ăśopá>Ă° ratIk ĂśopŠ�ð rĂŚtIkophthalmoscope of Ă° ďż˝ ĂŚlmoĂśskopĂź ý�Þ;ÿ�� of Ă° ďż˝ ĂŚlmŠ ù��#Ăł ĂśskoUp Ăź Ă˝ĂÞ�ÿ��ophthalmoscopic of Ăś ďż˝ ĂŚlmoĂ°skopIk Ăź ý�Þ;ÿ�� of Ăś ďż˝ ĂŚlmŠ\ù��#ĂłĂĂ°skopIk Ăź Ă˝ĂÞ�ÿ��organicism orĂ°gĂŚnI ĂśkIsm ďż˝ r Ă°gĂŚnI Ăśsù�ø�ó IzŠ morganic orĂ°gĂŚnIk ďż˝ r Ă°gĂŚnIkorganization ĂśorgĂŚnI Ă°zatyon Ăś ďż˝ rgŠYù�þ\Ăł nI Ă°zeIsŠ norganize Ă°orgĂŚĂśnÄąz Ă° ďż˝ rgŠYù�þ\ĂłĂĂśnaIzorthoscope Ă°orďż˝ oĂśskop Ă° ďż˝ r ��Š ù��#Ăł ĂśskoUporthoscopic Ăśorďż˝ oĂ°skopIk Ăś ďż˝ r ��Š ù��#Ăł Ă°skopIkosteophyte Ă°osteoĂś fi Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ�� Ă°ostiŠ ù��5Ăł Ăś faI Ăą(Ăş,Ăł t Ăź ý�Þ�ÿ��osteophytic ĂśosteoĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� ĂśostiŠ ù��5Ăł Ă° f I ù�ú,Ăł tIk Ăź ý�Þ;ÿ��
112 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
otiose Ă° otI Ăś os Ă°oUsĂąďż˝$^Ăł i ù��vó�ÜoUsotiosity Ăś otI Ă° osIti ĂśoUsĂąďż˝$^Ăł i ù��vó�ðosItiotoscope Ă° otoĂśskop Ă°oUt Š\ù��#ĂłĂĂśskoUpotoscopic Ăś otoĂ°skopIk ĂśoUt Š\ù��#ĂłĂĂ°skopIkoxidase Ă°oksÄą Ăśdas Ă°oksI ĂśdeIsoxidasic ĂśoksI Ă°dasIk ĂśoksI Ă°dĂŚsIkoxidation ĂśoksÄą Ă°datyon ĂśoksI Ă°deIsŠ noxide Ă°oksÄąd Ă°oksaIdozone Ă° ozon Ă°oUzoUnozonic oĂ°zonIk oU Ă°zonIkpalindrome Ă°pĂŚlInĂśdrom Ă°pĂŚlInĂśdroUmpalindromic ĂśpĂŚlInĂ°dromIk ĂśpĂŚlInĂ°dromIkpantomime Ă°pĂŚntoĂśmÄąm Ă°pĂŚntŠ\ù��#ĂłĂĂśmaImpantomimic ĂśpĂŚntoĂ°mÄąmIk ĂśpĂŚntŠ\ù��#ĂłĂĂ°mImIkparasite Ă°pĂŚrĂŚĂś sÄąt Ă°pĂŚrŠYù�þ\ĂłĂĂśsaItparasiticide ĂśpĂŚrĂŚĂ° sÄątI Ăśsù�ø�ó Äąd ĂśpĂŚrŠYù�þ\ĂłĂĂ°sItI Ăśsù�ø�ó aIdparasitic ĂśpĂŚrĂŚĂ° sÄątIk ĂśpĂŚrŠYù�þ\ĂłĂĂ°sItIkparoxytone pĂŚĂ° roksI ù�ú,ó�Ü ton pŠ\ù�þ\ó�ðroksI Ăą(Ăş,ó�Ü toUnparoxytonic ĂśpĂŚroksI ù�ú,Ăł Ă° tonIk ĂśpĂŚroksI ù�ú,Ăł Ă° tonIkpasteurization ĂśpĂŚstyù�Ú;Ăł Ur Ă° Izatyon ĂśpĂŚsc Ăąďż˝$^Ăł Š ù�Ú?ò�ó rI Ă°zeIsŠ npasteurize Ă°pĂŚstyù�Ú;Ăł ĂśUrÄąz Ă°pĂŚsc Ăąďż˝$^Ăł Š ù�Ú?ò�ó ĂśraIzpathogene Ă°pĂŚďż˝ oĂśËen Ă°pÌ��Š ù��5Ăł ĂśËinpathogenic ĂśpĂŚďż˝ oĂ°ËenIk ĂśpÌ��Š ù��5Ăł Ă°Ë Ăˇ nIkpearlite Ă°pá ù�Ú>Ăľ\Ăł rlÄąt Ă°pr. ù�Ú>Ăľďż˝ ?Ăł laItpearlitic pá ù�Ú>Ăľ\Ăł r Ă° l ÄątIk pr. ù�Ú>Ăľďż˝ �ó Ă° l ItIkpedicel Ă°pá dIk á l Ă°pá dIsù�ø�ó Š ù�Ú;Ăł lpedicle Ă°pá dIkl Ă°pá dIk Š lpegmatite Ă°pá gmĂŚĂś tÄąt Ă°pá gmŠYù�þ\ĂłĂĂś taItpegmatitic Ăśpá gmĂŚĂ° tÄątIk Ăśpá gmŠYù�þ\ĂłĂĂ° tItIkpeptone Ă°pá pton Ă°pá ptoUnpeptonic pá pĂ° tonIk pá pĂ° tonIkperidotite Ăśpá rI Ă°dotÄąt Ăśpá rI Ă°doUtaItperidotitic Ăśpá rIdoĂ° tÄątIk Ăśpá rIdoU Ă° tItIkperiscope Ă°pá rI Ăśskop Ă°pá rI ĂśskoUpperiscopic Ăśpá rI Ă°skopIk Ăśpá rI Ă°skopIkperlite Ă°pá rlÄąt Ă°pr.laItperlitic pá r Ă° l ÄątIk pr. Ă° l ItIkperspicacious Ăśpá rspI Ă°kakyos Ăśpr.spŠ ù�� Ăł Ă°keIsŠ sperspicacity Ăśpá rspI Ă°kakIti Ăśpr.spŠ ù�� Ăł Ă°kĂŚsù�ø�ó Itipertinacious Ăśpá rtI Ă°nakyos Ăśpr.t ŠYĂą ďż˝ ó�ðneIsŠ spertinacity Ăśpá rtI Ă°nakIti Ăśpr.t ŠYĂą ďż˝ ó�ðnĂŚsù�ø�ó Itiphagocyte Ă° fĂŚgoĂśsù�ø�ó i ù�ú,Ăł t Ăź Ă˝ĂÞ�ÿ�� Ă° fĂŚgŠYù��5ĂłĂĂśsù�ø�ó aI ù�ú,Ăł t Ăź ý�Þ�ÿ��phagocytic Ăś fĂŚgoĂ°sù�ø�ó i ù�ú,Ăł tIk Ăź ý�Þ;ÿ�� Ăś fĂŚgŠYù��5ĂłĂĂ°sù�ø�ó I Ăą(Ăş(Ăł tIk Ăź ý�Þ;ÿ��phallicism Ă° fĂŚlĂź ý���� I ĂśkIsmĂź ý�Þ;ÿ�� Ă° fĂŚlĂź ý���� I Ăśsù�ø�ó IzŠ m Ăź ý�Þ;ÿ��phallic Ă° fĂŚlĂź ý���� Ik Ăź ý�Þ�ÿ�� Ă° fĂŚlĂź ý���� Ik Ăź ý�Þ;ÿ��
3.A. ENGLISHDEEPAND SHALLOW ORLâS 113
deep shallow
pharmacal Ă° fĂŚrmĂŚkĂŚlĂź ý�Þ�ÿ�� Ă° f ďż˝ rmŠ\ù�þ\Ăł k ŠYù�þ5Ăł l Ăź Ă˝ĂÞ�ÿ��pharmacist Ă° fĂŚrmĂŚkIstĂź ý�Þ;ÿ�� Ă° f ďż˝ rmŠ\ù�þ\Ăł sù�ø�ó IstĂź ý�Þ;ÿ��pharmacy Ă° fĂŚrmĂŚkI Ăź ý�Þ;ÿ�� Ă° f ďż˝ rmŠ\ù�þ\Ăł sù�ø�ó i Ăź Ă˝ĂÞ�ÿ��pharyngoscope fĂŚĂ°rI ù�ú,Ăł ngoĂśskopĂź Ă˝ĂÞ�ÿ�� f ŠYù�þ\Ăł!Ă° rI Ăą(Ăş(ó�� gŠYù��5ó�ÜskoUp Ăź Ă˝ĂÞ�ÿ��pharyngoscopic fĂŚĂśrI ù�ú,Ăł ngoĂ°skopIk Ăź Ă˝ĂÞ�ÿ�� f Š ù�þ\Ăł Ăś rI Ăą(Ăş(Ăł ďż˝ gŠ ù��5Ăł Ă°skopIk Ăź Ă˝ĂÞ�ÿ��phenotype Ă° fenoĂś ti Ăą(Ăş(Ăł p Ăź ý�Þ;ÿ�� Ă°fin Š ù��5Ăł Ăś taI Ăą(Ăş,Ăł p Ăź Ă˝ĂÞ�ÿ��phenotypic Ăś fenoĂ° ti Ăą(Ăş(Ăł pIk Ăź ý�Þ;ÿ�� Ăśfin Š ù��5Ăł Ă° tI ù�ú,Ăł pIk Ăź ý�Þ�ÿ��philhellene f I l Ă°há l Ăź ý����� enĂź Ă˝ĂÞ�ÿ�� f I l Ă°há l Ăź ý������ in Ăź ý�Þ;ÿ��philhellenic Ăś f I lh á>Ă° l Ăź ý����� enIk Ăź ý�Þ�ÿ�� Ăś f I lh á>Ă° l Ăź ý������ná nIk Ăź ý�Þ;ÿ��phonolite Ă° fonoĂś l Äąt Ăź Ă˝ĂÞ�ÿ�� Ă° foUnŠ ù��5Ăł Ăś laIt Ăź ý�Þ;ÿ��phonolitic Ăś fonoĂ° l ÄątIk Ăź ý�Þ;ÿ�� Ăś foUnŠ ù��5Ăł Ă° l ItIk Ăź ý�Þ;ÿ��phonotype Ă° fonoĂś ti Ăą(Ăş(Ăł p Ăź ý�Þ;ÿ�� Ă° foUnŠ ù��5Ăł Ăś taI Ăą(Ăş,Ăł p Ăź Ă˝ĂÞ�ÿ��phonotypic Ăś fonoĂ° ti Ăą(Ăş(Ăł pIk Ăź ý�Þ�ÿ�� Ăś foUnŠYù��5ĂłĂĂ° tI ù�ú,Ăł pIk Ăź ý�Þ�ÿ��phosphate Ă° fosfatĂź ý�Þ;ÿ�� Ă° fosfeIt Ăź Ă˝ĂÞ�ÿ��phosphatic fosĂ° fatIk Ăź ý�Þ;ÿ�� fosĂ° fĂŚtIk Ăź Ă˝ĂÞ�ÿ��phosphorite Ă° fosfoĂś rÄąt Ăź ý�Þ;ÿ�� Ă° fosfŠ\ù��#ĂłĂĂśraIt Ăź ý�Þ;ÿ��phosphoritic Ăś fosfoĂ° rÄątIk Ăź ý�Þ;ÿ�� Ăś fosfŠ\ù��#ĂłĂĂ°rItIk Ăź Ă˝ĂÞ�ÿ��photoelectricity Ăś fotoá l á k Ă° trIkItI Ăź ý�Þ�ÿ�� Ăś foUtoUI ù�Ú;Ăł l á k Ă° trIsù�ø�ó Iti Ăź ý�Þ;ÿ��photoelectric Ăś fotoá>Ă° l á ktrIk Ăź ý�Þ�ÿ�� Ăś foUtoUI ù�Ú;Ăł Ă° l á ktrIk Ăź Ă˝ĂÞ�ÿ��photogene Ă° fotoĂśËenĂź Ă˝ĂÞ�ÿ�� Ă° foUt Š ù��#Ăł ĂśËin Ăź ý�Þ;ÿ��photogenic Ăś fotoĂ°ËenIk Ăź ý�Þ;ÿ�� Ăś foUt Š ù��#Ăł Ă°Ë Ăˇ nIk Ăź Ă˝ĂÞ�ÿ��phototype Ă° fotoĂś ti ù�ú,Ăł p Ăź ý�Þ�ÿ�� Ă° foUt Š ù��#Ăł Ăś taI Ăą(Ăş,Ăł p Ăź ý�Þ�ÿ��phototypic Ăś fotoĂ° ti ù�ú,Ăł pIk Ăź ý�Þ;ÿ�� Ăś foUt Š ù��#Ăł Ă° tI ù�ú,Ăł pIk Ăź ý�Þ;ÿ��phyllite Ă° f I Ăą(Ăş,Ăł l Ăź ý������ Äąt Ăź ý�Þ;ÿ�� Ă° f I ù�ú,Ăł l Ăź ý���� aIt Ăź ý�Þ�ÿ��phyllitic f I Ăą(Ăş,Ăł Ă° l Ăź ý������ ÄątIk Ăź ý�Þ;ÿ�� f I ù�ú,Ăł Ă° l Ăź ý����� ItIk Ăź ý�Þ;ÿ��phyllome Ă° f I Ăą(Ăş,Ăł l Ăź ý������ omĂź Ă˝ĂÞ�ÿ�� Ă° f I ù�ú,Ăł l Ăź ý���� oUm Ăź ý�Þ;ÿ��phyllomic f I Ăą(Ăş,ĂłĂĂ° l Ăź ý������ omIk Ăź ý�Þ;ÿ�� f I ù�ú,Ăł!Ă° l Ăź ý����� omIk Ăź Ă˝ĂÞ�ÿ��phylogenesis Ăś fi ù�ú,Ăł lo Ă°Ëená sIsĂź ý�Þ;ÿ�� Ăś faI Ăą(Ăş,Ăł l ŠYù��#Ăłďż˝Ă°Ë Ăˇ nI ù�Ú;Ăł sIsĂź ý�Þ;ÿ��phylogenic Ăś fi ù�ú,Ăł lo Ă°ËenIk Ăź ý�Þ;ÿ�� Ăś faI Ăą(Ăş,Ăł l ŠYù��#Ăłďż˝Ă°Ë Ăˇ nIk Ăź ý�Þ;ÿ��physical Ă° f I Ăą(Ăş,Ăł z ù��>Ăł IkĂŚlĂź ý�Þ;ÿ�� Ă° f I ù�ú,Ăł z ù��>Ăł Ik Š\ù�þ\Ăł l Ăź ý�Þ;ÿ��physicist Ă° f I Ăą(Ăş,Ăł z ù��>Ăł IkIstĂź ý�Þ;ÿ�� Ă° f I ù�ú,Ăł z ù��>Ăł Isù�ø�ó IstĂź ý�Þ;ÿ��physics Ă° f I Ăą(Ăş,Ăł z ù��>Ăł Ik ďż˝ sĂź ý�Þ;ÿ�� Ă° f I ù�ú,Ăł z ù��>Ăł Ik ďż˝ sĂź ý�Þ�ÿ��physic Ă° f I Ăą(Ăş,Ăł z ù��>Ăł Ik Ăź ý�Þ�ÿ�� Ă° f I ù�ú,Ăł z ù��>Ăł Ik Ăź ý�Þ;ÿ��phytophagous fi ù�ú,Ăł Ă° tofagosĂź Ă˝ĂÞ�ÿ�� faI Ăą(Ăş,Ăł Ă° tof Š ù�þ\Ăł gŠ sĂź ý�Þ;ÿ��phytophagy fi ù�ú,Ăł Ă° tofa I Ăź ý�Þ�ÿ�� faI Ăą(Ăş,Ăł Ă° tof Š ù�þ\Ăł Ëi Ăź ý�Þ;ÿ��pilose Ă°pÄąlos Ă°paI loUspilosity pÄą Ă° losIti paI Ă° losItipisolite Ă°pÄąsoĂś l Äąt Ă°paIsŠ ù��#Ăł Ăś laItpisolitic ĂśpÄąsoĂ° l ÄątIk ĂśpaIsŠYù��#Ăł!Ă° l ItIkplasmagene Ă°plĂŚzù��>Ăł mĂŚĂśËen Ă°plĂŚzù��>Ăł mŠYù�þ\Ăł!ĂśËinplasmagenic ĂśplĂŚzù��>Ăł mĂŚĂ°ËenIk ĂśplĂŚzù��>Ăł mŠYù�þ\Ăł!Ă°Ë Ăˇ nIkplasticity plĂŚĂ°stIkIti plĂŚĂ°stIsù�ø�ó Itiplasticize Ă°plĂŚstI ĂśkÄąz Ă°plĂŚstI Ăśsù�ø�ó aIzplastic Ă°plĂŚstIk Ă°plĂŚstIkpleasance Ă°pleù�Ú>Ăľ\Ăł z ù��>Ăł ĂŚns Ă°pl á ù�Ú>Ăľ\Ăł z ù��>Ăł Š ù�þ\Ăł ns
114 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
please Ă°pleù�Ú>Ăľ\Ăł z ù��>Ăł Ă°pli ù�Ú^Ăľ5Ăł z ù��>Ăłplumose Ă°plumos Ă°plumoUsplumosity pluĂ°mosIti pluĂ°mosItipodsolization ĂśpodsolÄą Ă°zatyon ĂśpodsŠYù��5Ăł l I Ă°zeIsŠ npodsolize Ă°podsoĂś l Äąz Ă°podsŠ ù��5Ăł Ăś laIzpodzolization ĂśpodzolÄą Ă°zatyon ĂśpodzŠ ù��5Ăł l I Ă°zeIsŠ npodzolize Ă°podzoĂś l Äąz Ă°podzŠ ù��5Ăł Ăś laIzpoeticize poĂ° á tI ĂśkÄąz poU Ă° á tI Ăśsù�ø�ó aIzpoetic poĂ° á tIk poU Ă° á tIkpolarization ĂśpolĂŚrI Ă°zatyon ĂśpoUl Š ù�þ\Ăł rI Ă°zeIsŠ npolarize Ă°polĂŚĂś rÄąz Ă°poUl Š ù�þ\Ăł Ăś raIzpolemicist poĂ° l á mIkIst pŠ ù��5Ăł Ă° l á mIsù�ø�ó Istpolemic poĂ° l á mIk pŠYù��5ĂłĂĂ° l á mIkpolite poĂ° l Äąt pŠYù��5ĂłĂĂ° laItpolitical poĂ° l ÄątIkĂŚl pŠYù��5ĂłĂĂ° l ItIk ŠYù�þ\Ăł lpoliticize poĂ° l ÄątI ĂśkÄąz pŠYù��5ĂłĂĂ° l ItI Ăśsù�ø�ó aIzpolitic Ă°polÄątIk Ă°polItIkpolyphone Ă°polI Ăą(Ăş,ó�Ü fonĂź ý�Þ;ÿ�� Ă°poli Ăą(Ăş,ó�Ü foUn Ăź ý�Þ�ÿ��polyphonic ĂśpolI Ăą(Ăş,Ăł Ă° fonIk Ăź Ă˝ĂÞ�ÿ�� Ăśpoli Ăą(Ăş,Ăł Ă° fonIk Ăź ý�Þ;ÿ��porcine Ă°porkÄąn Ă°pďż˝ rsù�ø�ó aInpork Ă°porkĂą Ăż Ăł Ă°pďż˝ rk Ăą Ăż Ăłposterity poĂ°stá rIti poĂ°stá rItiposter Ă°postr Ă°poUstr.precocious prá>Ă°kokyos prI ù�Ú;Ăł Ă°koUsŠ sprecocity prá>Ă°kokIti prI Ă°kosù�ø�ó Itipredaceous prá>Ă°daky ù�Ú;Ăł os prI Ă°deIsù�øfĂš;Ăł Š spredacious prá>Ă°dakyos prI Ă°deIsŠ spredacity prá>Ă°dĂŚkIti prI Ă°dĂŚsù�ø�ó Itiprevocational ĂśprevoĂ°katyonĂŚl ĂśprivoU Ă°keIsŠ nŠYù�þ\Ăł lproctoscope Ă°proktoĂśskop Ă°proktŠYù��5Ăł ĂśskoUpproctoscopic ĂśproktoĂ°skopIk ĂśproktŠYù��5Ăł Ă°skopIkprodigal Ă°prodIgĂŚl Ă°prodŠ ù�� Ăł gŠ ù�þ\Ăł lprodigy Ă°prodIgI Ă°prodŠ ù�� Ăł Ëiprodrome Ă°prodrom Ă°proUdroUmprodromic proĂ°dromIk proU Ă°dromIkprofane proĂ° fan prŠ ù��#Ăł Ă° feInprofanity proĂ° fanIti prŠ ù��#Ăł Ă° fĂŚnItiprofound proĂ° fund prŠYù��#ĂłĂĂ° faUndprofundity proĂ° fu Ăą ò�ó ndIti prŠYù��#ĂłĂĂ° f Ă´ ndItipronounce proĂ°nuns prŠYù��#ĂłĂĂ°naUnspronunciation proĂśnu Ăą ò�ó nsù�ø�ó I Ă° atyon prŠYù��#ĂłĂĂśnĂ´ nsù�ø�ó i Ăą ďż˝ Ăł!Ă°eIsŠ nprosaicism proĂ°z ù��>Ăł aI ĂśkIsm proU Ă°z ù��>Ăł eI I Ăśsù�ø�ó IzŠ mprosaic proĂ°z ù��>Ăł aIk proU Ă°z ù��>Ăł eI Ikprototype Ă°protoĂś ti Ăą(Ăş,Ăł p Ă°proUt Š ù��5Ăł Ăś taI ù�ú,Ăł pprototypic ĂśprotoĂ° ti Ăą(Ăş,Ăł pIk ĂśproUt Š ù��5Ăł Ă° tI Ăą(Ăş,Ăł pIk
3.A. ENGLISHDEEPAND SHALLOW ORLâS 115
deep shallow
providence Ă°provIdá ns Ă°provIdŠ nsprovide proĂ°vÄąd prŠYù��#Ăł!Ă°vaIdprovocation ĂśprovoĂ°katyon Ăśprov Š\ù��#Ăł Ă°keIsŠ nprovoke proĂ°vok prŠYù��#Ăł!Ă°voUkpsammite Ă° ďż˝ ù��(Ăł sĂŚmĂź ý������ Äąt Ă° ďż˝ ù��,Ăł sĂŚmĂź ý������ aItpsammitic ďż˝ ù��(Ăł sĂŚĂ°m Ăź ý���� ÄątIk ďż˝ ù��,Ăł sĂŚĂ°m Ăź ý������ ItIkpsephite Ă° ďż˝ ù��(Ăł sefÄąt Ăź ý�Þ;ÿ�� Ă° ďż˝ ù��,Ăł sifaIt Ăź ý�Þ;ÿ��psephitic ďż˝ ù��(Ăł seĂ° fÄątIk Ăź ý�Þ�ÿ�� ďż˝ ù��,Ăł siĂ° f ItIk Ăź ý�Þ;ÿ��pteridophyte ďż˝ ù��(Ăł t á?Ă°rIdoĂś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� ďż˝ ù��,Ăł t Š�ð rIdŠ ù��#Ăł Ăś faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��pteridophytic ďż˝ ù��(Ăł t á?ĂśrIdoĂ° fi ù�ú,Ăł tIk Ăź ý�Þ�ÿ��%ďż˝ ù��,Ăł t Š�Ü rIdŠ ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��publicist Ă°pĂ´ blIkIst Ă°pĂ´ blIsù�ø�ó Istpublicity pĂ´/Ă°blIkIti pĂ´/Ă°blIsù�ø�ó Itipublicize Ă°pĂ´ blI ĂśkÄąz Ă°pĂ´ blI Ăśsù�ø�ó aIzpublic Ă°pĂ´ blIk Ă°pĂ´ blIkpugnacious pĂ´ gĂ°nakyos pĂ´ gĂ°neIsŠ spugnacity pĂ´ gĂ°nakIti pĂ´ gĂ°nĂŚsù�ø�ó Itipyrite Ă°pi Ăą(Ăş(Ăł rÄąt Ă°paI Ăą(Ăş,Ăł raItpyritic pi Ăą(Ăş(Ăł!Ă°rÄątIk paI Ăą(Ăş,ĂłĂĂ° rItIkpyroelectricity Ăśpi Ăą(Ăş(Ăł roá l á k Ă° trIkIti ĂśpaI Ăą(Ăş,Ăł roUI ù�Ú;Ăł l á k Ă° trIsù�ø�ó Itipyroelectric Ăśpi Ăą(Ăş(Ăł roá>Ă° l á ktrIk ĂśpaI Ăą(Ăş,Ăł roUI ù�Ú;Ăł Ă° l á ktrIkpyrrole Ă°pI Ăą(Ăş,Ăł r Ăź ý������ ol Ă°pI ù�ú,Ăł r Ăź ý����� oUlpyrrolic pI Ăą(Ăş,Ăł Ă° r Ăź ý������ olIk pI ù�ú,Ăł Ă° r Ăź ý����� olIkradioisotope Ăś radIoĂ° Äąsotop ĂśreIdi ù��vĂł oU Ă°aIsŠ ù��#Ăł toUpradioisotopic Ăś radIoĂś ÄąsoĂ° topIk ĂśreIdi ù��vĂł oU ĂśaIsŠ ù��#Ăł Ă° topIkradiopacity Ăś radIoĂ°pakIti ĂśreIdi ù��vĂł oU Ă°pĂŚsù�ø�ó Itiradiopaque Ăś radIoĂ°pakù��\ò�ó ĂśreIdi ù��vĂł oU Ă°peIk ù��\ò,Ăłradiophone Ă° radIo Ăź &�Þ;ÿ�� Ăś fonĂź ý�Þ;ÿ�� Ă°reIdi ù��vĂł oU Ăź &�Þ�ÿ��ĂĂś foUn Ăź ý�Þ;ÿ��radiophonic Ăś radIo Ăź &�Þ;ÿ�� Ă° fonIk Ăź ý�Þ;ÿ�� ĂśreIdi ù��vĂł oU Ăź &�Þ�ÿ��ĂĂ° fonIk Ăź ý�Þ;ÿ��radioscope Ă° radIoĂśskop Ă°reIdi ù��vĂł oU ĂśskoUpradioscopic Ăś radIoĂ°skopIk ĂśreIdi ù��vĂł oU Ă°skopIkradiotelephone Ăś radIo Ăź &�Þ;ÿ�� Ă° t á l á>Ăś fonĂź ý�Þ�ÿ�� ĂśreIdi ù��vĂł oU Ăź &�Þ�ÿ��ĂĂ° t á l Š�Ü foUn Ăź ý�Þ�ÿ��radiotelephonic Ăś radIo Ăź &�Þ;ÿ�� Ăś t á l á>Ă° fonIk Ăź ý�Þ;ÿ�� ĂśreIdi ù��vĂł oU Ăź &�Þ�ÿ��ĂĂś t á l Š�ð fonIk Ăź ý�Þ;ÿ��rapacious rĂŚĂ°pakyos r Š ù�þ\Ăł Ă°peIsŠ srapacity rĂŚĂ°pĂŚkIti r Š ù�þ\Ăł Ă°pĂŚsù�ø�ó Itirealization Ăś reĂŚlI Ă°zatyon Ăśri Š ù�þ\Ăł l I Ă°zeIsŠ nrealize Ă° reĂŚĂś l Äąz Ă°ri Š ù�þ\Ăł Ăś laIzrecitation Ăś r á sù�ø�ó Äą Ă° tatyon Ăśr á sù�ø�ó I Ă° teIsŠ nrecite r á>Ă°sù�ø�ó Äąt rI Ă°sù�ø�ó aItreclination Ăś r á klÄą Ă°natyon Ăśr á kl Š ù�� Ăł Ă°neIsŠ nrecline r á>Ă°klÄąn rI Ă°klaInregale r á>Ă°gal rI Ă°geI lregality r á>Ă°galIti rI Ă°gĂŚlItirenounce r á>Ă°nuns rI Ă°naUnsrenunciation r á>Ăśnu ù�ò�ó nsù�ø�ó I Ă° atyon rI ĂśnĂ´ nsù�ø�ó i Ăą ďż˝ Ăł!Ă°eIsŠ nreorganization Ăś reorgĂŚnI Ă°zatyon Ăśri ďż˝ rgŠ ù�þ\Ăł nI Ă°zeIsŠ n
116 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
reorganize reĂ°orgĂŚĂśnÄąz ri Ă° ďż˝ rgŠ\ù�þ\ĂłĂĂśnaIzresidence Ă°r á z ù��>Ăł Äądá ns Ă° r á z ù��>Ăł IdŠ nsreside r á?Ă°z ù��>Ăł Äąd rI Ă°z ù��>Ăł aIdresignation Ăśr á sÄągĂ°natyon Ăś r á z ù��>Ăł IgĂ°neIsŠ nresign r á?Ă° sÄągn rI Ă°z ù��>Ăł aI Ăą ďż˝ Ăž Ăł nreveal r á?Ă°veù�Ú>Ăľ\Ăł l rI Ă°vi ù�Ú>Ăľ5Ăł lrevelation Ăśr á veĂ° latyon Ăś r á v Š�ð leIsŠ nrevile r á?Ă°vÄąl rI Ă°vaI lrevocation Ăśr á voĂ°katyon Ăś r á v Š ù��#Ăł Ă°keIsŠ nrevoke r á?Ă°vok rI Ă°voUkrhetoric Ă°retorIk Ăź Ă˝ĂÞ�ÿ�� Ă° r á t Š ù��#Ăł rIk Ăź ý�Þ;ÿ��rhetor Ă°retorĂź ý�Þ;ÿ�� Ă° rit Š ù��5Ăł r Ăź ý�Þ�ÿ��rhyolite Ă°ri Ăą(Ăş(Ăł oĂś l Äąt Ăź ý�Þ�ÿ�� Ă° raI ù�ú,ó�ŠYù��#ó�Ü laIt Ăź ý�Þ;ÿ��rhyolitic Ăśri Ăą(Ăş(Ăł oĂ° l ÄątIk Ăź ý�Þ;ÿ�� Ăś raI ù�ú,ó�ŠYù��#ó�ð l ItIk Ăź ý�Þ;ÿ��rhythmicity rI Ăą(Ăş,Ăł('#Ă°mIkItI Ăź ý�Þ;ÿ�� rI Ăą(Ăş(Ăłďż˝'5Ă°mIsù�ø�ó Iti Ăź ý�Þ;ÿ��rhythmics Ă°rI Ăą(Ăş,Ăłďż˝' mIk ďż˝ sĂź ý�Þ;ÿ�� Ă° rI Ăą(Ăş(Ăł ' mIk ďż˝ sĂź ý�Þ�ÿ��rimose Ă°rÄąmos Ă° raImoUsrimosity rÄą Ă°mosIti raI Ă°mosItiromanticism roĂ°mĂŚntI ĂśkIsm roU Ă°mĂŚntI Ăśsù�ø�ó IzŠ mromanticist roĂ°mĂŚntIkIst roU Ă°mĂŚntIsù�ø�ó Istromanticize roĂ°mĂŚntI ĂśkÄąz roU Ă°mĂŚntI Ăśsù�ø�ó aIzromantic roĂ°mĂŚntIk roU Ă°mĂŚntIkrugose Ă°rugos Ă° rugoUsrugosity ruĂ°gosIti ruĂ°gosItirusticity r Ă´/Ă°stIkIti r Ă´/Ă°stIsù�ø�ó Itirustic Ă°r Ă´ stIk Ă° r Ă´ stIksabulose Ă°sĂŚbUl Ăś os Ă°sĂŚbyŠ�Ü loUssabulosity ĂśsĂŚbU Ă° losIti ĂśsĂŚbyŠ�ð losItisagacious sĂŚĂ°gakyos sŠYù�þ5Ăł!Ă°geIsŠ ssagacity sĂŚĂ°gakIti sŠYù�þ5Ăł!Ă°gĂŚsù�ø�ó Itisalacious sĂŚĂ° lakyos sŠYù�þ5Ăł!Ă° leIsŠ ssalacity sĂŚĂ° lakIti sŠ ù�þ5Ăł Ă° lĂŚsù�ø�ó Itisalicine Ă°sĂŚlIkInďż˝ ù�Ú;Ăł Ă°sĂŚlIsù�ø�ó Inďż˝ ù�Ú;Ăłsalicin Ă°sĂŚlIkIn Ă°sĂŚlIsù�ø�ó Insalic Ă°sĂŚlIk Ă°sĂŚlIksane Ă°san Ă°seInsanity Ă°sanIti Ă°sĂŚnItisaprolite Ă°sĂŚproĂś l Äąt Ă°sĂŚprŠ ù��#Ăł Ăś laItsaprolitic ĂśsĂŚproĂ° l ÄątIk ĂśsĂŚprŠ ù��#Ăł Ă° l ItIksaprophyte Ă°sĂŚproĂś fi Ăą(Ăş,Ăł t Ăź Ă˝ĂÞ�ÿ�� Ă°sĂŚprŠYù��#ĂłĂĂś faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��saprophytic ĂśsĂŚproĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� ĂśsĂŚprŠYù��#ĂłĂĂ° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��satellite Ă°sĂŚtá>Ăś l Ăź ý����� Äąt Ă°sĂŚtŠ�Ü l Ăź ý������ aItsatellitic ĂśsĂŚtá>Ă° l Ăź ý����� ÄątIk ĂśsĂŚtŠ�ð l Ăź ý������ ItIksaturnine Ă°sĂŚtUr ĂśnÄąn Ă°sĂŚtŠYĂą ò�ó r ĂśnaInsaturninity Ă°sĂŚtUr ĂśnÄąnIti Ă°sĂŚtŠ Ăą ò�ó r ĂśnInIti
3.A. ENGLISHDEEPAND SHALLOW ORLâS 117
deep shallow
saxophone Ă°sĂŚksoĂś fonĂź ý�Þ;ÿ�� Ă°sĂŚksŠYù��5ó�Ü foUn Ăź ý�Þ;ÿ��saxophonic ĂśsĂŚksoĂ° fonIk Ăź ý�Þ;ÿ�� ĂśsĂŚksŠYù��5ó�ð fonIk Ăź Ă˝ĂÞ�ÿ��schizomycete Ăśskù�øaÝ�ó Izomi Ăą(Ăş,ĂłĂĂ°ket Ăśskù�øaĂť,Ăł IzoUmaI Ăą(Ăş,ĂłĂĂ°sù�ø�ó itschizomycetic Ăśskù�øaÝ�ó Izomi Ăą(Ăş,ĂłĂĂ°ketIk Ăśskù�øaĂť,Ăł IzoUmaI Ăą(Ăş,ĂłĂĂ°sù�ø�óïá tIkschizophyte Ă°skù�øaÝ�ó IzoĂś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� Ă°skù�øaĂť,Ăł IzŠ ù��#Ăł Ăś faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��schizophytic Ăśskù�øaÝ�ó IzoĂ° fi ù�ú,Ăł tIk Ăź ý�Þ;ÿ�� Ăśskù�øaĂť,Ăł IzŠ ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��scholasticism skù�øaÝ�ó oĂ° lĂŚstI ĂśkIsm skù�øaĂť,Ăł Š ù��5Ăł Ă° lĂŚstI Ăśsù�ø�ó IzŠ mscholastic skù�øaÝ�ó oĂ° lĂŚstIk skù�øaĂť,Ăł Š ù��5Ăł Ă° lĂŚstIkseismoscope Ă°si ù�Ú"ďż˝ Ăł z ù��>Ăł moĂśskop Ă°saI ù�Ú"ďż˝ Ăł z ù��>Ăł mŠ ù��#Ăł ĂśskoUpseismoscopic Ăśsi ù�Ú"ďż˝ Ăł z ù��>Ăł moĂ°skopIk ĂśsaI ù�Ú"ďż˝ Ăł z ù��>Ăł mŠ ù��#Ăł Ă°skopIksemen Ă°semá n Ă°simŠ nsemination Ăśsemá Ăą ďż˝ Ăł Ă°natyon Ăśsá mŠ ù�� Ăł Ă°neIsŠ nsemiparasite Ăśsá mI Ă°pĂŚrĂŚĂś sÄąt Ăśsá mi Ăą ďż˝ Ăł!Ă°pĂŚrŠYù�þ\ĂłĂĂśsaItsemiparasitic Ăśsá mI ĂśpĂŚrĂŚĂ° sÄątIk Ăśsá mi Ăą ďż˝ Ăł!ĂśpĂŚrŠYù�þ\ĂłĂĂ°sItIksepticemia Ăśsá ptI Ă°kemIĂŚ Ăśsá ptI Ă°sù�ø�ó imi ù�� Ăł!Šsepticidal Ăśsá ptI Ă°kÄądĂŚl Ăśsá ptI Ă°sù�ø�ó aIdŠ\ù�þ\Ăł lsepticity sá pĂ° tIkIti sá pĂ° tIsù�ø�ó Itiseptic Ă°sá ptIk Ă°sá ptIksequacious sá>Ă°kwakyos sI ù�Ú;Ăł Ă°kweIsŠ ssequacity sá>Ă°kwakIti sI ù�Ú;Ăł Ă°kwĂŚsù�ø�ó Itiserene sá>Ă° ren sŠ�ðrinserenity sá>Ă° renIti sŠ�ðr á nItisiderite Ă°sIdá?ĂśrÄąt Ă°sIdŠ�Ü raItsideritic ĂśsIdá?Ă°rÄątIk ĂśsIdŠ�ð rItIksigmoidoscope sIgĂ°moIdoĂśskop sIgĂ°moIdŠ ù��5Ăł ĂśskoUpsigmoidoscopic sIgĂśmoIdoĂ°skopIk sIgĂśmoIdŠ ù��5Ăł Ă°skopIksilicic sI Ă° l IkIk sI Ă° l Isù�ø�ó Iksilicide Ă°sI l I ĂśkÄąd Ă°sI l I Ăśsù�ø�ó aIdsiliciferous ĂśsI l I Ă°kIf á ros ĂśsI l I Ă°sù�ø�ó Ifr. Š ssilicify sI Ă° l IkI Ăś fÄą sI Ă° l Isù�ø�ó!ŠYĂą ďż˝ ó�Ü faI
silicon ðsI l Ikon ðsI l Ik ŠYù��#ó nsomite ðsomĹt ðsoUmaItsomitic soðmĹtIk soU ðmItIksone ðson ðsoUnsonic ðsonIk ðsonIkspecifiable ðspá sù�ø�ó I Ü fĹÌbI l ðspá sù�ø�ó Š ù�� ó Ü faI Š ù�þ\ó bŠ lspecification Üspá sù�ø�ó If I ðkatyon Üspá sù�ø�ó Š ù�� ó f Š ù�� ó ðkeIsŠ nspecificative ðspá sù�ø�ó If I ÜkatIv � ù�Ú;ó ðspá sù�ø�ó Š ù�� ó f Š ù�� ó ÜkeItIv � ù�Ú;óspecificity Üspá sù�ø�ó I ð fĹkIti Üspá sù�ø�ó Š ù�� ó ð f Isù�ø�ó Itispecify ðspá sù�ø�ó I Ü fĹ ðspá sù�ø�ó�ŠYù�� óïÜ faI
specimen Ă°spá sù�ø�ó Imá n Ă°spá sù�ø�ó�ŠYù�� Ăł mŠ nspectrohelioscope Ăśspá ktroĂ°helIoĂśskop Ăśspá ktr ŠYù��#ĂłĂĂ°hili ù��vóÊYù��#ó�ÜskoUpspectrohelioscopic Ăśspá ktroĂśhelIoĂ°skopIk Ăśspá ktr ŠYù��#ĂłĂĂśhili ù��vóÊYù��#ó�ðskopIkspectroscope Ă°spá ktroĂśskop Ă°spá ktr ŠYù��#ĂłĂĂśskoUpspectroscopic Ăśspá ktroĂ°skopIk Ăśspá ktr Š ù��#Ăł Ă°skopIk
118 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
sphericity sfeĂ°rIkItI Ăź Ă˝ĂÞ�ÿ�� sfá>Ă° rIsù�ø�ó Iti Ăź ý�Þ;ÿ��spherics Ă°sferIk ďż˝ sĂź ý�Þ;ÿ�� Ă°sfá rIk ďż˝ sĂź Ă˝ĂÞ�ÿ��spinose Ă°spÄą Ăśnos Ă°spaI ĂśnoUsspinosity spÄą Ă°nosIti spaI Ă°nosItisporophyte Ă°sporoĂś fi Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ�� Ă°spďż˝ r Š ù��#Ăł Ăś faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��sporophytic ĂśsporoĂ° fi Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ�� Ăśspďż˝ r Š ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��state Ă°stat Ă°steItstatic Ă°statIk Ă°stĂŚtIkstaurolite Ă°stďż˝ roĂś l Äąt Ă°stďż˝ ù�þ;ò�ó r Š ù��5Ăł Ăś laItstaurolitic Ăśstďż˝ roĂ° l ÄątIk Ăśstďż˝ ù�þ;ò�ó r Š ù��5Ăł Ă° l ItIkstauroscope Ă°stďż˝ roĂśskop Ă°stďż˝ ù�þ;ò�ó r Š ù��5Ăł ĂśskoUpstauroscopic Ăśstďż˝ roĂ°skopIk Ăśstďż˝ ù�þ;ò�ó r Š ù��5Ăł Ă°skopIksteatite Ă°steĂŚĂś tÄąt Ă°stiŠ\ù�þ\óïÜ taItsteatitic ĂśsteĂŚĂ° tÄątIk ĂśstiŠ\ù�þ\óïð tItIksteatopyga ĂśsteĂŚtoĂ°pi Ăą(Ăş,Ăł gĂŚ ĂśstiŠ\ù�þ\Ăł toU Ă°paI ù�ú,Ăł gŠsteatopygia ĂśsteĂŚtoĂ°pi Ăą(Ăş,Ăł gIĂŚ ĂśstiŠ\ù�þ\Ăł toU Ă°paI ù�ú,Ăł Ëi ù��vóÊsteatopygous ĂśsteĂŚtoĂ°pi Ăą(Ăş,Ăł gos ĂśstiŠ ù�þ\Ăł toU Ă°paI ù�ú,Ăł gŠ sstenotype Ă°stá noĂś ti Ăą(Ăş,Ăł p Ă°stá nŠ ù��#Ăł Ăś taI ù�ú,Ăł pstenotypic Ăśstá noĂ° ti Ăą(Ăş,Ăł pIk Ăśstá nŠ ù��#Ăł Ă° tI Ăą(Ăş,Ăł pIkstereoscope Ă°stá reoĂśskop Ă°stá ri Š ù��5Ăł ĂśskoUpstereoscopic Ăśstá reoĂ°skopIk Ăśstá ri Š ù��5Ăł Ă°skopIkstereotype Ă°stá reoĂś ti ù�ú,Ăł p Ă°stá ri Š ù��5Ăł Ăś taI Ăą(Ăş(Ăł pstereotypic Ăśstá reoĂ° ti ù�ú,Ăł pIk Ăśstá ri Š ù��5Ăł Ă° tI Ăą(Ăş,Ăł pIksterilization Ăśstá rI l I Ă°zatyon Ăśstá r ŠYù�� Ăł l I Ă°zeIsŠ nsterilize Ă°stá rI Ăś l Äąz Ă°stá r ŠYù�� óïÜ laIzstethoscope Ă°st� oĂśskop Ă°stá���ŠYù��#ó�ÜskoUpstethoscopic Ăśst� oĂ°skopIk Ăśstá���ŠYù��#ó�ðskopIkstoicism Ă°stoI ĂśkIsm Ă°stoUI Ăśsù�ø�ó IzŠ mstoic Ă°stoIk Ă°stoUIkstrobilation ĂśstrobI Ă° latyon ĂśstroUbŠ ù�� Ăł Ă° leIsŠ nstrobila stroĂ°bÄąlĂŚ stroU Ă°baI l Šstroboscope Ă°stroboĂśskop Ă°stroUbŠ ù��#Ăł ĂśskoUpstroboscopic ĂśstroboĂ°skopIk ĂśstroUbŠ ù��#Ăł Ă°skopIkstromatolite stroĂ°mĂŚtoĂś l Äąt stroU Ă°mĂŚtŠ ù��#Ăł Ăś laItstromatolitic stroĂśmĂŚtoĂ° l ÄątIk stroU ĂśmĂŚtŠ ù��#Ăł Ă° l ItIkstylite Ă°sti Ăą(Ăş(Ăł l Äąt Ă°staI ù�ú,Ăł laItstylitic sti Ăą(Ăş(Ăł Ă° l ÄątIk staI ù�ú,Ăł Ă° l ItIkstylolite Ă°sti Ăą(Ăş(Ăł lo Ăś l Äąt Ă°staI ù�ú,Ăł l ŠYù��#Ăł!Ăś laItstylolitic Ăśsti Ăą(Ăş(Ăł lo Ă° l ÄątIk ĂśstaI ù�ú,Ăł l ŠYù��#Ăł!Ă° l ItIkstypticity stI Ăą(Ăş,Ăł pĂ° tIkIti stI Ăą(Ăş(Ăł pĂ° tIsù�ø�ó Itistyptic Ă°stI Ăą(Ăş,Ăł ptIk Ă°stI Ăą(Ăş(Ăł ptIksublime sU Ă°blÄąm sŠYĂą ò�óïðblaImsublimity sU Ă°blImIti sŠ Ăą ò�ó Ă°blImItisubvene sUbĂ°ven sŠ Ăą ò�ó bĂ°vinsubvention sUbĂ°ventyon sŠ Ăą ò�ó bĂ°v á ncŠ n
3.A. ENGLISHDEEPAND SHALLOW ORLâS 119
deep shallow
sulfite Ă°sĂ´ lf Äąt Ă°sĂ´ lfaItsulfitic sĂ´ l Ă° fÄątIk sĂ´ l Ă° f ItIksupervene Ăśsupá r Ă°ven Ăśsupr. Ă°vinsupervention Ăśsupá r Ă°ventyon Ăśsupr. Ă°v á ncŠ nsybarite Ă°sI Ăą(Ăş,Ăł bĂŚĂś rÄąt Ă°sI ù�ú,Ăł bŠ ù�þ\Ăł ĂśraItsybaritic ĂśsI Ăą(Ăş,Ăł bĂŚĂ° rÄątIk ĂśsI ù�ú,Ăł bŠ ù�þ\Ăł Ă°rItIksyenite Ă°si ù�ú,Ăł á>ĂśnÄąt Ă°saI Ăą(Ăş,Ăł Š�ÜnaItsyenitic Ăśsi ù�ú,Ăł á>Ă°nÄątIk ĂśsaI Ăą(Ăş,Ăł Š�ðnItIksynagogical ĂśsI Ăą(Ăş,Ăł nĂŚĂ°gogIkĂŚl ĂśsI ù�ú,Ăł nŠ ù�þ\Ăł Ă°go Ik Š ù�þ5Ăł lsynagogue Ă°sI Ăą(Ăş,Ăł nĂŚĂśgogďż˝ Ăą ò�ó ďż˝ ù�Ú�ó Ă°sI ù�ú,Ăł nŠ ù�þ\Ăł Ăśgogďż˝ ù�ò�ó ďż˝ ù�Ú;Ăłsyndrome Ă°sI Ăą(Ăş,Ăł ndrom Ă°sI ù�ú,Ăł ndroUmsyndromic sI Ăą(Ăş,Ăł nĂ°dromIk sI ù�ú,Ăł nĂ°dromIktachistoscope tĂŚĂ°k ù�øaÝ�ó IstoĂśskop t Š\ù�þ\ó�ðk ù�øfÝ�ó IstŠYù��5óïÜskoUptachistoscopic tĂŚĂśk ù�øaÝ�ó IstoĂ°skopIk t Š\ù�þ\ó�Ük ù�øfÝ�ó IstŠYù��5óïðskopIktachylite Ă° tĂŚkù�øaÝ�ó I ù�ú,Ăł!Ăś l Äąt Ă° tĂŚkù�øaĂť,óÊYĂą(Ăş(Ăł!Ăś laIttachylyte Ă° tĂŚkù�øaÝ�ó I ù�ú,Ăł!Ăś li ù�ú,Ăł t Ă° tĂŚkù�øaĂť,óÊYĂą(Ăş(Ăł!Ăś laI Ăą(Ăş(Ăł ttachylytic Ăś tĂŚkù�øaÝ�ó I ù�ú,Ăł Ă° li ù�ú,Ăł tIk Ăś tĂŚkù�øaĂť,Ăł Š Ăą(Ăş(Ăł Ă° l I Ăą(Ăş,Ăł tIktelescope Ă° t á l á>Ăśskop Ă° t á l Š�ÜskoUptelescopic Ăś t á l á>Ă°skopIk Ăś t á l I ù�Ú;Ăł Ă°skopIktenacious t á>Ă°nakyos t Š�ðneIsŠ stenacity t á>Ă°nĂŚkIti t Š�ðnĂŚsù�ø�ó Ititephrite Ă° t á frÄąt Ăź ý�Þ;ÿ�� Ă° t á fraIt Ăź ý�Þ;ÿ��tephritic t á f Ă°rÄątIk Ăź ý�Þ;ÿ�� t á f Ă° rItIk Ăź ý�Þ;ÿ��tetrabasicity Ăś t á trĂŚbaĂ°sIkIti Ăś t á tr Š ù�þ5Ăł beI Ă°sIsù�ø�ó Ititetrabasic Ăś t á trĂŚĂ°basIk Ăś t á tr ŠYù�þ5Ăł!Ă°beIsIkthallophyte Ă° ďż˝ ĂŚlĂź ý���� oĂś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� Ă° ďż˝ ĂŚlĂź ý������tŠYù��5ó�Ü faI Ăą(Ăş,Ăł t Ăź ý�Þ;ÿ��thallophytic Ăś ďż˝ ĂŚlĂź ý���� oĂ° fi ù�ú,Ăł tIk Ăź ý�Þ�ÿ�� Ăś ďż˝ ĂŚlĂź ý������tŠYù��5ó�ð f I ù�ú,Ăł tIk Ăź ý�Þ;ÿ��theodolite ďż˝ eĂ°odoĂś l Äąt ďż˝ i Ă°odŠYù��#ó�Ü laIttheodolitic ďż˝ eĂśodoĂ° l ÄątIk ďż˝ i ĂśodŠYù��#ó�ð l ItIkthermoelectricity Ăś ďż˝;á rmoá l á k Ă° trIkIti Ăś ďż˝ r.moUI ù�Ú;Ăł l á k Ă° trIsù�ø�ó Itithermoelectric Ăś ďż˝;á rmoá>Ă° l á ktrIk Ăś ďż˝ r.moUI ù�Ú;Ăł Ă° l á ktrIkthermoplasticity Ăś ďż˝;á rmoplĂŚĂ°stIkIti Ăś ďż˝ r.mŠ ù��#Ăł plĂŚĂ°stIsù�ø�ó Itithermoplastic Ăś ďż˝;á rmoĂ°plĂŚstIk Ăś ďż˝ r.mŠ ù��#Ăł Ă°plĂŚstIkthermoscope Ă° ďż˝;á rmĂśoskop Ă° ďż˝ r.mŠ ù��#Ăł ĂśskoUpthermoscopic Ăś ďż˝;á rmoĂ°skopIk Ăś ďż˝ r.mŠ ù��#Ăł Ă°skopIkthoracic ďż˝ oĂ° rĂŚkIk ����ðrĂŚsù�ø�ó Ikthorax Ă° ďż˝ orĂŚks Ă° ��� rĂŚksthrombocyte Ă° ďż˝ romboĂśsù�ø�ó i ù�ú,Ăł t Ă° ďż˝ rombŠYù��#ó�Üsù�ø�ó aI Ăą(Ăş(Ăł tthrombocytic Ăś ďż˝ romboĂ°sù�ø�ó i ù�ú,Ăł tIk Ăś ďż˝ rombŠYù��#ó�ðsù�ø�ó I Ăą(Ăş,Ăł tIktone Ă° ton Ă° toUntonic Ă° tonIk Ă° tonIktope Ă° top Ă° toUptopic Ă° topIk Ă° topIktorose Ă° toros Ă° t ďż˝ roUstorosity toĂ° rosIti t ��ðrosIti
120 CHAPTER3. ORL DEPTHAND CONSISTENCY
deep shallow
toxicity toĂ°ksIkIti toĂ°ksIsù�ø�ó Ititoxic Ă° toksIk Ă° toksIktoxophilite toĂ°ksofI Ăś l Äąt Ăź ý�Þ;ÿ�� toĂ°ksofŠYĂą ďż˝ ó�Ü laIt Ăź ý�Þ;ÿ��toxophilitic toĂśksofI Ă° l ÄątIk Ăź ý�Þ;ÿ�� toĂśksofŠYĂą ďż˝ ó�ð l ItIk Ăź Ă˝ĂÞ�ÿ��trephination Ăś tr á fÄą Ă°natyonĂź Ă˝ĂÞ�ÿ�� Ăś tr á f Š Ăą ďż˝ Ăł Ă°neIsŠ n Ăź ý�Þ;ÿ��trephine tr á>Ă° fÄąn Ăź ý�Þ;ÿ�� trI ù�Ú;Ăł Ă° faIn Ăź Ă˝ĂÞ�ÿ��triazole Ă° trĹÌÜzol Ă° traI Š ù�þ\Ăł ĂśzoUltriazolic Ăś trĹÌðzolIk Ăś traI Š ù�þ\Ăł Ă°zolIktrichite Ă° trIk ù�øaÝ�ó Äąt Ă° trIk ù�øfÝ�ó aIttrichitic trI Ă°k ù�øaÝ�ó ÄątIk trI Ă°k ù�øfÝ�ó ItIktrilobite Ă° trÄąlo ĂśbÄąt Ă° traI l Š ù��#Ăł ĂśbaIttrilobitic Ăś trÄąlo Ă°bÄątIk Ăś traI l Š ù��#Ăł Ă°bItIktroglodyte Ă° trogloĂśdi ù�ú,Ăł t Ă° troglŠYù��#ó�ÜdaI Ăą(Ăş,Ăł ttroglodytic Ăś trogloĂ°di ù�ú,Ăł tIk Ăś troglŠYù��#ó�ðdI ù�ú,Ăł tIktrope Ă° trop Ă° troUptropic Ă° tropIk Ă° tropIktropophyte Ă° tropoĂś fi Ăą(Ăş(Ăł t Ăź Ă˝ĂÞ�ÿ�� Ă° tropŠYù��5ó�Ü faI ù�ú,Ăł t Ăź Ă˝ĂÞ�ÿ��tropophytic Ăś tropoĂ° fi Ăą(Ăş(Ăł tIk Ăź ý�Þ;ÿ�� Ăś tropŠYù��5ó�ð f I Ăą(Ăş(Ăł tIk Ăź ý�Þ�ÿ��trypanosome Ă° trI Ăą(Ăş,Ăł pĂŚnoĂśsom Ă° trI Ăą(Ăş,Ăł pŠ ù�þ5Ăł nŠ ù��5Ăł ĂśsoUmtrypanosomic Ăś trI Ăą(Ăş,Ăł pĂŚnoĂ°somIk Ăś trI Ăą(Ăş,Ăł pŠ ù�þ5Ăł nŠ ù��5Ăł Ă°somIktuberose Ă° tubá>Ăś ros Ă° tubŠ�ÜroUstuberosity Ăś tubá>Ă° rosIti Ăś tubŠ�ðrosItiultramicroscope Ăś Ă´ ltrĂŚĂ°mÄąkroĂśskop Ăś Ă´ ltr Š ù�þ5Ăł Ă°maIkr Š ù��#Ăł ĂśskoUpultramicroscopic Ăś Ă´ ltrĂŚĂśmÄąkroĂ°skopIk Ăś Ă´ ltr Š ù�þ5Ăł ĂśmaIkr Š ù��#Ăł Ă°skopIkunchaste Ăś Ă´ nĂ° cast Ăś Ă´ nĂ° ceIstunchastity Ăś Ă´ nĂ° castIti Ăś Ă´ nĂ° cĂŚstItiuralite Ă°yUrĂŚĂś l Äąt Ă°yUr. ŠYù�þ\óïÜ laIturalitic ĂśyUrĂŚĂ° l ÄątIk ĂśyUr. ŠYù�þ\óïð l ItIkuranite Ă°yUraĂśnÄąt Ă°yUr. ŠYù�þ\óïÜnaIturanitic ĂśyUraĂ°nÄątIk ĂśyUr. ŠYù�þ\óïðnItIkurbane Ur Ă°ban r. ù�ò �ó!Ă°beInurbanity Ur Ă°banIti r. ù�ò �ó Ă°bĂŚnItiurbanization ĂśUrbĂŚnÄą Ă°zatyon Ăś r. ù�ò �ó bŠ ù�þ\Ăł nI Ă°zeIsŠ nurbanize Ă°UrbÜÌnÄąz Ă° r. ù�ò �ó bŠ ù�þ\Ăł ĂśnaIzvaccination ĂśvĂŚkù�ø�ó sù�ø�ó I Ă°natyon ĂśvĂŚkù�ø�ó sù�ø�ó Š ù��vĂł Ă°neIsŠ nvaccine vĂŚĂ°k ù�ø�ó sù�ø�ó eĂą ďż˝ Ăł n vĂŚĂ°k ù�ø�ó sù�ø�ó i Ăą ďż˝ Ăł nvaporization ĂśvaporÄą Ă°zatyon ĂśveIpŠ ù��#Ăł rI Ă°zeIsŠ nvaporize Ă°vapoĂś rÄąz Ă°veIpŠ ù��#Ăł ĂśraIzvaricose Ă°vĂŚrI Ăśkos Ă°vĂŚrŠ Ăą ďż˝ Ăł ĂśkoUsvaricosity ĂśvĂŚrI Ă°kosIti ĂśvĂŚrŠYĂą ďż˝ ĂłĂĂ°kosItivariolite Ă°vĂŚrIoĂś l Äąt Ă°v á;ù�þ5Ăł r.i ù�� ó�ŠYù��#Ăł!Ăś laItvariolitic ĂśvĂŚrIoĂ° l ÄątIk Ăśv á;ù�þ5Ăł r.i ù�� ó�ŠYù��#Ăł!Ă° l ItIkvaticination ĂśvĂŚtIkI Ă°natyon ĂśvĂŚtIsù�ø�ó I Ă°neIsŠ nvatic Ă°vĂŚtIk Ă°vĂŚtIkventricose Ă°v á ntrI Ăśkos Ă°v á ntrŠ ù�� Ăł ĂśkoUs
3.A. ENGLISHDEEPAND SHALLOW ORLâS 121
deep shallow
ventricosity Ăśv á ntrI Ă°kosIti Ăśv á ntrŠYù�� ĂłĂĂ°kosItiveracious v á>Ă° rakyos v Š�ð reIsŠ sveracity v á>Ă° rakIti v Š�ð rĂŚsù�ø�ó Itiverbose v á r Ă°bos vr. Ă°boUsverbosity v á r Ă°bosIti vr. Ă°bosItiverrucose Ă°v á r Ăź ý������ U Ăśkos Ă°v á r Ăź ý�����tŠ ù�ò,Ăł ĂśkoUsverrucosity Ăśv á r Ăź ý������ U Ă°kosIti Ăśv á r Ăź ý�����tŠ ù�ò,Ăł Ă°kosItivertical Ă°v á rtIkĂŚl Ă°vr.tIk Š ù�þ\Ăł lvertices Ă°v á rt Ăś Ik ďż˝ ez Ă°vr.tI Ăśsù�ø�ó ďż˝ izvideophone Ă°vIdeoĂś fonĂź ý�Þ;ÿ�� Ă°vIdi Š ù��5Ăł Ăś foUn Ăź ý�Þ;ÿ��videophonic ĂśvIdeoĂ° fonIk Ăź ý�Þ�ÿ�� ĂśvIdi Š ù��5Ăł Ă° fonIk Ăź ý�Þ;ÿ��vinosity vI Ă°nosIti vI Ă°nosItivinous Ă°vÄąnos Ă°vaInŠ sviscose Ă°vIskos Ă°vIskoUsviscosity vIsĂ°kosIti vIsĂ°kosItivivacious vI Ă°vakyos vI Ă°veIsŠ svivacity vI Ă°vĂŚkIti vI Ă°vĂŚsù�ø�ó Itivocational voĂ°katyonĂŚl voU Ă°keIsŠ nŠYù�þ5Ăł lvocation voĂ°katyon voU Ă°keIsŠ nvoracious voĂ° rakyos v ��ð reIsŠ svoracity voĂ° rakIti v ��ð rĂŚsù�ø�ó Itivortical Ă°vortIkĂŚl Ă°v ďż˝ rtIk Š ù�þ5Ăł lvortices Ă°vortI Ăśk ďż˝ ez Ă°v ďż˝ rtI Ăśsù�ø�ó ďż˝ izvorticism Ă°vortI ĂśkIsm Ă°v ďż˝ rtI Ăśsù�ø�ó IzŠ mxerophyte Ă°z Ăąďż˝)/Ăł á roĂś fi Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ�� Ă°z Ăąďż˝)/Ăł I ù�Ú�ó r. Š ù��#Ăł Ăś faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��xerophytic Ăśz Ăąďż˝)/Ăł á roĂ° fi Ăą(Ăş(Ăł tIk Ăź ý�Þ�ÿ�� Ăśz Ăąďż˝)/Ăł I ù�Ú�ó r. Š ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��xylophone Ă°z Ăąďż˝)/Ăł i ù�ú,Ăł lo Ăś fonĂź ý�Þ;ÿ�� Ă°z Ăąďż˝)/Ăł aI Ăą(Ăş,Ăł l ŠYù��5óïÜ foUn Ăź ý�Þ;ÿ��xylophonic Ăśz Ăąďż˝)/Ăł i ù�ú,Ăł lo Ă° fonIk Ăź ý�Þ;ÿ�� Ăśz Ăąďż˝)/Ăł aI Ăą(Ăş,Ăł l ŠYù��5óïð fonIk Ăź Ă˝ĂÞ�ÿ��zeolite Ă°zeoĂś l Äąt Ă°zi Š\ù��#ĂłĂĂś laItzeolitic ĂśzeoĂ° l ÄątIk Ăśzi Š\ù��#ĂłĂĂ° l ItIkzoophile Ă°zooĂś fÄąl Ăź ý�Þ�ÿ�� Ă°zoU ŠYù��#ĂłĂĂś faI l Ăź Ă˝ĂÞ�ÿ��zoophilic ĂśzooĂ° fÄąl Ik Ăź ý�Þ�ÿ�� ĂśzoU Š ù��#Ăł Ă° f I l Ik Ăź ý�Þ�ÿ��zoophyte Ă°zooĂś fi ù�ú,Ăł t Ăź ý�Þ;ÿ�� Ă°zoU Š ù��#Ăł Ăś faI Ăą(Ăş(Ăł t Ăź ý�Þ;ÿ��zoophytic ĂśzooĂ° fi ù�ú,Ăł tIk Ăź Ă˝ĂÞ�ÿ�� ĂśzoU Š ù��#Ăł Ă° f I Ăą(Ăş,Ăł tIk Ăź ý�Þ;ÿ��zygote Ă°zi Ăą(Ăş,Ăł got Ă°zaI ù�ú,Ăł goUtzygotic zi Ăą(Ăş,Ăł Ă°gotIk zaI ù�ú,Ăł Ă°gotIk
122 CHAPTER3. ORL DEPTHAND CONSISTENCY
3.A.2 Rulesfor the deepORL
In the next two subappendicesI give the rulesneededfor the two differentORLâs. Ialso indicaterules that areneededfor the deepORL but not the shallow ORL, andvice versa,with thesymbolâ * â. Decidingwhich rulesaresharedis not ascompletelytrivial asit might seemsince,for instance,a rule thatconvertsanunderlying/u/ intoÂĄ ou§ is really equivalent to a rule that convertssurface/aU/ into ÂĄ ou§ , sincethelatter phonologicalrepresentationis supposedto be derived from the former. Suchcasesarecountedasmatching.On theotherhand,in somecasesonemayfind thatasingleunderlyingphonemeis representedin severalpossibleways: thus/yu/ surfacesas/yu/, /yU/ and/y Š /. In suchcasesonly oneof thecorrespondingshallow ORL rulesis countedasmatchingthedeepORL rule.
3.A. ENGLISHDEEPAND SHALLOW ORLâS 123
1 ďż˝ + , au-2 tyon + , tion -3 kyos + , cious- / #4 os + ous / #5 (k . k)s + x6 kw + , qu-7 gz + , x -/8 Äąz + , es- / 0 #
(plural /iz/ spelled , es- )/9 il + , le - / (a. i)b #
10 r + , er- / C #11 1 + , e- / [ 0 tense]C2 #
(Adda âsilentâ , e- after tensevowels)12 e + 1 / Äągn #13 e + 1 / Äą 3547698;: [ 0 cor, 0 cont] #
( , ea- requiresno âsilentâ , e- exceptwith intervening, s- )14 l + , le - / C #15 Äą + , ee- / Ă°C < #16 c + , ch-17 ' + , th -/18 g + , g -19 = + , th -20 b + , b -/21 k + , c -22 d + , d -23 f + , ph- / . . . [ 0 gk]24 r + , rh - / # . . . [ 0 gk]25 f + , f -26 g + , g -27 h + , h -28 l + , l -29 m + , m -30 n + , n -31 p + , p -32 r + , r -33 t + , t -34 v + , v -35 w + , w -36 yu + , u - / #37 y + , y -38 z + , z -39 s + , ce- / n #40 s + , s-41 u + , ou-42 oI + , oi -43 Ă´ + , u -44 a + , a-45 a + , a-46 e + , e-47 Äą + , i -48 o + , o -49 u + , u -
124 CHAPTER3. ORL DEPTHAND CONSISTENCY
50 u + , u -51 o + , o -52 I + , i -53 e + , e-54 Ë + , g - / >?,A@ďż˝-B.ďż˝,ACD-B.ďż˝,FE;-HG55 Ë + , j -56 k + , k - / >?,A@ďż˝-B.ďż˝,ACD-B.ďż˝,FE;-HG57 k + , c -58 i + , y - / #(this of coursecouldbemodeledasa surfaceconstraint âSection3.5)
3.A. ENGLISHDEEPAND SHALLOW ORLâS 125
3.A.3 Rulesfor the shallow ORL* 1 � à ¥ o § / r2 � à ¥ au§3 (c à s)Š n à ¥ tion § / ( Š l)? #4 sŠ s à ¥ cious§ / #* 5 o à ¥ a§ / w6 ks à ¥ x §7 gz à ¥ x §8 kw à ¥ qu§* 9 I à ¥ e§ / # pr à r à d* 10 aIz à ¥ ize§ / #* 11 z à ¥ s§ / #
12 r à ¥ er§ / C #* 13 zŠ m à ¥ sm§ / #14 Š s à ¥ ous§ / #15 Š l à ¥ le § / #16 � à ¥ e§ / [ � tense]Cý #17 e à � / Ĺ ù�IKJ�ó Ÿ [ � cor, � cont] #18 l à ¥ le § / C #19 i à ¥ ee§ / ðCà #20 c à ¥ ch§21 ' à ¥ th §22 L à ¥ th §23 b à ¥ b §24 d à ¥ d §25 f à ¥ ph§ / . . . [ � gk]26 r à ¥ rh § / # . . . [ � gk]27 f à ¥ f §28 g à ¥ g §29 h à ¥ h §30 l à ¥ l §31 m à ¥ m §32 n à ¥ n §33 p à ¥ p §34 r à ¥ r §35 t à ¥ t §36 v à ¥ v §37 w à ¥ w §38 yu à ¥ u § / #* 39 y Š à ¥ u §* 40 yU à ¥ u §41 y à ¥ y §42 z à ¥ z §43 s à ¥ ce§ / n #44 s à ¥ s§* 45 aU à ¥ ou§46 oI à ¥ oi §47 ô à ¥ u §
126 CHAPTER3. ORL DEPTHAND CONSISTENCY
48 ĂŚ Ă ÂĄ a§49 eI Ă ÂĄ a§50 i Ă ÂĄ y § / #51 i Ă ÂĄ e§52 aI Ă ÂĄ i §53 oU Ă ÂĄ o §54 u Ă ÂĄ u §55 U Ă ÂĄ u §56 o Ă ÂĄ o §57 I Ă ÂĄ i §58 e Ă ÂĄ e§* 59 r. Ă ÂĄ r § / V60 r. Ă ÂĄ er§* 61 ďż˝ Ă ÂĄ a§* 62 ďż˝ Ă ÂĄ n § / [ ďż˝ velar]* 63 Š Ă ÂĄ a§ / #* 64 Š Ă ÂĄ a§ / #* 65 Š Ă ÂĄ e§66 Ë Ă ÂĄ g § / žaÂĄBMç Ă/ÂĄONH§ Ă/ÂĄBPq§&Ă67 Ë Ă ÂĄ j §68 k Ă ÂĄ k § / žaÂĄBMç Ă/ÂĄONH§ Ă/ÂĄBPq§&Ă69 k Ă ÂĄ c §
Chapter 4
Linguistic Elements
Un petit dâun petitSâetonneauxHallesUn petit dâun petitAh! degreste fallentIndolentqui nesort cesseIndolentqui nesemeneQuâimporteun petit dâun petitToutGai deReguennes.
vanRooten,LuisdâAntin. 1967.MotsdâHeures:Gousses,Rames.ThedâAntinManuscrip-t, pageI. PenguinBooks,New York, NY.
In Section1.2 we madea numberof specificassumptionsaboutwhat kinds oflinguistic elementswritten symbolsrepresent.More specificallyandmoreformallywe assumedthat both phonologicaland semanticportionsof an AVM representinga morphemeor word can in principle licensegraphicalelements.This assumptionnaturallybegsthequestionof therangeof linguistic elementsthatcanberepresentedby written symbolsin the worldâs writing systems.This questionis the topic of thischapter. Thequestionof whatkindsof linguistic elementswritten symbolsrepresentis the single most investigatedissuein the study of writing systems. Gelb (1963)is normally creditedwith beingthe first to systematicallyinvestigatethe matter, andevery extensive discussionof the topic sincehaspresenteda classificationof writingsystemsbasedonwhich linguisticelementsthewriting systemsupposedlyrepresents.
I startthe discussion(Section4.1) with a review of someof the moreinfluentialtaxonomiesof writing systems.As weshallsee,thesetaxonomiesaremostlyarboreal:I will endthe sectionwith a proposalfor a non-arborealtwo-dimensionaltaxonomythat takesasonedimensionthe typeof phonographyencodedby thewriting system,andasanotherdimensionthedegreeof logographyof thesystem.
Chinesewriting was introducedin Chapter1 as a mixed systemthat generallyinvolvesbothphonographicandsemanticâ or logographicâ elements,a position
127
128 CHAPTER4. LINGUISTIC ELEMENTS
arguedmostforcefully by DeFrancis(1984;1989).Wefurtherjustify thisassumptionherein Section4.2.Alsoaswehavepreviouslyargued,thephonographicandsemanticelementsrepresentedgraphicallyin a Chinesecharacterare in a relationof overlap.This formal property, along with Axiom 1.3, hasmakes an interestingand correctpredictionaboutthewritten representationof disyllabicmorphemesin Chinese.
We turn in Section4.3 to a discussionof Japanesewriting, in particularwith re-spectto its useof Chinesecharacters.Japaneseis surelythe mostcomplex modernwriting system,andthehardestto force into any taxonomicmold. The propertiesofthe Chinesescriptasit is usedin the Chinesewriting systemcontrastratherdramat-ically with the useof the samebasicwriting systemâ kanji â in Japanese,a pointthatSampson(1985;1994)showsconvincingly in his discussionof this topic. As weshallargue,Japaneseuseof kanji is logographicto a greaterdegreethanis useof thesuperficiallysimilar setof symbolsin Chinese.Thecharacterizationof the Japanesewriting systemasa wholeis a ratherdifferentmatter, however. Thebestcharacteriza-tion wouldappearto bethatit is basicallyaphonographicsystem,but with significantamountsof logography.
We endthechapter(Section4.4)with a discussionof a few esotericgraphicalde-vicesusedin somewriting systems,andpresentananalysisof eachwithin thepresentframework.
4.1 Taxonomiesof Writing Systems:A Brief Overview
Our purposehereis not to beexhaustivebut ratherto presenta smallsampleof someof themoreinfluentialtaxonomiesof writing systems;a morebalancedreview canbefoundin (Coulmas,1994)(andseealso(DeFrancis,1989,pages56â64)).
4.1.1 Gelb
Gelbâs taxonomyof writing systemsis generallyviewed asthe startingpoint for allsubsequenttaxonomies.Gelbâs purposein his classificationwaslargely teleological:heviewedthesegmentalphonographicalphabetastheevolutionaryhighpointof writ-ing systems,andall otherwriting systemscouldbeviewedasfalling on a continuumfrom pictographicnon-writingto alphabeticwriting. Thusa linearpresentationseem-s mostappropriatefor Gelbâs taxonomy, andthis is what is presentedin Figure4.1.Note thatGelbclassifiedMayanwriting amonghis âlimited-systemsâsubcategory oftheforerunnersof writing; of course,this is now known to befalse,sinceMayanwrit-ing is a full writing systemcontainingboth logographicandphonographicelements;see(Macri, 1996),inter alia. Thereis alsogeneraldisagreementwith Gelbâs classifi-cationof theconsonantalSemiticwriting systemsassyllabic.
4.1.2 Sampson
A lessteleologicalview of writing is presentedin (Sampson,1985); seeFigure4.2.Oneimportantinnovationof Sampsonâssystemis theprimarydivisionbetweenâglot-
4.1. TAXONOMIES 129
Fo
rerun
ners o
f writin
g: p
ictog
raph
s and
oth
er "limited
systems"
Wo
rd syllab
ic systems
Syllab
ic writin
g
Th
e Alp
hab
et
Mayan
Sum
erian, Chinese
Aegean syllabaries, K
ana, West S
emitic
Greek, etc.
Figure4.1: Thetaxonomyof Gelb(1963),alongwith examplesof writing systemsthatbelongto eachcase.
130 CHAPTER4. LINGUISTIC ELEMENTS
SemasiographicQ Systems
GlottographicR Systems
Segmental(Alphabetic)
Logographic Systems
SyllabicQ
ConsonantalS
Featural
Yukaghir [?]T
ChineseU
Linear B West SemiticV
GreekW
Hankul
Phonographic Systems
Figure4.2: Thetaxonomyof Sampson(1985).
tographicâwriting, wherethe symbolsrepresentlinguistic elements;andâsemasio-graphicâwriting, wherethesymbolsrepresentconcepts,but provide no specificationof a linguistic form to expressthoseconcepts. Sampson(pages28â29)tentativelypresentsas an exampleof semasiographicwriting a Yukaghir âlove letterâ, and hegivesa few otherinstancesaswell â for examplea setof pictographicinstructionsfor startingaFordcar(page30). But on thewhole,theevidencefor semasiographyasa viable category of writing systemsis tenuous.Indeed,Sampsonâs primary interestappearsto beto suggestmerelythata fully communicativesystemof semasiographymight in principlebepossible,ratherthanto arguethatsuchasystemhaseverexisted.
Among glottographicsystemsSampsontakes the relatively traditionalview thatChinesewriting is logographic.This is because,in hisview, Chinesecharactersdonotencodephonologicalinformation: rather, he claims,Chinesecharactersdirectly rep-resentmorphemes,sothattheChinesereadermustsimply learnwhich charactergoeswith which morpheme,andultimatelywith which pronunciation.This is of coursea
4.1. TAXONOMIES 131
fairly standarddefinitionof logography.1 It differsfrom theview thatweareassuminghereâ seeSection1.2.2andthediscussionbelow in Section4.2â in thatweviewanycomponentof a writing systemashavinga logographicfunctionif it formallyencodesa portion of non-phonological linguistic structure, whetherit bea wholemorpheme,or merelysomesemanticportionof thatmorpheme.Thelatterkind of encodingmightperhapsbecalledâsemasiographicâ,but for variousreasonsI preferto avoid thatterm.
In additionto adoptingthecommonview of Chinesewriting, Sampsonalsotakesthe more innovative (andcontroversial)position that KoreanHankul is featural,anissuethatwe will returnto below.
4.1.3 DeFrancis
DeFrancis(1989)takesissuewith anumberof Sampsonâsclaims.
4.1.3.1 No full writing systemis semasiographic
First, andmostimportantly, hearguesagainsttheexistenceor eventhepossibilityofsemasiographicwriting. His majorattackconsistsof demonstratingthattheYukaghirâlove letterâ, thatSampsonciteswasnot aninstanceof a systemof written communi-cationatall, but ratherapropin akind of âparty gameâwhichwasneverintendedto beunderstoodby a reader, but ratherwasinterpretedfor othersby theauthor(pages24â35). Still, asSampson(1994)correctlyobserves,theargumentis not really fair: eventhoughtheYukaghirâletterâ turnsoutnot to beaninstanceof semasiographicwriting,DeFrancisignorestheotherinstancesof semasiographicwriting (e.g. theiconic Fordinstructionmanual)that Sampson(1985)hadpreviously discussed:indeedDeFran-cis cannotdeny the existenceof iconic symbologythat communicatesideaswithoutrecourseto representingany specificallylinguistic information. On the otherhand,DeFrancisis correctin observingthat suchsystemsarealwayslimited in what theyarecapableof expressing:nobodyhasshown theexistenceof a writing systemthat isentirelysemasiographic,relying on no linguistic basisin orderto communicateideas,andwhich allows peopleto write to oneanotheron any topic they choose.It seemsfair to saythat theburdenof proof is on thosewho would claim that semasiographicwriting is possibleto demonstratetheexistenceof suchasystem.For DeFrancis,then,all full writing is glottographic.
4.1.3.2 All full writing is phonographic
But DeFrancismakesan evenstrongerclaim: all full writing is largely phonograph-ic. A purely logographicsystemis, accordingto him, impossible. ThusSampsonâsclassificationof Chinesewriting as
logographicis incorrect.DeFrancisâbasicargumentis simple:thevastmajorityofChinesecharactersthathave beencreatedthroughouthistoryareso-calledsemantic-
1Note that the basicdivision amongglottographicsystemsbetweenlogographicsystemsand phono-graphicsystemscorrespondsexactly to what Haas (1983)termspleremicandcenemic, respectively. Seealso(Coulmas,1989).
132 CHAPTER4. LINGUISTIC ELEMENTS
phoneticcompounds,suchasthe characterX ÂĄ INSECT+CHAN § chan âcicadaâ dis-cussedin Section1.2.2whereoneelementin thecharactergivesahint of themeaning,and the other elementgives a hint at the pronunciation. The exact percentagede-pendsupon the sizeof the charactersetbeing considered:for the 9,353charactersthathadbeendevelopedup to the2ndcenturyAD, about82%of thecharactersweresemantic-phoneticcompounds;for theentiresetof 48,641charactersthatwererecord-ed by the 18th century, 97% were semantic-phoneticcompounds(DeFrancis,1989,page99), meaningthat essentiallyall of the characterscreatedbetweenthe 2nd and18th centurieswereof the semantic-phonetictype. No explicit estimateis given forthe percentageof suchcompoundsin the written vocabulary of the averageChinesereader(who canbeexpectedto know between5,000and7,000characters),but thereis no questionthat it will be thevastmajority. Thus,for DeFrancis,Chinesewritingis not primarily logographicat all, but whathetermsmorphosyllabic: it is basicallyaphonographicwriting system,with additionallogographicinformationencoded.
4.1.3.3 Hankul is not featural
Finally, hearguesthatKoreanHankul,while thereareclearlysome featuralaspectsthatwentinto its design,is basicallysegmental(pages186â200);notethatthis is alsothepositionof (King, 1996). Themajor reasonthatDeFrancisgives(andthis is alsoechoedin (Coulmas,1994))is thatKoreanchildrenlearningto readtypically mem-orizethesyllable-sizedgroupingsof elementsaswholes,andthatKoreanreadersarecertainlyunawareof the featuralrelationshipsbetweenthe symbols. This argument,it seemsto me,is rathershaky: what readersaretaughtor explicitly awareof in theirwriting systemis oftenat oddswith whata carefulanalysistells usis trueof thatsys-tem.For exampleFlesch(1981)citesseveralinstancesof Americanreadersof Englishtaughtreadingby theso-calledâwhole-wordâ method,whowereunawarethatEnglishwriting is basicallysegmentalwith particularlettersor combinationsrepresentingpar-ticular sounds.
Nevertheless,DeFrancisis probablycorrectin assertingthatHankul is basicallysegmental,ratherthan featural. To seethis, it is worth consideringa truly featuralscript,namelyBellâs âVisible Speechâ(Bell, 1867;MacMahon,1996),whichwasde-velopedfor useasa universal phoneticalphabet.The constructionof the individualglyphsin thesystemencodesarticulatoryfeaturesin a consistenticonic fashion.Forinstanceconsonantplaceof articulationis indicatedby orientationof thebasicconso-nantglyph(aC-shapedsymbolfor consonant): thus Y is /x/ with thebowl of theglyphpointingleftwards,indicatingaconstrictionatthebackof themouth;and Z is / [ /, withthe rightward-pointingbowl representinga bilabial constriction.Stops(closures)areindicatedby closingoff theopenpartof theglyph: thus \ is /k/ and ] is /p/. Voicingis indicatedby a bar that is iconic for the nearclosureof the glottis during voicing:thus ^ is /g/ and _ is /b/. Nasalityis indicatedby turninghalf of theclosurebarinto awavy line, which indicatestheloweredsoftpalate(MacMahon,1996,page838): thus`
is / ďż˝ / and a is /m/. Otherfeaturesaresimilarly indicatedin a consistentfashion.In contrast,Hankul is not by any meansasconsistentin its representationof fea-
tures. While somefeaturesare representedconsistentlyby propertiesof the script,
4.1. TAXONOMIES 133
Figure4.3: Featuralrepresentationof KoreanHankul, from (Sampson,1985,page124),Fig-ure19. (Presentedwith permissionof Routledge/StanfordUniversityPress.)
othersareonly inconsistentlyrepresented,andstill othersnot at all. Considertheba-sic Hankul segmentalelementspresentedin (Sampson,1985,page124), andshownherein Figure4.3. As we have notedcertainphonologicalfeaturesdo have a consis-tentrepresentationin theelementsof theHankulscript.Thus,for instance,thefeature(bundle)[ ďż˝ tense,ÂŚ aspirated](fifth row of Hankulsymbols)is representedby a dou-bling of thebasicsymbolusedfor thecorresponding(in termsof placeof articulation)
lax stop.Similarly, thesibilants(third column)have in commonthebasicsymbolÂĄ s§ , which representsa tooth (Sampson,1985,page125), or in otherwords is anindicationof theplaceof articulationof theconsonantsin question.Ontheotherhand,labiality is not consistentlyrepresented:/m/, /b/ and/p/ sharea commonshape(thesquarerepresentinga mouth),but /pb / is different. And somefeaturesarenot repre-sentedatall: thusthereis norepresentationof thefeaturenasal:/m/, /n/ and/s/appearto be on a par, involving the âbasicâ glyphsfor eachplaceof articulation. Thereissimilarly no consistentrepresentationof thefeature[voiced] (lax, in Sampsonâs clas-sification) for stops: the apicalandsibilant voicedelementshave an overbar, but inthecaseof /b/ we find not anoverbar, but anextensionof thetwo verticalsidesof thesquareof /m/; the overbarfor /g/, accordingto Sampson,representsthe roof of themouthtouchingthepalate,andthusis presumablynot thesameastheoverbarfor the
134 CHAPTER4. LINGUISTIC ELEMENTS
Syllabicc
PureSyllabic
Morphosyllabic
Segmentalc
Consonantald
Alphabetice
PureConsonantal
MorphoConsonantal
PurePhonemic
MorphoPhonemic
W. Semiticf
Egyptian Greek English,Korean
Chinese, Sumerian
kana, Cherokee
Figure4.4: DeFrancisâclassificationof writing systems.
apicalandsibilantglyphs. Finally, thereis no consistentrepresentationof aspiration:in somesymbols(/cb /, /h/) it seemsto be representedasa dot over the basicglyph(third row), andin othercases(/t b /, /k b /) it appearsasahorizontalbarinsidethebasicglyph.
Noneof this shouldbeinterpretedasdenigratingtheHankulscript: it is probablythe mostscientificscript in commonusetoday. There is no questionthat the basicdesignof Hankul is phoneticallymotivatedto a highly sophisticateddegree:afterall,many of the shapesare depictionsof modesof articulation,andare iconic in a waysimilar to BellâsVisible Speech.But it falls shortof being a featuralscriptin thewaythatVisibleSpeechis: ratherit is betterviewedasanintelligentlyconstructedsegmen-tal alphabet.Thisconclusionshouldnotbesurprising.After all, evenin unequivocallysegmentalsystems,onestill findssymbolsthatseemto encodeindividual features.Soin Russian,for instance,thesoft sign gKÂĄ â § markspalatalizedconsonants,andthusmight beviewedasencodingthefeature[ ďż˝ high] for consonants.ClearlyHankulhasmorefeaturalaspectsthanRussianorthography, yet it is probablymorea matterofdegreethanof kind.
Thethreeviewswe have just discussed,takentogether, leadDeFrancisto proposea classificationof truewriting systemsasdepictedin Figure4.4.
Somecommentsarein order. DeFrancisâbasicdivision is accordingto the typeof phonologicalunit represented:syllabicversussegmental,andwithin thelattercon-sonantal(representingmostlyor only consonants)versusalphabetic(representingallsegments).Within eachcategoryheassumestwo variants,namelyapurevariantwhereonly phonologicalinformation is representedin the system;and a morpho-variantwhere,additionally, morphologicalinformationis represented.Thisis obviousenoughfor Chinese,SumerianandEgyptian,whereeachof thesewriting systemshassemanticelementsthatrepresentmeaning-relatedpropertiesof morphemes.EnglishandKorean
4.1. TAXONOMIES 135
Hankularesimilarly classifiedsincein bothcasesthewriting systemsfail to befullyphonemic.We shall arguein the next subsectionthat this classificationrepresentsacategoryerror: EnglishandHankularenot on aparwith Chineseor Egyptian.
4.1.4 A newproposal
While I acceptseveral basicassumptionsof DeFrancisâclassificationscheme,therearenonethelessseveralareaswhereI feel his schemeis deficient.TheseI enumeratebelow:
1. Despitetheprominentpositiongivento syllabariesin DeFrancisâtaxonomy(aswell asmostotherschemes),it is importantto realizethat full syllabariesâthat is, systemswhereall syllablesof the languagearerepresentedby (oneormore)singlesymbolsâ areactuallyvery rare.Thoughscholarsof writing sys-temshave undoubtedlybeenawareof this point for a long while, it wasto myknowledgefirst madeexplicitly by Bill Poserin a presentationat theLinguisticSocietyof America in 1992. Poserâs basicpoint wassimple. In the majorityof systemsthatarecalledâsyllabicâ â amongthem,JapaneseKana,LinearB,thephonologicalcomponentof Sumerianwriting, thephonologicalcomponentof Mayanwriting â onedoesnot find a symbol for every full syllableof thelanguage.Instead,whatonefindsaresymbolsfor simpleâcoreâ (C)V syllables,possiblyaugmentedto includeonglides(CGV); morecomplex syllablesarerep-resentedin writing by combiningthecoresyllablesymbolswith symbolseitherrepresentingsinglephonemes,or elseothersymbolsrepresentingcoresyllables.A few exampleswill serve to illustratethepoint.h KanasymbolsrepresenteitherV, CV or CGV. To representa CVV syl-
lable, one must combinea basicCV symbol with a V symbol. Thus asyllable/nai/ would be representedas ÂĄ na§HÂĄ i § . Japaneseorthographymight thereforebedescribedasmoraic (cf. (Horodeck,1987,page33)).h In LinearB (Miller, 1994;Bennett,1996),symbolsrepresent(C)V, CGV,or in a few casesCCV. More complex syllablesare(partially) representedusingcombinationsof thesebasicunits. Thusthe(disyllabic)word /ksen-wion/ was representedas ÂĄ ke§HÂĄ se§HÂĄ ni §HÂĄ wi §HÂĄ jo § (with no repre-sentationof thefinal /n/), andthe(tetrasyllabic)word /mnasiwergos/wasrepresentedÂĄ ma§HÂĄ na§HÂĄ si §HÂĄ we§HÂĄ ko § (with no orthographicrepre-sentationof the/r/ andfinal /s/) (Miller, 1994,pages18â22).h In Sumerian,complex syllableswereoften representedby âsyllable tele-scopingâ(DeFrancis,1989,pages81â82),wherebya CV graphemicunitandaVC graphemicunit werecombinedto representaCVC phonologicalsyllable.Thus ÂĄ ki § + ÂĄ ir § would represent/kir/.
In onesense,of course,suchsystemsare syllabaries:the phonologicalunit-s representedby the simpleglyphsaregenerallywell-formedsyllablesof thelanguage.But this is misleading.Whenonespeaksof alphabeticsymbolsit is
136 CHAPTER4. LINGUISTIC ELEMENTS
taken for grantedthat therearesymbolsavailableto representevery phonemicsegmentof the language.So, the term syllabaryoughtto similarly imply thateverysyllableof thelanguagehasa graphemicsymbolassociatedwith it. Mostso-calledsyllabariesdo not meetthatrequirement.2 For lack of a betterterm,Iwill henceforthtermsuchsystemscoresyllabaries.3
Still full syllabariescertainlydo occur: Chineseis onesuchexample,thoughof courseit is not a purephonologicalsystem. Anotherexampleseemsto betheYi syllabary(DeFrancis,1989;Shi, 1996),which in its classicform wasamorphosyllabicsystemlikeChinese(andmayhavebeeninfluencedby Chinesewriting), but in its modernform â at leastthepopularstandardizedform thatisrecommendedby thegovernment(Shi, 1996,241), it is a purelyphonographicsyllabary. Yi syllable-structureis exceedinglysimple,with basicallyonly CVsyllables(including somediphthongs)allowed. However, thereare44 conso-nants(including the emptyonset),10 vowels and3 lexical tones,resultingina syllabaryof 819 charactersonceall legal C+V+Tonecombinationsarecon-sidered. Of coursethe complexity of the syllablestructurerepresentedby Yisyllabogramsis no more complex than thoserepresentedby typical coresyl-labaries:but in Yi eachdistinct full syllableis representedby a separateglyph,unlike,for example,thecasein Japanese.
Note that to saythat true syllabariesarerarerthanusuallysupposedis not, ofcourse,to deny the importanceof the syllable as an organizationalunit in agreatmany writing systems;we have noted(as have others)theimportanceofsyllablesin Hankul,DevanagariandPahawh Hmong,andmany othersystemscouldbe cited. We even arguedfor the importanceof syllablesin the Russianwriting system(Section3.5).
2. âMorphophonemicâsystemssuchasEnglishor Korean,areparallelin DeFran-cisâ taxonomyto morphosyllabicsystemslike Chineseor Sumerian,andmor-phoconsonantalsystemslike Egyptian.This is a categoryerror.
WhatmakesKoreanandEnglishlessthanfully âphonemicârelatesnot to whatis representedby thebasicsymbolsof thescript,but to thephonologicaldepthof what is represented,and the amountof lexical marking one must assume:in other words it relatesto the depthof the ORL, and other issuesdiscussedin Chapter3. This issuewasexplicitly discussedfor English in that chapter;relevantdiscussionon Koreancanbe found in (Sampson,1985,pages135ff.),who describesa set of rules to predict the actualsurfacepronunciationof astringof KoreanHankul,givenregular (morpho)phonological processesof thelanguage.As I alsodiscussedin Section3.2,it is particularlyamistaketo equate
2Of course,thisargumentis notentirelyfair, sincein many alphabeticsystemscertainphonemesareonlyrepresentedby combinationsof basicsymbols,suchasdigraphs:so/c/ in Spanishis only representablebyĂŞ chĂŤ and/ ďż˝ / and/ ' / in Englishareonly representableby ĂŞ th ĂŤ . But polygraphsaretypically theminorityin alphabeticsystems,and thereare many segmentalsystemsthat do not have polygraphs. In contrast,polygraphicrepresentationof complex syllablesin so-calledsyllabariesappearsto bethenorm.
3Fischer(1997a;1997b) termsthemâopensyllabariesâ,but this termis suboptimal:CVV syllablesareafterall âopenâ, thoughthey tendnot to berepresentedwith singlesymbolsin coresyllabaries.
4.1. TAXONOMIES 137
the lexical orthographicmarkingof English(e.g. the markedspellingof /n/ inknit) with the logographiccomponentsof Chinesewriting, an equationthat isimplicit in DeFrancisâclassification.
3. Calling Egyptianâconsonantalâ,andthusequatingit with Semiticwriting sys-temsobscuresoneuniquepropertyof Egyptian,namelytheexistenceof bi- andtriliterals, standingfor two andthreeconsonantsrespectively (Ritner, 1996). Infact thesemake up the majority of the system:âuniliteralâ symbolsconsistofonly about25 symbols;biliteralsabout80; andtriliterals 70. Egyptianwritingmight thereforebetterbedescribedasâpolyconsonantalâ.
4. While I acceptDeFrancisâbasichypothesisthatonecannotconstructa full writ-ing systemon completelylogographicprincipleswithout recourseto phonogra-phy, logographyis nonethelessan importantaspectof many writing systems,apointwhich noscholarwouldpresumablydeny.4
Ontheotherhand,asDeFrancishasargued,for awriting systemto beextensibleit musthave a robustphonographiccomponent:onecannotefficiently developwritten representationsfor neologismsif oneis restrictedto purelylogographicmeans.5 Thusnomatterhow largetheamountof logographyaparticularwritingsystemhas,logographyis clearly not on a par with phonography, andshouldthereforenot be representedaspartof thesamearborealtaxonomyasit is, forexample,in Sampsonâssystem.
Thelastpointmotivatesustoabandonthetraditionalarborealclassificationof writ-ingsystemsin favorof atwo-dimensionalarrangementwherethetypeof phonographyusedrepresentstheprimarydimensionandamountof logographyusedrepresentsthesecond.This schemeis representedin Figure4.5. Naturallythedegreeof logographyis tricky to estimateâ thoughI believe it canbeestimatedâ andthearrangementofparticularwriting systemsin thisseconddimensionis largely impressionistic.But it isimportantto realizethatall writing systemsprobablyhavesomedegreeof logography.Sowritten Englishcontainsnumeroussymbolsandletter sequencesthat canonly beconstruedlogographically: ÂĄ & § , ÂĄ lb § , ÂĄ $ § , arejust threeexamples.In thetaxon-omy, alphasyllabariessuchasDevanagari (Section2.3.2)areclassifiedasalphabets.Thestatusof âonset-rimeâscriptslikePahawh Hmong(Section2.3.3)is unclear:theyarealmostsegmental,but symbolsfor rimeslike / ��� / show themnot to be completelyso. Onemight considersettingup a specialcategory for suchscripts;in the currentschemeI classifyPahawh Hmongasfalling somewherebetweenalphabetsandcoresyllabaries.
4Indeed,asSampson(1994,page122) cogentlypointsout, logographyis in no way anomalousonceoneobservesthatâany naturallanguagehasunitsat many levels,andin particularthatall humanlanguagesexhibit a âdoublearticulationâ into units carryingmeaning,on theonehand,andphonologicalunits . . . onthe other. . . . It is at leastlogically possible,therefore,that a glottographicscript might assigndistinctivesymbolsto elementsof thefirst ratherthanof thesecondarticulation.â
5As weshallseein ourdiscussionof Japanesebelow, kokuji â Japanese-inventedChinesecharactersâareinstancesof purelylogographicconstructionsinventedto representwordsthatdid notpreviouslyhaveawritten representation.But therearenomorethanacoupleof hundredof these,whereasthenumberof newwordsthathave representationsin thekanacoresyllabarynumberin thethousands.
138 CHAPTER4. LINGUISTIC ELEMENTS
Alphabetici
Core Syllabic Syllabicj
Consonantal Polyconsonantal
Type of Phonographyk
Am
ou
nt o
f Lo
go
grap
hy
W. Semiticl
PersoâAramaic
Egyptian
Modern Yi
Chinese
Sumerian,Mayan,Japanese
Linear BEnglish,Greek,Korean,Devanagari
PahawhmHmong
Figure 4.5: A non-arborealclassificationof writing systems. On Perso-Aramaic,seeSec-tion 6.1.
Thereis of coursenoreasonto stopattwo dimensions,thoughit is moreconvenientto do so for simplicity of presentation.An additionaldimensionwould relateto thedepthof theORL, andothertopicsdiscussedin Chapter3; in this dimension,Englishand Koreanwould patterndifferently from, say, Greek.This is of coursethesensein which DeFrancismeantthat EnglishandKoreanare âmorphophonemicâ,unlikeGreek:but thedimensiononwhichthey differ is orthogonalto thedimensiononwhich,say, ChineseandModernYi differ.
Yetanotherdimensionwouldbethedegreeto whichcomplex planararrangementshave a significantfunction in a writing system,the topic of Chapter2: KoreanandDevanagariwouldpatterndifferentlyfrom Englishon this dimension(Faber, 1992).
4.1.5 Summary
We have suggesteda view of writing systemswherelogographyâ definedas thegraphicalencodingof non-phonologicallinguistic informationâ is anorthogonaldi-mensionfrom phonography:writing systemscanthusbeclassifiedminimally in a twodimensionalspaceaccordingto whattypesof phonologicalelementsareencoded,andto how muchlogographythey have. Encodedphonologicalelementsrepresenta rangeof possibilities,asis well known,but normallythemaximumsizeof suchelementsis acore-syllabicCV or VC unit: in particularrarelydoesonefind a purelyphonographicsystemthat representseachpossiblesyllableof the languagewith a distinctelement.Egyptianrepresentsanapparentlyuniquepolyconsonantalsystem.
Theorthogonalityof logographyandphonographyis entirely in keepingwith theformalmodelwepresentedin Section1.2.2: assumingthatlogographicelementsrep-resentinformationrelatedto theSYNSEMattribute,thenaswe previously observed,thelinguistic informationencodedby logographicelementsis not in ahierarchicalre-
4.2. CHINESEWRITING 139
lationshipwith informationencodedby phonographicelements.Thereforeit shouldinprinciplebepossiblefor writing systemsto selectdifferentmixesof logographicandphonographicencodings,andto exhibit both in thesamesystem.For differenttypesof phonographythis ability to mix is lessnatural: syllablesdominatesegment-sizedunits, andso if a systemis core-syllabicit is naturalfor it to choosemostor all ofits elementsat that level of thehierarchy;mixescouldhappen,but onein fact rarelyif ever finds systemsthat have botha largecollectionof graphemesthat denotecoresyllablesandanothercollectionof graphemesthatdenotesegments.Systemstendtochooseonephonologicallevel to encode.
DeFrancisâmainthesisis thatChineseis a writing systemthatis basicallyphono-graphic(asheclaimsareall writing systems),but with alargelogographiccomponent.The conclusionof the precedingparagraphthat suchsystemsareexpectedgiven theformal apparatusof Section1.2.2is of courseconsistentwith DeFrancisâthesis.Buttheformalmodelof Chinesesemantic-phoneticcharacterspresentedtherestill begsthequestionof whetherthereis compellingevidencethatChinesewriting really behavesthatway: doesit actuallybuy you anything to assumethat the ÂĄ INSECT § portionofchan âcicadaâ,encodesaportionof theSYNSEMfield, whereastheputatively phono-graphicportion ÂĄ CHAN § encodesphonologicalinformation? We will arguethat itdoes,bothin thenext section,andin Chapter5 whereweaddresspsycholinguisticevi-dencefor readersâonlineprocessingof characters.In thenext section,in particular, wewill arguenot only that thephonographicportionof Chinesecharactersplaysan im-portantrolein encodingphonologicalinformation,but thattheformalmodelpresentedearliermakesaninterestingpredictionabouttheencodingof disyllabicmorphemesinChinese.
4.2 ChineseWriting
The traditionalChineseclassificationof charactersdividestheminto six groups,theso-calledli u shu, or Six Categoriesof Characters.Of these,four relateto thestructuralpropertiesof thecharacters,andtwo to their usage(Wieger, 1965,page10). It is thestructuralpropertiesthatwill concernushere,thefour categoriesof interestbeing:h Pictographs(xiangxÄąng): e.g. n ÂĄ PERSON § ren âpersonâ, o ÂĄ TURTLE § guÄą
âturtleâ.h Indicative symbols(zhÄąshÄą): e.g. p ÂĄ DOWN § xia âdownwardsâ, q ÂĄ UP §shang âupwardsâ.h Semantic-semanticcompounds(huÄąyÄą): e.g. r ÂĄ FEMALE+CHILD § haoâgoodâ, s ÂĄ GRASS+FIELD § miao âsproutâ.h Semantic-phoneticcompounds(xÄąng sheng): e.g. X ÂĄ INSECT+CHAN § chanâcicadaâ, t ÂĄ TREE+XIANG § xiang âoakâ.
Thefirst threecasesarereasonablyuncontroversial:thereis noquestionthatthesethreegroupsof signs,which in total numberno morethanabout1,500in the largest
140 CHAPTER4. LINGUISTIC ELEMENTS
dictionary (DeFrancis,1989, page99), are logographs,without any representationof phoneticinformation. Of course,asCoulmas(1989,page50) notes,logographicsymbols,which theoreticallyshouldmapdirectly to anon-phonologicalportionof themorphologicallevel of representationdo, in the minds of skilled readers,also mapdirectly to phonologicalrepresentation:a skilledEnglishreader, for example,will un-consciouslymap ¥ lb § to /paUnd/, andequivalentfactshold for Chinese.6 Thecon-troversialcharactersareof coursethefourthcategory, thesemantic-phoneticgroup,thecategory thatDeFrancis insistsarebasicallyphonetic,whereasSampsonhasargued(andmany othershavemerelyassumed)arelogographic.
The problemwith the categorizationof the semantic-phoneticcategory revolvesaroundthefactthatthephonologicalinformationprovidedby thephoneticcomponentis sometimesperfect,frequentlyonly partial, and in somecasescompletelyuseless.An exampleof eachcaseis givenbelow:
Char. Analysis Phon.Component ActualPron. Glosst ÂĄ TREE+XIANG § u xiang(âelephantâ) xiang âoakâv ÂĄ BIRD+JI A § w ji a (âcuirassâ) ya âduckâx ÂĄ DOG+QING § y qÄąng (âgreenâ) cai âguessâ
Thedistribution of thesethreetypesis quiteskewed: therearea smallnumberofthefirst category (perfectmatch),a few of thelastcategory (completelyuseless),withmostfalling into thesecondcategory (somewhathelpful). Somephoneticcomponentsare in generalmoreuseful thanothers: for exampleall charactershaving z huang(âemperorâ)asaphoneticcomponenthavethepronunciationhuang, matchingthebasecharacterdown to thelevel of thetone;see(DeFrancis,1989,e.g.,pages102â103)fora rangeof otherexamples.7
Now, if thephoneticcomponentwerealwaysa perfectindicatorof thepronuncia-tion of thecharacter, thentherewould presumablybeno contention:everyonewouldagreethatmostChinesecharactersarebasicallyphoneticsymbols,with additionall-ogographicinformation(thesemanticcomponent).But becauseof the imperfectionsin therepresentationof thephonologicalinformation,mostauthorshaveassumedthat
6This is presumablythebasisof theuseof characterspurely for their pronunciation,a practicethathasbeenfollowed for centuriesto transliterateforeign words,whetherthey be Sanskrittermsfrom Buddhisttracts,or present-dayforeignnameslike {}|ďż˝~ kelÄąndun âClintonâ. In ModernChinese,theparticularcharactersthatareusedfor this kind of âphonetictranscriptionâareamore-or-lessclosedclass;see(Sproatet al.,1996)for somediscussion.SeealsoSection4.3for adiscussionof theequivalentJapaneseateji.
7What is the reasonfor inexact matches?In somecasesthe reasonis historical soundchange. Forexample,many characterswith thephoneticcomponentďż˝ xÄąng/hangarepronouncedxingor hang(Wieger,1965,page443)(asis thebasecharacter).Thesetwo syllablesaretheresultof ahistoricalsplit in Mandarin.Thereis no questionthat the phoneticcomponentsweremoreuseful in the pastthanthey arein ModernMandarin(andmay be moreusefuleven today in otherChineselanguages,suchasCantonese,thoughIhavenotseenaninvestigationof this topic). EvenSampson(1994)admitsthattheChinesesystemmayhavebeenamuchmorephonographicsystemat onetime. See(Baxter, 1992)for a comprehensive discussionofthephonologyof earlyChineseandits relationshipwith thephoneticcomponents.
In othercases,thehistoricalargumentis lessconvincing: ďż˝ ĂŞ WATER+X ING ĂŤ âoverflowâ is pronouncedyan, which presumablywasnever historically derivable from xÄąng/hang: presumablythis way of writingthecharacterwaschosensincethephoneticcomponentwasdeemedsimilarenoughto theintendedreading,andsinceďż˝ in its readingxÄąng meansâgoâ, it mayhave alsocontributedsomesemanticinformationto thecompositecharacter.
4.2. CHINESEWRITING 141
the systemis no longerphonographic.Although the comparisonwith Englishis notentirely fair (Englishorthographyis never asirregularasChinese),it is interestingtonote that preciselythe sameassumptionshave beenpopularly madeaboutEnglish.Indeed,themisguidedassumptionthatEnglishis not basicallya phoneticbut ratheralogographicwriting systemhashada significantimpacton theteachingof readingintheUnitedStates,aslamentedandattackedin (BloomfieldandBarnhart,1961;Flesch,1981); this assumptionstemsin largemeasurefrom the fact thatEnglishspellingisnot optimal for cuing the readerto the pronunciationof the word, meaningthat thespellingandpronunciationof somewordsmustsimply belearned.8
DeFrancisâargument,however, is not thatChinesewriting is agoodphonographicsystem: indeed,he stressesthat it is a lousy one. However, it is muchmoreusefulto view it asanimperfectphonographicsystemwith additionallogographicattributes,than it is to view it as a wholly logographicsystem. Apart from the distributionalreasonsthatDeFrancisdiscusses,thereareotherreasonsfor assumingthatChineseislargely phonographic,andthat in particularthe phonographicinformationresidesinthephoneticcomponent,whenthatis present.Amongthesereasons:h The evidencefor the psychologicalreality of the phoneticcomponent,asdis-
cussedin thenext chapter.h Thecommon-senseobservationthatChinesereaders,whenencounteringanun-familiar character, will attemptto guessits pronunciationfrom the phoneticcomponent.Indeed,with acompletelyunfamiliarcharacter, they havenochoicebut to adoptthis strategy. An instanceof this is thecharacterďż˝ ÂĄ FISH+XUE §xue âcodâ. Apparentlythis characterwasa Japaneseinvention,a kokuji (Sec-tion 4.3), wherethesecondelementďż˝ wasusednot for its pronunciationxue,but for its meaningâsnowâ (thefleshof cookedcodbeingsnowy white). Thusthecorrectanalysisfor Japanesewould be ÂĄ FISH+SNOW § , a typical semantic-semanticconstructioncommonin kokuji. Whenthis characterwasborrowedbackinto Chinese,Chinesereadersinterpretedthe ďż˝ componentasa phoneticcomponent,thusassigningthecharacterthepronunciationxue.h The developmentof somesimplified charactersin the Mainlandinvolving thesubstitutionof a differentphoneticcomponentfor theoneusedin thetradition-al script. Themainmotivation in charactersimplificationwasthereductionofthe numberof strokesneededto write the character, with the goal of makingChinesewriting easierto learn; see(DeFrancis,1984, inter alia). The major-ity of simplificationsinvolved stroke reductionsin componentsof characters,without actuallychangingthe componentsused. However a small percentageinvolved actually substitutingan easier-to-write componentâ usuallya pho-neticcomponentâ for a morecomplex traditionalcomponent.In suchcases,asubstitutedphoneticcomponentwasmoreoftenthatnotacloserphoneticmatch
8Sampson(1985), aswe have noted elsewhere,makes a similar assumptionaboutEnglishspelling,thoughit shouldbestressedthathecannotbeaccusedof the samekind of naivete asAmericaneducatorswhohave subscribedto thisview.
142 CHAPTER4. LINGUISTIC ELEMENTS
to thepronunciationof thewholecharacterthanthetraditionalcomponentit re-placed.For instance,a countof phonetic-componentsubstitutionsfrom thelistof simplified/traditionalcharacterpairsin onedictionary(NanyangSiangPau,1984) revealed74 characters(64%) wherethe pronunciationof the substitut-ed phoneticcomponentis a closer, or at leastasclosea matchto the pronun-ciation of the whole characterthan that of the traditional component,and42(36%) wherethe substitutedcomponentis actuallya worsematch. Thus, �¥ GOING+ ZHUI § j Äąn âenterâ hasbeenreplacedin thesimplified scriptbyÂĄ GOING+JING § .9 Similarly, traditional ďż˝ ÂĄ EARTH+GUI § kuai âlumpâ has
beenreplacedby ÂĄ EARTH+GUAI § . An instancewherethepronunciationof the substitutedcomponentis worseis ďż˝ ÂĄ FORCE+ZHONG § dong âmoveâ,
wherethe (lefthand)phoneticcomponent� zhong, is replacedto form¥ FORCE+YUN § .
Given theseobservations,it makessenseto assumeaswe have that the phoneticcomponentis licensedby thephonologicalinformationof thesyllablethatit encodes.Whatthenof thesemanticcomponent,which we have assumedis licensedseparatelyby a portionof theSEM attributeâs value?For X ÂĄ INSECT+CHAN § chan âcicadaâ,we hadassumedanAVM asin (1.8), repeatedhereas(4.1); andanannotationgraphasin (1.10),repeatedhereas(4.2).
(4.1) �����������PHON
��� SYL
�� SEG ��� ONS ����� � RIME �� �9�TONE � �� ��������
SYNSEM � CAT �������SEM ���?���¥ ¥�£¢ ��¤
ORTH ¼�Œ ¢�§ ¨ �£Š� ����������
(4.2)
SEM: cicada: ŒTONE: 2SYL: ª : ¨ONS-RIME: ch an
Thetwo licensingcomponentsin (4.2)overlap,andAxiom 1.3 tells usthat they mustcatenatewith eachother: thespecificcatenationoperatorchosenis (in this case)pre-dictableby rule, following thediscussionin Section2.3.4.
9Notethatthesyllablesjin andjing arehomophonousfor many Mandarinspeakers.
4.2. CHINESEWRITING 143
Note,however, thatsincemostChinesemorphemesaremonosyllabic(DeFrancis,1984),it would seemhardto distinguishbetweenthesomewhatelaboratetheorypre-sentedhere,andtheseeminglysimplertheorythatstatesthatthephoneticcomponent¨ CHAN  , is indeedlicensedby the syllable,but that the semanticcomponentŒ  INSECT  is simply someexcessbaggagethathappensto beassociatedwith thesyllablefor this particularword. In otherwordswe would like to distinguishour pro-posalfrom the alternative theorythatstatesthat theso-calledsemanticcomponentisnot licensedby thesemanticportionof theAVM at all.
Crucial evidencecomesfrom the orthographicrepresentationof disyllabic mor-phemes.Somewell-known examplesof disyllabicmorphemesincludehudie âbutter-flyâ, putao âgrapeâandbÄąnlangâbetelâ: asfarashistoricalrecordsallow usto determinethesewordsdo not derive from morphologicallycomplex formsandthereis certainlyno synchronicevidenceof morphologicalcomplexity. While varioussourcesdiscussdisyllabicmorphemes,onerarelygetsa clearsenseof how many of thesemorphemesthereare:DeFrancis(1984;1989),for instance,only discussesa few suchcases,andsuchcursorytreatmentis thenorm.10 In factdisyllabicmorphemesprobablynumberaroundahundred:74arelistedin Tables4.1and4.2,andthisby nomeansacompletelist.
Beforewe considerthoselists, however, let usseewhat thetheorypredictsabouttheorthographicrepresention.ConsiderbÄąnlang âbetelnutâ which is written usingtheÂŤ TREE  radical ÂŽ andphoneticcomponents and ° representing,respectively,thetwo syllablesbÄąn andlang, andthuscouldbetranscribedas ÂŤ TREE+BINLANG  .The two phoneticsymbolsareobviously licensedby the individual syllables. Giventhat the SYNSEM attribute is associatedwith the entiremorphemeratherthanwiththe individual syllables,the theorystatesthat the ÂŤ TREE  radicalis associatedwiththewholemorphemeratherthanwith theindividualsyllables.TheAVM for thismor-phemewould beasin (4.3)andtheannotationgraphasin (4.4):
(4.3) ¹²²²²²²²²²³PHON
¹²³SYLS ´
¹³ SEG ¾�œ ONS ¡9¸ Âś RIME ďż˝?š�¸9ďż˝TONE Âş �� �� ¹³ SEG ¾Ÿœ ONS ½ž¸ Âś RIME ¿¥š øĂďż˝
TONE � �� ¢ ��à ����SYNSEM � CAT š�����š
SEM ¡�Ăďż˝ĂĂ Ă�½Ăà �äORTH ¼�Ž Ă�§ ¼�¯ ďż˝ ° ¢ Š�Š
������������10Plausibly, thereasonfor therelativeneglectof disyllabicmorphemescomesfromthefactthattraditional
Chinesedictionariesareorganizedaroundmonosyllabiccharacters,notaroundwordsor morphemes.Sincemeaningsaretraditionally listed in thedictionaryasentriesfor thesecharacters,this obscuresthe fact thatmany charactersarein factmeaninglessunlesscombinedwith aspecificsecondcharacter.
144 CHAPTER4. LINGUISTIC ELEMENTS
(4.4)
SEM: betel: ŽTONE: 1 2SYL: ª : ¯ ª : °ONS-RIME: b in l ang
Given the representationin (4.4), how are the semantic and phoneticcomponentsto be realizedrelative to oneanother?First observe that both the following overlapstatementsaretrue,where ª�à and ªŸà arethefirst andsecondsyllables:
1. betelĂĂÂŞďż˝Ă2. betelĂĂÂŞ Ă
From Axiom 1.3 we would thenexpect that Ă�à betelĂ mustcatenatewith Ă�ݪ Ă Ă , andthat Ă#Ă betelĂ mustalsocatenatewith Ă#Ă7ÂŞ Ă Ă ; in otherwords,thesemanticradicalmustshow up on bothcomponentsof thethewritten representationof themorpheme.Thispredictionis correct: the written form is ĂĂĂ , with the ÂŤ TREE  componentÂŽ onbothcomponents.
Of coursein additionto overlappingwith theindividualsyllables,it is alsotruethatthe semanticinformationbetel, overlapswith the whole phonologicalword bÄąnlang.In that casewe would say that betelĂ bÄąnlang and we might then expect to find awritten form suchas* ĂĂ° , with the ÂŤ TREE  componentcatenatedto the left ofthe pair of syllablesâ thusin effect showing up on the first component.Note thatthis obeys the catenationformula Ă#Ă betelĂĂĂďż˝Ă#Ă bÄąnlangĂ , which canbe derived frombetelĂ bÄąnlang from Axiom 1.3. However, recall that the combinationof semanticandphoneticradicalsin Chinesefrequentlyinvolvesa differentcatenationoperatorfrom the macroscopicoperator. In generala formula suchas Ă#Ă betelĂĂĂĂ�à bÄąnlangĂwouldrequirechoosingadifferentoperatorfrom themacroscopicoperatorat thewordlevel. This in turn violates the constraintthat the SLU is the syllable in Chinese;seeSection2.3.4. Thereforetheonly optiongiventhe theoryandthewriting-systemparticularconstraintsis to duplicatethesemanticelementacrossbothsyllables.Thisduplicationhasbeennotedelsewherein theliteratureâ e.g.in (DeFrancis,1989;LawandCaramazza,1995,amongothers)â but never to my knowledgeexplained.
Theduplicationof semanticradicalsacrossbothcomponentsof a disyllabicmor-phemepredictedby the theoryholdsquite generally. Oneproblem,of course,is todefinewhatonemeansby a disyllabicmorpheme:onecannotin generalrely on dic-tionaries,sincestandardChinesedictionariesarearrangedby character, andunderthemulticharactersubentriesrarelydistinguishâcompoundsâof two or morecompoundsthataremorphologicallysimplex from compoundsthataremorphologicallycomplex.Onecanhowevercollectdisyllabicmorphemesfrom corpora,if we make thereason-ableassumptionthat a disyllabic morphemeshouldconsistof two elementsthat arestatisticallyhighly associatedwith eachother, but not with anything else. Two such
4.2. CHINESEWRITING 145
lists arepresentedhere,basedon 20 million charactersof Chinesetext.11 The firstlist, presentedin Table4.1,consistsof pairsof charactersthatoccurat leasttwice andonly cooccurwith eachother: i.e. thetokencountof thepair is identicalto thetokencountsof theindividualcharacters.Thesecondlist, Table4.2,consistsof otherpairsofcharactersnot in thefirst list, which havehigh mutualinformation(Fano,1961).12 Inthis secondtable,at leastoneof the componentcharacterscanoccurelsewhere. Insomecasesthis is purely an orthographicfact: for instance,the Ă ka of Ă}Ă kafeiâcoffeeâ is alsousedto write ĂĂĂ galÄą âcurryâ. In othercases,aswith the Ă dieof ĂĂ hudie, whatwasoriginally partof adisyllabicmorphemehastakenona âlife of itsownâ, andcanbeusednowadaysin new derivatives(aswell asin personalnames);see(SproatandShih,1995) for somediscussion.But what is clearaboutall thesecasesis thatbothon thebasisof distributionalevidence,andon thebasisof native-speakerjudgments,thesepairsof charactersall seemto form singleindivisibleunits.Themoststriking featureof theselists, of course,is the predictedduplicationof the semanticradicalsin eachof thecases.
Beyonddistributionalevidence,suchexamplesreflectwhatis in facta verystrongintuition of Chinesereadersthat charactersthatbelongto suchrelatively inseparableconstructionsmust be written with the sameradical. Indeed,it is possibleto tracethehistoryof someof thesecases,andshow that thesemanticradicalsharedby bothcharacterswasaddedin accordancewith this âsame-radicalconstraintâ.For instance,ĂĂĂ ÂŤ STEP+SHANGYANG Â changyangâroamleisurelyâhasin thepastbeenwrittenin variousways, including ĂBĂ (changyang, the literal interpretationof âeverydaysheepâbeingirrelevant)(Ci Hai, 1979).Over time, thesame-radicalconstraintforcedtheword into its presentshape.
It is worth notingat this juncturethatonealsofindsa few pairsof disyllabicmor-phemeswith identicalpronunciationsandphoneticcomponents,differing only in thesemanticcomponents.The meaningdifferencesmay be subtle,or stark,but in anycasethereis a strongsensethatoneis dealingwith differentsensesof thesameword,with different spellingsfor the different senses.One pair where the semanticdif-ferenceis ratherstarkis ĂĄ}â ÂŤ TREE+PI ĂKĂŁ"ä9ĂĂĂ(ĂĽ!ĂĽĂĂŚ ĂŁ"Ăèç�Ê�ã"ĂĂ ĂŁ BA  pÄąpa âloquatâ and ĂŞĂŤ ÂŤ MUSIC+PI ĂKĂŁ"ä9ĂĂĂ(ĂĽ!ĂĽĂĂŚ ĂŁ"Ăèç�Ê�ã�ĂKĂŁ BA  pÄąpa âChineseluteâ. But despitethe differencein meaning,the wordsare clearly related: the Chineselute (or âpipaâ) is a loquat-shapedinstrument.A moresubtlepair is ĂŹĂĂ ÂŤ MOUNTAIN+QIQU  qÄąqu andÂŤ FOOT+QIQU  qÄąqu, bothwith thesenseof âruggedâ, thoughwith theformerseem-ing to emphasizethe terrain,andthe latter the journey or pathover a ruggedterrain.A crucial point aboutsuchcasesis that while thereis somefreedomto choosethesemanticradical,mixing thesemanticradicalsacrossthe two charactersresultsin anill-formed result: Î�ïĂĂ ÂŤ FOOT+QI MOUNTAIN+JU  is not a possibleway to write
11Thisin turnconsistsof the10-million characterROCLING (R.O.C.ComputationalLinguisticsSociety)corpus,plustenmillion charactersof anothercorpusfrom UnitedInformatics,Inc.
12Mutual information is definedfor two events Ă°!Ăą and ðèò , andtheir cooccurrenceĂ°!ùà ðèò , asfollows:óþôáÜÚøĂĂş ò�Ýýß"ĂžKÿ�� ďż˝ ďż˝ ďż˝ ďż˝ ¢��ß!ĂžKÿ�� ��� ďż˝ ďż˝ Ăź"ĂžKÿ�� ��� ¢���Herewe estimatetheprobabilitiesby themaximumlikelihoodestimate,5Ăť Ă° ���� ݞð ����� , where ݞð ďż˝ is
thefrequency of Ă° and ďż˝ is thesizeof thecorpus.Mutual informationhasbeenusedin many computationallinguistic applicationsfor computingthe strengthof associationbetweenlexical items: see,e.g. (ChurchandHanks,1989), andalso(SproatandShih,1990)for anapplicationto Chineseword segmentation.
146 CHAPTER4. LINGUISTIC ELEMENTS
thisword. Clearlytheseexamplesareconsistentwith amodelwherethesemanticrad-ical is a propertyof themorpheme:it is bothindependentof thephoneticcomponentin thatthereis somefreedomto choosealternative radicalswhile keepingthephonet-ics constant,while at thesametime thesameradicalmustbeusedfor bothcharactersusedto write thesyllablesof themorpheme.
While theappropriaterepresentationof xÄąngshengcharactersseemsclear, weneedto saysomethingabouttherepresentationof theotherclassesâ pictographs,indica-tive symbolsandsemantic-semanticcompoundsâ which formally lack a semanticor phoneticcomponent.Taking asour examplethe pictographďż˝ ren âpersonâ, wewill assumea representationsuchasthat in (4.5),wherethegraphemeis licensedbytheSEM entry. Froma purely formal point of view ďż˝ ren is a logograph:it doesnotstrictly speakingencodephonologicalinformation.Nonetheless,aswe have observedabove, even in suchcasesof formal logography, skilled readersphonologicallyâre-codeâ the symbol(Coulmas,1989),which in turn suggestsa second licensingfromthephonologicalportionof theAVM. Both theselicensingsareindicatedhere:
(4.5)¹²²²²²²²²²³PHON
¹²³SYL
¹³ SEG ¾Ÿœ ONS ��¸ Âś RIME Ă�š�¸ ďż˝TONE ďż˝
�� ���� � �SYNSEM � CAT š�����š
SEM ������¿¥ ¥¿ ¢ � ¤ORTH ¼�� ��� ¢ Š
������������4.3 JapaneseWriting
Treatmentsof writing systemsneedto saysomethingaboutJapanesewriting, widelyconsideredto be the mostcomplex writing systemin usetoday. I thereforebrieflytreatJapanesewriting here,andin Section5.2.2. In this sectionI provide aninformaltreatmentof the degreeto which Japanesewriting is logographic,a sizeableboneofcontentionamongscholarsof writing systems.In Section5.2.2I discusspsycholin-guisticevidencethat,no matterhow formally logographicJapanesemaybe,Japanesereadersand writers seemto treat it as a phonographicsystem: that is, the kind ofphonologicalrecodingsuggestedby Coulmas(1989) doesindeedseemto occur inJapanese(andalso in Chineseaswewill arguein Section5.2.1).
Japaneseapparentlyexhibits the most most extensive useof logographyof anymodernwriting system.Thereasonsfor this arewell-documentedelsewhereandwillonly bebriefly sketchedhere;see(Sampson,1985;DeFrancis,1989;Sampson,1994,amongothers). When the JapaneseadaptedChinesewriting for representingtheirown language,they experimentedwith variouswaysof usingChinesecharacters(kan-ji ) to representJapanesewords. Oneway wasto useChinesecharactersto representJapanesewordswith roughly the samemeaningbut (of course)with quite differentpronunciations:thus ďż˝ âpersonâwould beusedto representhito âpersonâ (Mandarin
4.3. JAPANESEWRITING 147
ren) and ďż˝ âfishâ would be usedto representsakanaâfishâ (Mandarinyu). Charac-ters usedin this way to representnative Japanesewords are said to have their kun(âinstructionâ) reading. In somecasesa sequenceof characterswith the appropriateinterpretationwould be usedto representa monomorphemicJapaneseword: thus ��
for itsu âwhenâ (MandarinheshÄą), or ďż˝! anata âyouâ (MandaringuÄąfang). Inthesecasesâ termedjukujikunâ thecharactersequencebehavesin effectasasinglecomplex character.
Right from the start,Chinesecharacterswerealsousedphoneticallyto representJapanesesyllables,andin thisusagethecharacterstookonpronunciationscorrespond-ing (roughly) to their pronunciationin Chinese.Thus, " (Mandarinnai) would beusedto representthe Japanesesyllableno (Sampson,1985,page175). Over time, aconventionalsetof theseso-calledmanyoâgana, reducedin form, evolvedinto thet-wo modernkanacoresyllabaries.But in earlyJapanesetextsonefoundadmixturesofChinesecharactersfunctioningasphoneticelementsalongwith charactersto bereadwith a kunpronunciation.
To addto thecomplexity, Japaneseborrowednot only Chinesewriting, but alsoalargenumberof Chinesevocabularyitems.Naturallythesewerewrittenasthey wouldbe in Chinese,andthey werealsopronouncedapproximatelyasin Chinese.Signifi-cantly, suchSino-Japanesereadingsarecalledon meaningâsoundâ (Chinese# yÄąn);theinherentâsoundâ of a characteris its Chinesepronunciation,andthis is consisten-t with the useof the Chinesepronunciationin manyoâgana. As is well-documentedin Sampsonâs (1985)discussion,furthercomplicationsarosedueto thefact thatChi-nesevocabularywasborrowedinto Japaneseatdifferenttimesandfrom differentpart-s of China,resultingin variousâChineseâpronunciationsfor many characters.It isnot unusualfor a characterto have, in additionto a kun pronunciation,threeor fourdistinct on pronunciations.Typically thesedifferentpronunciationsarerestrictedto
differentwords:thus $ âdefiniteâ is pronouncedtei in teika âfix edpriceâ (Man-darin dÄąngjia), but as jo in $&% joren âregular customerâ(which would be dÄąnglianin Mandarinif this werea Chineseword). Suchfactsguaranteethat for mostChinesecharacters,the Japanesereaderhaslittle choiceother thanto memorizethe associa-tion betweenlexical entriesandtheir written form, with little or no usefulrecoursetophonologicalinformation:in otherwordsthesecharactersarelogographic.
Thelogographicnatureof Chinesecharacters,asthey areusedin Japanese,is un-derscoredin a differentway by kokuji (literally âdomesticcharactersâ),Chinesechar-actersthat were inventedin Japanto representJapanesewords (LehmanandFaust,1951; Coulmas,1989; Daniels and Bright, 1996). A sampleof theseis givenin Table 4.3. The most striking featureof kokuji is the overwhelmingprevalenceof semantic-semanticconstructions,and the relatively small numberof semantic-phoneticconstructs;particularly rarearesemantic-phoneticconstructsinvolving na-tive kunpronunciations.Alexanderâscompilationof kokuji (reportedin (LehmanandFaust,1951))includes249examplesfor whichapproximately184haveaclearetymol-
ogy, andarenot simply contractionsof multicharacterexpressions.(For instancejinrikisha ârickshawâ is clearlyderivedfrom thecharactersequenceďż˝('*) jin riki sha(humanpower car).) Of these,72%aresemantic-semanticconstructs.Theremainder
148 CHAPTER4. LINGUISTIC ELEMENTS
aresemantic-phoneticcompounds,with 20% basedon an on pronunciation,andtheremaining8% on a kun pronunciation. Someexamplesof eachof thesecategoriescanbeseenin Table4.3.13 The prevalenceof semantic-semanticformationsamongJapanesecharacterinnovationsis striking in that it sostronglycontrastswith thesitu-ationin Chinese:there,aswe havealreadynoted, semantic-phoneticconstructswereoverwhelminglythepreferredmeansof forming new characters.TheJapanesesitua-tion alsocontrastedwith anotherChinese-basedscript,namelytheChuâ Nomwritingsystemof Vietnam.Exclusively Vietnamesecharacterinnovationswerefoundin ChuâNom, but thesewereapparentlyall semantic-phoneticconstructions(Nguyen,1959).
A coupleof factorsmightseemto explainthelow percentageof semantic-phoneticconstructsin kokuji. Both of theseexplanationsdependupon the observation thatmany of thekokuji wereinventedto write wordsthathave only a kun pronunciation.Howeverasweshallsee,neitherof theseexplanationsreally work.
Thefirst ideais that themostobvioussourceof thephoneticcomponentfor kun-only characterswouldbeacharacterwith thesameor similarkunpronunciationastheintendedtarget.But sincetheâsoundâ (on) of aChinesecharacteris its Sino-Japanesepronunciation(or pronunciations),usinga kun pronunciationin this way might havebeendisfavored. However, aswe have noted,8% of the kokuji were formedin thisway, so therecannothave beenan absoluteprohibition on usingkun pronunciationsin phoneticcomponents.Furthermore,sucha prohibition would not have ruled outthemorewidespreaduseof on pronunciationsto representbothon (20%of thecases)aswell askun pronunciations.Indeed,someinstancesof the latter typedo occur, as
in ÂŤ FISH+KI Â kisu (kun) âsillagoâ (Alexanderâs entry 230), wherethe phoneticcomponent+ ki is anon reading.
Thesecondpotentialexplanationrelatesto the lengthof kun pronunciations.InChinese,andalsoin Vietnamese,charactersalmostexclusively representsinglesylla-bles.Giventherelatively simplesyllablestructuresof theselanguages,thereis a highdegreeof homophony. Thusin inventinganew characterto representamorphemewithagivenpronunciation,thereareusuallymany identicallyor similarly pronouncedchar-actersto choosefrom to actasaphoneticcomponent.Kunpronunciationsin Japanese,in contrast,areoften polysyllabic(three-syllablenativemorphemesarenot unusual),andthereforethe degreeof potentialhomophony is reduced.Again theremay be agrainof truth to this explanation,but it cannotrepresentmorethana tendency. Poly-syllabic homophonesdo exist in Japanese,andthis fact is apparentlymadeuseof in
forming someof thekokuji: see,for instance, ÂŤ CLOTHING+YUKI Â yuki âsleevelengthâ in Table4.3 is homophonouswith (amongotherthings)yuki âgoâ (written ,), andthis is takenadvantageof in forming this character. Secondly, thereseemsto beno requirementin generalthat the homophony be particularlyclose. Thusthe 214th
entryin Alexanderâs(1951)list is ÂŤ FISH+HASHI Â subashiri âyoungof grey mul-letâ, which is apparentlyderivedusing - hashiârunâ asa phonetic;herethetarget is
13Note, that the semanticelementsusedin kokuji do not always correspond to traditional semanticelementsin Chinese:thus,for instance,$ âdefiniteâ is nota traditionalsemanticelement,thoughit is usedassuchin thecharacterfor shikaâclearlyâ, in Table4.3.
4.3. JAPANESEWRITING 149
quitedifferentin pronunciation(evenallowing for thewell-known /h/. /b/ alternationin Japanese)from thatof thephoneticcomponent.
Theonly reasonableguessasto thehighincidenceof semantic-semantickokuji, inmy view, is thattherewassimplyapreferenceamongtheusersof theJapanesewritingsystemfor creatingthesekindsof âvisual punsâ.Thismayin partbedueto thefactthatthroughoutmuchof historywriting wasaneliteskill in Japan(asin muchof therestoftheworld) (Sampson,1985)andthepeoplewhopossessedthatskill hadtimefor whatmaybeviewedaspractically-orientedlanguagegames.But thespreadof literacy hasby no meanskilled this kind of creativity: Alexanderexplicitly excludesfrom discus-
sionmorerecentwidely-known formationslike ÂŤ FEMALE+UP+DOWN Â erebetagaru âelevatorgirlâ preciselybecausethey arepunsandarenot seriouslyconsideredpart of the writing system.But the differencein kind betweenthis exampleandthe
genuinekokuji ÂŤ MOUNTAIN+UP+DOWN Â touge âmountainpassâis in factmin-imal.
As a logographicsystemâ or moreproperly, a logographicsubsetof a writingsystemâ semantic-semantickokuji exemplify thecreative limits of logography. Butwhatdo they tell usaboutthenatureof Japanesewriting?
And secondlywhat do they tell us aboutthe possibility of developingan entirewriting systembasedon logographyâ somethingthat Sampson(1985), it will berecalled,claimsexistsalreadyin thecaseof Chinese?
In answerto thefirst question,aswe notedin the introductionto this section,theamountof logographythatJapanesereadersmustfaceis large,morethanin any othermodernwriting system.But it is alsoclearthatthispercentagehasbeenonthedeclinewithin the last century, as the useof the systemmoved out of the circle of literatiinto the generalpopulation;as Smith (1996, page210) notes,the useof kanji in awide varietyof functionshasdeclinedsteadilythroughoutthe20thcentury. With thedecreasein theuseof kanji, therehasbeen necessarilya concomitantincreasein theuseof thephonologically-basedkanascripts.Japanesewriting hasalwaysinvolvedamixture of logographicandphoneticelements;it is, andalwayshasbeen,a âmixedscriptâ, asSampson(1985)termsit, onewherethereis a large logographiccore,butwherephonologically-baseddevicesareavailable,andwidely andproductively used.Themix hassimply shiftedmoreandmoreto thephonologically-basedmethods.
Over andabove this onemustmake a cleardistinctionbetweenthepurely formalcharacterizationof the script andhow the script is actuallyusedby fluent readersofJapanese.Largenumbersof logographicelementsclearlyexist in Japanese,but recallthat even logographicelementscanbe recodedso as to representphonologicalele-mentsdirectly, aswe discussedin Section4.2. Thuswe would assumethat thekokuji/
tara âcodâ hasa representationlike thatof Chineseďż˝ ren âpersonâ in (4.5),givenin (4.6)below:
150 CHAPTER4. LINGUISTIC ELEMENTS
(4.6)¹²²²²³ PHON 07ĂĂ Âż1ďż˝ĂÂż32 ďż˝ ďż˝SYNSEM ďż˝ CAT š�����š
SEM ���à ¢ ��¤ORTH ¼ / �4� ¢ Š
ďż˝ �����Therearetwo kindsof evidencethat this hashappenedin Japanese.First of all, thereis psycholinguisticevidencefrom Horodeck(1987) and Matsunaga(1994) demon-stratingthat readersof Japaneseaccessphonological representationswhenthey readkanji; this evidencewill bediscussedin Section5.2. Secondly, kanji (like charactersin Chinese)maybeusedpurely for their phonologicalvalue,ignoring their semanticvalue: in this usagethey arecalledateji. (Of coursethis is preciselytheway in whichthey were usedin early Japanesemanyoâgana.) An exampleis 5!6 kohÄą âcoffeeâ(Smith,1996,page210),wherethecomponentcharacters5 ko (âornamentalhatpinâ)and 6 hÄą (âstring of many pearlsâ)areusedpurely for phonologicalreasons,the in-dependentmeaningsof thecharactersbeingirrelevant.Thuswe mayassumea purelyphonographicanalysisfor 5(6 , asin (4.7):
(4.7)¹²²²²²³ PHON œ SYLS 0 ko
���hĹ ¢ � 2K¸
SYNSEM ďż˝ CAT š�����šSEM ���8797 Ă�à ¤
ORTH ¼�5 � 6 ¢ Š��������
Ateji really involveexactly thesameprocessby which onecanwrite theEnglishsen-tenceI seeyouforgot that as  i c u 4gotthat : thedifferenceis thatspecificateji areanacceptedstandardpartof Japanesewriting.
Turningto thesecondquestion,semantic-semanticcharactersin Japanese(andinChinesealso), certainly give someindication of what a purely logographicsystemmight look like. Now, in their reply to Sampson(1994),DeFrancisandUnger(1994)argueagainstthe possibility of a learnablepurely logographicsystemby citing thecaseof military codes.In suchcodes(asdistinct from ciphers), wordsarerandomlysubstitutedfor eachother, sothatbattleshipmight betransmittedasgrapefruitandat-tack mightbetransmittedasfallacious. Suchsystemsareindeedunlearnable(nobody,presumably, hassufficientmemory),but theexampleis notentirelyfair either:thesys-temhasno structure,which is of coursewhy it is soeffective for its intendedpurpose.Semantic-semanticcharactersprovidewhatseemslikeamorereasonablemodel:therewould be a limited setof primitivesâ in the caseof ChineseandJapanesewriting,the componentsof the charactersâ andtherewould be a calculusthat defineshowthey areto becombined.Of coursetherewouldbenophonologicalcuesto thelearner:ratherthelearnerwouldneedto learnto associatecollectionsof purelysemanticinfor-mationwith intendedwordsor morphemes.It seemsfair to guessthatsucha system
4.4. SOMEFURTHER EXAMPLES 151
would beextremelydifficult to design,which is presumablypartof thereasonsuchasystemnever hasbeendesigned.And it seemsfair to guessthatsucha systemwouldalsobe difficult to learn, thoughpresumablynot asdifficult asa military code. Forthe latter reasonalonesucha systemwould not serve theneedsof a societyin whichreadingis takenasabasicskill to bemasteredasrapidlyaspossibleby a largenumberof people:mostpeoplein a societyhave little time for complex linguistic games.Toserve thoseneedsa writing systemmusthavea significantphonographiccomponent.
4.4 SomeFurther Examples
In thischapterwehavegivenanoverview of someof thekindsof linguisticinformationthatmaybeencodedby orthographicelements.Wehaveproposedataxonomyof writ-ing systemsbasedon thekind of phonographicelementsusedin the system,andtheamountof logographypresentin the system.However, unequivocally phonographicelements,andthekindsof semanticallymotivatedlogographicelementsthatwe haveconsideredby no meansexhaustthe possiblefunctionsof orthographicdevices. Weclosethischapterwith threemildly esoterickindsof functions:anorthographicpluralmarker in Syriac;reduplicationmarkers;andcancellationsigns.In eachcasewe givea formaldescriptionin termsof ourmodel.
4.4.1 Syriac syame
The Syriacsyameis a pair of dots that marksplurality in nounsand adjectives inSyriacand someother Aramaicdialects(Daniels,1996a,page507). For example,the plural of : ÂŤ mlk ;ÂĄÂ /malka/ âkingâ, is written : ÂŤ mlkâ ;ÂĄÂ/malke/ (with syametransliteratedas ââ â). In unvocalizedtext, the syameareoftentheonly markof plurality, soonecanplausiblyanalyzethis device aslogographic,inthis casebeinglicensedby theSYNSEMfeature < = PL> . A representationfor /malke/âkingsâ, is given in (4.8). (Ultimately, the syamewill catenatewith the orthographicexpressionof theword asa whole: thereforethe lettersin theword aregroupedhere(usingâ ÂĽ Š â) separatelyfrom thesyame):
(4.8)¹²²²²³ PHON 0@? ďż˝ ďż˝ ¿£½ ¢ ďż˝BA Ă ďż˝ ĂC2ORTH ¼�¼ ÂŤ ?  � §� ½� ¢ §� A  à §� ;¥ Š § ÂĽ ÂŤ&D ÂFE Š�ŠSYNSEM ďż˝ CAT š�����š
PL = E ďż˝ ¤ �������Sincethe PL attribute overlapswith the phonologicalinformation in the word, thesyame, as the orthographicrepresentationof <ďż˝= PL> , catenateâ more specificallydownwardscatenateâ with the orthographicrepresentationof the phonologicalin-formationin theword, in this casetheletters ÂŤ m  , ÂŤ l  , ÂŤ k  and ÂŤ ;ÂĄÂ .
The exact placementof the syameabove the word dependsuponthe particularlettersin theword. If  r  is present,thesyameareattractedto it andarewritten
152 CHAPTER4. LINGUISTIC ELEMENTS
asa seconddot over the  r  : . Otherwisethesyamearewritten preferablyneartheendof theword,avoiding theletters  l  and  ;¥ , which have longascenders.
Thecategoricalrequirementsthat thesyameappearabove theword, andthat theymustappearabove  r  if thereis onearecapturedby the rulesin (4.9) and(4.10).Thefirst placesthesyameasthegraphicalexpressionof <�=HGJIK> abovethegraphicalex-pressionof thenounor adjective L . Thesecondreassociatesthesyameto thepositionof the  r  in theword, if thereis one;notethat if morethanone  r  is present,thesyamecanoccuron eitherone. Otherdetailsof syame-placementwe assumeto bestylistic.
(4.9) Ăďż˝Ăďż˝L ĂB< =HGMIK> ĂONQPSR5ÂżT?áĂVU Ă5Ă#Ă@L Ă(4.10) Let Ă#Ă@L Ă Ă"ĂŚ ĂŚ ĂŚ W Ê�à denotethefirst ďż˝YX Âş lettersof Ăďż˝Ăďż˝L Ă , Ă#Ă@L Ă W the ďż˝ th letter, andĂďż˝Ăďż˝L Ă W[Zďż˝Ă"ĂŚ ĂŚ ĂŚ \ theremainingletters.If Ăďż˝Ăďż˝L Ă W is ÂŤ r  , then:P]R5ÂżT? Ă^U Ăďż˝Ă#Ă@L Ă`_ Ăďż˝Ăďż˝L Ă Ă"ĂŚ ĂŚ ĂŚ W7ĂŠďż˝Ăba Ăc< P]R5ÂżT? Ă^U Ăďż˝Ă#Ă@L Ă W > a Ă Ăďż˝Ăďż˝L Ă W[Zďż˝Ă"ĂŚ ĂŚ ĂŚ \
Notethatwe wouldsaythatin SyriactheSLU is theword.
4.4.2 Reduplication markers
A numberof writing systems,includingearlierformsof Malayand BahasaIndonesiaaswell asKhmer, havemarkersthatindicaterepetitionof precedingmaterial.Khmer,for instance,hasa signthatmarkstherepetitionof theprecedingword or word group(Schiller, 1996,page472). (In Malay andBahasaIndonesiaa raisedâ2â wasused.)Suchsignsmightappearto constituteacounterexampleto regularity: in orderto iden-tify wheresuchsignscanbe written, the mapping dfehg9i akj would have to identifycopiedstretchesof linguisticmaterial,somethingthatcannotbehandledby finite-statedevicesfor unboundedcopy lengths.However, asfar asI have beenableto ascertain,thesedevicesarenot usedto markarbitrarycopiesof surfacestrings,but ratheronlycopiesthatarisefrom someform of morphologicalreduplication.We canreasonablyassumethat the lexical representationof a reduplicatedform indicatesthat a givenstretchof linguistic materialhasbeencopied: for example,standardautosegmentalanalysesof reduplication(Marantz,1982,andmuchsubsequentwork) assumethatthebasethatis copiedis affixedor compoundedwith amorphemethatis lexically phono-logically empty, but which derivesits surfacephonologicalmaterialby copying fromthebase.Clearlysuchreduplicatingmorphemesmustbemarkedassuchin the lexi-cal representationof constructsthatcontainthem,andwe only needassumethat thisinformationis indicatedaspartof therepresentationat theORL.
Forexample,letâssaythatwehaveaform likeoral oral , wherewewill assumeforthesake of argumentthat thefirst portionis thebase,andthesecondis thecopy. Theinformationaboutwhich is thebaseandwhich thecopy is known to themorphology,andpresumablycouldbelexically markedassuch.For instance,onemight imaginearepresentationsuchasthatin (4.11),wherethecopy is markedwith labeledbrackets:
(4.11) oralh< m�nKü]o oralJ>pm�nKü]o
4.4. SOMEFURTHER EXAMPLES 153
Assumethat thewriting systemin questionmarksreduplicatedconstituentsusingthesymbolâ Ă â. Thenonecanwrite arulethatsimplystatesthattheimageunderd ehg9i akjof any spanL bracketedby < mďż˝n?ĂĽ]o and > mďż˝nKĂĽďż˝o is simply â Ă â:(4.12) Ă#Ăďż˝< mďż˝nKĂĽďż˝oqLV>pmďż˝nKĂĽ]oĂON ĂThusassumingaspellingof ÂŤ orang for thebase,thespellingof oralr< mďż˝n?ĂĽ]o oral3>@mďż˝n?ĂĽ]owouldbecomeÂŤ orangĂ Â .
4.4.3 Cancellationsigns
A numberof scriptshave cancellationsigns,usedto mark symbolsthat arenot pro-nounced.For example,Syriacmarkslettersthatarenot pronouncedin somedialectswith a diagonalbar(calledmbat.lana) undertheletter(GeorgeKiraz, personalcom-munication).Similarly, Thai (Diller, 1996)usesa cancellationsignto indicatelettersthatarenotto bepronounced,mostlyin wordsderivedfrom Sanskrit,whicharespelledetymologicallyin ModernThaiorthography.
Cancellationsignsthus mark graphemesthat are not licensedby any linguisticmaterial: more formally, they mark graphemess wherethe imageof s under theinverseof dferg9i akj â whichwewill denotehereas à Ê�à â is empty: à Ê�à Ătsďż˝ĂuNwv : .Thus,givenacancellationsign x , wewantto rewrite s as x#Ă[s just in caseà Ê�à Ătsďż˝Ă`Nyv :(4.13) For an orthographicsymbol s , andcancellationsign x if à Ê�à Ăďż˝sďż˝ĂzN{v thens|_}x Ă~s .
154 CHAPTER4. LINGUISTIC ELEMENTS
Orthography Analysis Pronunciation Glossďż˝(ďż˝ ÂŤ SURROUND+L INGWU  l Äąngyu âimprisonedâďż˝(ďż˝ ÂŤ SURROUND+WULUN  hulun âswallow wholeâďż˝(ďż˝ ÂŤ CART+LI AOGE  ji uge âentwinedâďż˝(ďż˝ ÂŤ CAVE+YOUTIAO Ă4ďż˝!ä9çèã�üĂĂŚ ĂŁ!�  yaotiao âgracefulâďż˝(ďż˝ ÂŤ DEMON+WANGLIANG  wangliang âroamingghostâďż˝(ďż˝ ÂŤ FEMALE+ZHOU ďż˝!ä9ĂĂĂ�üĂĂŚ ĂŁ9� L I  zhoulÄą âsisterin lawsâďż˝(ďż˝ ÂŤ FOOD+KUNTUN  huntun âwontonâďż˝(ďż˝ ÂŤ FOOT+CUOTUO  cuotuo âprocrastinateâďż˝(ďż˝ ÂŤ FOOT+LANG Ăèç!äĂĂ��(ĂĽĂĂŚ ��� QIANG ďż˝ ä!Ă���üĂĂŚ ďż˝!Ă��  langqiang âhobbleâďż˝(ďż˝ ÂŤ FOOT+ROUL IN  roulÄąn âtrampleâďż˝(ďż˝ ÂŤ FOOT+CHOU ďż˝!ä!Ă���üĂĂŚ ����� ZHU  chouchu âhesitateâďż˝(ďż˝ ÂŤ FOOT+ZHI Ă9ä9ç(ĂĽĂĂŚ ���9Ă SHU  zhÄązhu âhesitateâďż˝(ďż˝ ÂŤ GAS+YINYUN ďż˝!ä!Ă ďż˝ ĂĽĂĂŚ ďż˝!à �� yÄąnyun âmisty atmosphereâďż˝( ÂŤ GOING+XIEHOU  xiehou âencounterâÂĄ(¢ ÂŤ GOING+YIL I  yÄąlÄą âtrailingâÂŁ(¤ ÂŤ GRASS+BOQI  bÄąqÄą âwaterchestnutâÂĽ(ÂŚ ÂŤ GRASS+GUAJU  woju âlettuceâ§(¨ ÂŤ GRASS+HANXIAN  handan âlotusâŠ(ÂŞ ÂŤ GRASS+JI ANJI A  ji anjia âtype of reedâÂŤ(ÂŹ ÂŤ GRASS+MUSU  musu âcloverâÂ(ÂŽ ÂŤ HAND+YEYU  yeyu âteaseâÂŻ(° ÂŤ HEAD+MANHAN ĂKĂŁ"ä9çĂĂ(ĂĽĂĂŚ ã���à  manhan âmuddleheadedâÂą(² ÂŤ HEART+CONGYONG  songyong âeggonâÂł(´ ÂŤ HEART+NIU ďż˝!�(ü�Ì �� NI  niunÄą âcoyâÂľ(Âś ÂŤ HEART+YINQIN  yÄąnqÄąn âattentivelyâ¡(¸ ÂŤ INSECT+BIANFU  bianfu âbatâš(Âş ÂŤ INSECT+FUYOU  fuyou âmayflyâÂť(Âź ÂŤ INSECT+QIUY IN  qiuyÄąn âearthwormâ½(ž ÂŤ JADE+CUICAN  cuÄącan âbrilliantâÂż(Ă ÂŤ JADE+DAIMAO  daimao âtortoiseshellâĂ(Ă ÂŤ LEATHER+QIUQIAN  qiuqian âswingâĂ(Ă ÂŤ OLD+MAOZHI  maodie âold peopleâĂ (Ă ÂŤ OVERHANGING+YI ďż˝9ä9Ă���üĂĂŚ ��� NI  yÄąnÄą âflutteringâĂ(Ă ÂŤ PERSON+KONGZONG ďż˝!�(ĂĽ9ĂĽĂĂŚ ďż˝9ç!Ă"ĂŠĂďż˝9ç9Ă Â kongzong âbusyâĂ(Ă ÂŤ SICKNESS+GE ďż˝!ä9ĂĂĂ(ĂĽĂĂŚ ��� DA  geda âcyst,boilâĂ(Ă ÂŤ STEP+PANGHUANG  panghuang âroamaimlesslyâĂĂĂ ÂŤ STEP+SHANGYANG  changyang âroam leisurelyâĂ(Ă ÂŤ TEETH+JUWU  juyu âbickeringâĂĄĂâ ÂŤ TREE+PI ĂKĂŁ"ä9ĂĂĂ(ĂĽ!ĂĽĂĂŚ ĂŁ"Ăèç�Ê�ã�ĂKĂŁ BA  pÄąpa âloquatâĂ(Ă ÂŤ TREE+NINGMENG  nÄąngmeng âlemonâĂ(Ă ÂŤ WINE+MINGDING  mÄąngdÄąng âdrunkâĂ(Ă ÂŤ WINE+TI ďż˝!ä!Ă��(ĂĽĂĂŚ ��� HU  tÄąhu âclear wine,butterfatâĂ(Ă ÂŤ WRAP+PU Ă��9ä9Ă9Ă�ü!ĂĽĂĂŚ ã����Ê�ã� ďż˝ FU  pufu âcrawlâ
Table4.1: Disyllabic morphemescollectedfrom the ROCLING corpus(10 million charac-ters)and10 million charactersof theUnited Informaticscorpus.This setconsistsof pairsofcharactersoccurringat leasttwice, andwhereeachmemberof thepair only cooccurswith theother.
4.4. SOMEFURTHER EXAMPLES 155
Orthography Analysis Pronunciation GlossĂ(Ă ÂŤ BIRD+YUANYANG  yuanyang âmandarinduckâĂ(Ă ÂŤ DOG+JI AOHUA ĂŁ!ä���üĂĂŚ �èã!Ă Â ji aohua âcunningâĂ(Ă ÂŤ GRASS+FANSHU  fanshu âyamâĂ(Ă ÂŤ GRASS+HULU  hulu âgourdâĂ (ĂĄ ÂŤ GRASS+LUOFU  luobo âdaikonââ(ĂŁ ÂŤ GRASS+PUTAO  putao âgrapeâä(ĂĽ ÂŤ HEART+GUANGHU  huanghu âillusionarilyâĂŚ(ç ÂŤ HEART+KANGJI  kangkai âgenerousâĂĂĂ ÂŤ INSECT+HUDIE Ă4ďż˝!ä9Ă��(ĂĽĂĂŚ ���!Ă Â hudie âbutterflyâè(ĂŠ ÂŤ INSECT+MAY I  mayÄą âantâĂŞ(ĂŤ ÂŤ INSECT+PANGXIE  pangxie âcrabâĂŹ(Ă ÂŤ INSECT+ZHANGLANG  zhanglang âcockroachâĂŽ(ĂŻ ÂŤ JADE+HUBO  hupo âamberâĂ°(Ăą ÂŤ JADE+L INLANG  l Äąnlang âkind of jadeâò(Ăł ÂŤ JADE+PIL I  bolÄą âglassâĂ´(Ăľ ÂŤ LAME+JI ANJI E  ganga âawkwardâĂś(á ÂŤ MOUTH+PAO Ă4ďż˝!ä9ç���ü!ĂĽĂĂŚ ĂŁ9ç���Ê�ã9ç"Ă XIAO  paoxiao âroarâø(Ăš ÂŤ MOUTH+HOULONG  houlong âthroatâĂş(Ăť ÂŤ MOUTH+HAISU  kesou âcoughâĂź(Ă˝ ÂŤ MOUTH+JUJUE  jujue âchewâĂĂĂ ÂŤ MOUTH+JI AFEI  kafei âcoffeeâĂž(Ăż ÂŤ MOUTH+LABA  laba âtrumpet,speakerâ��� ÂŤ MOUTH+XIXU  xÄąxu âsnifflingâ��� ÂŤ PERSON+GUILEI  kuÄąlei âpuppetâ��� ÂŤ PERSON+KANGL I  kanglÄą âcoupleâ��� ÂŤ ROOF+YUZHOU ďż˝9ä9Ă9Ă�üĂĂŚ ĂŁ9�  yuzhou âuniverseâďż˝ ÂŤ SHELL+YOULV ďż˝!äĂ� ĂĽĂĂŚ ĂŁ9� Â�� huÄąluo âbribeâ �� ÂŤ STEP+FEIHUI  paihuÄą âgoing to andfroâĂĂĂ ÂŤ TREE+BINLANG  bÄąnlang âbetelnutâ��� ÂŤ TREE+GANLAN  ganlan âoli veâ��� ÂŤ WINE+YUN ďż˝!ä!Ă ďż˝ ĂĽĂĂŚ ďż˝!Ă ďż˝ XIANG  yunniang âbrewing (i.e. trouble. . . )â
Table 4.2: Furtherdisyllabic morphemescollectedfrom the ROCLING corpus(10 millioncharacters)and10 million charactersof theUnited Informaticscorpus.This setconsistsotherpairsof charactersthatdo not exclusively occurwith eachother, but wherethereis nonethelessa high mutualinformationfor thepair. Note that LV ( ďż˝ ) indicatesthat thephoneticcomponentin questionoccurs9 out of 38 timesin characterspronouncedwith initial /l/ followedby somevowel.
156 CHAPTER4. LINGUISTIC ELEMENTS
Alex. # Kokuji Analysis (Phonetic) Kun (on) Gloss
10 ÂŤ PERSON+MOVE Â hataraki do âeffortâ
12 ÂŤ WIND+STOP Â nagi âlull, calmâ
33 ÂŤ MOUNTAIN+UP+DOWN Â touge âmountainpassâ
37 ÂŤ HEART+FOREVER Â kore âendureâ
74 ÂŤ FEW+HAIR Â mushi âpluckâ
124 ÂŤ EAR+CERTAIN Â shika âclearlyâ
160 ÂŤ BODY+BEAUTIFUL Â shitsuke âupbringingâ
198 ÂŤ DOWN+WIND Â oroshi âmountainwindâ
240 ÂŤ FIELD+BIRD Â shigi âsnipeâ
249 ÂŤ FEMALE+NOSE Â kaka âwifeâ
138 ÂŤ GRASS+ZA Â ďż˝ za(on) goza âmattingâ
51 ÂŤ TREE+MASA Â masa(kun) masa âstraightgrainâ
147 ÂŤ CLOTHING+YUKI Â yuki (kun) yuki âsleeve lengthâ
Table4.3: A sampleof Japanesekokuji (secondcolumn),with their componentialanalysis(third column). The first columnis the entry numberin Alexanderâs list (LehmanandFaust,1951).Thefourth columnliststhephonetic,if any. Thefifth columnliststhekunpronunciation,and the sixth column the on pronunciation,if any: in one caseâ gozaâ thereis no kunpronunciation.The last threekokuji shown areformedassemantic-phoneticconstructs,withthe last two beingbasedon thekun pronunciation:notethat thephoneticcomponentof masaalsomeansâstraightâ, soit is possiblethatthis oneis alsoa semantic-semanticconstruct.
Chapter 5
PsycholinguisticEvidence
Thereis to datealargeliteratureonthepsycholinguisticsof readingandwriting, whichdealsin thequestionof how humansextract linguistic informationfrom written text,andhow they composewritten text given a mentallinguistic representation.Someusefulgeneralcollectionsinclude(FrostandKatz,1992;deGelderandMorais,1995;Perfetti,Rieben,andFayol, 1997)and (Balota,FloresdâArcais,andRayner, 1990);therehasalsobeena large amountof work on readingandwriting Chinesescript,including the paperscollectedin (Chenand Tzeng,1992) and (Wang, Inhoff, andChen,1999).
Sinceweareproposingacomputationalmodelof writing systemsandtheirrelationto linguistic structure,it makessenseto askwhat âpsychologicalrealityâ thereis inthe model that we have proposed.It is not my intentionhereto review the variouspsycholinguisticmodelsâ many of them mutually inconsistentâ that have beenproposed.Rathertheapproachthatwill betakenwill beto examinethemodelandseeif thereis any supportin the psycholinguisticliteraturefor someof the propertiesofthemodel.
Clearlysuchanapproachrequirescaution.Whatdowemeanby âpsychologicallyrealisticâ? This is a term which unfortunatelyhasbeenmuchabusedin the historyof linguisticsandcomputationallinguistics. In the context of the presentdiscussionwe needto beratherpreciseaboutthelevel of granularityat which we would wanttoinvestigatethe psychologicalreality of our modelof writing systems.For example,thereare specificcomputationaldevices â finite-statetransducersin particularâthathavebeenproposedasplausiblecomputationalmechanismsfor mappingbetweenwrittenform andlinguistic representation:it seemsunlikely thatthesespecificdevicesareplausiblemodelsof whatgoeson insidea readerâs (or writerâs)head.
Ontheotherhandtherearemoremacroscopicpropertiesof themodelthatdomakesenseto comparewith theresultsof psycholinguisticresearch.I will focusontwo suchpropertieshere:ďż˝ Ar chitectural Uniformity : thesamemodelof therelationbetweenorthography
andlinguistic form is proposedfor all writing systems.
157
158 CHAPTER5. PSYCHOLINGUISTICEVIDENCEďż˝ Dual Routes: the model makesa distinction betweenspelling rules, and thelexical specifications, possibly includingmarkedorthographicinformation,thattheserulesoperateon. It is assumedthat in normalreadingorthographicrepre-sentationsaremappedto lexical elements(i.e., theORLâsof morphemes,wordsandphrases),andthenceto pronunciations.However, in mostwriting systems,for mostwords,only partial lexical orthographicspecificationsarerequired,thebulk of thespellingbeingpredictableby spellingrules. Inverted,thesespellingrulescanserveasrulesfor inferringanORL representationfrom spelling;if onethencomposestheseinvertedspellingruleswith whatever rulesor principlesofthelanguagepredicttheactualpronunciationfrom theORL, onecanderivepro-nunciationsfor spelledwordswithout actuallyâconsultingâ the lexicon. Thuswehaveanadditionalrule-basedpathto pronunciationthatbypassesthelexicon.
Oneof the main topics in the literatureon readinghasbeenthe questionof thenumberof routesby which a readercanget from a written word into a phonologicalrepresentation.As we describein moredetail below, the mostcommonassumptionin the literatureis that thereareat leasttwo suchrouteswhich maybecharacterizedbroadlyasvia thelexiconor via âgrapheme-to-phonemecorrespondencesâ.
Within this framework, onequestionthathasreceiveda greatdealof attentioniswhetherwriting systemsdiffer in whichof thesetwo routesis taken.Theso-calledOr-thographicDepthHypothesisâ henceforthODH â claims,in its strongestform, thatsomelanguagesâ theso-calledorthographicallyâdeepâ languages,of whichEnglishis anoft-citedexampleâ requirereadersto go via thelexicon;whereasorthographi-cally âshallowâ languagesâ Serbo-Croatianis supposedlysucha caseâ only makeuseof thegrapheme-to-phonemeroute.Onemightbetemptedto equatethisnotionoforthographicdepthwith the notionof the depthof the ORL, discussedin Chapter3.But thereis a crucial difference:we claim that languagesdiffer in the depthof theirorthography, andin theregularityof their âgrapheme-phonemeâcorrespondences,butnot in themannerin whichonemapsfrom orthographyto linguistic representation,orultimatelyto pronunciation.
It seemsthatonecandraw two conclusionsfrom theliterature(thoughit wouldbedisingenuousto suggestthat thereis anything like a consensuson thesepoints).Bothof theseconclusionsareconsistentwith the propertiesof the modelthat we outlinedabove:ďż˝ Multiple routesfrom written form to pronunciationareavailable.ďż˝ TheODH, at leastin its strongestform, is incorrect:all writing systemscanbe
shown to make useof both a âlexicalâ, anda âphonologicalâ (i.e, rule-based)route.
Theremainderof this chapteris organizedasfollows. In Section5.1,we will out-line the evidencefor multiple routes,andwe will discussthe ODH, andgive someof the evidencethathasbeenpresentedbothsupportof, andagainstthis hypothesis.Section5.2 continuesthis discussionwith someevidencefrom ChineseandJapaneseâ two writing systems thatwouldappearonthefaceof it to beunequivocallyâdeepââ showing that even thereonefinds evidenceof âshallowâ processing.Finally, not
5.1. ORTHOGRAPHICDEPTH 159
all psycholinguistssupportthehypothesisof multiple routes,andthemostvocaladvo-catesof analternativesingle-routeapproachhavebeentheconnectionists.A somewhatdated,thoughstill influentialwork in thismold is (Seidenberg andMcClelland,1989).Section5.3givesabrief critiqueof this work.
5.1 Multiple Routesand the Orthographic Depth Hy-pothesis
Two kindsof experimentsfigureprominentlyin thepsycholinguisticwork onreading.Oneinvolveslexical decision, andtheother naming.
In a lexical decisionparadigm,subjectsarepresentedwith awrittenstimulus(usu-ally on a CRT screen),andareasked to answer(e.g. by pressinga button on a key-board)whetheror not the stimulusin questionis a word of their language. Theirreactiontime is measured,asis thecorrectnessof their responses.
In thenamingparadigm,subjectsareagainpresentedwith a written stimulus,butthis time they areasked to pronouncethe stimulusaloudâ to ânameâ theword thatis on the screen. In this casewhat is normally measuredis the time betweenthepresentationof thestimulusandtheonsetof vocalization.
TheODH hasimplicationsbothfor namingandfor lexical decision,but it is per-hapseasiestto illustratethe ideabehindthe hypothesisin the context of a modelofnaming.Onesuchmodelis presentedschematicallyin 5.1;thismodelis adapted,withsimplifications,from (BesnerandSmith,1992,Figure1), andthe presentationof theODH hypothesisdrawsheavily on their discussionof this topic.
Themodelin Figure5.1allows for threeroutesto naming.Thesimplestis labeledâAâ in thefigureandinvolvestheapplicationof âgrapheme-to-phonemeârules.In thatscheme,input text is fed into a block of rules,anda phonologicalrepresentationisderivedsolelyvia thatblockof rules.Crucially, thereis no lexical accessinvolved.Totakeanexample,thestring ďż˝ peatďż˝ in Englishcanbepronouncedby applyingtherulesďż˝ p ����� pďż˝ , ďż˝ ea����� i ďż˝ , and ďż˝ t ����� t ďż˝ , deriving thepronunciation/pit/. Sincethisrouteinvolvesassemblinga phonologicalrepresentationon the fly, it is often termedtheassembledroute.
Thesecondandthird routesdoinvolvelexical access,tovaryingdegrees.TheroutelabeledâBâDâ involvesthe so-calledorthographic input lexicon, which storeswordsin their orthographicforms,presumablywith associatedphonologicalinformation; itcorrespondspretty muchexactly to the orthographiclexical entry in the ORL in ourmodel. Namingvia routeâBâDâ thusinvolveslexical access,but of a fairly shallowkind, in thatonly theformal propertiesof theword areaddressed.Underthis schemeďż˝ peatďż˝ would bepronouncedby matchingthestring ďż˝ p ďż˝ , ďż˝ eďż˝ , ďż˝ aďż˝ , ďż˝ t ďż˝ againstthe lexical entry for peat in the orthographicinput lexicon, andretrieving the storedpronunciation/pit/.
The third route,âCâDâ is the deepest.It too involvesthe orthographicinput lex-icon, but it alsoinvolvesaccessingthe meaningof the word. In this case,semanticattributesof the lexical entry of peatwould be accessed,andfrom thereonewould
160 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
Phonological Output Lexicon
TEXT
SPEECH
Orthographic Input Lexicon
Grapheme to PhonemeCorrespondences
Phonemic Buffer
Semantic System
A
C C
B
D
Figure5.1: A modelof readingaword aloud,simplifiedfrom (BesnerandSmith,1992).
5.1. ORTHOGRAPHICDEPTH 161
derive a pronunciationfor theword associatedwith thatsetof attributes.1 In normalreadersundernormalconditionsaccessingthesemanticsshouldnot in principleyielda differentresult from the âBâDâ route. Differentresultsmay be obtained,however,in neurologicallyimpairedpatients,aswe shall seemomentarily. Routesinvolvinglexical accessderive pronunciationsfor written wordsby addressinga lexical repre-sentation,andhenceareoftentermedaddressedroutes.
As BesnerandSmith note (page47) thereis evidencefor the existenceof eachof theseroutesin readersof deeporthographies,like that of English. Someof themostcompellingevidencecomesfrom patientswith variouskindsof brainlesionsthatimpair theirability to readaloudin variousways.Specifically:ďż˝ Oneclassof patientsfinds it easierto namewordswhosespellingsareâmore
regularâ giventheir pronunciations.For examplecavefollows therulesof En-glish spellingbetterthanhavedoes,andsuchpatientsfind it easierto correctlynamecavethanhave. Plausibly, suchpatientshavebeendamagedin suchawaythatthegrapheme-to-phonemerule pathA is theonly oneleft opento them.ďż˝ At theotherextreme,somepatientsmake semanticerrorswhenaskedto name:for ďż˝ tulip ďż˝ they mayanswercrocus, for example.A reasonableexplanationisthat for thesepatientsthesemanticaccessrouteCâD hasbecomefavored(andthisonly imperfectly).ďż˝ In the middle are patientswho have no particularproblemsnamingordinarywords(eitherhaveor cave), anddonât tendto make semanticerrors. Yet theyareimpairedin that they areunableto readnon-words.This suggeststhat theyareusingneithera grapheme-to-phonemestrategy (routeA) nor do they seemto be using a semanticstrategy (route CâD). Ratherthey are forced by theirimpairmentinto routeBâD. Thiscorrectlypredictsthatthey will beableto readwordsthatarein thelexiconalready, but not novel words.
We turn now to the ODH. Two flavors of this hypothesishave beenproposedinthe literature,the strong form andthe weakform. The strongODH canbe statedasfollows:
(5.1) Orthographicdepthhypothesis(strongform):Readersof languagesthathave completelyregulargrapheme-phonemecorre-
spondenceslackanorthographicinput lexicon.
In otherwords,routeA is theonly routeavailableto suchreaders.In theliteratureonthe ODH, the mostoften cited instanceof a shallow orthographyis probablySerbo-Croatian.2
1This third, semantic,routeis theonethathasno directcorrespondentin our model: it would of coursebeeasyenoughto addanadditionallayerof semanticprocessingwherebylexical entriesat theORL maptoasemanticrepresentation,andthencebackto phonologicalentries.
2Notehowever that Serbo-Croatianorthographydoesnot mark lexical accent,which is determinedby(unwritten)lexical propertiesof thewordmuchasin thecaseof Russianstress(Section1.2.1);(Seidenberg,1990,pages50â51;WaylesBrowne,personalcommunication). Note that this is thecasewhetherwe aretalking aboutCroatian(written in theRomanalphabet)or Serbian(written in theCyrillic alphabet).Thus
162 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
A (significantly) weaker versionof the hypothesisâ onesupported,for exam-ple, by Katz andFrost (1992)â states that all written languagesallow for both agrapheme-to-phonemecorrespondenceroute(routeA), andfor a lexical accessroute(routesBâD,or perhapsCâD):but thatcostof eachroutedirectlyrelatesto thetypeoforthography(deepor shallow) involved. In shallow orthographies,the grapheme-to-phonemerouteis usuallycheaperin naming,thoughtheremaybeinstancesin whichlexical accessis involved.Contrariwise,in adeeporthography, lexical accesswill typ-ically becheaperto usein naming,thoughtherewill beinstanceswherethegrapheme-to-phonemeroutemightbeused.
Insofar asit makesa far strongerclaim aboutthe mentalprocessof reading,thestrongODH is moreinterestingthantheweakODH. We shall thereforestartby out-lining theevidencethathasbeenmarshalled,both for andagainstthis versionof thehypothesis.Ourconclusionwill bethattheevidenceagainstthestrongODH seemsonthewholemorecompellingthantheevidencefor it, andthat thereforethereseemstobeno reasonto acceptthatreadersof differentorthographieshave fundamentallydif-ferentmentalarchitectures.Rathertheevidenceseemsmoreconsistentwith a model(possiblylike the weakODH), wherereadersof all orthographiesusefundamentallythesamemodel,thoughof coursedifferencesamongorthographieswill inevitably leadto differencesin how thementalresourcesareallocated.
5.1.1 Evidencefor the Orthographic Depth Hypothesis
Accordingto thestrongODH, theprocessingof shallow orthographiesin namingin-volvespathway A in Figure5.1. Thus,it bypassesbothof the lexical pathwaysBâDandCâD.Thiswouldappearto maketheratherclearpredictionthatreadersof shalloworthographiesshouldfail to show effectsof lexical accessin namingtasks.In contrast,readersof deeporthographiesshouldshow sucheffectssincein generalpathway A isnot sufficient to correctlynamewritten forms,andoneof the lexical routesmustbeused.
Two widely reportedlexical effectsarethe effect of word frequency andlexicalpriming. The lexical frequencyeffect relatesthefrequency of particularlexical itemswith thespeedwith which they canberetrievedfrom the lexicon: otherthingsbeingequalmorefrequentwordsareretrievedmorequickly. The lexical priming effect re-latesthespeedwith which a word will beretrieved,to thepresenceof a semanticallyrelatedword: if theword couch hasbeenusedin a previouscontext, semanticallyre-latedsofawill beretrievedfasterthanif asemanticallyrelatedwordhadnotbeenused.Thespeedof lexical retrieval is oftenmeasuredusinga lexical decisionparadigm,andin this paradigm,bothpriming andlexical frequency effectshave beendemonstratedboth in languagesthathave deeporthographiesandin shallow orthographies(BesnerandSmith,1992,page50).
Given theseobservations,it would appearto be strongconfirmationof the ODHthatprimingandword frequency effectswerenot observedin namingtasksfor Serbo-
it is by no meanspossibleto predictevery aspectof thepronunciationof a word in Serbo-Croatian.Thisdiffers from the caseof Spanishwherealmostwithout exceptiononecanpredict the pronunciationof awritten form without considerationof whatlexical form it mayrepresent.
5.1. ORTHOGRAPHICDEPTH 163
Croatian,alanguagewith asupposedlyshallow orthography(KatzandFeldman,1983;Frost,Katz, andBentin,1987). In theseexperiments, subjectswereasked to namebothrealwordsandplausiblenon-words;theexpectedpriming andfrequency effectsdid notobtainfor therealwordstimuli. In contrast,readersof deeporthographies,likethatof English,do show theselexical accesseffectsin similarly constructednamingtasks(BesnerandSmith,1992).
Still, BesnerandSmithobserve(page50):
. . . in contrastto thelargenumberof papersshowing priming andfre-quency effectsin deeporthographies,theattemptto provethenull hypoth-esisof no priming andno frequency effectsin theoral readingof shalloworthographiesrestsupona very narrow database.Therehave beenonlytwo reportsthata relatedcontext doesnot facilitatenamingrelative to anunrelatedcontexts (Frost,Katz & Bentin,1987;Katz & Feldman,1983),andonly onereportthatword frequency doesnot affect naming(Frostetal., 1987)
As BesnerandSmith note, one critical designfeatureof both the Frost et al.andKatz andFeldmanexperimentsis that, aswe have alreadydescribed,they usedbothwordsandnon-wordsasstimuli. Presumablynon-wordscanonlybepronouncedvia the assembledroute: they have, after all, no lexical representations.Could thisthennot simply biassubjectsto alwaysusetheassembledroute?After all, in a shal-low orthographythis will nearlyalwayswork. Sowhattheseexperimentsreportmaybe indicative not of what readersof shallow orthographiesdo in readingnormaltex-t (wherethe majority of wordswill be known), but rathersimply be the resultof astrategy thatsubjectshaveadoptedundertheconditionsof thisexperiment.
5.1.2 Evidenceagainstthe Orthographic Depth Hypothesis
BesnerandSmithdiscussseveralpiecesof evidencethatwould appearto underminethe conclusionsreachedin the Katz andFeldmanandFrost et al. papers,includingdatafrom Serbo-Croatian,Persian(Farsi), andJapanesewritten in kana. For Serbo-Croatian,experimentswereperformedwhereonly realwordswerepresentedto sub-jects.In this case,bothlexical frequency andprimingeffectswerefound.
ThePersianresultswereoriginally reportedin (BaluchandBesner, 1991).PersianorthographyisanArabic-derived abjad(Kaye,1996)(andseeSection6.1for anexpla-nationof thetermabjad): for many wordsthephonological informationprovidedbythewritten form is incomplete,in particularinformationaboutthevowels. However,asin Arabic, theconsonantletters ďż˝ w ďż˝ , ďż˝ y ďż˝ and ďż˝ â ďż˝ (alif ) canfunctionasvowels(/u/, /i/ and/a/,respectively),andsomewordswrittenwith thesesymbolshappento becompletein theirphonologicalspecifications.ThusPersianprovidesbothcaseswherelexical accessis necessaryto namea written form, andcaseswherelexical accessisin principlenot necessary. TheODH would predictlexical accesseffectsâ word fre-quency andpriming effectsâ for thosewordsthatarerelatively âdeepâ,andno sucheffectsfor âshallowâ words. BaluchandBesnerâs datasupportthis expectation,butonly whena significantportion of non-wordswere includedamongthestimuli. When
164 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
suchnon-word stimuli were not presented,lexical accesseffects were obtainedforbothâshallowâ andâdeepâwords.This, then,supportsthecontentionthatthereportedlackof suchlexical accesseffectsin previouswork onSerbo-Croatianmaybedueto astrategy adoptedby subjectswhengivena taskwheretheassembledrouteis oftenre-quired.Whennon-wordsareremoved,the assembledrouteis no longerautomaticallyadopted,andsubjectsbehaveasif they areuniformly usinganaddressedroute.
Theexperimenton readingof Japanesekanareportedin (BesnerandHildebrandt,1987) leadsto a similar conclusion. Japanese hasan even more extremecaseofa mixed orthographythan Persian,using both Chinesecharacters(kanji), many ofwhich functionlogographicallyaswehaveseen(Section4.3,thoughseeSection5.2.2below), aswell two kanacore syllabariesâ hiraganaandkatakana,which arefairlyphonemicin their representation.3 BesnerandHildebrandtpresentedsubjectswithstimuli written in katakana,which is normallyusedto write foreignloanwords.4 Thestimuli wereof two types,namelywordsthat arenormally written in katakana,andwordsthatwould normallybewritten in kanji. The lattergroupwerethuswritten inanunfamiliarway, whereastheformergroupwasorthographicallyfamiliar. However,if theODH werecorrect,this familiarity shouldhave no effecton namingspeedsincekatakanais in any event a shallow orthography. Registeringa form asâf amiliarâ orâunfamiliarâ presumesthatoneis matchinga written form againsta lexical entry, yetif onepresumes,following theODH, thatkanais readusingonly pathwayA from Fig-ure5.1, thenno matchingagainstlexical entriescanbe involved. In fact,BesnerandHildebrandtâs resultsshow definiteeffectsof familiarity, with wordsthatarenot nor-mally written in katakana(unfamiliar orthographicforms) takingsignificantlylongerto namethanwordsthatarenormallywritten in katakana(familiarorthographicform-s). This suggeststhatlexical accessmustbeinvolvedin readingkatakana,contrarytotheexpectationsof theODH.
5.2 âShallowâ Processingin âDeepâ Orthographies
Theprevioussectionhasexaminedsomeof thepsycholinguisticevidencesurroundingtheODH,which in its strongform claimsthatreadersof shallow orthographieslargelybypasslexical accesswhenreadingaloud.Thebulk of theevidencedoesnot seemtosupportthatradicalconclusion.Ratherthereseemsto beevidencethatreadersof bothshallow anddeeporthographiesdoperformlexical accesswhennaming,exceptunderexperimentalconditionsthatfavor adoptinga uniformassembledroute.
Yet surelythereis a sensethat âdeepâ orthographies,suchasEnglishor Chinese,typically requirelexical accessthat is âdeeperâthanonewould expectfor a shalloworthography?For example,while naminga Spanishform likecocerâto cookâ mayaf-ter all usuallyinvolve lexical access,presumablythewholelexical entrydoesnât needto beretrieved,but ratherjust thephonologicalinformation,which correspondsfairlystraightforwardly to the orthographicform. In contrast,to reada Chineseword like
3Note however, that pitch accent,which is lexically distinctive in Japanese,is not marked in the kanascripts.
4Hiraganais reservedmostlyfor grammaticalmorphemes.
5.2. âSHALLOWâ PROCESSINGIN âDEEPâ ORTHOGRAPHIES 165ďż˝ma âhorseâ, wherethereseemsto beno indicationof the pronunciationin the or-
thographicform, presumablyonehasto retrieve the whole lexical entry: indeed,aswe havenotedelsewhere,it hasoftenbeensupposedthatChinesewriting is primarilylogographicin thateachcharacterrepresentsnotaphonologicalunit atall, but ratheraword or morpheme.In this sectionwe discussevidencethat in ChineseandJapaneseâ two canonicalexamplesof deeporthographiesâ rapid accessto the phonologywithout (complete)lexical access,is possible.This thenprovidesevidenceof a com-plementarynatureto whatwaspresentedin theprevioussection:aâdeepâorthographycannonethelessshow shallow processingeffects.
5.2.1 Phonologicalaccessin Chinese
In an experimentreportedby AngelaTzeng(Tzeng,1994),Chinesereaderswerep-resentedwith a seriesof Chinesecharacterspresentedin rapid succession,possiblycontainingsomeinterveningcharacter-likenonsensematerial.5 Thetaskfor thesub-jectswassimply to write down thecharactersthatthey werepresentedwith. Thestim-uli werepresentedwith an interval of between90 and110milliseconds,fastenoughto resultin aneffect of repetitionblindnessunderappropriateconditions.Repetitionblindness,first reportedin (Kanwisher, 1987),denotesasituationwheretwo tokensofa particulartypearepresentedin rapidsuccession,andwheresubjectsfail to notethatmorethanonetokenwaspresented.In the context of Tzengâs experiment,presenta-tion of two identicalcharactersâ e.g. two instancesof ďż˝ sheng âwinâ â resultedin a meanaccuracy ratein subjectsâperformanceof about51%. In contrast,presenta-tion of a controlsequenceof two distinctandnon-homophonouscharactersâ e.g. ďż˝shengand ďż˝ dÄą â resultedin a higheraccuracy (around61%). Crucially, presenta-tion of two graphicallydissimilarbut homographiccharactersâ e.g. ďż˝ shengand shengâholyâ resultedin a meanerrorrateof 52%,or thesameastheratefor identicalcharacters.6
The critical factor in this experimentis that the homographicpairschosenweregraphicallydistinct,so it is not plausiblethat thesubjectsweresimply confusingthecharactersat a visual level. Neither is it possiblethat the subjectswere doing fulllexical accessand confusingthe two instancesat a lexical level. Putting asidetheimplausibility of doing lexical accessin as little time as90â110milliseconds(mostexperimentsaremoreconsistentwith lexical accessrequiringon the orderof a fewhundredmilliseconds,especiallyfor lower frequency items),full lexical accesscouldnot be involved,sincethe charactersin questioncorrespondto differentmorphemes:ďż˝ shengâwinâ and shengâholyâ, certainlymusthave differentlexical entries,andif thesubjectsweredoinglexical accessthenthey surelywouldhaveregisteredthefactthat they weredealingwith a successionof distinct characters.The only solution,it
5TheânonsenseâmaterialusedwasKoreanHankulsyllableglyphs,whichareof coursemeaninglesstoChinesereaderswhodonotknow Korean,but havetheusefulpropertythatthey look somewhatlikeChinesecharacters.
6The behavior for high and low frequency characterswas different, with high frequency homophon-ic pairsshowing a higheraccuracy than repeatedcharacters,thougha lower accuracy thandifferentandnon-homophonicpairs; for low frequency charactersthe performancefor homophonicpairswasactuallysignificantlyworsethantheperformancefor repeatedidenticalcharacterpairs.
166 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
seems,is to concludethatChinesecharactersmap,in the initial stagesof processing,to a level of representationthat is basicallyphonological.Put in anotherway, whileChinesecharacterscertainlycontainnon-phonologicalinformation, it is nonethelessthe casethat skilled Chinesereadershave learnedan associationbetweencharactersandtheircorrespondingsyllables,thatallowsfor veryrapidaccessto thephonologicalform, in effectbypassingtherestof lexical access.
Tzengâsresultsareconsistentwith othermorerecentfindings.For examplePerfettiandTan(1998)reportresultsof a priming experimentwheresubjectswerepresentedwith a characterprime followed immediatelyby a target, which the subjectswerethenasked to readaloudasquickly andaccuratelyaspossible.The time differencebetweenthe startof the primeandstartof the targetâ the so-calledStimulusOnsetAsynchrony or SOA â wasvaried,aswasthe natureof the prime: the prime couldeitherbe graphicallysimilar, homophonous,semanticallyrelated(either vaguelyorâpreciselyâ), or an unrelatedcontrol. A strongerpriming effect resultedin a shorterandgenerallymoreaccuratenamingof thetarget. With theshortestSOAâs (43 msec)thestrongestprimingwasobtainedfrom graphicallysimilarcharacters,but astheSOAincreasedto 57 msec,thegraphicsimilarity effect attenuated.AcrossthelongerSOAconditions,homophonousprimesconsistentlyhada strongereffect thansemanticallysimilarprimes.In otherwords,thenamingof targetcharactersis facilitatedmoreby aprimethatsoundsthesame,thanwith aprimethathasa relatedmeaning.
In thecontext of thecomputationalmodel,asensibleinterpretationof thisclassofresultswouldappearto bethatskilledreadersof Chinese,in additiontoknowingwhichcharactersrepresentwhich lexical entries,have also learneda setof âgrapheme-to-phonemeâcorrespondencesby which they know, for example,that ďż˝ mapsto sheng.In termsof the discussionin Section4.2, this amountsto sayingthat the relationbe-tweenthesyllableshengandtheentirecharacterďż˝ implicit in therepresentationhasbeenextractedasa rule by theskilled Chinesereader;we returnto this point in Sec-tion 5.2.4.
5.2.2 Phonologicalaccessin Japanese
Tzengâs resultsfor Chinesearemirroredby theresultsobtainedfor Japanesekanji bytwo studies,(Horodeck,1987)and (Matsunaga,1994).
Horodeckâs goal was to refute the widespreadview that Chinesecharactersareideographic in the sensethat they directly representideasin the mind of the reader;thisview hasof coursebeenheavily attackedby others,mostnotablyDeFrancis(1984;1989). To this end,Horodeckconductedtwo studies,oneinvolving writing andtheother reading. In the writing study, spontaneouslywritten short essaysfrom 2410Japanesespeakers with a variety of occupationsandeducationalbackgroundswerestudiedfor spellingerrorsinvolving kanji. Horodeckclassifiedtheerrorsalongthreedimensions:ďż˝ whethertheerrorful characterhadtheright soundâ i.e., wasa homophoneof
thecorrectcharacter;ďż˝ whethertheerrorfulcharacterhadtheright form â i.e.,sharedamajorstructural
5.2. âSHALLOWâ PROCESSINGIN âDEEPâ ORTHOGRAPHIES 167
componentwith thecorrectcharacter;andďż˝ whethertheerrorful characterhadtheright meaningâ i.e.,wassimilarenoughin its senseto thecorrectcharacter.
For thepurposesof Horodeckâs intentions,themostusefulkindsof errorswereerrorsinvolving either:characterswith theright sound,but wrongform andwrongmeaning;or characterswith the wrong sound,wrong form but right meaning. All othercate-goriesof errorareeitherambiguous,or elsecouldbeexplainedpurelyon thebasisofformal similaritiesbetweentheerrorandthetarget. In Horodeckâscorpustherewere136right-sound/wrong-form/wrong-meaningerrors;amongtheseerrors127involvedon (Sino-Japanese)readingsand9 involvedkun (native) readings.In contrast,therewerea total of 14 wrong-sound/wrong-form/right-meaning errors. Thus, in sponta-neouswriting oneis muchmorelikely to make anerroron thebasisof soundthanonthebasisof meaning.
Horodeckâs secondexperimentinvolveda readingtestwherekanji with inappro-priate meaningswere insertedin a text, and wherethe object was to measurehowoften theseerrorswere detected. All of the errors in this portion of the study in-volvedmulticharactercompoundswith on readings:kanji occurmuchmorefrequent-ly in these constructionsthanthey do eitherwith kun readingsor assinglecharac-terswith on readings(âon-isolatesâ),andit wasthereforeeasierto constructstimuliusing multicharacteron constructions.For the stimulustexts, newspaperheadlineswere chosensincethesehave a higher densityof kanji than normal running prose.Theerrorstimuli usedwereof two types:right-sound/right-form/wrong-meaningandwrong-sound/right-form/wrong-meaning. Readerson averagedetectedonly 40.5%oftheformerkind of stimulus,asopposedto 54.3%of thelatterkind of stimulus.Thisd-if ferencewasstatisticallysignificant,anddemonstratedthaterrorshomophonouswiththeir targetsareharderto detectthanerrorsthatarenon-homophonous.
Matsunagaâs(1994)experiment,likeHorodeckâssecondexperiment,involvedho-mophonousandnon-homophonouskanji errors.However, ratherthanaskingreadersto markerrorsin newspaperheadlines,sheinsteadmeasuredreadersâeye movementsasthey readfull sentencescontainingsucherrors: theassumptionhereis thaterrors,when detected,will disrupt the readerâs readingandwill translateinto fixations onthe locationof the error, which in turn will show up in the eye-trackingdata. Mat-sunagafoundthattherateof fixationspererrorwassignificantlyhigherin thecaseofnonhomophonicerrorsthan in the caseof homophonicerrors. In otherwords,non-homophonicerrorswereeasierto detect,a result that replicatesHorodeckâs secondstudy.
The studiesof HorodeckandMatsunagathus lead to the sameconclusion forJapanesereadingof kanji asdoesTzengâsstudyof Chinese.Usersof bothwriting sys-temsmorereadilymisserrorsthatarehomophonouswith their targets,andthey morereadilymissrepeatedcharactersif they arehomophonouswith a previouslypresentedcharacter. All of thesestudiesthereforesupportthe ideathatChinesecharactersmapdirectly to phonologicalrepresentationsin themindsof fluentreaders.
168 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
5.2.3 Evidencefor the function of phoneticcomponentsin Chinese
Psycholinguisticevidencefor low-level phonologicalprocessingof Chinese,of thekind discussedin Section5.2.1seemson thefaceof it to bedirectconfirmationof theclaim of DeFrancis(1984;1989),alsodiscussedin Chapter4, that Chinesewritingis essentiallyphonographicin design,thoughobviously imperfectlyso. As it will berecalled,the mostpowerful evidencefor this claim is the large numberof semantic-phoneticcharacters,wherethepronunciationis indicated,to agreateror lesserdegree,by a phoneticradical. Theimportanceof phoneticinformationin thedevelopmentoftheChinesewriting systemis unequivocal,andit is evenplausibleto supposethatthephonologicalinformationprovidedby thephoneticcomponentis anaidto learningthewriting system.WhatTzengâs experimentsshow is thatskilled Chinesereadershaveinternalizedthewrittensymbolsasakind of alphabet,sothatthey canretrievepronun-ciationsfor eachof thesymbolsin theabsenceof any furtherlexical information.Butonemustbecarefulaboutdrawingaconnectionbetweentheresultsof thisexperiment,andtheevidenceadducedbyDeFrancis.To seethis,considerthefollowing thoughtex-periment.SupposethattheChinesewriting systemwere,counterfactually, completelyarbitraryin its mappingbetweenorthographicsymbolsandtheir pronunciation:thatis, therewould beno equivalentto a âphoneticâ component,andthereforeno way tolook atanovel characterandguessits pronunciation.Supposefurthermorethatsome-onehadmasteredthiswriting systemaswell asliterateChinesereadersmastertherealChinesewriting system.Onemight expectthat theresultsof Tzengâs experimentsonthispseudo-Chinesewouldbeidenticalto whatshedemonstratedwith realChinese.Ifthatturnedout to bethecase,thenonewould haveto concludethata skilled readerofany writing systemis likely to âphonologicallyrecodeâthesystemin that they wouldbe able to mapbetweenwritten symbolsandpronunciationwithout performingfulllexical access.7
So Tzengâs experimentmight not relatedirectly to DeFrancisâresultsat all. Wemustthenaskwhat the evidenceis that readersof Chineseactuallymake useof thephoneticinformationprovidedin themajorityof Chinesecharacters.
In fact,thereis suchevidence,onerelevantexperimentbeingthatof (Hung,Tzeng,andTzeng,1992). In thatexperimenta Strooppicture-word interferenceparadigmwas usedto test subjectsâabilities to namea picture when a single-characterwordof varying degreesof congruenceto the picturewassimultaneouslypresented.Forexample,supposea pictureof a basket is presented.A completelycongruentwordwould betheword ! lan âbasketâ; following Hung,Tzengand Tzengâs terminology,we will call this word/characterCC for âcompletely congruentâ. An exampleof acompletelyincongruent(CI) word would be " dÄąng ânailâ. Partially congruentwordswere:ďż˝ A homophonous(but semanticallydistinct)characterhaving thesamephonetic
componentasCC,or having CC asa phoneticcomponent.For example# lanâblueâ. (SGSS:âsimilar graph,samesoundâ)
7Coulmas(1989,page50) makesexactly this point whenhenotesthata skilled readerof Chinesecanequallywell mapbetweena charactersuchas $ andits phonologicalâ bÄą â andlexical â âpenâ âvalues.
5.2. âSHALLOWâ PROCESSINGIN âDEEPâ ORTHOGRAPHIES 169ďż˝ A homophonousandstructurallydistinct character: % lan âorchidâ. (DGSS:âdif ferentgraph,samesoundâ)ďż˝ A nonhomophonouscharactersharinga componentwith CC: & ji an âjailâ (S-GDS:âsimilar graph,differentsoundâ)ďż˝ A pseudocharacter(PC),wherethe CC servedasthe phoneticcomponentof anon-existentcharacter:
Subjectswereaskedto namethepictures,andtheirreactiontimeswererecorded,alongwith their error rates.Not surprisingly, the CC andCI conditionsshowed the fastestandslowestreactiontimes,respectively, aswell asthebestandworsterrorrates.Theother conditionslisted above were arrangedas follows, orderedfrom fastest/lowesterror to slowest/highesterror: PC ďż˝ SGSSďż˝ SGDS ďż˝ DGSS.As Hung,TzengandTzengargue,therearetwo independenteffects,oneof graphicsimilarity to the tar-get (CC) character, andoneof phonologicalsimilarity. Taken together, theseresultsfirst of all supportthe laterwork of Tzeng(1994)in underscoringthe importanceofphonologicalinformationin Chinese,andthey alsoshow thatthephoneticcomponentis bothaccessibleandusedby Chinesereaders,sincethetwo non-CCconditions(PC,SGSS)wherethephoneticcomponentis thesameasthatof CC weretheoneswheresubjectsperformedthebest.8
5.2.4 Summary
Thereappearsto be evidencethatphonologicalinformationis both availableto andusedby readersof ChineseandJapanese.Furthermore,at leastfor readersof Chi-nese,informationin the phoneticcomponentof the character, whenpresent,is used.Whena usefulphoneticcomponentdoesnot exist, we assume,aswe did in the pre-viouschapter, that theorthographicentry for themorphemeis linkedsimultaneouslyto boththesemanticandphonologicalentries,andthatthecharacterthusservesasitsown âphoneticcomponent.â In the discussionin the previouschapter, we assumedastaticrepresentationwherebythe orthographicsymbol is simply listed aspart of thelexical entryof the morpheme,with indicesindicatingwhich portionsof the symbolcorrespondto thesemanticandphonologicalfields. What theexperimentalevidencepresentedin this sectionsuggestsis that skilled Chinesereadershave advancedonestepfurther thanthis staticknowledge.Soratherthanmerelyrepresenting(Chinese)' ďż˝ BIRD+JI A ďż˝ ya âduckâ asin (5.2),they haveformulatedaspellingruleasin (5.3),which wouldbelexically markedto applyonly to this morpheme.
8The importanceof the phoneticcomponentis further underscoredby several studiescited in (Hung,Tzeng,andTzeng,1992,page127),whereit wasshown that the phonologicalconsistency of a phoneticcomponentwasnegatively correlatedwith the naminglatency for charactershaving thatphoneticcompo-nent.
OnemightalsosupposethatJapanesereadersmakeuseof thephoneticcomponentwhenit is bothpresentanduseful.Clearlythephoneticcomponentwill notgenerallybeusefulfor kunreadings,but for onreadings,thephoneticcomponentwould in many caseshave approximatelythesameutility asit would for thesamecharacterin Chinese.Onestudythatseemsto supporttheutility of thephoneticcomponentin Japaneseonreadingsis (FloresdâArcais,Saito,andKawakami,1995).
170 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
(5.2) ()))))))))*PHON
()* SYL
(* SEG +-, ONS .0/ , RIME 12/43TONE 5 67 6987;:=<
SYNSEM > CAT ?A@CBD?SEM E0BGFIHKJ <ML
ORTH NPO J-QSR :UT698888888887
(5.3) ya �VOW� BIRD �YX[Z R � JI A � ( \ ' )
(5.4) (' \ ) O]X^Z R ďż˝ ya
Inverted,as in (5.4) this rule will map directly between'
and ya. The presenceof the phoneticcomponentR ďż˝ JI A ďż˝ on the lefthandside of several suchinvertedruleswill, giventhephonologicalsimilarity (thoughcertainlynot identity in thiscase),tend to reinforcethe salienceof the phoneticcontribution of that componentto thepronunciationsof thecharacterscontainingit asaphoneticelement.And theexistenceof suchinvertedrulesyields the result that it is possibleto mapdirectly betweenacharacterandits pronunciation,without lexical access.
The situationin Chineseis really no different in kind from the lexically markedspellingsin English: for the word light, for example,one must mark the spellingďż˝ igh ďż˝ in thelexicon,sincethereis nowayto predictthatspellingfrom generalprinci-ples.However, onecancertainlyextractfrom thesetof wordsspelledwith ďż˝ igh ďż˝ theusefulgeneralizationthat this graphemesequenceis generallypronounced/aI/.9 Im-putinga rule that maps
'to ya from a lexical representationthat statesthat âduckâ
is written with the ďż˝ BIRD ďż˝ radicalanda phoneticcomponentďż˝ JI A ďż˝ , is merelyaninstanceof thesamephenomenon.
5.3 Connectionist Models: The Seidenberg-McClelland Model
As wediscussedabove,moststudiesof readinghaveassumedadualroute,or multipleroutemodel. By definition suchmodelspresumea strict distinctionbetweenstoredlexical informationon the onehand,andruleson theother. In separatingrulesfromstaticlexical information,suchpsychologicalmodelsareof coursetakinga fairly tra-ditional stance,onethatis in accordwith traditionallinguistic models.
As is well-known, sincethe mid 1980âs, an alternative view of languagehase-mergedwhich eschews a formal distinction betweenrulesand lexicon. This is theconnectionistview, so-calledbecausethe belief is that complex systemsof behavior
9Or, if oneprefersto assumea deepORL for English(seeSection3.2), that it mapsto /Äą/ which sub-sequentlychangesto /aI/ by phonologicalrule. Note thatsetsof suchinvertedrulesserve asthe basisforlinguistically-informedapproachesto theteachingof readingsuchas(BloomfieldandBarnhart,1961).
5.3. CONNECTIONISTAPPROACHES 171
can be modeledusing large numbersof simple, but massively interconnectedunits(sometimescalledâneuronsâ). Phenomenathathave beentermedârulesâ or âlexicalentriesâaremerelyemergentpropertiesof suchinterconnectednetworks. Probablythemostfamousapplicationof this ideato aproblemin naturallanguageis RumelhartandMcClellandâs(1986)oft-citedsimulationof the learningof theEnglishpasttense.Their basicclaim wasthattherewasno differencebetweenEnglishregularpasttenseverbs,whichaddsomevariantof /-d/, andirregular(mostlyhistoricallyâstrongâ verb-s)whichinvolveachangeof thestemvowel (alongwith otherchangesin somecases):both typesof verbsare learnedby their network in the sameway, and the networkis â to someextent â ableto generalizewhat it hasâlearnedâ to new cases.Thusthereis noneedto posita formaldistinctionbetweenrulesandstoredlexical items,asbothâkindsâ of knowledgeareâlearnedâby thenetwork in thesameway. An effectiverebuttal to this papercanbefoundin (PinkerandPrince,1988).
Theclassicconnectionistapproachto readingis thesystemof Seidenberg andM-cClelland(1989). Naturally, oneof themain claimsof their theoryis thatdual-routemodelsarenot necessary:in particular, regularly spelledwords,suchasate, whichcouldin principlebepronouncedwithout referenceto lexical information,arelearnedin the sameway asirregularly spelledwords (plaid), wherelexical accessseemstobe required. Indeed,in subsequentwork (Seidenberg, 1990)the model is presentedasproviding analternative to traditionalâdual routeâ model(thoughsee(Seidenberg,1992)for a slightly modifiedposition). Thework is alsocited (Seidenberg, 1997)asa viable alternative to symbolicrule/principle-basedapproachesof the kind familiarfrom generative linguistics. Therearemorerecentandarguablymoresophisticatedconnectionistapproachesto readingâ see,for example,thework of VanOrdenandcolleagues(Van Orden,Pennington,andStone,1990;StoneandVan Orden,1994),but Seidenberg andMcClellandâs paperseemsto presentthe mostdetaileddiscus-sionof a computationalsimulationof a connectionistmodelof reading,aswell asthemostdetaileddiscussioncomparingthat modelâs behavior to experimentson humansubjects.
It is not our purposehereto give anextensive review of Seidenberg andMcClel-landâsmodel.Rather, averybrief summarywill begiven,andafew of theweaknessesof theapproachwill bepointedout. Themainconclusionwill bethatSeidenberg andMcClellandhave failed to provide convincing evidencethat their modelhaslearnedthe task that it is claimedto have learned;thus thereis little reasonto accepttheirconclusionthatmoretraditionalkindsof modelshavebeensuperseded.
5.3.1 Outline of the model
Seidenberg andMcClellandhave in mind a completemodelof lexical processingre-lating orthographic,semantic,phonologicalandcontextual information. Their modelis diagrammedin Figure5.2.10 Theportionof themodelthat is actuallyimplement-
10Oddly, morphologicalinformation,socrucialfor thecorrectpronunciationof wordsin many languages,is missingfrom their conceptionof the lexical processingsystem. It is unclear, for instance,wherethemorphologicallydeterminedstressinformationin Russian(Section1.2.1)that is crucial for correctvowelpronunciationwould fit into themodel:would thatbepartof âphonologyâor âmeaningâ?
172 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
context
meaning
phonologyorthography
MAKE /meIk/
Figure 5.2: The Seidenberg and McClelland model of lexical processing,(Seidenberg andMcClelland,1989,page526), Figure1. Usedwith permissionof theAmericanPsychologicalAssociation,Inc..
5.3. CONNECTIONISTAPPROACHES 173
100/200hidden units
400_orthographicunits
460_phonologicalunits
Figure5.3: The implementedportion of Seidenberg andMcClellandâs modelof lexical pro-cessing,(Seidenberg andMcClelland, 1989,page527), Figure2. Usedwith permissionofAmericanPsychologicalAssociation,Inc..
ed in the 1989paperis depictedin Figure5.3. This systemwastrainedon a setof2,884word-pronunciationpairsconsistingof all minimally three-lettermonosyllablesfrom the KuceraandFranciswordlist of English(1967), from which they removedâproper nouns,words we judgedto be foreign, abbreviations,andmorphologicallycomplex words that were formed from the addition of a final -s or -ed inflectionâ(page530). The training wasdivided into epochs,andwordswerepresentedto thesystemin eachepochwith a probabilityproportionalto their occurrencein theKucer-a/Francisdatabase.Input (orthography)andoutput(pronunciation)wascodedusingaâWickelgrentripleâ letter/phonemetrigramschemesimilar to thatusedin (RumelhartandMcClelland,1986): thus ďż˝ catďż˝ would becoded as N ca,cat,at
T(whereâ â is
theboundarysymbol). Eachinput andoutputunit is sensitive to exactly oneof thesetriples.
The testingmethodologyis describedby Seidenberg andMcClellandasfollows(page532):
Thephonologicaloutputcomputedfor eachwordwascomparedto allof thetargetpatternsthatcouldbecreatedby replacinga singlephonemewith someotherphoneme.For theword HOT, for example,thecomputedoutputwascomparedto thecorrectcode,/hot/,andto all of thestringsinthesetformedby /Xot/, /hXt/, and/hoX/, whereX wasany phoneme.Wethendeterminedthenumberof casesfor which thebestfit (smallesterrorscore)wasprovidedby thecorrectcodeor oneof thealternatives.
Thesystemwastestedon thetrainingset,andtheerror rateof the trainedsystemonthissetwas2.7%.Amongtheerrorsreported,someareplausibleregularizationerrors
174 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
(e.g./bruc/ for ďż˝ broochďż˝ ); othersarelessplausible(e.g./h rs/ for ďż˝ hearthďż˝ ).The remainderof the paperis devotedmostly to demonstratingequivalencesbe-
tweenexperimentaldataonhumansubjectsandthebehavior of themodelon thekindof datareportedin theexperiments.Thetypicalcomparisonmadeis betweenhumansânaminglatenciesâ the amountof time it takes for a humanto pronouncea givenword aloudâ andthephonologicalerrorscoreof themodel(computedasdescribedabove) for the correctpronunciationof the given word. For example,in a studyre-portedin (TarabanandMcClelland,1987),subjectsshowedslower naminglatenciesin low-frequency wordsthanhigh-frequency words;they alsoshowedslower naminglatenciesfor exceptionallyspelledwordsover regularly spelledwords,but this dif-ferencewasonly significantamonglow-frequency words. This patternof behavioris apparentlyreplicatedin the phonologicalerror ratesof the model: low-frequencywordsshow higherphonologicalerror ratesthanhigh-frequency words; andamongthelow-frequency words,exceptionallyspelledwordsshowedsignificantlyhigherer-ror ratesthanregularlyspelledwords.
5.3.2 What is wrongwith the model
The Seidenberg-McClellandmodel is not the first connectionistmodel that wasap-plied to theproblemof readingaloud:it waspredatedby severalyearsby theNETtalksystemof Sejnowski andRosenberg (eventuallyreportedin a journal article in (Se-jnowski and Rosenberg, 1987)). Sejnowski and Rosenberg were only marginallyinterestedin psychology, being concernedinsteadin an engineeringproblem: howcould onedesigna computationaldevice that âlearnsâ to correctlypronouncewordsgiven a training corpusconsistingof text with alignedorthographyand pronuncia-tion? Ratherthanrestrictthemselvesto a few thousandmonosyllables,Sejnowski andRosenbergâssystemwasexposedto Englishwordsof variousstructures,alignedwiththeir pronunciations,taken from runningtext. The resultsof Sejnowski andRosen-bergâs experimentwereclearlynot acceptablefor a realapplicationâ reportederrorrateswereabout8%by phonemeâ but werepromisingenoughto spawn a greatdealof subsequentresearchin self-organizingmethodsfor learningword pronunciations:seeSection6.6.
Comparedwith Sejnowski andRosenbergâs system,the Seidenberg-McClellandsystemseemsratherweak,evenif theresultsdo on thefaceof it appearto bebackedupby experimentalevidence.Considerthatthemodelhasbeentrainedandtestedonlyon a few thousandmonosyllables(andfurther testedon possiblya few hundredmoreexamplesin thevariousreplicationsof experiments).Restrictingoneselfto monosylla-blesonenaturallyavoidsoneof themostdifficult problemsin learningto readEnglish,namelypredictingwherethestressis placed.Linguistically motivatedmodelsof pro-nunciation,suchasthosetypically usedin goodtext-to-speechsystems,modelstressplacementby somecombinationof lexical marking,andphonologicalrulesthat aresensitive to morphologicalstructure.Seidenberg andMcClellandâsmodelprovidesnoanswerto how thelearnerlearnsto appropriatelystresswordswhenreadingaloud.11
11Similarly, by avoiding names,the modelavoids anothercomplex areathat maturereadersof Englishlearnto dealwith. Becausenamesâ personalnamesin particularâ oftencomefrom languagesotherthan
5.3. CONNECTIONISTAPPROACHES 175
As we notedpreviously, someof the errorsproducedby the systemarebizarre,at leastif oneis consideringthe systemto be a modelof a normalmaturereaderofEnglish.Someerrorsthatfall into this categoryaregivenin (5.5):
(5.5) chew cwfrappe frlplewd lidmow mlouch eIcplume plomswarm swlrmangst ondstbreadth brebaczar varfeud fludgarb gargnerse mersnymph mimfsphinx spinkstaps tatstsar tarzip vip
As Pinker andPrince(1988)observe abouta similar setof errorsin the Rumelhart-McClelland(1986)model,thisdoesnotappearto be thebehavior of amaturesystem.
What, then,of the replicationsof the variouspsycholinguistic experimentsthatSeidenberg andMcClellanddiscuss?Several of thesedependuponanalogizingbe-tweensubjectsânaminglatenciesand the error rateof the model: whetherthis is ameaningfulcomparisonis unclear, thoughone might acceptit if enoughexamplesshow parallelbehaviors betweenthesetwo measures.The problemis, however, thatsomeof the supposedparallelsare highly misleading. The bestexampleof this isshown in Seidenberg andMcClellandâsFigure19,reproducedherein Figure5.4.Thisis a replicationof a studyreportedin (Seidenberg, McRae,andJared,1988),whichcomparednaming latenciesfor regularly pronouncedEnglish words,and regularlypronouncedEnglishwordsthatbelongto aninconsistentlypronouncedclass(Reg Incin the figure). For examplehoneis regularly pronounced(/hon/), but thereare(fre-quent)words sharingthe letter sequenceďż˝ oneďż˝ , that have pronunciationsfor thatsequencethatareinconsistentwith thepronunciationin hone: gone, done. Theexper-imentalresultsdemonstrateda 13 millisecondnaminglatency differencebetweentheregularandregularinconsistentclasses,asshown in Figure5.4.Also shown in thatfig-urearethemeansquaredphonologicalerrorscoresfor themodelon thesamestimuli,which accordingto Seidenberg andMcClellandâalso providea goodfit to thelatencydata.â But this canhardly be describedasan honestcomparison.Note that thereisno prescribedformula for mappingbetweenlatency differencesand(mean-squared)
English,thepronunciationof namesdoesnot alwaysfollow thegeneralconventionsof Englishwords.
176 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
540b535b530b525b520b
6.0
5.5
5.0
4.5
4.0
ExperimentSimulation
Regular Reg Inc
Type of Word
Mea
n N
amin
g La
tenc
y (m
sec)
Mea
n S
quar
ed E
rror
Figure5.4: Replicationof the(Seidenberg, McRae,andJared,1988)experiment,from (Sei-denberg and McClelland, 1989, page545), Figure 19. Used with permissionof AmericanPsychologicalAssociation,Inc..
5.4. SUMMARY 177
phonologicalerror scoresof the model. In otherwords,givena line representingla-tency differencesfor humansubjects,thereis no prescribedformulawhich,giventhatline, allows oneto derive theslopeandintersectof theexpectedphonologicalerrors-coresof themodel.Thus,with only two datapoints,SeidenbergandMcClellandcouldhave chosento give the line correspondingto themodelâs performanceany slopeandintersectthatthey chose.In particular, they couldhavematchedtheline exactly to theexperimentaldata;but this would have lookedtoo good,andwould have emphasizedthepoint that thecomparisonis meaningless.Insteadthey presentedanalmostexactmatch,a tacticthatis incrediblyeffective,unlessoneis payingcloseattention.
To sumup: Seidenberg andMcClellandpresenttheir modelasan alternative tostandardpsychologicaltheoriesof reading. But it is hardto acceptthat conclusion.What we arepresentedwith is a toy system,onethat is shown to performwell (andeventhenwith bizarreexceptions)only on a very small portionof theproblem. Thecomparisonsofferedwith realexperimentaldatarangefrom theplausibleto thehighlymisleading.
In thefieldsof computationallinguisticsor speechtechnology, nobodywould ac-ceptthatamodelprovidedausefulalternativesolutionto a problemif thatmodelhadonly beentestedon a small and carefully selectedsubsetof the relevant examples.Neithershouldonebelievesimilar claimsin psychology.
5.4 Summary
Thecomputationalmodelof orthographythatwe have presentedimplicitly assumesaâdual routeâ for mappingbetweenwritten formsandtheir pronunciation:on theonehandwe have themappingto theORL, thelevel of lexical representationrelevantfororthographicencoding;on theother, themappingbetweentheORL andtheorthogra-phy â cedgfihkjml â is a setof spellingruleswhich, invertedandcomposedwith themapbetweentheORL andthepronunciation,serve alsoasa setof âletter-to-soundârules.
ThemodelalsohasArchitectural Uniformity in that it assumesthesamearchitec-ture for all writing systems:onereadsChineseor Japaneseby thesamemechanismsasonereadsEnglishor Serbo-Croatian.
Thesetwo propertiesarewell supportedby thepsycholinguisticliteratureonread-ing. Connectionistproposalsnotwithstanding,thereis evidencefor dual(or multiple)routesduringreading.And thesamemechanismsappearto beavailableto readersof avarietyof scripts.In particularthereis evidencebothfor âdeepâ(lexical) processinginsupposedlyshallow orthographies,andalsoâshallowâ processingin supposedlydeeporthographies.Of coursedifferentexperimentalconditionsmay favor certainstrate-gieswith certainscripts:presentingnonsensewordsto readersof a relatively shalloworthographywill certainlyfavor anâassembledâroute.But on thewhole,whetheronefindsevidencefor deepor shallow processingdepends,it seems,moreuponthe taskthatis beingexaminedthantheorthographicsystemoneis studying.
So, while I would not go so far asto claim âpsychologicalrealityâ for the com-putationalmodel,we canstatewith someconfidencethattheoverall architectureis at
178 CHAPTER5. PSYCHOLINGUISTICEVIDENCE
leastnot atoddswith whatweknow abouthumanorthographicprocessing.
Chapter 6
Further Issues
Needlessto say, therearemany problemsin the analysisof writing systemsthat areleft unresolvedby this discussion.This chapterwill addressfour of theseissues.
First of all, in Section6.1, I examinethe adaptationof writing systems,andaskwhatit meansto adaptanorthographyto a new language.
In Section6.2 I considerspelling reforms,and in particularthe 1995reform ofDutchspelling:asweshallseein thiscase,contraryto popularnotionsof whatspellingreformsshouldbelike,extantexamplesof spellingreforms(asin theDutchcase)oftenaremoreconcernedwith âmorphologicalfaithfulnessâthanwith makingthe writingsystemmoreâphoneticâ, thoughthey maynotbeparticularlysuccessfulin eithergoal.
Throughoutmostof this bookI have consideredwhatonemight termâcore writ-ingâ, that is theordinaryspellingof wordsin thenormalorthographyfor a language.Largely left out of this discussionhave beena largenumberof typesof symbolsthatarerampantin written languageâ abbreviations,specialsymbols(%, & , etc.),andnumericalsymbolsamongthem. How do suchthingsfit into the generalmodelthatwe havedevelopedhere?This questionis addressed,at leastin a preliminaryfashion,in Section6.3 with an examinationof written numeralsandtheir relationto numbernames,andin Section6.4with a shortdiscussionof abbreviatorydevices.
Finally, implicit in theapproachadoptedherehasbeentheBloomfieldian(Bloom-field, 1933,page21) maximthatwritten languageis âmerely a way of recordinglan-guageby meansof visible marksâ. This view, while assumedin a lot of work onwriting systemsis by nomeansa universallypopularone,andthereis a long traditionthat takesthe view that written languageis on a par with spoken language,andthattherearea lot of featuresof theformerthatarenotbestunderstoodby appealingto thelatter. In fact, I believe that thereis no fundamentalincompatibilitybetweenthe twoviews,andI will arguethispoint in Section6.5.
179
180 CHAPTER6. FURTHER ISSUES
6.1 Adaptation of Writing Systems:the Caseof ManxGaelic
Almost all of the literateculturesin the world todayoriginally borrowedtheir scriptfrom anotherculture. The only clear exceptionsto this generalizationare the veryfew cultureswhosewriting systemwasdevelopedtotally indigenously:Chineseis themostobviousexampleof this,but othersmight includeSemitic(dependingupononeâsviewsof theoriginsof Semiticscriptsâ see(OâConnor, 1996)),andof coursewritingsystems,suchasPahawh Hmong, that weredevelopedin recenttimes by inspiredindividualsin previously illiterate cultures.
It is worth makinga distinction(onenot often made)betweenthe adaptationofscriptsandtheadaptationof writing systems. Thedistinctioncanbe illustratedper-fectly by thevariousadaptationsof Semiticscripts,notablyHebrew andArabic (Hary,1996;Aronson,1996;Kaye, 1996). As is well known, the mostnotable featureofSemiticwriting systemsis their systematiclackof representationof shortvowels,andtheir imperfectrepresentationfor long vowels.1 Also systematicallylackingarerep-resentationsof certainconsonantalpropertiessuchasgeminationin StandardArabic.Daniels(1996b)termsthis kind of writing systeman abjad. As is alsowell known,Semitic languageshave a characteristicâroot-and-patternâmorphology, wheremor-phologicallyrelatedwordssharea commonconsonantalroot,anddiffer mainly in thepatternof vowelsandgemination(in Arabic)or spirantization(in Hebrew) of rootcon-sonants.TheSemiticwriting systemsareoftenclaimedto bewell adaptedto Semiticlanguagessince,by omitting symbolsfor vowelsandalterationsof root consonants,theroot is renderedmoretransparentacrosswholesetsof derivationallyrelatedforms.
Whatever themeritsof this argument,it is noteworthy thatwhenSemiticwritingsystemshave beenadaptedto otherlanguagestwo ratherdivergentcourseshave beentaken.Onecourseis simply to borrowtheentirewriting system: thatis thesymbolsetis borrowed(with possibleaugmentations),thesymbolsareusedto denote(roughly)the samephonemes,and the way in which the script is usedto representlinguisticform is moreor lessidenticalto theoriginal. Thiswasthecoursetakenby theArabic-basedabjadsfor PersianandUrdu, andtheHebrew-basedabjadof Judeo-Arabic.Inthesecases,the resultingsystemis the sameasthe original writing systemin that,
for instance,shortvowels tendnot to beorthographicallyexpressed.Thereareothersensesin whichawriting systemcanbeborrowed,andwereturnto thesemomentarily.
Thesecondway in which Semiticscriptshave beenadaptedis for theborrowinglanguageto borrow thescript,but not thewayin which it is used. That is, thesymbolset is borrowed(againwith possibleaugmentation),the symbolsare(again)usedtorepresent(roughly) thesamephonemes,but theresultingsystemis not anabjad.Ex-amplesaretheArabic-basedKurdishandUyghuralphabets,and theHebrew-basedalphabetsfor YiddishandJudezmo(Ladino). Indeed,in thesecasesmorethanjust thesymbolsetandthe grapheme-to-phonemecorrespondenceswereborrowed: onealso
1Vowel symbolsareof courseavailable in the form of âpointsâ written above or below the consonantsymbols,but suchsymbolsaregenerallyreservedfor pedagogicalusesto ensurethatthelearnerpronouncesthevowelscorrectly, andfor religioustexts for thesamereason.
6.1. ADAPTATION OF WRITING SYSTEMS 181
findspositionalpreferencesfor certainsymbolsreflectingcertainsoundsborrowedin-to thenew system.For instance,in Judezmo(Bunis,1975)theHebrew consonantďż˝ â ďż˝ is usedto represent/a/ in initial andmedialposition;however in final positionďż˝ h ďż˝ is usedsinceHebrew wordsendingin /a/ aremost frequentlyspelledwithfinal . Thuskadaâeachâ is spelled noďż˝ qâdhďż˝ . Similarly, ďż˝ â ďż˝ doublesasaâsupportâ for syllable-initialvowelsbesides/a/,sothat i âandâ is written n�� ây ďż˝ ,andes âisâ is written nWďż˝ âys ďż˝ ; notealsothat ďż˝ y ďż˝ is usedfor both /i/ and/e/. Again, this follows Hebrew practicesince ďż˝ â ďż˝ , etymologically/ p /, is onewayof representinganemptysyllableonsetin Hebrew, and ďż˝ y ďż˝ mayrepresenteither(long) /e/ or /i/. But despitethis inheritance,thesystemusedin Judezmois clearlyanalphabet,not anabjad,sinceall vowelsarerepresentedin theorthography.
Let usnow returnto thenotionof borrowing a writing system,ratherthanmerelyascript.Whatdoesthisnotionentail?Clearlytheinterpretationof thisconcepthingescruciallyupononeâsunderstandingof whatspecifickindsof linguistic informationarerepresentedby agivenwriting system,andhow they arerepresented.For example,wehave notedearlier(Section3.2,Footnote17) that thereis a tensionbetweenâphono-logical faithfulnessâandâmorphologicalfaithfulnessâ: writing systemsoften faceachoicebetween representinga word in a form that is representative of its (surface)pronunciation,andrepresentingthemorphemesof a word in a fashionconsistentwiththeir spellingin otherrelatedwords.Semiticwriting systemshave addressedthis ten-sion in a ratherinterestingfashion:dueto their peculiarproperties,they areable,toa largeextent,to consistentlyrepresentmorphologicalroots,abstractingaway from avariety of surfacepronunciationsof thoseroots; but at the sametime, preciselybe-causevowelsandcertainotherfeaturesarenot generallyrepresented,Semiticwritingsystemsrepresentwordsin a mannerconsistentwith, if not particularly informativeof, theactualpronunciation.(In otherwords,Semiticwriting systemshaveincompletecoverage;Section1.2.1.)For aSemiticwritten form suchas ďż˝ mlk ďż˝ âkingâ, onecouldinterpretthis aseitherrepresentinga particularsurfacepronunciation(e.g. /malik/ inArabic)or elsea particularrootmorpheme.
Carriedonestepfurther, onemight imagineâforgettingâ the phoneticbasisof astringof graphemeslike ďż˝ mlk ďż˝ andtakingthis to bea logographicrepresentationofthemorphemeâkingâ. For a particularword derivedfrom âkingâ, onemight considerany additionalgraphemesto bephoneticcuesasto theparticularderivative of âkingâin question.This abstractionwould appearto bethesourceof theheterogramsfoundin adaptationsof Aramaicwriting systemsto Persianlanguages(SkjĂŚrvø,1996).Het-erogramsarewordsor morphemesthatarespelledexactly asthey would be spelledin their Aramaicsourcelanguage,but areintendedto bereadasa Persianword: oftentheword, in additionto its Aramaiccore,hasadditionalgraphemicmaterialto reflect,for example,Persianinflectionalendings,andthesearespelledaccordingto Persian,notAramaic,pronunciation.To giveanexamplefrom Middle Persian(SkjĂŚrvø,1996,page523),awordspelledďż˝ YHWWNd ďż˝ consistedof anAramaiccore ďż˝ YHWWN ďż˝âbe, becomeâ(the Aramaicstembeing /yhwwn/), and the final ďż˝ d ďż˝ representingaPersianinflectionalending. Theword wasto be pronounced/bawand/,with the pro-nunciation/bawan/ âbecomeâ correspondingto the heterogramďż˝ YHWWN ďż˝ . One
182 CHAPTER6. FURTHER ISSUES
could castthis asa reinterpretationof the mapping crqtsIuwvxv for a given written sign:ratherthanmappingfrom thevalueof PHONattributeto spelling,it is reinterpretedasbeinga mappingfrom a valueof the SYNSEMattribute. Aramaicheterogramsthushave a strongfamily resemblanceto theadaptationof Chinesescript to thewriting ofnative wordsin Japanese(Section4.3), wherethephonographicbasisof theChinesecharacterwaslost.
Turning to anothercase,considerhow a writing systemmight look if it wereanadaptationof theEnglishwriting system.Considerfirst whatparticularpropertiesareessentialfeaturesof Englishwriting. Two importantpropertiescometo mind:ďż˝ A particular associationbetweenphonological structureand graphemese-
quences( c qtsyuzvxv )ďż˝ A largeamountof lexical markingof orthographicfeatures;seeSection3.2
Onewriting systemthat is adaptedfrom that of Englishand that seemsto haveadoptedthe above-mentionedtwo featuresis thatof Manx Gaelic. Unlike Irish andScotsGaelic,which preserveda written tradition datingbackto the 7th century(see(McManus,1996)for aconciseoutlineof thehistoryof Gaelicorthography),theGael-ic speakerson the Isle of Man completelylost touchwith that literary tradition. Sowhen Bishop Phillips, the Welsh Bishop of SodorandMan, undertookto translatetheAnglicanBook of CommonPrayerinto Manx sometimebetween1605and1610(Thomson,1969),he wasforcedto inventan orthographyfor the language.This hedid, with a systemthat representedtheconsonantsasin Englishandthevowels (ap-parently)at leastin partbasedonWelsh.Thisfirst attemptto introduceanorthographyfor Manx wasnot very successful,however. In the early eighteenthcenturythe firstprintedbookin Manx appeared,a bilingual versionof BishopThomasWilsonâsPrin-ciplesandDutiesof Christianity, andit wastheorthographicschemeusedherethat,with someminorchanges,becamethestandardorthographyfor ManxGaelic.
This later orthography, unlike Phillipsâ, wasbasedalmostwholly on that of En-glish. Thismuchis generallyacceptedasbeingclearlyborrowed:ďż˝ The valuesof the vowels: thus,for instance,ďż˝ eeďż˝ represented/i/ and ďż˝ ooďż˝
represented/u/.ďż˝ Theuseof âsilentâ ďż˝ eďż˝ to markvowel length.Thuslane /l { :n/ âfullâ.ďż˝ Doubledconsonantsmarking short vowels. Thus moddey */m ` d| /,2 balley/bĂŚl}[| / âtown, farmâ.ďż˝ ďż˝ ghďż˝ is usedto represent/x/ (word internally),asit wasin variousdialectsofEnglishat thetime.
2The pronunciationheredoesnot reflect the pronunciationof Late Spoken Manx (Broderick,1984b),which would be */m ` : ~2| / for this word: betweenthe time that the orthographywas invented,and theearly20thcentury, several innovationshadtakenplacein theManx soundsystemincludingthelenition ofintervocalicstops,andaprocessof lengtheningof /a/and/ ` / in stressedopensyllablesof disyllabicwords.
6.1. ADAPTATION OF WRITING SYSTEMS 183
To besure,somechoicesfor spellingcertainsoundsdo notmakeagreatdealof sensegivenanEnglishmodel. So ďż˝ y ďż˝ generallyrepresents/ | /, somethingthatmight per-hapsbea holdover from Phillipsâ earlierorthography, since ďż˝ y ďż˝ is usedto represent/ | / in Welsh. Equallypuzzlingfrom anEnglish(or a Welsh)point of view is theuseof ďż˝ ey ďż˝ to represent/ | /, particularly in final position: ushtey /usc| / âwaterâ, carrey/kĂŚr| / âfriendâ.3 Also not apparentlyfrom English(thoughvery reminiscentof tra-ditional Gaelicspelling)wasthesporadicuseof ďż˝ i ďż˝ beforea consonantto representpalatalizationof that consonant.But on the whole, the Englishprovenanceof mostfeaturesof Manx spellingis quiteclear.
Not only did Manx borrow a largenumberof its spelling-soundcorrespondencesfrom English, but it also apparentlyborrowed the propensityof English for irregu-lar spelling; in more technicalterms,it adoptedthe tendency of English for lexicalmarkingof orthographicfeatures. One interestinginstanceis the useof ďż˝ h ďż˝ afterinitial consonantswhich, with only four exceptions,would appearto correlatewithno phonologicaldistinction. Thefour exceptionsare ďż˝ ghďż˝ (representing/ ďż˝ /), ďż˝ chďż˝(representingboth/x/ and/c/), ďż˝ phďż˝ (representing/f/) and ďż˝ shďż˝ (representing/s/).4
But ďż˝ h ďż˝ canalsooccurafter initial ďż˝ b ďż˝ , ďż˝ d ďż˝ , ďż˝ f ďż˝ , ďż˝ k ďż˝ , ďż˝ l ďż˝ , ďż˝ m ďż˝ , ďż˝ n ďż˝ ,ďż˝ r ďż˝ , and ďż˝ t ďż˝ ,5 andin noneof thesecasesdoesthe spellingwith ďż˝ h ďż˝ apparentlycorrespondto a differentphonologicalform from thespellingwithout it.6 Consider,for example,whatCregeenin his classicdictionary(1835,pagevi) statesconcerningďż˝ lh ďż˝ â theonly ďż˝ h ďż˝ spellingthatheexplicitly commentson:
L. Somesaythatthis letteradmitsof noaspiration,andis pronouncedasl (in English)in law, live, love; asLAUE, LIOAR, LANE; but I think
3Onepossibleexplanationof this particularpuzzleis that the ďż˝ ey ďż˝ spellingwasmotivatedby a finalreducedvowel otherthan/ | /, namely/I/. At leastsomeof thewordsspelledwith final ďż˝ ey ďż˝ andpronouncedwith / | / in LateSpoken Manx may have hadan /I/-like vowel in 18th centuryManx, asevidencedby thequasi-phonetictranscriptionscollectedin Edward Lhuydâs Geiriau Manaweg (âManx Wordsâ) (IfansandThomson,1979).Thuswe find: wysteefor ushtey âwaterâ; yleefor eoylley âmudâ; maji for maidjey âwoodâ;lomyr yn kyrri for loamreyân cheyrrey âfleeceâ; fani for fahney âwartâ. For at leastsomeof thesethereisetymologicalevidencein thatcognatesin Irish or ScotsGaelichaveapalatalizedconsonantbeforethefinalreducedvowel, whichcouldplausiblyresultin ahigher/I/-like reducedvowel. Thus(palatalizedconsonantsunderlined):(Irish) uisce âwaterâ ( ďż˝ ushtey), maide âstickâ ( ďż˝ maidjey), and(ScotsGaelic) foinne âwartâ( ďż˝ fahney); notethat in Gaelicspelling,palatalizationof consonantsis indicatedby ďż˝ eďż˝ or ďż˝ i ďż˝ adjacentto the consonantcluster(andif possibleon both sides). Given that ďż˝ y ďż˝ wasat leastsometimesusedtorepresent/ | / in otherpositions,theuseof ďż˝ ey ďż˝ to representthis /I/ wouldhavebeenreasonable,especiallysincethe pronunciationof final /i/ (as in chimney) wasmost likely /I/ in nearby(Lancashire)dialectsofEnglish. (Indeed,Geoffrey Sampson,personalcommunication,notesthatsuchfinal high vowelsarelax inpresent-dayReceived Pronunciation;seealso(Wells, 1982,page119).) It is conceivable that ďż˝ ey ďż˝ wasthengeneralizedto representall final reducedvowels.
4Notethat in Phillipsâ earlierorthographyďż˝ chďż˝ wasnot ambiguousasit representedonly /c/. In thatsystem/x/ wasrepresentedas ďż˝ ghďż˝ in all positions.
5Onealsofinds � vh � to representinitial lenitionof wordsspelledwith � bh� or � mh� .6Of course, � bh� , � mh� , � fh ��� th � and � dh� have an overt similarity to Gaelicspellingsfor
lenited/b/ (/v/ or /w/), /m/ (/v/ or /w/), /f/ (/ �P� ), /t/ (/h/) and/d/ (/ � /). But it seemsunlikely that traditionalGaelicorthographyis the sourceof thesespellings,sincethereis no evidencethat the developersof theorthographywereawareof Gaelicorthographictraditions,so therewould have beenlittle opportunityforthemto attemptto give Manx a superficialsimilarity to Gaelic.Besides,a Gaelicsourcecouldnot directlyexplain themostcommon� h � spelling, � lh � , norcouldit explain � kh � , � nh� or � rh � , sincenoneofthesesequencesoccurinitially in Gaelicspelling.
184 CHAPTER6. FURTHER ISSUES
thereis a distinctionbetweenlie or ly in English,andLHIE in Manks;andhadthewordsLOO, LOOR, etc. beenspelledor written LHOO andLHOOR, they would have answeredtheMankspronunciationbetter;forwithout theh thesoundis toonarrow, exceptto thosewhoknow thattheyrequirethatsound.
Thoughit is hardto saywhatCregeenis describinghere,it is evident that,at leastintheManx of the early19thcenturywhenhis dictionarywascompiled,thepronunci-ation of /l/ in Manx wasdistinct from thatof English/l/, andthe spellingwith ďż˝ h ďż˝wasintendedto answerthis difference.7 However, asis alsoevident from Cregeenâscomments,otherwordsthatwerespelledwith plain ďż˝ l ďż˝ , alsohadthisnon-English/l/,sothat the ďż˝ h ďż˝ , if indeedit servedthefunctionof markingtheconsonantasdistinctfrom theEnglishpronunciation,at leastdid not do soconsistently.8
Did ďż˝ h ďż˝ serve to distinguishhomophonesor closehomophones?It clearlydid atleastpartly serve this function,asthe following closeminimal pairs(from Cregeenâsdictionary)show:
beill âmouthsâ bheill âgrindâleih âforgivenessâ lheih âplaceâlott âlotâ lhott âwoundâmeeley âsoftâ mheeley âmileâtaal âflowâ thaal âadzeâtie âthe illâ thie âhouseâ
Indeed,in onecaseCregeenhimselfexplicitly notesthis function: commentingon thewordmhill âspoilâ andonthealternativespellingmill, henotes(page126)thatâfor thebettersoundâssake anda differencefrom Mill (honey), theh is inserted.â9 However,providing ameansof orthographicallydistinguishinghomophonesseemsonly to havebeena minor function of postconsonantalďż˝ h ďż˝ . Table6.1 shows the total countsinCregeenâsdictionaryof wordsspelledwith initial ďż˝ Ch ďż˝ (for C a consonant),andthenumberof thosewordsthatareminimal pairswith homophonicor close-homophonicwords spelledwithout the ďż˝ h ďż˝ . (In thesecalculations,I discountedderived com-pounds:thusthie âhouseâis counted,but not thie lhionney âalehouseâ.)
About the only consistentfunction that ďż˝ h ďż˝ in thesespellingsseemsto have isthatit servesto make Manx spellingirregular: thatis, onemustsimply list for a word
7RobertThomsonsuggests(personalcommunication)thatCregeenmayhave hearda dark /l/ in Manxin contrastto thelight /l/ onewould expectto getin Englishin initial position.
8Onemight be temptedto supposethat ďż˝ lh ďż˝ representspalatal(âslenderâ) /l/, sincefor a numberofwordsspelledwith ďż˝ lh ďż˝ , the correspondingIrish or ScotsGaelic forms have palatalized/l/: thus lhaihâreadâ correspondingto ScotsGaelic leugh (wherethe ďż˝ eďż˝ serves to mark a palatalized/l/). Howeverpalatalized/l/ is alsomarkedin otherways,especiallyby ďż˝ i ďż˝ : lioar âbookâ (Irish leabhar). And therearemany instancesof wordsspelledwith ďż˝ lh ďż˝ thatarenot palatalizedin Gaelic: thuslhag âweakâ (Irish lag).This is alsoconfirmedby late spoken Manx pronunciationsascataloguedby Broderick(1984a): thusforexamplethe word lhon (Irish lon) âblackbirdâ hasattestedpronunciations/l ďż˝ ďż˝kďż˝ n/ or /l ` n/, neitherof themwith palatalized/l/.
9It is unclearwhathemeansby âthe bettersoundâs sake.â
6.1. ADAPTATION OF WRITING SYSTEMS 185
Spelling Total Number Homophones Percentageďż˝ bhďż˝ 14 1 7%ďż˝ dhďż˝ 32 1 3%ďż˝ fh ďż˝ 2 0 0%ďż˝ kh ďż˝ 6 2 33%ďż˝ lh ďż˝ 186 8 4%ďż˝ mhďż˝ 24 3 13%ďż˝ nhďż˝ 3 0 0%ďż˝ rh ďż˝ 23 0 0%ďż˝ th ďż˝ 90 5 6%
Table6.1: Total countsin Cregeenâs dictionaryof wordsspelledwith initial ďż˝ Ch ďż˝ , whereCdenotesaconsonant,andthenumberandpercentagesof thosewordsthatareminimalpairswithhomophonicor close-homophonicwordsspelledwithout the ďż˝ h ďż˝ .
likebheill âgrindâ, thefactthatthereis an ďż˝ h ďż˝ in theorthography. Thuswemightas-sumearepresentationalongthelinesof (6.1),wherethe ďż˝ bhďż˝ spellingis (irregularly)licensedby /b/:
(6.1) > PHON �=� : <2� J <����z<0�ORTH N��P� : Q �M� J Q ��� � T L
This is, needlessto say, highly reminiscentof English,wherelargeamountsof suchlexical markingarenecessary. Theparticularuseof ďż˝ h ďż˝ , is of coursenot apparentlyborrowed from English: that is distinctively Manx. However the propertyof irreg-ularity itself plausibly is borrowed. Onecan imaginethe original developersof theManxwriting system,beingintimatelyfamiliarwith Englishorthography, consciouslyor unconsciouslyimportingthepropertythatwordsmayhaveorthographicallymarkedlexicalentries.Occasionallythisirregularitywouldbeusedto distinguishhomophones(asin Englishroad/rode), but moreoften it would be usedasa lexical markingwithno apparentother function. Put in anotherway, the developersâof Manx orthogra-phy, giventheir experiencewith English,werenot particularlymotivatedto provideaconsistentspellingsystemfor Manx.10
What doesit meanto borrow a writing system? Apparentlyit can meanmuchmorethanmerelyadaptingthe mapping c qtsyuzvxv to a new language.In somecases,asin Perso-Aramaicheterograms,it can involve a reinterpretationof what c qtsyuzvxv ismappingbetween.In thecaseof Manx,whatwasborrowed(apartfrom theparticularâletter-to-soundâcorrespondences)is thepropertyof having rampantlexical markingof orthographicproperties.
10As RobertThomsonnotes(personalcommunication),besidestheidiosyncraticuseof post-consonantalďż˝ h ďż˝ , therearemany otherinstancesof idiosyncraticspellingsin Manx. For instance,thewordsleigh âlawâ,leih âforgiveâ, lheiy âcalf â and lhiy âcoltâ arehomophonesor nearhomophones,which arekept distinct insomewhatarbitrarywaysin spelling.
186 CHAPTER6. FURTHER ISSUES
6.2 Orthographic Reforms: the Caseof Dutch
Englishis oneof thefew major languagesthathasbeenblessednot to have hadanylarge-scaleformally sanctionedspellingreformsduringits history, thisdespitethenu-merousattemptson the part of variousindividualsfor the pastthreehundredyears.Not surprisingly, themajorintentionof all spellingreformsproposedfor Englishis torenderEnglishspellingâmorephoneticâ,or in otherwordsto make it morephonolog-ically faithful. An Anglocentricviewpoint would thusassumethat spellingreformsin generalshouldaim for phonologicalfaithfulness. In fact, this is not usually thecase,andmorphologicalfaithfulnessâ a propertythat Englishorthographyalreadyhasto someextent(seeSection3.2)â canoftenplayarole in theredesignof spellingsystems. We examineherethe caseof the 1995 spelling reform for Dutch, whichillustratesthispoint.
In 1995a new revision of Dutch spellingwas formulated(Instituut voor Neder-landseLexicologie,1995);this new spellingbecameofficial in thefall of 1996in theNetherlandsandBelgium.Variouschangesproposedin the1995spellingsystemhavebeenthesourceof muchlinguistic debate;see(Neijt andNunn,1997)for a compre-hensivereview of this andpreviousspellingchangesfor Dutch.
In this discussionwe will concernourselves with only one issue,namely thespellingof two of the so-calledâlinking morphemesâin nominalcompounds,thosethat arespelled ďż˝ eďż˝ or ďż˝ enďż˝ , both of which arepronounced/ | /. Someexamples,usingtheconventionsof thepre-1995â 1954â spelling,areshown below, with thelinking morphemein questionunderlined. (We will glossthe linking morphemeasâLMâ): 11
(6.2) (a) slangebeet(snake+LM+bite)âsnakebiteâpaardebloem(horse+LM+flower) âdandelionâkattevel (cat+LM+skin)âcatskinâforellevangst(trout+LM+catch)âtrout catchâ
(b) bessenjam (berry+LM+jam)âberry jamâboekenkast(book+LM+case)âbookcaseâpaardenvolk (horse+LM+people)âcavalryâkreeftenvangst(crab+LM+catch)âcrabcatchâ
6.2.1 The 1954spelling rules
Under the 1954spellingconventions,the decisionon which form to usewasbasedlargely on whetherthe lefthandmemberis interpretedasplural. A commonplural
11The-e anden formsareonly two of thefive possiblewaysof linking elementsof nominalcompoundsin Dutch. Of the otherthreeways, the mostcommonis simply to have no linking morpheme:rundvlees(ox+meat)âbeefâ. Lesscommonis -s (lamsvlees(lamb+S+meat)âlamb (meat)â),andevenrareris -er (run-dergehakt(ox+ER+chopped)âgroundbeefâ). As Schreuderet al. (1998)note(from whomtheseparticularexampleswere taken), all of the non-zerolinking morphemesare relics of an obsoletemedieval Dutchnominalinflectionsystem.
6.2. ORTHOGRAPHICREFORMS 187
suffix for Dutchnounsis written ďż˝ enďż˝ â andpronounced/ | /. If thelefthandmemberof thecompoundhasa plural in ďż˝ enďż˝ (not all nounsdo), andif the interpretationofthelefthandmemberin thecompoundin questionis plausiblyplural,spell thelinkingmorphemeas ďż˝ enďż˝ ; otherwisespell it as ďż˝ eďż˝ . Thus,in principle oneshouldwritebessenjam for âberry jamâ becausetheword for berry(bes) hasa plural in ďż˝ enďż˝ , andbecauseonenormallymakesjamoutof multipleberries.Ontheotherhandonewritesslangebeetfor âsnakebiteâ, becauseeven thoughthe plural of slange is slangen, a s-nakebitetypically only involvesonesnake. As Neijt andNunnnote(1997,pages11â12), andasonemight expect, thingswere by no meansuniformly so simple. So,sometimestheprincipleswereappliedratherarbitrarily: why is it kreeftenvangstâcrabcatchâ, implying thecatchof morethanonecrab,but forellevangstâtrout catchâ, im-plying thecatchof justonetrout?12 Furthermore,someportionsof thevocabularyap-parentlylicensedcategoricaloverridesof thegeneralprinciple.For instance,if theleft-handtermdenotedapersonor persons,enwasalwaysused,evenif asingularinterpre-tationmight beplausible:weduwenpensioenâwidowâspensionâ.13 However, if a par-ticular individual is intended,then ďż˝ eďż˝ is written: koninginnedag (queen+LM+day)âQueenâsdayâ.
But ignoring these(somewhat large) nits in the system,and working undertheassumptionthat the 1954spellingconventionsweremore-or-lessconsistent,what isthebestanalysisof themappingbetweenlinguistic representationandorthographyintermsof thetheoryunderdevelopmenthere?Thespellingconventionsstatedthatoneshouldwrite ďż˝ enďż˝ if thelinking morphemewaspronounced/ | /, if theintendedinter-pretationof thelefthandmemberwasplural,andif thenounin questionhadaplural in-en. They did not actuallymake thelinguistic claim that in suchinstancesthelinkingmorphemeis theplural morpheme.However, severallinguists(e.g.Booij (1996),andSchreuderandcolleagues(1998)who provide experimentaldatasupportingBooijâsclaim) have madepreciselythis argument,andindeedit seemsto result in the mostsuccinctdescriptionif we make this assumption.If this is the case,thenwe canas-sumethatnounsthathaveaplural in -en, selectthelinking morpheme(spelledďż˝ enďż˝ ,andmarked[+PL]) in (6.3a),which is in fact just the-enplural morpheme;andin allothercasestheform in (6.3b)(which is unspecifiedfor plurality) is selected:
(6.3) (a) ()* SYNSEM ��� ���m��� �PHON ��| : �ORTH N � ? : < T
6 8712As a reviewer haspointedout to me,theanswermight be thatâcrabsaretypically caughtin numbers,
while trout arecaughtindividually on a line.â On the otherhand,a trout fishermanmay capturemultipletrout on a single fishing trip, and yet underthe old spelling system,this would still have to be writtenforellevangst.
13Notethat theEnglishtranslationhasa singularform widowâs, correspondingto theDutchplural formweduwen.
188 CHAPTER6. FURTHER ISSUES
(b) ()* SYNSEM � �PHON ��| : �ORTH N � :�< T
69876.2.2 The 1995spelling rules
For all of their relativeconsistency, the1954conventionssuffer from onemajorprob-lem, that make themideal fodderfor spellingreformers.Decidingwhetherto writeďż˝ enďż˝ or ďż˝ eďż˝ requiresone to judgewhethera lefthandmemberof a compoundisplausiblyplural in interpretation.Sincethis may differ from compoundinstancetocompoundinstance,the 1954 conventionshad the disadvantagethat one could notguaranteea consistentspellingof a givencompound,or a givenclassof similar com-poundssincein someinstancesa plural interpretationmight seemappropriate,in oth-ersa singularinterpretation.Thusfrom thepoint of view of thosewho prefera super-ficially consistentspellingsystem,the1954designis ratherpoor.
Underthe 1995conventions,oneis no longerrequiredto decideuponwhetherapluralmeaningis moreappropriateto agivencontext. Rathertherule for using � en�and � e� depends,at leastin its simplestform, onwhattheplural form of thelefthandnounis (InstituutvoorNederlandseLexicologie,1995,page25,my translation):
Write an -n- whenthe first part of the compoundis an independentnounwhich hasa pluralexclusively in -(e)n.14
Theconnectionbetweenthe ďż˝ enďż˝ form of thelinking morphemeandtheplural is thusalsodrawn in the1995conventionsasin the1954conventions,but herethesemanticsof the compounddo not enter into the decision. On the faceof it, then, the 1995conventionswould appearto bea simplificationover theearlierconventions.
However, therearesomeexceptionsto themainrule, which significantlycompli-catethenew spellingprinciple.Amongthese:
1. Thefirst partdenotesapersonor thing thatin thegivencontext is auniquetype:zonneschijn âsunshineâ
2. The first part is an animal name, and the secondpart is a botanical term:paardebloem(horse+LM+flower) âdandelionâ
3. The first part denotesa body part, and the whole compound is a fos-silized construction: kakebeen (jaw+LM+bone[?]) âjawboneâ, ruggespraak(back+LM+speech)âconsultationâ
Thefirst of theseexceptionsis of coursealmostidenticalto thestipulationof the1954conventionsrelatingto compoundswith lefthandmembersdenotingpersons,and
14Besidestheexceptionsto therule to bediscussedbelow, therearea coupleof additionalamendments.For example,if thesingularof the lefthandnoundoesnot endin / | / andcanform a plural both in /s/ and/en/,thenthelinking morphemeshouldalsobespelledwith ďż˝ enďż˝
6.2. ORTHOGRAPHICREFORMS 189
for thisclassof casesthewriter is still forcedto judgewhetherthelefthandmemberisappropriatelyinterpretedasuniquegiventhecontext. Theâflora-faunaruleâ is appar-ently a concessionto thefact thatmostsuchcompoundshadpreviously beenspelledwith ďż˝ eďż˝ , andtherewasadesireto minimizethenumberof spellingchangesrequiredby the new conventions(Neijt andNunn, 1997,page22). The third seemsperhapsthemostdifficult to applysinceit requiresoneto determinewhethertheconstructionin questionis âfossilizedâ (theDutchtermusedhereis versteendesamenstellingâpet-rified compoundâ),presumablymeaningthat it is semanticallyopaque.Ruggespraakâconsultationâclearlycountsasopaque,but kakebeenâjawboneâis muchlessobvious-ly so, andonly after somerathercircuitousreasoningdoesit becomeclear that oneshouldprobablyconsiderit to beopaqueafterall: beenmeanseitherâboneâ or âlegâ;theplural for âboneâ is beenderen, for âlegâ benen; thepluralof kakebeenis kakebenen,suggestingthatthebeenherehas,etymologicalconsiderationsaside,theâlegâ reading.On thatreasoning,thencertainlykakebeenshouldbeconsideredopaque.
What is an appropriateformal analysisfor the 1995conventionon ďż˝ eďż˝ versusďż˝ enďż˝ ? Sincethe basicrule is no longer basedon the semanticsof the situation,but ratheron a morphologicalpropertyof the lefthandnounâ whetheror not it hasa plural in -en â it no longermakesany senseto assumea linking-morphemeentrywith a[+PL] semanticspecification(asin (6.3a)).Simplerwouldbeto assumeasinglelinking morphemethatis unspecifiedfor orthographicandsemanticinformation:
(6.4) > PHON ��| �ORTH N T L
In orderto predicttheappropriatespelling,weneedto assumeamorphologicalfeature[+en], which marksnounsthat have a plural exclusively in /en/. Thenwe canwritespellingrulesasin (6.5), to capturethebasicrule of the1995conventions,andfill invaluefor theORTH attributein (6.4):
(6.5) / | / ��� en� / [+en]/ | / ��� e�
For theexceptionalclassestherearetwo possibleroutes.Firstly, onecouldprespecifythe exceptional ďż˝ eďż˝ spelling in the orthographicfield of the compoundsfitting thetermsof theexception:this seemsperhapsthemostreasonableroutein thethird classof exceptionsgivenabove. Secondly, onecouldassumeanadditionalrule introducingďż˝ eďż˝ in certainsemanticallyspecifiedcontexts. The flora-faunaexceptioncould behandledby therule in (6.6),which would applysoasto bleedthefirst of therulesin(6.5):
(6.6) / | / ��� e� / [+flora] [+fauna]
But, howevertheexceptionsareto betreated,whatis clearis thatwehave in the1995proposalasystemthatis morecomplex thantheconventionsit supplants.
Whatwould a morereasonableapproachfor the1995proposalto have takenwithregardto the linking morpheme?Oneapproachwould have simply beento leave it
190 CHAPTER6. FURTHER ISSUES
unchangedfrom the 1954conventions. This is moreor lesswhat Booij (1996)sug-gests.Of course,thisdoesleavesomeambiguityin somecases:dependinguponwhatonemeansonecouldhave eitherschapevleesor schapenvlees(sheep+LM+meat)forâmuttonâ. But, asBooij rightly asks(page133): âwhat is wrongwith that?â
A secondapproachwould have achievedcompleteconsistency, andwould at thesametime have beenmuchsimplerto state:sincethelinking morphemeis invariablypronounced/ | / nomatterhow it is spelled,onecouldsimplyalwayswrite it ďż˝ eďż˝ , andeliminatethe ďż˝ enďż˝ spellingentirely. Alternatively, onecould have chosento spellthelinking morphemeexclusively with ďż˝ enďż˝ (eliminating ďż˝ eďż˝ entirely);in thelattercase,thepronunciationof ďż˝ enďż˝ as/ | / would beconsistentwith thepronunciationoftheplural suffix ďż˝ enďż˝ , whosespellinghasnot beenchangedunderthe1995reform.Either of thesereforms,hadthey beenadopted,would have madeDutch spellings-lightly moreâphonologicallyfaithfulâ. Insteadwhathasbeenadoptedis asystemthatattemptsto beasâmorphologicallyfaithfulâ asthe 1954conventionsbut at thesametime drainsthemorphologicalfaithfulnessof any semanticsense:ratherthandepend-ing uponthesemanticsof thesituation,the1995conventionsrequirethatoneconsidera purely formal propertyof the lefthandnoun(whetherit exclusively formsits pluralin -en). In principlethis couldof courseguaranteemoreconsistentspellingsthanthe1954conventionssincethewriter would merelyhave to reflecton themorphologicalpropertiesof thebasenoun,andwould not have to determinepossiblysubtleseman-tic nuances.Needlessto say, whatever benefitmight have beengainedby this newconventionhasbeeneffectively eliminatedby theadditionalstipulatedexceptions.
6.3 Other forms of Notation: Numerical Notation andits Relation to Number Names
In our discussionof writing systemswe have thus far focusedexclusively on whatmight betermedthecoreof writing systems,wherewritten symbolsclearlyrepresentsomesort of linguistic object, be it phonologicalor lexical. But written languagecontainsmany formsthatcannotbesodescribed,themostprominentandwidespreadof thesebeingnumericalnotation. Herewe will concernourselvesmostly with theHindu-Arabicnumeralsystem,whichhasbecomepracticallyuniversal.
Therehaveof coursebeennumerouswrittenrepresentationsof numbersdevelopedthroughouthistoryby variousculturesspeakingvariouslanguages;for anoverview see(Pettersson,1996). In somecases,thesystem,in additionto servingasa representa-tion for numerals,alsoservedasa reasonablewritten representationof theassociatednumbernames.Suchis the casewith traditionalChinesenumerals.Thusa numeralrepresentationsuchas�¥ £¢¼¤£Œ£§¼¨ sanqianli u bai ba shÄąsÄą â Š�5MÂŞ2ÂŤÂÂŹďż˝5MÂŞ2ÂŽÂÂŻďż˝5yª¹°A² â,servessimultaneouslyasarepresentationfor thenumberâ3,684â andasaspecificationof how the numberis actually read; indeedthere is no other way to representthenumbernamefor â3,684â in Chinesethanby the string of charactersgiven.15 And
15Businessandaccountingvariantsof thestandardnumbernamecharactersdo exist: thus Âł er insteadof ´ er for â2â. But thesearemerelycontextually determinedgraphicalvariantsof thestandardforms.
For seriousmathematicalcalculations,standardChinesenumeralsare not very convenient,and other
6.3. NUMERICAL NOTATION 191
numericalrepresentationschemesdevelopedby a particularculturetend,not surpris-ingly, to have propertiesthat are influencedby the linguistic factsof the languagespokenby thatculture: thusAncientMayannumericalrepresentationis basicallyvi-gesimal,reflectingthevigesimalsystemusedin numbernameconstructionin Mayanlanguages.
Thussomenumericalrepresentationschemesareat leastpartly glottographicindesign,in thatthey reflectat leastsomeaspectsof thestructureof thelinguisticsystemof numbernamesof the languagespoken by the designersof the system. In con-trast,the Hindu-Arabicsystemis decidedlynon-glottographicin designeven for thespeakersof the SouthAsian languages(whichever they may have been),who devel-opedit, around600AD from anearliermoreglottographicsystem(Pettersson,1996,page804). Ratherit is a purelymathematicallymotivatedâpositionalâ representation(Harris,1995;Pettersson,1996)wherepowersof thebase(10)are representedby thepositionof digits in a grid startingfrom the rightmostposition,andthe digits them-selvesrepresentmultipliersof thepowerof thebase.Thusanumbersuchas ďż˝ 3,684ďż˝representsstraightforwardly (omitting 5yÂŞ2Âľ ):(6.7) ŠAœ¡5yÂŞ0ÂŤy��Aœ¸5yÂŞ0ÂŽy��¯Aœ¡5yÂŞU°I��²
TheHindu-Arabicsystemis now usedto representnumbersin thewritten repre-sentationof nearlyall languages,andthesystemsof numbernamesin the languagescoverawidespectrumof possibilities.A sampleof therangeof possibilitiesfor theex-ampleâ3,684â is givenbelow in (6.8).English(6.8a)is afairly straightforwarddecimalsystemwherethereis a closeone-to-onemappingbetweenthe wordsin the numbername,andthe multipliers andmultiplicandsin the factorizedrepresentationin (6.7).German(6.8b) is similarly straightforward with the exceptionthat, as in mostotherGermaniclanguages(ModernEnglishbeingthenotableexception),thedigitsandtensarepresentedin the reverseof their âlogicalâ order. In Malagasy(6.8c) (Rajemisa-Raolison,1971), the entirenumbernameis presentedin the reverseof its âlogicalâorder. Finally, in thecaseof Basque(6.8d)we find a partially decimal-vigesimalsys-temwherenumbersbelow 100areregularly representedin termsof sumsof productsof powersof 20 followedby unitsor âtenâ plusa unit.16
(6.8) (a) threethousandsix hundredeightyfour
(b) dreitausendsechshundertvierundachtzig(three+thousand+six+hundred+four+and+eighty)ŠAœ¡5Mª2y��gœš5Mª2Žy��²K��¯Aœ¡5yªU°
(c) efatraambyvalopolosy eninjatosy teloarivo(four andeight+tenandsix+hundredandthreethousand)²U��¯gœš5Mª¹°y��gœ¡5yª2ŽM��ŠAœ¡5yª0
(d) hiru mila seirehundalaurogeitalau(threethousandsix+hundredfour+scorefour)ŠAœ¡5Mª2y��gœš5Mª2Žy��²ºœ�tª¹°I��²
numericalrepresentationswereinventedfor suchpurposes;see(Needham,1959;Pettersson,1996)16See(Hurford,1975)and(Stampe,1976)for surveysandlinguisticmodelsof numbernamesystemsand
(BrandtCorstius,1968)for someearlygrammaticalmodelsof numbernames.
192 CHAPTER6. FURTHER ISSUES
The relationshipbetweenthe blatantly non-glottographicHindu-Arabicnumeralsystemand the numbernamesystemsof the variouslanguagesin which it is usedwould beof largely academicinterestwereit not for thefact thatconvertingbetweenthe two representationsis somethingthat literatespeakersdo routinelyâ andsome-thing thatautomatictext-to-speechconversionsystemsmustalsobecapableof. Thisimmediatelyraisesthe questionof what kind of mappingsuchspeakersperformandhow the modelof this mappingrelatesto the theoryof writing systemsthatwe havebeendeveloping.
On first consideration,our modelof writing systemswould appearto have littleto sayaboutthis mapping,sincethe two mostprominentassumptionsthat onemustmakeseemto directlycontradictwhatweknow to betrueof theHindu-Arabicnumeralsystem,andwhatis claimedto betrueof numbernamesystems:ďż˝ A numericalrepresentationsuchasâ3,684â mapsdirectly to a linguistic level of
representation,in thiscasethelexical representationof thenumbernameitself.17ďż˝ Themappingbetweenthetwo levelsis regular.
Thefirst assumptionpatentlycontradictsthemathematicaldesignof thesystem,whichwasclearlynon-glottographic.Thesecondassumptionis alsoclearly falsein generalsincetheâalphabetâof powersof tenis infinite: by definitiononecannothavearegularrelationthatinvolvesaninfinite alphabet.18
But thesetwo pointsaremisleading.First of all, the original designof a writtenrepresentationsystemshouldnot confusetheissueof how thesystemis actuallyusedby readersof alanguagethatusesthatsystem:thereis noreasonwhy theHindu-Arabicnumeralsystemasused,say, in Englishcouldnot have a dual function,namelyasamathematicallymotivatedrepresentationof thenumber, but alsoasacrudelogograph-ic representationof thenumbernamesof thelanguage.Secondly, theobservationthatthe alphabetof powersof ten is non-finitemissesthe importantpoint that thereis alimit to the lengthof a digit string that will be readby a humanreaderas a numbername(asopposedto merelya stringof digits). Thelimit of coursevariesfrom readerto reader, andpresumablydependsuponthelevel of literacy andmathematicalacumenof thepersoninvolved. But thereclearly is sucha limit: while mostreadersof Amer-icanEnglishwould have no problemreadingâ1,000,000âasonemillion, fewerwouldbe so confidentaboutonequadrillion for â1,000,000,000,000,000â; andpresumablynonewould be ableto translate(without the aid of pencil andpaper, andpossiblyadictionary)a numbersuchasâ1,000,000,000,000,000,000,000,000,000â. In practicalsituations,suchnumbersaretypically eitherrepresentedin scientificnotation( 5yÂŞ0Ž�Ÿ ),whichhasatotally differentmodeof readingaloud,or else(at leastwith smallernum-bers)partlyin words(e.g.AmericanEnglish1 trillion for â1,000,000,000,000â). Giventheseobservations,wecanproceedto developafinite-statemodelfor theconversionofHindu-Arabicnumeralsinto numbernamesfor agivenlanguage;thismodelis theone
17Wetake it for grantedthatnumericalrepresentationsdo not generallyrepresentphonologicalinforma-tion.
18Number-namesystemsthemselveshave beenarguedto bemildly context sensitive, hencenot regular;see(Radzinski,1991).
6.3. NUMERICAL NOTATION 193
0 1
1:1
2:2
3:3
4:4
5:56:6
7:7
8:8
9:9
4
Îľ :10^1
2
Îľ :10^2 5
0:0
1:1
2:2
3:3
4:45:5
6:6
7:7
8:8
9:93
0:0
1:1
2:2
3:3
4:45:5
6:6
7:7
8:8
9:9
Îľ :10^1
Figure6.1: A numeralfactorizationtransducerfor numbersup to 999.
usedin theBell LabsmultilingualTTSsystem,asdescribedin (Sproat,1997b;Sproat,1997a).
The problemis bestunderstoodby factoringit into two components.The firstcomponent,which we shall term factorization, is a mechanismfor expanding aHindu-Arabicnumeralsequenceinto a representationin termsof sumsof productsofpowersof ten. The secondcomponentmapsfrom this factorizedrepresentationintothesequenceof wordsthatmake up thenumbernamecorrespondingto theparticularnumeralsequence;let us term this latter componentthe numbernamegenerator.Both of theseoperationscanbehandledusingfinite-statetransducers.For example,asimpletransducerthatfactorsnumeralsup to multiplesof 5yÂŞ0ÂŽ is shown in Figure6.1.Generatinga numbernamefrom a numeralstring then consistsof composingthestring with the factorizationtransducer, composingthe resultwith the numbernamegenerator, andthencomputingtheprojectionof theoutput,or formally:
numbername\Y½ Ž � numeralž factorization ž numbernamegenerator �Thenumbernamegeneratoris obviously language-specific,sincenot only arethe
194 CHAPTER6. FURTHER ISSUES
lexical itemsinvolved specificto a given language,but alsovariousaspectsof theircombination:it is a language-specificfact of English, for examplethat onemay (insomedialectsmust) use the word and betweenhundred and following material inthe numbername,but do not useit after thousand, million, etc; similarly in Russiancomplex caseandgenderagreementis requiredbetweenelementsof thenumbername(Wade,1992).19
The factorizationtransduceris alsolanguagespecificâ or onemight bettersay,language-areaspecific.This is in partbecauselanguagesdiffer in thewaythey logical-ly factorizea long numbername.Most (decimal)numbernamesystemshave distinctwordsfor 5yª¹° , 5MÂŞ2ÂŽ and 5MÂŞ2ÂŤ , but differ significantlyon higherpowers. The first fivepowersof tenfor which thereareseparatelexical itemsin AmericanEnglish,Chinese(alongwith several other EastAsian languages)andHindi (alongwith mostSouthAsianlanguages)aregivenin (6.9):20
(6.9) English 5yÂŞU° , 5MÂŞ2ÂŽ , 5yÂŞ2ÂŤ , 5yÂŞ0Âż , 5MÂŞ2ĂChinese 5yÂŞU° , 5MÂŞ2ÂŽ , 5yÂŞ2ÂŤ , 5yÂŞ2Ă , 5MÂŞ2ĂHindi 5yÂŞU° , 5MÂŞ2ÂŽ , 5yÂŞ2ÂŤ , 5yÂŞ0Ă , 5MÂŞ2Âź
For the âmissingâ powers,languagesrevert to an analytic strategy, so that 5kœ¡5yÂŞ Ă inEnglishis expressedasten thousandand 5kœ¡5yÂŞ0Ăš�Ă5-œš5MÂŞtĂ is expressedasa hundredand ten thousand. Thusa numberlike â12345678âwould be factoreddifferently inthesethreelanguages.Here,theanalyticblocksareunderlined:
(6.10) English Ă xĂ0Ăďż˝ĂĂĂ2ĂGĂĂĂPĂĂĂĂĂĂĂyĂ ĂeĂ Ă;ĂĂĂĂĂMĂÂĂĂĂ-Ăďż˝ĂĂĂtĂiĂĂĂPĂĂĂĂĂĂĂMĂ ĂĂĂ;ĂĂĂĂĂCĂgĂĂĂKĂĂĂĂĂtĂiĂĂĂChinese Ă xĂ0Ăďż˝ĂĂĂ Ă ĂĂĂ;ĂĂĂĂĂ Ă ĂĂĂkĂĂĂĂĂ Ă ĂĂĂIĂĂ ĂĂĂĂĂMĂĄ ĂĂĂKĂďż˝ĂĂĂ Ă ĂĂĂkĂĂĂĂĂ Ă ĂĂĂ;ĂĂĂĂĂ Ă ĂĂĂHindi ĂÂąĂĂĂĂĂMâ ĂeĂ Ă;ĂĂĂĂĂ Ă ĂĂĂIĂĂĂďż˝ĂĂĂMĂŁ ĂeĂ Ă-Ăďż˝ĂĂĂ Ă ĂĂĂPĂĂĂĂĂĂĂ Ă ĂĂĂ;ĂĂĂĂĂ Ă ĂĂĂKĂĂĂĂĂ Ă ĂĂĂ
Onepoint that is not often notedin discussionsof numericalrepresentationsandtheir relation to numbernamesrelatesto the positioningof commaor otheraids tointerpretationthat are typically insertedinto long numerals.21 Whereone finds acommawritten dependsexactly uponwhich powersof ten the languagehasdistinctwordsfor. For English,thecommais written in positionscorrespondingto theendofthe first andsecondunderlinedblocksin (6.10): thusonewrites ďż˝ 12,345,678ďż˝ . InChinese,thecommais alsowritten after thefirst underlinedblock, this time resultingin ďż˝ 1234,5678ďż˝ . Finally in Hindi, onewrites the commaafter the first secondandthird underlinedblocks, resultingin ďż˝ 1,23,45,678ďż˝ .22 Thus the placementof thecommacorrespondsexactly to the(right) edgeof analyticnumbernameconstructions.It is hardto interpretthecommain any otherway thanasanaid for thereaderin themappingbetweenthenumericalrepresentationandthenumbername.In otherwords,
19In a working system,suchastheBell LabsTTS system,suchlinguistic factscanbehandledin partbyrewrite rulescompiledinto finite-statetransducers.
20Thesystemfor AmericanEnglishdiffers,of course,from thatusedtraditionallyin British English,andcurrentlyin otherWesternEuropeanlanguages.
21Thesymbolis â,â in English,ChineseandHindi, asit happens.In many EuropeanlanguagesbesidesEnglishoneuseseither â.â or simply a space,the â,â being usedto representwhat is representedwith adecimalpoint in English.
22The surfaceform of the numeralsin Hindi andotherIndian languages(andalsoArabic), is differentfrom thatusedin Englishor Chinese:but thesystemotherwiseworksexactly thesame.
6.3. NUMERICAL NOTATION 195
thenumericalrepresentationis beingtreatedin thewriting systemasa representationnotonly of mathematicalobjects,but simultaneouslyasanorthographicrepresentationof linguistic objects.
For somelanguages,additionalmechanismsarerequiredin thefactorizationstep.In German,for example,digits anddecadesoccurin thenumbernamein the reverseof their âlogicalâ order, as noted in (6.8b). This âdecadeflopâ can be handledbya finite-statetransducer, but only at somecost, sincetransducerscan only performstringreversals(for a finite setof strings)by enumeratingall stringspairedwith theirreversedform. (This is exactly what is donein the Germanversionof the Bell LabsTTS system;see(Sproat,1997b).) So the mappingbetweennumeralsand theirassociatedfactorizationsis still regular, but not elegantlyso. It is thusnot surprisingthat speakersof GermanandDutch (which hasan equivalentnumbernamesystem)havesomedifficulty in readingnumbernamesfrom their numericalrepresentation.23
For Malagasy, which hasa completereversalof the logical order(6.8c),it makesmoresenseto assumethat readers,whenfacedwith a number, shift their attentiontotheendof thenumeralstringandread(within thatstring)from right to left, temporarilyoverriding the normalleft-to-right orderof reading.Thus,a simpleregularmappingbetweennumeralandnumbernamecanbemaintained,with theonly addedassumptionbeingthe additionallow-level processingstrategy just described.A similar strategymust in any casebe assumedfor Hebrew and Arabic, wherethe script runs fromright to left, but Hindu-Arabic numeralsrun from left to right as they do in otherlanguages.24
Turningto vigesimalsystemswenotethattherearetwo possiblestrategiesfor deal-ing themappingbetweendecimal-basedHindu-Arabicnumerals,andnumbernames.The first would be to insert into the factorizationstepa stepthat performsthe baseconversionbetween10 and20. For numericalrepresentationsof a finite size,this canbehandledby a regular relation. Oncethis factorizationinto powersof 20 is accom-plished,the numbernamegeneratorwould work in a vigesimalsystempreciselyasin a decimalsystem. As a practicalmatterhowever thereseemto be relatively fewextant vigesimalsystemsthat have distinct wordsfor anything above a small powerof 20. In Basquefor example,the systembecomesdecimalabove 5yÂŞ ÂŽ . Merrifield(1968) reportson the Macro-MayanlanguageChâol, which haswords for Âť2ª¹° , ÂťtÂŞ0ÂŽand Âť2ÂŞ2ÂŤ , thusreflectinganold Mayanpurelyvigesimalsystem;but healsonotesthatthe systemfor the larger numbershasessentiallygiven way to the decimalsystemof Spanish.Furthermorethereis no dataon Châol speakersreadingnumbersfrom aHindu-Arabicdecimalrepresentation(assumingit would be possiblefor themto dothis). So for the point at hand,namelythe conversionof decimalHindu-Arabicnu-meralsinto vigesimalnumbernames,we cannotdeduceanything from theexistenceof sucha systemof numbernames.For simpler(onemight evensaysemi-fossilized)
23Harald Baayen,personalcommunication. The evidenceis currently only anectodal,and this claimcertainlyneedsto besupportedby experimentalevidence.
24It is interestingto note that whenMalagasywaswritten in the Arabic script (beforethe early 19thcentury),this low-level processingstepneednot have beenassumed.The script ran from right to left, ofcourse,but numeralswouldhavebeenrepresented,asin Arabicor in presentdayMalagasyfrom left to right.Thusa readerof Arabic-script-basedMalagasycouldmaintaina right-to-left readingorderthroughoutboththeordinarytext and(asin modernMalagasy)thenumber.
196 CHAPTER6. FURTHER ISSUES
decimal-vigesimalsystemssuchasBasque,it is a relatively straightforwardmattertolist thevigesimal-basedwordsin the numberlexicon, associatingthemwith decimalratherthanvigesimalfactorizations.Thesolutionis notentirelyelegant,but it is notto-tally unreasonableeither, especiallysince,aswasnotedabove,Basquenumbernamesaredecimalabove 5yÂŞ0ÂŽ ; numbersunder 5MÂŞ2ÂŽ arethuslexical exceptionsto the generalpatternof thenumbernamesystemof ModernBasque.
It shouldbe stressed,if it is not alreadyclear, thatwhat I have presentedis not alinguistic theoryof numbernames,but rathera modelof themappingbetweena nu-mericalrepresentationâ theHindu-Arabicsystemâ andthenumbernamesystemsof variouslanguages.Numbernamesystemsthemselvescanbe quite complex: S-tampe,for example,givesanexampleof anexotic systememployedin Sora,aMundalanguageof India (Stampe,1976,page601):
Most Mundalanguageshave decimal-vigesimalcounting:they count10, 20, 20 + 10, 2 x 20, (2 x 20) + 10. Sorachangedfrom a decimaltoa duodecimal(12) basewithin this vigesimalstructure. Sorasthereforeaddunitsto 12 to reach19migg| l-gulji (12+ 7); thencount20 b -kor. i (1x 20) andaddunits to reach32 b -kor. i-migg| l ((1 x 20) + 12), to whichareaddedunitsto reach39 b -kor. i-migg| l-gulji ((1 x 20) + 12 + 7); 40 isba-kor.i (2 x 20), andso on, in a Stravinskianalternationof twelvesandeightsunparalleledin any known language.
Thecurrenttheorywould beableto provide a modelfor the readingof Soranumbernamesfrom a Hindu-Arabicdecimalrepresentation:but of courseit would not reallyaccountfor the form of the numbernamesthemselves,a topic that falls ratherunderthedomainof a linguistic theoryof numbernames.
In summary, it is indisputablethat the primary representationalfunction of theHindu-Arabicnumeralsystemis mathematical,not linguistic. IndeedHindu-Arabicnumeralsareoften held up asan archetypalexampleof a patentlynon-linguisticallymotivatedwritten representation.However, themappingbetweennumeralsandtheirassociatednumbernamesin a large variety of languagescan,somewhat surprising-ly, be handledby a model that is consistentwith the moregeneralmodelof writingsystemsthathasbeendevelopedin theremainderof this book.
6.4 Abbreviatory Devices
Theliteratureon writing andwriting systemscontainsvery little discussionof abbre-viations,acronyms,andothershorteningdevices falling underthe generalrubric ofâinitialismsâ. This is perhapsunsurprisinggiventheheavy focusin that literatureonwhat linguistic objectswritten symbolsrepresent,andhow they representthemin thespellingof âordinarywordsâ. Yet it is at thesametimesomewhatof anoversightsinceabbreviatory devicesof variouskinds have a history that is asold asthat of writingitself (Cannon,1989;Romer, 1994).
Abbreviations,asdefinedbelow, areof particularpracticalimportancein the de-velopmentof TTS systemssincethesystemmustdecideon how to readthem,given
6.4. ABBREVIATORY DEVICES 197
that they typically do not obey thenormalâpronunciationrulesâ of the language.Forstandardizedcasessuchas ďż˝ Blvd ďż˝ for Boulevard, this is lessof aproblemsincesuchcasescanbecatalogued.(Thereis, however, theproblemthatmany abbreviationscanconventionallystandfor more than one thing, as in ďż˝ Stďż˝ (Street, Saint) or ďż˝ Dr ďż˝(Drive, Doctor, drachma); or elseareconfusablewith ordinarywordsasin ďż˝ Aveďż˝(Avenue, or aveasin avemaria). Theseissuesareamenableto sensedisambiguationtechniques,suchasthoseof Yarowsky (1996).) But creatively coinedabbreviationsarenotatall uncommon,andin certaingenres,suchasrealestateadvertisements,theyarerife. Considertheexamplein (6.11)takenfrom theNew York Timesrealestateadsfor January12,1999:
(6.11) 2400â REALLY! HI CEILS,18â KIT,MBR/Riv vu, mds,clstsgalore!$915K.
Herewe find ďż˝ CEILSďż˝ (ceilings), ďż˝ KIT ďż˝ kitchen, ďż˝ MBR ďż˝ masterbedroom,ďż˝ Riv vu ďż˝ river view, ďż˝ mdsďż˝ maids(room)(?) and ďż˝ clstsďż˝ closets, noneof whicharestandardabbreviations,at leastnot in generalwritten English.While humanread-ers(usually)havenoproblemreconstructingtheintendedwords,thesearebeaseriousproblemfor TTS systems,which generallywill fail to correctlyexpandtheabbrevia-tion in suchcases.Clearly, though,abbreviation is a productive processthatmustbemodeledin any theoryof therelationbetweenlanguageandwriting.
Thepurposeof thissectionis toproposehow abbreviatorydevicesmightfit into thetheorythatwe have developed.Beforewe proceed,however, it is necessaryto definesomeof our terminology, sincetermssuchasabbreviation, acronymandsoforth, areusedin differentwaysby differentpeople;see(Cannon,1989)for adiscussionof someof this terminologicalquagmire.
For the purposesof the presentdiscussion,we will distinguishthreecategories.The first category, abbreviations, constituteall caseswherethe normalspellingof aconstructionâ typically, thoughby no meansalways,a singleword â is shortenedeitherby deletingletters(e.g. ďż˝ St.ďż˝ , ďż˝ Dr. ďż˝ , ďż˝ kg ďż˝ , ďż˝ CEILSďż˝ or ďż˝ clstsďż˝ ), or bysubstitutinga shorterstringof symbolswhich is synchronicallyunrelatedto thetargetword: the lattercasesinclude ďż˝ lb ďż˝ for pound, ďż˝ % ďż˝ for percent, ďż˝ & ďż˝ for and andďż˝ $ ďż˝ for dollar. Note that my useof the term abbreviation differs from Cannonâs,on which seebelow. What all of thesecaseshave in common,andwhat setsthemapartfrom theothertwo categoriesto bediscussedmomentarilyis thatthey all involveshortenedformswhereit is nonethelessintendedthatonereadthefull formof theword.Thus,whenencounteringtheabbreviation ďż˝ lb ďż˝ , onewould normally readpoundorpounds, but not l. b..
Thesecondcategory, which we shall term letter sequences, behave differently: inthiscasetheintentionis thatonereadthemassequencesof letters,irrespectiveof whatthey standfor. Thus ďż˝ CIA ďż˝ , ďż˝ USA ďż˝ and ďż˝ ACL ďż˝ areto be readassequencesofletters,despitethefactthatthey standfor (amongotherthings)Central IntelligenceA-gency, UnitedStatesof AmericaandAssociationfor ComputationalLinguistics. Notethat Cannonincludescasessuchas theseunderthe rubric of abbreviation, thoughthesereally differ in kind from what I have termedabbreviationsabove, sincelettersequencesarenot generallyto beexpandedinto a word or setof words.On theother
198 CHAPTER6. FURTHER ISSUES
handwhatI termlettersequencesareoftenpopularlycalledacronyms, which is moreproperlyusedto namethethird category. To avoid thesepotentialconfusions,then,Isuggestthe term letter sequence. Typically a letter sequenceis formedfrom the ini-tial lettersof the wordsof the phrasebeingabbreviated, thoughfunction wordsareoften omittedin this computation(as in ďż˝ USA ďż˝ ). Periodsmay be usedwithin thelettersequencethoughtheseseemneverto berequired;see(Cannon,1989)for furtherdetails.
Thethird category areacronyms, which canbethoughtof aslettersequencesthatare to be readas words. Well-known examplesare ďż˝ NATO ďż˝ , ďż˝ UNESCOďż˝ andďż˝ AIDS ďż˝ . The formationprinciplesof acronymsaresimilar to thoseof initial lettersequences,but therearedifferences.Acronymsaremorelikely thanlettersequencestohave additionallettersaddedbeyondthe initial lettersof theconstituentnon-functionwords;Cannon(page114)citesexamplessuchas ďż˝ APEX ďż˝ from advancepurchaseexcursion. And acronyms canbe longer than letter sequences:the initial letter se-quencesin Cannonâscorpushadmaximallyfive letters,whereasacronymscouldhaveasmany aseightletters(pages110â113).
Unlike abbreviations,both letter sequencesandacronymsarederivedmostoftenfrom multi-wordphrases.
How arethesevariousclassesof initialismsaccountedfor within thecurrentmod-el? Let usstartwith abbreviations,which aretheeasiestto describe.For standardizedabbreviations it makessenseto simply assumethat they are listed asan alternativeorthographicentryfor a theword,or words,thatthey areassociatedwith. This wouldyield a representationsuchastheonein (6.12)for ďż˝ Dr ďż˝ representingDoctor:
(6.12) > PHON ��E :�< 1KHUäw|Pü J < �ORTH NMÌ : Q ü J T L
Note that thereare somewords for which thereis no standardnon-abbreviatedform: in English theseincludeMrs (missusis a possiblefull spelling,but this is infact hardly ever used),andMs. In thesecasesoneassumesthereis simply no fullorthographicentry.
For novelabbreviationsâ caseslike ďż˝ clstsďż˝ in (6.11)aboveâ wemustassumeadevicewherebytheabbreviationmaybederivedproductively from thenormalspellingof theword. It is notentirelyclearwhattheconstraintsonabbreviation formationare:clearlyin Englishvowelsareparticularlyproneto beingdeleted,andthereseemsto beatendency to deletenon-initialconsonantstoo,but beyondthis it is hardto sayclearlywhatmakesagoodversusanunacceptableabbreviation. However, it seemslikely thatwhatever theconstraintsare,they canbedescribedin termsof regularrelations.Thisbeingso,wecanmodelproductively-formedabbreviationsbycomposinganadditionalabbreviation transducerç ontotheoutputof c dÂşfihkjèl : c dgfih-jèl žšç . This predictsthat abbreviations,as we have definedthem, can be formed purely on the basisofthe orthographicform. It mustthereforebe possibleto recastapparentphonologicalinfluenceson abbreviation (if any) in purelyorthographicterms.Whetherthis is trueor not remainsto beseen.
6.4. ABBREVIATORY DEVICES 199
For acronyms and letter sequencesthe modelmustbe different. Acronyms likeďż˝ NATO ďż˝ andletter sequenceslike ďż˝ CIA ďż˝ certainlyrepresent,respectively, NorthAtlantic TreatyOrganizationandCentral IntelligenceAgency(or Culinary Instituteof America). But they arenot generallyto be read assuch. Rather ďż˝ NATO ďż˝ , forinstance,is the orthographicrendition of a lexical item that happensto denotethesameasNorth AtlanticTreatyOrganization, but is pronounced/neIto/:
(6.13) > PHON ��? :�< � I J < ä � < @CÊ < �ORTH NCê : Q ç J Qà Í � Q4Ï Ê T L
A similar analysiswould begivenfor ďż˝ CIA ďż˝ , whereherethesyllables/si aI eI/ arerepresentedorthographicallyby theletterswith thecorresondingnames:
(6.14) > PHON ďż˝=Ă ďż˝ :=< 1 I J < ďż˝ I
� < �ORTH N�Î : Q�ï J Q ç �CT L
Thus,whereasabbreviationsmerelyconstituteshorterwaysof writing existing lexicalitems, acronyms and initial letter sequencescorrespondto new lexical items. Thecreationof acronyms and initial letter sequencesis thus a type of word formation(Aronoff, 1976;Cannon,1989),applying to the orthographic representationof theword, ratherthan(aswould normallybethecase)on thephonologicalrepresentation.Onceonehasthenew orthographicform, thepronunciationcanbederivedin oneoftwo ways. In the caseof acronyms the relation that mapsbetweenphonologyandorthographicform (âspelling rulesâ) canbe inverted(âgrapheme-to-phonemerulesâ)to producea phonologicalrepresentation.In the caseof initial letter sequencesthephonologicalrepresentationis formed out of the normal namesof the letters; notehere,though,that otherdevicessuchasdescriptionsof the letter sequenceinvolved(Triple A for ďż˝ AAA ďż˝ ) arepossible.
While the rangeof abbreviatory devicespossiblefor Englishseemsto be widelyavailablefor many written languages,the distribution of the varying typesseemstodiffer from languageto language.(I am not awareof any cross-linguisticsurveys ofthedistributionsof typesof abbreviatorydevices.)In somelanguages,indeed,certaintypesseemto be essentiallylacking. For example,Chineseseemsto have very fewabbreviationsin thesensethatI haveusedthis term,andit will beinstructiveto digressfor a momentandconsiderthecaseof Chinese,sinceit offersaninterestingexampleof how differentpropertiesof thewriting systemcanresultin differentpossibilitiesforabbreviatorydevices.
In Chineseacronymsâ termedsuoxieâshrunkenwritingâ â abound:thusonefindsmany standardexampleslike Ă°eĂą bei da for Ă° òeĂą Ăł beijÄąng daxue âBeijingUniversityâ; ô�þ dengxuan for Ă´ Ü¥á£ø¼þ dengxiaopÄąngwenxuan(DengXiaopingselected-works)âthe selectedworksof DengXiaopingâ; and øeĂš wenge for ø ú¥ùÚ Ăť wenhua dagemÄąng âCultural Revolutionâ (Wang,1996). An examinationof the
200 CHAPTER6. FURTHER ISSUES
examplesgivenwill revealthatthepiecesselectedfor thesuoxieconstructionneednotcomefrom theinitial of thecorrespondingconstituent,unlike whatonewould almostinvariably find in English. Nonetheless,Chinesesuoxieare like English acronymsin that they are shortenedforms of longer constructionswhich, crucially, are readin their shortenedform, not expandedinto the constructionfrom which they werederived. Now, astheastutereaderwill have alreadynoted,thenatureof theChinesewriting systemmakes it impossibleto determinewhethersuoxie is more correctlyequatedwith acronyms,aswe haveheretoforeassumed,or with lettersequences.Thekey distinction is in how thesetwo kinds of constructionsare read: acronyms arepronouncedby applyingpronunciationrulesto thesequenceof symbols;initial lettersequencesarepronouncedby namingthe lettersin sequence.In Chinese,thesetworoutesyield thesameresultsincethepronunciationof a characteris alsothenameofthatcharacter.
As we noted,Chinesebasicallylacksabbreviationsin thesensethatwe have de-fined. With the exceptionof specialsymbolslike â%â and â$â, which areexpandedinto thecorrespondingexpressionsfor âpercentâ,âdollarâ, etc.,thereareessentiallynoothercaseswherea shortenedform is expandedduring reading:this evenappliestoborrowedforms,like Ăź kg Ă˝ or Ăź cmĂ˝ , which areobligatorily treatedasabbreviationsin English,but which Chinesereadersspell out assequencesof letters:thus Ăź kg Ă˝ isreadliterally ask. g. (Chilin Shih,personalcommunication).Theapparentavoidanceof treatingborrowedgraphicalelementsasabbreviationsto beexpandedwhenreadingcouldbeexplainedby thefact thatChinesehashistorically lackedabbreviations.Butwhy did it lackabbreviations?
I believe the explanationmay be dueto a conspiracy betweenpropertiesof Chi-nesemorphology, thenatureof theChinesescript,andthe functionof abbreviations.First, Chinesewordsaretypically short,one-andtwo-syllablebeingtheoverwhelm-ing majority. (This is with thenotableexceptionof nominalcompounds,whichcanbequite long.) In the earliestforms of Chinese,it is commonlyconjecturedthat wordswere largely monosyllabic(seevariouschaptersin (Packard,1998) for discussion),andevenin laterClassicalChinese,which wasthewritten standardup until theearlypartof thiscentury, monosyllabicwordsmadeupalargerproportionof wordsin atyp-ical text thanthey would in present-dayspokenMandarin.Second,Chinesecharactersphonologicallyalmostalwaysrepresentsinglesyllables.Third, asweobservedabove,abbreviationsaremostcommonlyusedto abbreviatesinglewords(thoughabbrevia-tions of phrasescertainlydo occur). In Chinese,then,all onecould hopeto gain inmostcaseswould betheshorteningof a word thatwould bewritten with two charac-ters(a two-syllableword) into a one-characterabbreviation,somethingthatwouldnothaveaffordedmuchof asavings.Therewasthereforelittle to begainedby introducinggraphicalshorteningdevicesin theform of abbreviations.25
Abbreviations,aswe havedefinedthem,area purelygraphicaldevice intendedtoshortentheformof writtenwords.Acronymsandinitial lettersequencesaresomewhatmorecomplex thanthis,but they toodependuponthewritten form of words.As such,all of theseformsof âinitialismsâ haveaplacein acompletemodelof writing systems.
25Notethatothergraphicalshorteningdeviceswereemployed,suchasaspecialsymbolto indicateredu-plicatedcharacterssimilar in functionto thereduplicationmarkersdiscussedin Section4.4.2.
6.5. NON-BLOOMFIELDIAN VIEWS ON WRITING 201
In thissectionwehavemadesometentativestepstowardsfitting thesedevicesinto theproposedmodel.
6.5 Non-BloomfieldianViewson Writing
It hasoftenbeennotedthat scholarsof languagehave beendivided in their attitudesaboutwriting alongtwo partly independentdimensions.The primary dimensionre-latesto whetherthestudyof writing is eveninteresting:Bloomfieldis generallycitedasthesourceof theview thatwriting itself is not interesting(sincewritten languageismerelyatbestacrudeapproximationof spokenlanguage,thetrueobjectof study),andthisview hasto a largeextentsurvivedin moderngenerativelinguistics.26 Theseconddimension,assumingoneat leastacceptsthat writing systemsmight be interesting,is how written languagerelatesto language.Specifically, is it in Bloomfieldâs termsa âway of recording[spoken] languageby meansof visible marksâ,or is it indeedaseparateform of communicationthatneednot relateto spokenlanguageat all?27
ThePraguians,mostnotablyVachek(e.g. (Vachek,1973))have beenamongthemoststaunchdefendersof theview thatwritten languageshouldbetreatedseparatelyfrom spoken language,but a numberof British linguists, including Sampson(1985)andHarris(1995),havechallengedtheessentiallyâglottocentricâBloomfieldianview.For example,aswe discussedearlier, Sampsondistinguishesbetweenglottographicwriting systems,wherethewritten symbolsrepresentsomeaspectof specificallylin-guisticinformation;andâsemasiographicâwriting systems,wherethewrittensymbolsdirectlyrepresenttheâmeaningâof theintendedmessage,giving noinformationabouthow onewould actuallyexpressthis meaninglinguistically. Crucially, for Sampsonboth of thesekinds of systemscount aswriting, and this view is echoedby Harris(1995).
Perhapsthe clearestinstanceof the opposingview is offered by DeFrancis(1989),whodefinesâwriting systemâassynonymouswith âglottographicwriting sys-temâ.28 Onecentralreasonis thatif oneinsiststhatwriting systemsareexactly thosegraphicalsystemsof communicationwhereinonecanexpressany messageexpressiblein spokenlanguage,thentheonly extantsystemsthatmeetthis requirementareglot-tographicones.To besure,therearemany highly complex notationalsystemsthatarearguablynot glottographic,andwhich allow for a rich setof expressions:mathemati-calnotation,danceand othermovementnotationsystems,music(seevariouschaptersin (DanielsandBright, 1996)),andeventheicon-basedmessagesone frequentlyseesin (especiallyEuropean)instructionmanuals(Sampson,1985,pages31â32). All of
26It would bea mistake, however, to view this asany kind of religiousdogmain generative linguisticsthatis inculcatedin succeedinggenerationsof disciples.WhenI wasagraduatestudentat MIT in theearly80âs, I donot recallwriting systemsbeingdiscussedin any form, eitherpositively or negatively. Ratherthelack of attentionto writing systemsamonggenerative linguistsarises,I believe, simply becausefew in thattraditionhave thoughtmuchaboutthetopic,andnonehave beenencouragedto doso.
27Of course,asHarris rightly observes(1995,page45), the existenceof Braille shows that the writingdoesnot have to bevisibly arranged,merelyspatiallyarranged.
28Indeed,aswe have seen,he goesmuchfurther thanthis, insistingthat full writing systemsmustnotonly beglottographic,but at leastto somedegreephonographic.
202 CHAPTER6. FURTHER ISSUES
theseexamplesinvolvesymbologythatis conventionalto agreateror lesserdegree;allof themareclearlysystemsof communicationthatarecomplex to a greateror lesserdegree;andall of themareableto communicatemessagesthatcanbequitecomplex,especiallyif onehad to put them into words. But all of themare highly restrictedin the domainto which they apply, andthis is, to take DeFrancisâposition,the cruxof the matter. To make the samepoint in a differentway, while it maybe painful topreciselyexpressa complex mathematicalexpressionusingordinarywritten English,this is somethingthatcouldbedone,just asonecouldreadtheexpressionaloud,andhave it be understoodby someonewith sufficient mathematicalknowledge. In con-trast, it would be hardto seehow onecould representthe AmericanDeclaration ofIndependenceusingonly thesymbologyof mathematics.Thusglottographicsystemsaregeneral,whereasarguablysemasiographicsystemsarerestricted.
Shouldnon-glottographicsystemsbe consideredwriting? On the faceof it thiswould appearto be purely a matterof definition, and hardly worth arguing about.However, thereis oneimportantpropertythatnon-glottographicsystemssuchasmath-ematicalnotationsharewith (glottographic)written language,which neitherof themsharewith speech,andthatis theuseof atwo-dimensionalsurface:speechis producedandprocessedover time, andthereforecouldbe consideredto be a one-dimensionalsignal; written forms, whetherglottographicor otherwise,usuallyhave two dimen-sionsat their disposal,andfrequentlymake useof themin waysthathave no parallelin speech.
Forexample,asHarrisnotes(pages141â144),crucialusehasbeenmadein diversemathematicaltraditionsof tabular arrangementsof symbols:multiplicationtables,andlogarithmtablesare just two moderninstancesof these.The tabular arrangementiscrucial for representingthe relevant mathematicalconcepts:for example,in a mul-tiplication table,oneunderstandsthe entriesin eachcell of the tableasrepresentingthe productof the numberheadingthe relevantcolumn,andthe numberheadingtherelevantrow. Speechcannotadequatelyrepresentthis two dimensionalstructure.AsHarrisobserves(page144): âIf mathematicshadhadto rely onspeechasits cognitivemode,we shouldstill beliving in a primitiveagriculturalsociety.â29
Similar usesof two-dimensionallayout can also be found, of course,in glotto-graphicwriting. OnecasethatHarris pointsto is patternpoetry, wherewordsin thepoemarearrangedsoasevokeapicture;therearealsoinstancesof âpatternproseâ,themostfamousof thesein Englishbeing,perhaps,themouseâstale/tailin Lewis CarrollâsAliceâsAdventuresin Wonderland. A moderninstanceis thee-mailsignatureblock,amade-up(but perfectlyrealistic)exampleof which is givenin Figure6.2;herewe seethe useof two-dimensionalarrangementin the separationof the postaladdressfromthenamein thelefthandcolumn,thee-mailaddressin thetop righthandcolumn,and
29Having saidthis, it shouldalsobepointedout thattherearesituationswhereoneis forcedto representtabular informationin speech:suchis the caseof readingsystemsfor the blind. Oneingenioustechnicalsolutionthatcircumventssomeof thelimitationsof normalspokenlanguagewasdevelopedby T.V. Raman(1994)in his text-preprocessingsystemnamedAster. Ramanâs systemrendersLATEXdocumentsinto speech(usingtheDECTalk TTSsystem),andincludesmethodsfor renderingvariouslevelsof documentstructure,aswell asmathematicalexpressionsâ includingmatrices,which areof coursea kind of tabular represen-tation. Tablesarereadleft-to-right andtop to bottom,with thepositionin thetablebeingmimickedby theperceivedpositionof thevoicein theauditoryfield.
6.6. POSTSCRIPT 203
---------------------------------------------------------â:â...ââ. ---
Michael Farber [email protected] â::.âââââ:;:â::; ;;:
Lucent Technologies :::,, :;;1432 Pine St., 3D-403 Lucent Technolo-
gies ;:;â ;:;Liberty Corner Bell Labs Innovation-
s ;;: .;:New Jersey, 07934 :;:, ,:;:phone: 908-712-9993 fax: 908-712-
9980 â:;;;:;:â
Figure6.2: Two-dimensionallayoutin ane-mailsignature
the integrationof the verbiagefrom the company logo (âLucent Technologies,BellLabs Innovationsâ) into the restof the design. Therearemany otherexamplesthatcouldbegiven.
Sonon-glottographicformsof âwritingâ sharewith glottographicformstheprop-erty of usingtwo-dimensionalspacein waysthathave no direct counterpartin (one-dimensional)speech.Furthermore,this âlayout analysisâ(to borrow andsomewhatadapta term from documentimageprocessing)is clearly a field worthy of study inits own right. But whatarewe to make of this observation?Doesit forceus to view(e.g.) mathematicalnotationandordinarywritten Englishasbeingtwo instancesofthe sameclassof object? And must the term âwritingâ apply to both? I fail to seewhy: presumablyonecouldrestrictthetermâwritingâ to glottographicrepresentation-al systems,andusea separateterm to denoteforms of symbolicrepresentationthatmakecrucialuseof two dimensions.Glottographicwriting systems,in their full glory,would beinstancesof both;mathematicalandothernon-glottographicsystemswouldbeinstancesonly of thelatter. It comesdown, afterall, to a matterof definition.
Of coursethis view doesentail thatthereareinterestingaspectsof (glottographic)writing that go beyond the way in which writing representsspeech.This conclusionseemsincontrovertible,but it is importantto realizethatthis in no waycontradictsthetheoryof writing systemspresentedin thisbook,whichdealssolelywith themappingbetweenwrittenandspokenform.
6.6 Postscript
I havepresentedaformal theoryof orthography, makingspecificproposalsaboutwhatlinguisticobjectsarerepresented,whatlevelof linguisticrepresentation(whatwehavetermedthe ORL) may be represented,andwhat the constraintson the mappingbe-
204 CHAPTER6. FURTHER ISSUES
tweenlinguistic andgraphicalrepresentationare.Thequestionof whatspecifickindsof linguistic objectsarerepresentedis, of course,a topic that hasoccupiedmuchofthe literatureon writing systems;the level of linguistic representationhasonly beendiscussedextensively in the psycholinguisticliterature,andthereonly in superficialterms; constraintson the mappingbetweenlinguistic andwritten form have hardlybeendiscussedat all. Thereis thereforesomereasonto believe that thecurrentworkis the mostsystematicformal proposalfor a theoryof writing systemspresentedtodate.It is, nonetheless,only a beginning,andit is hopedthatthis work will serveasastimulusfor developinga muchmorecompletetheoryof writing systemsby a muchwider groupof researchers.
Sucha theoryis clearlynecessaryfor a varietyof reasons.Consider, for examplethatorthographicevidencehasbeenoccasionallyusedby generative linguiststo sup-port oneor another(usually)phonologicaltheory. We have alreadydiscussedChom-sky andHalleâs views on English spellingandits relationto their modelof Englishphonology;onecouldaddto this Steriadeâs (1982)andsubsequentlyMillerâ s (1994)useof evidencefrom Linear B to supporta modelof syllablestructurefor Greek.Millerâ sstudyis broadin scopeandsystematic,but mostuseof orthographicevidencethatonefindsin theliterature,includingSteriadeâs,andChomsky andHalleâs is limit-ed,andmostlyadhoc.In somecases(plausiblySteriadeâs)theanalysismayturnouttobecorrect;in others(Chomsky andHalleâs) it is suspect.But therealpointhereis thata fortuitouslyselectedorthographicfactheldup asevidencefor a particularlinguisticclaim cannotbe readily evaluatedin the absenceof a serioustheoryof orthography.Thereis nothingspecialaboutorthographyin this regard:in anentirelysimilar veinaphonologicalfactoidbroughtin asevidencefor a particularsyntacticanalysisshouldnot be taken seriouslywithout a goodunderstandingof the relationbetweensyntaxandphonology. Orthographydeservesthesamelevel of respect.
A coherenttheoryof therelationbetweenwriting andlinguistic form is alsoneed-edin speechtechnology, whichwasthestartingpoint of our discussion.Many speechtechnologyresearchers,both in text-to-speechandautomaticspeechrecognition,im-plicitly view the standardorthographyfor a languagelike Englishasa poor kind ofphonetictranscription. Thusone hearstermslike âletter-to-soundrulesâ usedas ifsomehow the sequenceof letters Ăź enoughĂ˝ wassimply a lousy phonetictranscrip-tion of thesequenceof soundsthatwould (in standardIPA) berepresentedby / Ăž nĂż f/.Recognizingthatfor a complex orthographylike English,thedevelopmentof a letter-to-soundcomponentis amajorundertaking,therehasbeen,overthepastdecadeor so,a largeamountof interestin automaticmethodsfor acquiringletter-to-soundsystem-s: (Sejnowski andRosenberg, 1987;Luk andDamper, 1993;AdamsonandDamper,1996;Luk andDamper, 1996;Daelemansandvan denBosch,1997) aresomeof the betterknown instancesof these. Suchsystemscanautomaticallyâlearnâ thecontext-dependenttransductionsneededfor a systemlike thatof English,so thatonemight expectrough, troughandthroughto becorrectlypronounced.(To date,though,nobodyhasyet demonstratedperformance,for English,at the level of a moretradi-tionally designedsystemincluding a dictionaryplus morphologicalor phonologicalrules.) In sodoingsuchsystemstake advantageof a crucialpropertyof Englishor-thography:while themappingbetweena particularletteranda givensoundis highly
6.6. POSTSCRIPT 205
complex, onecanalmostalwaysfind agoodanswerby lookingat thecontext within afairly smallwindow (sayplusor minusfour letters)aroundthetargetletter.
Evenso,thereis still a largeamountof indeterminacy. Alternationslike produce(noun)versusproduce(verb),or axes/ ďż˝ ĂŚksĂž z/ (plural of ax) versusaxes/ ďż˝ ĂŚksiz/(plu-ral of axis), demonstratethat onehasto be preparedto make useof informationnotfound in the letter string alone: in generalonemustuselexical, grammaticalor se-mantic information that can only be inferred from examining a wider context thanjust the individual word. Letterstringsin Englishdo encodepronunciation,but onlyin combinationwith otherinformationthatcannotbecomputedfrom the letterstringalone.
So,to theextentthatautomaticmethodswordpronunciationpresumethatall of theinformationneededto pronounceaword is foundin its letterstring,they aremissingabasicpoint abouthow writing systemsrepresentlinguistic information. Orthographyis not phonetictranscription;ratherit is a guideto the native readerof the languagethat frequentlygivesa large amountof informationabouthow to pronouncewords,but alsoinvariablyassumesthat the readerhasotherlinguistic knowledgeto bring tobearontheproblemof decodingthemessage.This is akey reasonwhy automaticTTSconversionis sohardto do right: mostof thelinguistic knowledgethathumansbringto bearon thetaskof readingis simply missingfrom TTS systems.
To furtherdrive homethis point, consideragainthecaseof Russianorthography.As wehaveseen,Russianspellingis highly regular(much moresothanEnglish),butthereis onecrucial pieceof informationmissingfrom the standardspelling,namelythe lexical stressplacement,without which onecannotpredictthe quality of variousvowels in the word: in Russian,lexically-determinedstressplacementof the kindillustratedby Englishproduce/produceis rampant.Stressplacementinformationcanbe predictedfrom morphologicalinformation, and if such information were addedto the strings(either by somedictionary-plus-rule-basedprocedure,or by someas-yet-to-be-developedhigh accuracy automaticinferenceprocedure),thenof coursethevariousautomaticschemesthathavebeenproposedshouldhavenotroublelearningtherelationbetweentheseannotatedorthographicstrings,andthepronunciation.But thisexercisewould largely defeatthe statedpurposeof mostwork on automaticlearningof âletter-to-soundârules in that therewould not be muchsavings of labor. On theonehand,developingthemorphologicalanalysistools for Russiansuchthatonecanpredicttheappropriatemorphologicalfeaturesto addto agivenstringis itself amajorundertaking;30 andonceonehasthis portionof thesystem,developingtheâletter-to-soundârulesis relatively straightforward. It shouldbeaddedthat to datetherearenoknown methodsby which themorphologicalsystemof a languageasmorphologicallycomplex asRussiancould be automaticallylearned;automaticmethodsthus fail tosave laborpreciselywherethesavingsis neededmost.
It shouldbeclearthatmy goalis not to argueagainsttheinvestigationof automaticmethodsfor learningword pronunciation:on thecontrary, investigationof theseandsimilarmachine-learningproblemsareaninterestingandimportantline of inquiry. Butsuchinvestigationsshouldbegroundedin a properunderstandingof thephenomenon
30In our experience,severalmonthsat leastis required:see(Sproat,1997b).
206 CHAPTER6. FURTHER ISSUES
thatoneis attemptingto investigate,andthisunderstandingis frequentlylackingin thespeechtechnologycommunity. As a resultany regularattendeeof speechtechnologyconferenceswill be subjectedto a seriesof quite surprisingclaimsto the effect thatsincesuch-and-suchan automaticmethodperformswith (say) a 10% error rate onEnglishword pronunciation(which is well-known to be the hardestlanguage,or sothesuppositiongoes),thesametechniquecanbeappliedto any otherlanguage,thusobviating theneedfor manuallinguistic labor. Worse,sincemostregularattendeesatspeechconferencesdo not know any better, suchclaimsdo not raisethe numbersofeyebrowsthatthey oughtto.
It is certainlyoptimisticto assumethata well-articulatedformal theoryof writingsystemswill ipso facto raisethe generallevel of awarenessof orthographyand itsrelation to linguistic form. But it is alsocertainthat without sucha theory, writingsystemswill not generallybedeemedworthyof seriousstudyby theoreticallinguists,norwill muchattentionbepaidto their propertiesby speechtechnologists.
References 207
REFERENCES
Adamson,Martin andRobertDamper. 1996. A recurrentnetwork that learnsto pro-nounceEnglish text. In Proceedingsof the Fourth InternationalConferenceonSpokenLanguageProcessing, volume3, pages1704â1707,Philadelphia,PA. IC-SLP. l2s.
Allen, Jonathan,M. SharonHunnicutt,andDennisKlatt. 1987.FromText to Speech:theMITalk System. CambridgeUniversityPress,Cambridge.
Aronoff, Mark. 1976. Word Formation in Generative Grammar. MIT Press,Cam-bridge,MA.
Aronoff, Mark. 1985.Orthographyandlinguistic theory. Language, 61(1):28â72.
Aronson,Howard. 1996. Yiddish. In PeterDanielsandWilliam Bright, editors,TheWorldâsWriting Systems. OxfordUniversityPress,New York, NY, pages735â742.
Avanesov, R. I. 1974. Russkayadialektnayafonetika(RussianDialect Phonetics).Prosvescenie,Moscow.
Avanesov, R. I., editor. 1983. Orfoepicheskiyslovarâ russkogo yazyka(OrthoepicDictionaryof theRussianLanguage). RusskiyYazyk,Moscow.
Balota,David, GiovanniFloresdâArcais, andKeith Rayner, editors. 1990. Compre-hensionProcessesin Reading. LawrenceErlbaumAssociates,Hillsdale,NJ.
Baluch,B. andDerk Besner. 1991. Visual word recognition:Evidencefor strategiccontrolof lexical andnonlexical routinesin oral reading.Journalof ExperimentalPsychology: Learning, Memory, andCognition, 17:644â652.
Baxter, William. 1992.A Handbookof Old ChinesePhonology. Number64 in Trendsin Linguistics:StudiesandMonographs.MoutondeGruyter, Berlin.
Bell, AlexanderMelville. 1867. Visible Speech: TheScienceof Universal Alphabet-ics; or Self-InterpretingPhysiological Letters, for theWriting of All LanguagesinOneAlphabet. Simpkin,Marshall,London.
Bennett,Emmett.1996.Aegeanscripts.In PeterDanielsandWilliam Bright, editors,TheWorldâsWriting Systems. OxfordUniversityPress,New York, NY, pages125â133.
Besner, Derk andD. Hildebrandt.1987. Orthographicandphonologicalcodesin theoral readingof Japanesekana. Journal of ExperimentalPsychology: Learning,Memory, andCognition, 13:335â343.
Besner, Derk andMarilyn ChapnikSmith. 1992. Basicprocessesin reading:Is theorthographicdepthhypothesissinking? In RamFrostandLeonardKatz, editors,Orthography, Phonology, Morphology and Meaning, number94 in AdvancesinPsychology. North-Holland,Amsterdam,pages45â66.
208 References
Bird, Steven. 1995. ComputationalPhonology. CambridgeUniversity Press,Cam-bridge.
Bird, Steven. 1999. Strategiesfor representingtonein African languages:A criticalreview. WrittenLanguageandLiteracy, 2(1). To appear.
Bird, StevenandT. Mark Ellison. 1994. One-level phonology:Autosegmentalrepre-sentationsandrulesasfinite automata.ComputationalLinguistics, 20(1):55â90.
Bird, StevenandEwanKlein. 1994.Phonologicalanalysisin typedfeaturestructures.ComputationalLinguistics, 20:455â491.
Bird, Steven andMark Liberman. 1999. A formal framework for linguistic annota-tion. TechnicalReportMS-CIS-99-01,Departmentof ComputerandInformationScience,Universityof Pennsylvania,Philadelphia.
Bloomfield,Leonard.1933.Language. Holt, RinehartandWinston,New York.
Bloomfield, Leonardand ClarenceBarnhart. 1961. Letâs Read: A Linguistic Ap-proach. WayneStateUniversityPress,Detroit,MI.
Bonfante,Larissa. 1996. The scriptsof Italy. In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages297â311.
Booij, Geert. 1996. Verbindingsklanken in samenstellingenen de nieuwespellingregeling.NederlandseTaalkunde, 2:126â134.
BrandtCorstius,Hugo, editor. 1968. Grammars for NumberNames. Number7 inFoundationsof Language,SupplementarySeries.D. Reidel,Dordrecht.
Bright, William. 1996. TheDevanagariscript. In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages384â390.
Broderick,George. 1984a.A Handbookof LateSpokenManx: Dictionary, volume2.Max NiemeyerVerlag,Tubingen.
Broderick,George. 1984b. A Handbookof LateSpokenManx: GrammarandTexts,volume1. Max NiemeyerVerlag,Tubingen.
Browman,CatherineandLouis Goldstein.1989. Articulatory gesturesasphonologi-calunits. Phonology, 6:201â251.
Browne,Wayles. 1993. Serbo-Croat.In BernardComrieandGreville Corbett,edi-tors,TheSlavonicLanguages. Routledge,London,pages306â387.
Bunis,David. 1975.A Guideto ReadingandWriting Judezmo. TheJudezmoSociety,Brooklyn,NY.
References 209
Cannon,Garland. 1989. Abbreviationsand acronyms in English word-formation.AmericanSpeech, 64:99â127.
Carlton,Terence.1990. Introductionto the Phonological History of theSlavicLan-guages. Slavica,Columbus,OH.
Chen,Hsuan-ChihandOvid Tzeng,editors.1992. LanguageProcessingin Chinese.Number90 in Advancesin Psychology. North-Holland,Amsterdam.
Chomsky, NoamandMorris Halle. 1968. TheSoundPatternof English. HarperandRow, New York.
Chou, Phil. 1989. Recognitionof equationsusing a two-dimensionalstochasticcontext-free grammar. Technicalreport,AT&T Bell Laboratories,Murray Hill,NJ,August.
Church,Kenneth.1980.Onmemorylimitationsin naturallanguageprocessing.Mas-terâs thesis,MassachusettsInstituteof Technology, Cambridge,MA.
Church,KennethandPatrick Hanks.1989.Word associationnorms,mutualinforma-tion andlexicography. In 27th AnnualMeetingof the Associationfor Computa-tional Linguistics, pages76â83,Morristown, NJ. Associationfor ComputationalLinguistics.
Ci Hai. 1979.Ci Hai. ShanghaiLexiconPublishingSociety, Shanghai.
Clements,G. Nick. 1985. The geometryof phonologicalfeatures. In C. EwenandJ. Anderson,editors,Phonology Yearbook2. CambridgeUniversity Press,pages225â252.
Coleman,John.1998. Phonological Representations:Their Names,FormsandPow-ers. Number85 in CambridgeStudiesin Linguistics.Cambridge.
Coulmas,Florian. 1989.TheWriting Systemsof theWorld. Blackwell,Oxford.
Coulmas,Florian. 1994. Typology of writing systems. In Hugo Steger and Her-bert ErnstWiegand,editors,Schrift und Schriftlichkeit/Writing and its Use, vol-ume2. WalterdeGruyter, Berlin, chapter118,pages1380â1387.
Cregeen,Archibald. 1835. Fockleyr ny Gaelgey (A Dictionary of Manx). Yn Che-shaghtGhailckagh,Douglas,Isleof Man. Reprintedin 1971.
Cubberley, Paul. 1996. The Slavic alphabets.In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages346â363.
Cummings,D. W. 1988.AmericanEnglishSpelling:An InformalDescription. JohnsHopkinsUniversityPress,Baltimore,MD.
210 References
Daelemans,Walter and Antal van den Bosch. 1997. Language-independentdata-orientedgrapheme-to-phonemeconversion. In Janvan Santen,RichardSproat,JosephOlive,andJuliaHirschberg,editors,Progressin SpeechSynthesis. Springer,New York, NY, pages77â89.
Daniels,Peter. 1991a. Is a structuralgraphemicspossible? In LACUSForum, vol-ume18,pages528â37.
Daniels,Peter. 1991b. Replyto herrick. In LACUSForum, volume21,pages425â31.
Daniels,Peter. 1996a.Aramaicscriptsfor Aramaiclanguages.In PeterDanielsandWilliam Bright, editors,The Worldâs Writing Systems. Oxford University Press,New York, NY, pages499â514.
Daniels,Peter. 1996b. The studyof writing systems.In PeterDanielsandWilliamBright, editors,TheWorldâsWriting Systems. Oxford UniversityPress,New York,NY, pages3â17.
Daniels,Peterand William Bright. 1996. The Worldâs Writing Systems. OxfordUniversityPress,New York, NY.
de Gelder, BeatriceandJose Morais, editors. 1995. Speech and Reading. Erlbaum(UK) Taylor andFrancis,Hove.
DeFrancis,John. 1984. The ChineseLanguage: Fact and Fantasy. University ofHawaii Press,Honolulu,HI.
DeFrancis,John. 1989. Visible Speech: TheDiverseOnenessof Writing Systems.Universityof Hawaii Press,Honolulu,HI.
DeFrancis,JohnandJ.MarshallUnger. 1994.Rejoinderto Geoffrey Sampson:âChi-nesescriptandthediversityof writing systemsâ.Linguistics, 32:549â554.
Diller, Anthony. 1996. Thai andLao writing. In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages457â466.
Dutoit, Thierry. 1997. An Introductionto Text-to-Speech Synthesis. Kluwer, Dor-drecht.
Faber, Alice. 1992. Phonemicsegmentationasepiphenomenon.evidencefrom thehistoryof alphabeticwriting. In PamelaDowning,SusanLima, andMichaelNoo-nan,editors,TheLinguisticsof Literacy. JohnBenjamins,Amsterdam,pages111â34.
Fano,R. 1961.Transmissionof Information. MIT Press,Cambridge,MA.
Fant,Gunnar. 1960.AcousticTheoryof Speech Production. Mouton,TheHague.
Fischer, Steven. 1997a.Glyphbreaker. Copernicus,New York.
References 211
Fischer, Steven. 1997b. Rongorongo: TheEasterIsland Script: History, Tradition-s, Texts. Number14 in Oxford Studiesin AnthropologicalLinguistics.OxfordUniversityPress,Oxford.
Flesch,Rudolf. 1981. WhyJohnnyStill Canât Read: A New Look at the Scandalofour Schools. HarperCollins,New York.
FloresdâArcais, Giovanni,Hirofumi Saito,andMasahiroKawakami. 1995. Phono-logicalandsemanticactivationin readingkanji characters.Journalof Experimen-tal Psychology, 21(1):34â42.
Frost,RamandLeonardKatz, editors. 1992. Orthography, Phonology, Morphologyand Meaning. Number94 in Advancesin Psychology. North-Holland,Amster-dam.
Frost, Ram, LeonardKatz, and ShlomoBentin. 1987. Strategies for visual wordrecognitionandorthographicaldepth:A multilingual comparison.Journal of Ex-perimentalPsychology: HumanPerceptionandPerformance, 13:104â115.
Fujimura, Osamuand R. Kagaya. 1969. Structuralpatternsof Chinesecharacter-s. In Proceedingsof the InternationalConferenceon ComputationalLinguistics,pages131â148,Sanga-Saby, Sweden.InternationalConferenceonComputationalLinguistics.
Gardiner, Alan. 1982. EgyptianGrammar. Griffith Institute,AshmoleanMuseum,Oxford, third edition.
Gelb,Ignace.1963.A Studyof Writing. ChicagoUniversityPress,2ndedition.
Giammarressi,Dora and Antonio Restivo. 1997. Two-dimensionallanguages. InGrzegorzRozenberg andArto Salomaa,editors,Handbookof Formal Languages.Springer-Verlag,Berlin, pages215â267.
Haas,W. 1983. Determiningthelevel of a script. In FlorianCoulmasandK. Ehlich,editors,Writing in Focus. Mouton,Berlin, pages15â29.
Haile, Getatchew. 1996. Ethiopic writing. In PeterDanielsand William Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages569â576.
Harbaugh,Rick. 1998. ChineseCharacters: A Genealogy and Dictionary. Zhong-wen.com.
Harris,Roy. 1995.Signsof Writing. Routledge,London.
Harrison,Michael. 1978. Introductionto FormalLanguageTheory. AddisonWesley,Reading,MA.
Hary, Benjamin. 1996. Adaptationsof Hebrew script. In PeterDanielsandWilliamBright, editors,TheWorldâsWriting Systems. Oxford UniversityPress,New York,NY, pages727â734.
212 References
Hopcroft, JohnandJeffrey Ullman. 1979. Introductionto AutomataTheory, Lan-guagesandComputation. Addison-Wesley, Reading,MA.
Horodeck,Richard. 1987. TheRoleof Soundin Readingand Writing Kanji. Ph.D.thesis,CornellUniversity, Ithaca,NY.
Hung, Daisy, Ovid Tzeng,and AngelaTzeng. 1992. Automatic activation of lin-guistic informationin Chinesecharacterrecognition. In RamFrostandLeonardKatz, editors,Orthography, Phonology, Morphology andMeaning, number94 inAdvancesin Psychology. North-Holland,Amsterdam,pages119â130.
Hurford, James.1975. TheLinguistic Theoryof Numerals. CambridgeUniversityPress,Cambridge.
Ifans,DafyddandRobertThomson.1979.EdwardLhuydâsGeirieuManaweg. StudiaCeltica, 14:127â167.
Instituut voor NederlandseLexicologie. 1995. Woordenlijst NederlandseTaal. SduUitgevers,TheHague.
Johnson,C. Douglas. 1972. Formal Aspectsof Phonological Description. Mouton,Mouton,TheHague.
Kanwisher, N. 1987. Repetitionblindness:Typerecognitionwithout tokenindividu-ation. Cognition, 27:117â143.
Kaplan,RonaldandMartin Kay. 1994.Regularmodelsof phonologicalrule systems.ComputationalLinguistics, 20:331â378.
Karttunen,Lauri. 1995. Thereplaceoperator. In 33rd AnnualMeetingof theAssoci-ation for ComputationalLinguistics, pages16â23,Cambridge,MA. ACL.
Karttunen,Lauri. 1998. Thepropertreatmentof optimality in computationalphonol-ogy. In Lauri KarttunenandKemalOflazer, editors,FSMNLPâ98: Proceedingsof theInternationalWorkshopon Finite StateMethodsin Natural Language Pro-cessing, pages1â12,BilkentUniversity, Ankara.
Karttunen,Lauri andKennethBeesley. 1992. Two-level rule compiler. TechnicalReportP92â00149,XeroxPaloAlto ResearchCenter.
Katz,LeonardandLaurieFeldman.1983.Relationbetweenpronunciationandrecog-nition of printedwordsin deepandshallow orthographies.Journalof Experimen-tal Psychology: Learning, MemoryandCognition, 9:157â166.
Katz, LeonardandRamFrost. 1992. The readingprocessis different for differen-t orthographies:the orthographicdepthhypothesis. In Ram FrostandLeonardKatz, editors,Orthography, Phonology, Morphology andMeaning, number94 inAdvancesin Psychology. North-Holland,Amsterdam,pages67â84.
References 213
Kaye,Alan. 1996.Adaptationsof Arabicscript. In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages743â762.
King, Ross.1996. Koreanhankul. In PeterDanielsandWilliam Bright, editors,TheWorldâsWriting Systems. OxfordUniversityPress,New York, NY, pages218â227.
Kiraz, George. 1999. ComputationalApproach to Non-LinearMorphology. Cam-bridgeUniversityPress,Cambridge.
Klima, Edward. 1972. How alphabetsmight reflect language. In JamesKavanaghandIgnatiusMattingly, editors,Language by Ear andby Eye: TheRelationshipsbetweenSpeech andReading. MIT Press,Cambridge,MA, pages57â80.
Knight,Stan.1996.Theromanalphabet.In PeterDanielsandWilliam Bright,editors,TheWorldâsWriting Systems. OxfordUniversityPress,New York, NY, pages312â332.
Koskenniemi,Kimmo. 1983. Two-Level Morphology: a General ComputationalModel for Word-Form Recognition and Production. Ph.D. thesis,University ofHelsinki,Helsinki.
Krivickij, A. andA. Podluzhnyj. 1994.UchebnikBelorusskogo Yazyka. VyshejshayaShkola,Minsk.
Kucera,H. andW. Francis.1967. ComputationalAnalysisof Present-DayAmericanEnglish. Brown UniversityPress,Providence.
Law, SamPoandAlfonso Caramazza.1995. Cognitive processesin writing Chinesecharacters:Basicissuesandsomepreliminarydata.In BeatricedeGelderandJoseMorais, editors,Speech and Reading. Erlbaum(UK) Taylor andFrancis,Hove,pages143â190.
Lehman,Winifred andLloyd Faust,1951. A Grammarof Formal WrittenJapanese,chapterSupplement:Kokuji, by R. P. Alexander. HarvardUniversityPress,Cam-bridge,MA.
Lejeune,Michel. 1974.Manueldela LangueVenete. CarlWinter, Heidelberg.
Levin, Estherand RobertoPieraccini. 1991. Dynamic planarwarping and planarhiddenmarkov modeling:from speechto opticalcharacterrecognition.Technicalreport,AT&T Bell Laboratories,Murray Hill, NJ,November.
Lewis, Harry andChristosPapadimitriou.1981.Elementsof theTheoryof Computa-tion. Prentice-Hall,EnglewoodCliffs, NJ.
Luk, RobertandRobertDamper. 1993. Experimentswith silent-eandaffix corre-spondencesin stochasticphonographictransductions.In Proceedingsof theThirdEuropeanConferenceonSpeechCommunicationandTechnology, volume2,pages917â920,Berlin. ESCA.
214 References
Luk, Robertand RobertDamper. 1996. StochasticphonographictransductionforEnglish.ComputerSpeech andLanguage, 10:133â153.
Lyosik, Yazep. 1926. GramatykaBelaruskaeMovy: Fonetyka. Publishedby theAuthor, Minsk.
MacMahon,Michael. 1996. Phoneticnotation. In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages821â846.
Macri, Martha. 1996. Maya andotherMesoamericanscripts. In PeterDanielsandWilliam Bright, editors,The Worldâs Writing Systems. Oxford University Press,New York, NY, pages172â182.
Maksymiuk,Jan. 1999. An orthographyon trial in Belarus. Written Language andLiteracy, 2(1):141â144.
Marantz,Alec. 1982.Rereduplication.LinguisticInquiry, 13:435â482.
Mastroianni,M. andBob Carpenter. 1994. Constraint-basedmorpho-phonology. InProceedingsof theFirst ACL SIGPHONWorkshop, LasCruces,NM. ACL.
Matsunaga,Sachiko. 1994. TheLinguisticandPsycholinguisticNature of Kanji: DoKanji RepresentandTrigger only Meanings?Ph.D.thesis,Universityof Hawaii,Honolulu,HI.
McManus,Damian. 1996. Celtic languages.In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages655â660.
Merrifield, William. 1968. Numbernamesin four languagesof Mexico. In HugoBrandtCorstius,editor, Grammars for NumberNames, number7 in Foundationsof Language,SupplementarySeries.D. Reidel,Dordrecht,pages91â102.
Miller, D. Gary. 1994.AncientScriptsandPhonological Knowledge. Number116inCurrentIssuesin LinguisticTheory. JohnBenjamins,Amsterdam.
Mohanan,K. P. 1986.TheTheoryof Lexical Phonology. D. Reidel,Dordrecht.
Mohri, Mehryar. 1994. Syntacticanalysisby local grammarsautomata:anefficien-t algorithm. In Papers in ComputationalLexicography: COMPLEXâ94, pages179â191,Budapest.ResearchInstitutefor Linguistics,HungarianAcademyof Sci-ences.
Mohri, Mehryar. 1997. Finite-statetransducersin languageandspeechprocessing.ComputationalLinguistics, 23(2).
Mohri, MehryarandRichardSproat.1996.An efficientcompilerfor weightedrewriterules. In 34thAnnualMeetingof the Associationfor ComputationalLinguistics,pages231â238,SantaCruz,CA. ACL.
References 215
Mountford, John. 1996. A functionalclassification. In PeterDanielsandWilliamBright, editors,TheWorldâsWriting Systems. Oxford UniversityPress,New York,NY, pages627â632.
Myers,James.1996. Prosodicstructurein Chinesecharacters.Presentedasa Posterat the5th InternationalConferenceon ChineseLinguistics,TsingHuaUniversity,Taiwan,June.
NanyangSiangPau. 1984. Learnerâs ChineseEnglishDictionary. NanyangSiangPau,UmumPublisher, Singapore.
Needham,J. 1959. ScienceandCivilisation in China: Vol. 3 â MathematicsandtheSciencesof theHeavensandtheEarth. CambridgeUniversityPress,Cambridge.with thecollaborationof WangLing.
Neijt, Anneke andAnneke Nunn. 1997. The recenthistory of Dutch orthography:Problemssolvedandcreated.LeuvenseBijdragen, 86:1â26.
Nguyen,Dinh Hoa. 1959. Chuâ nom: The demoticsystemof writing in Vietnam.Journalof theAmericanOrientalSociety, 79(4):270â274.
Nunberg,Geoffrey. 1995.TheLinguisticsof Punctuation. CSLI (Universityof Chica-goPress),Chicago,IL.
Nunn,Anneke. 1998. Dutch Orthography: A SystematicInvestigationof theSpellingof Dutch Words. Number6 in LOT InternationalSeries.HollandAcademicGraph-ics,TheHague.
OâConnor, M. 1996.EpigraphicSemiticscripts.In PeterDanielsandWilliam Bright,editors,The Worldâs Writing Systems. Oxford University Press,New York, NY,pages88â107.
Packard,Jerome,editor. 1998.New Approachesto ChineseWord Formation. Number105in Trendsin Linguistics,StudiesandMonographs.MoutondeGruyter, Berlin.
Perfetti,CharlesandLi Hai Tan. 1998.Thetime courseof graphic,phonologicalandsemanticactivation in Chinesecharacteridentification. Journal of ExperimentalPsychology: Learning, MemoryandCognition, 24(1):101â118.
Perfetti,CharlesA., LaurenceRieben,andMichel Fayol, editors. 1997. Learningto Spell: Research, Theoryand PracticeacrossLanguages. LawrenceErlbaumAssociates,Mahwah,NJ.
Pesetsky, David. 1979. Russianmorphologyandlexical theory. MS. MassachusettsInstituteof Technology.
Pettersson,JohnSoren. 1996. Numericalnotation. In PeterDanielsand WilliamBright, editors,TheWorldâsWriting Systems. Oxford UniversityPress,New York,NY, pages795â806.
216 References
Pinker, StevenandAlan Prince. 1988. On languageandconnectionism:Analysisofa paralleldistributedprocessingmodelof languageacquisition. In StevenPinkerandJacquesMehler, editors,ConnectionsandSymbols. MIT Press,pages73â193.Cognitionspecialissue.
Prince,Alan and Paul Smolensky. 1993. Optimality theory. TechnicalReport2,RutgersUniversity, Piscataway, NJ.
Radzinski,Daniel. 1991. Chinesenumber-names,treeadjoininglanguages,andmildcontext-sensitivity. ComputationalLinguistics, 17(3):277â300.
Rajemisa-Raolison,Regis. 1971. Grammaire Malgache. Centrede FormationPedagogique,Fianarantsoa,Madagascar. 7th edition.
Raman,T.V. 1994. Audio Systemfor Technical Readings. Ph.D.thesis,CornellUni-versity.
Ratliff, Martha. 1996. The Pahawh Hmong script. In PeterDanielsand WilliamBright, editors,TheWorldâsWriting Systems. Oxford UniversityPress,New York,NY, pages619â624.
Ritner, Robert.1996.Egyptianwriting. In PeterDanielsandWilliam Bright, editors,TheWorldâs Writing Systems. Oxford UniversityPress,New York, NY, pages73â84.
Romer, Jurgen.1994.Abkurzungen.In HugoStegerandHerbertErnstWiegand,ed-itors,Schrift undSchriftlichkeit/Writingandits Use, volume2. WalterdeGruyter,Berlin, chapter135,pages1506â1515.
Rumelhart,David andJamesMcClelland.1986.On learningthepasttenseof Englishverbs. In JamesMcClellandandDavid Rumelhart,editors,Parallel DistributedProcessing. MIT Press,Cambridge,MA, pages216â271.Volume2.
Sagey, Elizabeth.1986. TheRepresentationof FeaturesandRelationsin Non-LinearPhonology. Ph.D.thesis,MassachusettsInstituteof Technology, Cambridge,MA.
Salomon,Richard. 1996. South Asian writing systems(introduction). In PeterDanielsandWilliam Bright, editors,TheWorldâs Writing Systems. Oxford Uni-versityPress,New York, NY, pages371â372.
Sampson,Geoffrey. 1985.Writing Systems. StanfordUniversityPress,Stanford,CA.
Sampson,Geoffrey. 1994. Chinesescriptandthe diversityof writing systems.Lin-guistics, 32:117â132.
Schiller, Eric. 1996.Khmerwriting. In PeterDanielsandWilliam Bright, editors,TheWorldâsWriting Systems. OxfordUniversityPress,New York, NY, pages467â473.
Schreuder, Robert,AnnekeNeijt, FemkevanderWeide,andR. HaraldBaayen.1998.Regular plurals in Dutch compounds:Linking graphemesor morphemes.Lan-guageandCognitiveProcesses, 13:551â573.
References 217
Seidenberg, M., K. McRae, and D. Jared. 1988. Frequency and consistency ofspelling-soundcorrespondencesin naming. Presentedat the29thannualmeetingof thePsychonomicSociety, Chicago,November.
Seidenberg, Mark. 1990. Lexical access:Anothertheoreticalsoupstone?In DavidBalota,GiovanniFloresdâArcais,andKeithRayner, editors,ComprehensionPro-cessesin Reading. LawrenceErlbaumAssociates,Hillsdale,NJ,pages33â71.
Seidenberg,Mark. 1992.Beyondorthographicdepthin reading:Equitabledivisionoflabor. In RamFrostandLeonardKatz,editors,Orthography, Phonology, Morphol-ogyandMeaning, number94 in Advancesin Psychology. North-Holland,Amster-dam,pages85â118.
Seidenberg, Mark. 1997.Languageacquisitionanduse:Learningandapplyingprob-abilisticconstraints.Science, 275:1599â1603,March14.
Seidenberg, Mark andJamesMcClelland. 1989. A distributed,developmentalmodelof visualword recognitionandnaming.Psychological Review, 96:523â568.
Sejnowski,TerenceandC.Rosenberg. 1987.Parallelnetworksthatlearnto pronounceEnglishtext. Complex Systems, 1:145â168.
Serianni,Luca. 1989. GrammaticaItaliana: Italiano comunee lingua letteraria.UTET Libreria,Turin.
Shi, Dingxu. 1996. TheYi script. In PeterDanielsandWilliam Bright, editors,TheWorldâsWriting Systems. OxfordUniversityPress,New York, NY, pages239â243.
SkjĂŚrvø,Oktor. 1996. Aramaicscriptsfor Iranianlanguages.In PeterDanielsandWilliam Bright, editors,The Worldâs Writing Systems. Oxford University Press,New York, NY, pages515â535.
Smalley, William, Chia KouaVang,andGnia YeeYang. 1990. Mother of Writing:TheOrigin andDevelopmentof a HmongMessianicScript. Universityof ChicagoPress,Chicago,IL.
Smith,Janet.1996. Japanesewriting. In PeterDanielsandWilliam Bright, editors,TheWorldâsWriting Systems. OxfordUniversityPress,New York, NY, pages209â217.
Sproat,Richard.1992.MorphologyandComputation. MIT Press,Cambridge,MA.
Sproat,Richard.1997a.Multilingual text analysisfor text-to-speechsynthesis.Jour-nal of Natural LanguageEngineering, 2(4):369â380.
Sproat,Richard,editor. 1997b. Multilingual Text to Speech Synthesis:TheBell LabsApproach. Kluwer AcademicPublishers,Boston,MA.
Sproat,RichardandChilin Shih. 1990. A statisticalmethodfor finding word bound-ariesin Chinesetext. ComputerProcessingof ChineseandOriental Languages,4:336â351.
218 References
Sproat,RichardandChilin Shih. 1995.A corpus-basedanalysisof Mandarinnominalrootcompounds.Journalof EastAsianLinguistics, 4(1):1â23.
Sproat,Richard,Chilin Shih, William Gale,andNancy Chang. 1996. A stochasticfinite-stateword-segmentationalgorithmfor Chinese.ComputationalLinguistics,22:377â404.
Stampe,David. 1976. Cardinalnumbersystems.In Papers from the 12th RegionalMeeting, Chicago Linguistic Society, pages594â609,Chicago,IL. ChicagoLin-guisticSociety.
Steriade,Donca. 1982. GreekProsodiesand the Nature of Syllabification. Ph.D.thesis,MassachusettsInstituteof Technology, Cambridge,MA.
Steriade,Donca.1999. Paradigmuniformity andthephonetics-phonologyboundary.In MichaelBroeandJanetPierrehumbert,editors,Papersin LaboratoryPhonologyV. CambridgeUniversityPress.
Stone,Gregory and Guy Van Orden. 1994. Building a resonanceframework forword recognitionusing designand systemprinciples. Journal of ExperimentalPsychology: HumanPerceptionandPerformance, 20(6):1248â1268.
Talkin, David. 1995.A robustalgorithmfor pitch tracking(RAPT). In W. Kleijn andK. K. Paliwal, editors,Speech CodingandSynthesis. Elsevier, New York, NY.
Taraban,R. andJamesMcClelland. 1987. Conspiracy effects in word recognition.Journalof MemoryandLanguage, 26:608â631.
Thomson,Robert. 1969. The study of Manx Gaelic. Proceedingsof the BritishAcademy, 60:179â210.Sir JohnRhysMemorialLecture.
Tzeng,AngelaKu-Yuan. 1994. ComparativeStudieson Word Perceptionof ChineseandEnglish:EvidenceAgainstanOrthographic-SpecificHypothesis. Ph.D.thesis,Universityof California,Riverside.
Vachek,Josef.1973.WrittenLanguage: General ProblemsandProblemsof English.Mouton,TheHague.
van den Bosch, Antal, Alain Content,Walter Daelemans,and Beatricede Gelder.1994. Measuringthecomplexity of writing systems.Journal of QuantitativeLin-guistics, 1:178â188.
VanOrden,Guy, BrucePennington,andGregoryStone.1990.Word identificationinreadingandthepromiseof subsymbolicpsycholinguistics.Psychological Review,97(4):488â522.
Venezky, Richard.1970.TheStructure of EnglishOrthography. Number82 in JanuaLinguarum.Mouton,TheHague.
References 219
Voutilainen,Atro. 1994. ThreeStudiesof Grammar-BasedSurfaceParsingof Unre-strictedEnglishText. Ph.D.thesis,Universityof Helsinki,Helsinki. PublishedasPublicationsof theDepartmentof General Linguistics,Universityof Helsinki,no.24.
Wade,Terence.1992.A ComprehensiveRussianGrammar. Blackwell,Oxford.
Wang,Jason.1983. Toward a GenerativeGrammarof ChineseCharacterStructureandStrokeOrder. Ph.D.thesis,Universityof Wisconsin,Madison,WI.
Wang,Jian,Albrecht Inhoff, andHsuan-ChihChen,editors. 1999. ReadingChineseScript. LawrenceErlbaumAssociates,Mahwah,NJ.
Wang,Kuijing. 1996. XiandaiHanyuSuolueyuCidian (A Dictionary of Present-DayChineseAbbreviations). ShangwuPrintingHouse,Beijing.
Wells, J. C. 1982. Accentsof English 1: An Introduction. CambridgeUniversityPress,Cambridge.
Wieger, L. 1965. ChineseCharacters. Dover, New York. Republicationof secondedition,published1927by CatholicMissionPress.
Yarowsky, David. 1996. Homographdisambiguationin text-to-speechsynthesis.In Janvan Santen,RichardSproat,JosephOlive, andJulia Hirschberg, editors,Progressin Speech Synthesis. Springer, New York, pages157â172.