grapho-phonological parsing: corpus annotation for … › bmolinea › eshp3.pdf‘invesqgaqng...
TRANSCRIPT
DataavailablefromLAOS:
Example:LAOStext#341,NLS
Ms34.4.3 Year:
TheFromInglisToScots(FITS)ProjectandOlderScotsphonologyFITS(AHRCgrantnumberAH/L004542/1)isafour-yearprojectattheAngusMcIntoshCentreforHistoricalLinguisQcs.Focus:thesound/spellinghistoryofearlyScotsasevidencedinrootmorphemesofGermanicoriginMainRQ:WhatphonologicalfactsunderliethediversityofspellingaWestedinScotsoftheperiod1380-1500?Mainoutput:afreelyavailable,fullysearchableonlinedatabasewhichestablishes,quanQfiesandvisualisesrelaQonsbetweenunitsofsoundandtheirspellings.Possibleuser-definedquesQons:• Whatsound(s)didthedigraph<ch>representin15th-centuryScots?• Whenandwhereistheta-hardening([θ]>[t])aWestedinearlyScotsspellings?• WhatarethereflexesofOldEnglish/f/in15th-centuryScots?
Historicalcorpusphonology:canitbedone?VariaQoninnon-standardisedalphabeQcsystems,suchasthoseofpre-modernEurope,haslongbeenexploitedtoreconstructdiachronicanddiatopicalternantsinphonologicalhistories(e.g.McIntosh1956;Laing&Lass2003).However,electroniccorporaforthehistoryoflanguagearerarelybuiltwithphonologicalquesQonsinmind.Historicalsoundsubstanceismediatedbyagraphicsystemwhichmakesitdifficulttointerpretthebasicfacts.Thebuildingofhistoricalphonologicalcorpora,whilepossible,requiresafairdegreeofpreliminaryanalysisinordertoestablishthepotenQalsound-spellingmappingsofthelanguage.Whilethismaybeapainstakingfirststep,itissurprisingnobespoketoolshavethusfarbeendevelopedtoassistintheprocess.
Theoriginaldataset:ALinguis3cAtlasofOlderScots(‘LAOS’,Williamson2008)• c.1,250‘localdocuments’Burghrecords,charters,deeds,wills,etc.• c.400,000words• Mostlylocalisedanddated1380-1500• DiplomaQcallytranscribedandlexico-grammaQcallytagged
BibliographyAitken,A.J.&CarolineMacafee.2002.TheOlderScotsvowels:AhistoryofthestressedvowelsofOlderScotsfromthebeginningstothe
eighteenthcentury.Edinburgh:TheScoqshTextSociety.Alcorn,Rhona,BenjaminMolineaux,JoannaKopaczyk,VasiliosKaraiskos,BeWelouLos&WarrenMaguire.2017.'Theemergenceof
Scots:CluesfromGermanic*areflexes'inJ.CruickshankandR.McCollMillar(eds.)BeforetheStorm:PapersfromtheForumforResearchontheLanguagesofScotlandandUlstertriennialmeeBng,Ayr2015,pp.1-32.Aberdeen:FRLSU.
CoNE.2013ACorpusofNarraBveEtymologiesfromProto-OldEnglishtoEarlyMiddleEnglishandaccompanyingCorpusofChangescompiledbyRogerLass,MargaretLaing,RhonaAlcorn&KeithWilliamson[hWp://www.lel.ed.ac.uk/ihd/CoNE/CoNE.html].Edinburgh:Version1.1,2013-,©TheUniversityofEdinburgh.
Maguire,Warren,Alcorn,Rhona,BenjaminMolineaux,JoannaKopaczyk,DaisySmith,VasiliosKaraiskos&BeWelouLos.Forthcoming.‘InvesQgaQngevidenceforfinal[v]-devoicinginOlderScots’.
McIntosh,Angus1956.‘TheanalysisofMiddleEnglishtexts’.TransacBonsofthePhilologicalSociety55(1):26-55.Molineaux,Benjamin,JoannaKopaczyk,WarrenMaguire,RhonaAlcorn,VasiliosKaraiskos&BeWelouLos.2016.‘TracingL-vocalisaQon
inearlyScots’.PapersinHistoricalPhonology1,pp.187-217.Molineaux,Benjamin,JoannaKopaczyk,WarrenMaguire,RhonaAlcorn,VasiliosKaraiskos&BeWelouLos.Forthcoming.‘Anemergent
15cScotsspellingnorm:contrasQvevoicingindentalfricaQves’Johnston,Paul.1997.‘OlderScotsphonologyanditsregionalvariaQon’.InCharlesJones(ed.)TheEdinburghhistoryoftheScots
language,47-111.Edinburgh:EdinburghUniversityPress.Kopaczyk,Joanna,BenjaminMolineaux,VasiliosKaraiskos,RhonaAlcorn,BeWelouLos&WarrenMaguire.2018.‘Towardsagrapho-
phonologicallyparsedcorpusofmedievalScots:DatabasedesignandtechnicalsoluQons’,Corpora13(2).Laing,Margaret&RogerLass.2003.’Talesof1001nists:ThephonologicalimplicaQonsofliWeralsubsQtuQonsetsinsomethirteenth-centurySouth-WestMidlandtexts',EnglishLanguageandLinguisBcs7(2),pp.257-278.
LAOS.2008.ALinguisBcAtlasofOlderScots,Phase1:1380-1500.CompiledbyKeithWilliamson.RetrievedfromhWp://www.lel.ed.ac.uk/ihd/laos1/laos1.html.TheUniversityofEdinburgh.
Grapho-phonologicalparsing:Mappingspellingstosounds: Weassumethatoursourcematerialsweresetdownbyscribes“capableofsophisQcatedandsubtlelinguisQcanalysis”(Laing&Lass2003:258),soweexpecttheretobeasystemaQcconnecQon—albeitnotnecessarilyaone-to-onematch—betweenorthographicchoicesandunderlyingsoundsystems.Eachvariantspellingoftheroot-morphemesintheLAOScorpusisbrokenupintoasequenceofgraphemicunits,preservingtheirmorphological/graphologicalcontext.
EachgraphemeisthenassignedaplausiblesoundvaluebytriangulaQngonanumberoffactors(seeKopaczyketal.2018fordetails):
TheMedusa:Grapho-phonologicalsetsvisualisation*
Geographicalpinpointingofattestations
Viewingattestationsincontext(texts)
Mappingsoundstosources:ThediachronicdimensionSinceasizeableamountofwell-describeddataisavailablefortheGermanicsourcesofOlderScots,(OldEnglish,NorseandMiddleDutch),wecanidenQfymostofthelikelyhistoricalantecedentsofourtargetmorphemes.ThisallowsustopinpointparQculardiachronictrajectoriesforsoundsandmorphemes,helpingusalsoimprovetheaccuracyofourproposedsoundvaluesfortheOlderScotsperiod.WeaWempttomatcheachsoundintheOlderScotslayertotheaWestedformintherelevant(usuallynorthern)dialectsofOldEnglish,aswellasNorseandMiddleDutch.Wherethereisamismatchbetweenthesourceandthecorpusform,weproposeachange,drawingonexisQngliteratureandthegeneraldistribuQonofouraWestedvariants.
Whatcanyoudowithagrapho-phonologicallyparsedcorpus?Thecorpusallowsforafine-grainedexaminaQonofthephonotacQcandmorphotacQcdistribuQonofindividualsound-spellingpairingsaswellasvariaQonintheirvaluesoverQme,spaceandtext.Itfurtherallowsusersto:• Selectspecificsound,orthographicandgrammaQcalenvironments• DefinetemporalandspaQaldomainsforsearchresults• Traceetymologicalsourcesmorpheme-by-morphemeandsound-by-sound• LinketymologicalsourcestocorpusaWestaQonsviaaCorpusofChanges• FurtherinvesQgateformsvialinkstotheonlineDicQonaryoftheScotsLanguageandOED• Accessfullsourcetextsforcontext-checkingandcreaQngscribalprofiles
*MedusaIIisunderdevelopmentandwillallowmappingofsourcesegmentstoattestationsinourcorpus
WestartwithindividualtokensandestablishpaWernsacrosstheenQredataset.Asmoredataisenteredinthedatabase,theiniQalassumpQonsarereevaluated.Gradually,weestablishanetworkofrelaQonshipsbetweenthegraphemicunitsandtheirplausibleunderlyingsounds.WeuseabespokevisualisaQontoolcalledMedusa.• SoundsubsKtuKonsetsidenQfysoundsassociatedwithaspecificgrapheme• GraphemicsubsKtuKonsetsidenQfygraphemesassociatedwithaparQcularsound.Manysoundsandgraphemesbelongtomorethanoneset.
AcorpusofChangesFollowingtheexampleinCoNE(Lassetal.2013),wegiveadetaileddescripQonofeachoneofthechangesinvokedtomaptheproposedsourceformtotheplausibleFITSsoundvalue.
AgraphemicsubsKtuKonset:[ð]&[θ],morphemeiniKally
TheCorpusofChanges:
Whathavewefoundoutsofar?1. OurperiodhasfewlocalisableinnovaQons,asmightbe
expectedfromarelaQvelynewdialect(Alcornetal.2017).2. Changesdescribedelsewhereasquickandcomplete(such
asL-vocalisaQon)mayprogressslowlyoverQme,phonologicalenvironmentsandthelexicon(Molineauxetal.2016).
3. OlderScotsasawholeinnovateddisQnctspellingconvenQonssuchas<y>for[ð]vs.<th>for[θ](Molineauxetal.forthcoming)
4. Somechangesadvancedduringourperiodandlaterprobablyreversed(especiallyinthefaceofAnglicisaQon),suchasthecaseofpre-inflecQonaldevoicingoffricaQves(Maguireetal.forthcoming)
ProporQonofmedial<y>(orange),<th>(grey)and<þ>(yellow)foretymological[ð]bydecade.Blackline=datadensity.
leader trailer
AsoundsubsKtuKonset:<ch>
Tableview(+dataextraction)
Based on all FITS morphemes with <ch>๏ [x] = aucht, dochter, loch …๏ [ç] = nicht, echt, hech …๏ [θ] = bach, lench, muoch, strencht … ๏ [ʧ] = chalys, cheike, chekin, cheis …๏ [k] = chorn, chynde, chechyne …๏ [ð] = worchy, nechtir, skachlaß …
Datacapturetool
Grapho-phonological parsing: Corpus annotation for historical phonology
B. MOLINEAUX1, J. KOPACZYK2, V. KARAISKOS1, D. SMITH1, W. MAGUIRE1, R. ALCORN1 & B. LOS1 1 The University of Edinburgh; 2The University of Glasgow
[ð]-morphemes:thus,there,
those,thence,etc.
[θ]-morphemes:three,thief,think,thaw,thanketc.
The FITS Toolbox
spellings
sounds