jurgis pakerys (vilnius...
TRANSCRIPT
JurgisPakerys(VilniusUniversity)Measuringmorphologicalproductivity
GraduateSchoolofLinguistics,PhilosophyandSemiotics(GSLPS)
TartuUniversity,March20,2017
Outline1.Productivityandfrequency2.Measuringproductivity2.1.Sourcesofmeasurements2.2.Realizedproductivity2.3.Hapax-basedmeasuresofproductivity
2.3.1.Expandingproductivity2.3.2.Potentialproductivity
4.Summary5.References
1.Productivity andfrequency
Morphologicalprocessesrelatedtolexemes:• Composition• Derivation• Assignmenttoinflectionalclasses(=declinations,conjugations)
• Grammaticalforms
1.Productivity andfrequency
Frequencyvs.productivity• Frequent =abundant=affectsmanymembers• Productive =alive=attracts/producesmanyNEWmembers
1.Productivity andfrequency
Understandingfrequency
• Token frequency=numberoftimesalexemeoccursinthecorpus
• Type frequency=numberoftimesamorphologicalprocessisfoundinalllexemesofthecorpus
1.Productivity andfrequency
Typevs.token,artificialexample
• Token frequencyofmängi-mine is567=variousformsofthisNoccur567timesinagivencorpus
• Type frequencyof-mineis14232=suffix-mineisfound14232timesinthelistoflexemes(nottheirforms!)ofagivencorpus
1.Productivity andfrequency
Combinationsoffrequencyandproductivity1.FrequentandProductive• Hightypefrequency• Attractsnewmembers
2.FrequentandNon-Productive• Hightypefrequency• Doesnotattractnewmembers
1.Productivity and frequency
Combinationsoffrequencyandproductivity3.ProductiveandNon-Frequent• Attractsnewmembers• Lowtypefrequency
4.Non-productiveandNon-Frequent• Doesnotattractnewmembers• Lowtypefrequency
2.Measuringproductivity
2.1.Sourcesofmeasurements• Dictionaries• Corpora• Questionnaires,tests–Open-endedcoinagetests,judgmenttasks(see,forexample,Bolozky1999)
2.2.Realizedproductivity
• Numberofthemembersofthemorphologicalprocessinadictionary/corpus
• Realizedproductivity,extentofuse(Baayen2009:904)
• Frequency=/≠productivity• Neologisms!
2.2.RealizedproductivityDoingit:• Getatraditionaldictionary oralist ofalllemmasofthecorpus
• Filter byaffix(+anyadditionalparametersavailable);whataboutcompounds?
2.2.Realizedproductivity• Clean thedatamanually(synchronicallynon-deriveditems,non-affixes,etc.)
• Delete innerderivationalcycles(optional),cf.English:
• decompos-able <de-compose<compose• de- shouldcountasaderivationalaffixindecomposable
• Butcf.Gaeta&Ricca(2006:79-83)oninnerderivationalcycles:notsoimportant!
2.2.Realizedproductivity
Example(Gaeta&Ricca 2006)
• Corpusstudy(LaStampa,1996-98,75M)• Countingtypes,V(N),verticalaxis• Countingtokens,N,horizontalaxis
1. -mente:adverb2. -mento,-(t)ura,-nza:actionnoun
2.2.Realizedproductivity
Criticizingit:• RealizedproductivityshowshowproductiveamorphologicalprocesswasinthePAST
• WhatprocessesareattractingnewmembersNOW?WhatabouttheFUTURE?
2.3.Hapax-basedmeasuresofproductivity
• Hapax(legomenon)• Attestedonlyonceinacorpus
• Sometimesignoredasrubbish(numbers,typos,crazycharactersequences,etc.)
2.3.Hapax-basedmeasuresofproductivity
• Correlationbetweenhapaxesand newformations/newborrowings
• Donotjustbelieveit,let’sthink:whynewwordsarerare?
2.3.Hapax-basedmeasuresofproductivity
• Note:notallhapaxesarenewwords,butitisfine,theyarejustagoodstatisticalindicator!(cf.Baayen2009:906)
• Sizematters:thebigger,thebetter(?)(seeBaayen1993:189,2009:905)
2.3.Hapax-basedmeasuresofproductivity
Twohapax-basedmeasures• Expanding productivity• Potential productivity
• SeeBaayen 1993,2009:905-907
2.3.1.Expandingproductivity
• V(1,N),thenumberof(derivationallytransparent)hapaxeswiththeaffixX
• V(1),thetotalnumberofhapaxesofthecorpus
P*=V(1,N)/V(1)
• P*showsthemarketshareoftheaffixinthemarketofhapaxes(=possiblynewwords)Baayen2008:902,905
2.3.1.Expandingproductivity
Doingit:• Getthelistofhapaxes ofagivencorpus (DIYoraskforhelp)
• Alemmatizedlistofhapaxes helpsalotforalanguagelikeEstonian
• Filtertheitemsyouareinterestedin(accordingtotheaffixes,etc.)
• Manuallycleanthelists(seeaboveonrealizedproductivity)
2.3.1.Expandingproductivity
• CountP*values• Rank themorphologicalprocesses(affixes,etc.)accordingtoP*
• Q:isdivisionbythetotalnumberofhapaxesofthecorpusnecessary?
2.3.1.Expandingproductivity
Criticizingit:• Someprocesses(affixes,etc.)getextremelyhighnumbersofhapaxes,buttheydonotseemtobeasproductive
• Example:Italiandeverbalagentsuffix-(t)ore(male/generic) has2xmorehapaxes than-trice (female)(Gaeta&Ricca 2006:73-74)
• Notfair!
2.3.1.Expandingproductivity
• Variablecorpusapproach(Gaeta&Ricca2006)
• Counthapaxesforequalnumbersoftokensofagivenprocess
• Forthis,thesizesofthesubcorporawillbedifferent(=variablecorpus)
• Weakness:someaffixesdonotreachthetokenfrequencyneeded(then:binominalinterpolation,extrapolation)
2.3.1.Expandingproductivity
• P*andinflectionclass(IC)productivity?• Wurzel1989:149onnewformations/loansasindicatorsofproductiveICs
• Seeesp.Gaeta2009onusingvariablecorpusapproachtomeasureinflectionalmorphology
2.3.2.Potentialproductivity
• V(1,N),thenumberofhapaxeswiththeaffixX• N,thenumberofformsoflexemeswiththeaffixX (tokens,lexemefrequency)
P=V(1,N)/N
2.3.2.Potentialproductivity
• HighervalueofP:– theformsoflexemeswiththeaffixX are(still)comparativelyrare– theaffixXhasthepotentialtogetalargershareoftheonomasiologicalmarket(Baayen2008:902,906)
• Alternative:variablecorpusapproach(countPforequalnumbersoftokensofagivenaffix)
2.3.2.Potentialproductivity• Example,Dutch(Baayen2008:905-907)• -ster (deverbalagent,female)• ver- (verbalprefix)• -stershouldbemoreproductive(intuitively)
• Types(42Mcorpus):370(-ster) vs.985(ver-)• Hapaxes:161(-ster)vs.274(ver-)• Potentialprod.:0.031(-ster)vs.0.001(ver-)
2.3.2.Potentialproductivity
Doingit:• Getthelistoflexemeswithtokenfrequencydata,filtertherelevantones,cleanthelistmanually,countthetotaltokenfrequency
• Getthelistofhapaxes (filterthefirstlist,frequency=1),filtertherelevantitems,cleanthelistmanually
• CountPvalue,ranktheaffixesaccordingtoit
Summary
• Realized productivity• Hapax-basedmeasures– Expanding productivity(hapaxeswithaffixX:allhapaxes)– Potential productivity(hapaxeswithaffixX:tokenswithaffixX)
• Variablecorpusapproach
Referencesandfurtherreading• WebsiteofR.H.Baayen:http://www.sfs.uni-tuebingen.de/~hbaayen/
• Baayen 1993.Onfrequency,transparency,andproductivity.InBooij,G.E.,andMarle,J.van(Eds),YearbookofMorphology1992,KluwerAcademicPublishers,Dordrecht,181-208.
• Baayen 2009.Corpuslinguisticsinmorphology:morphologicalproductivity.InLüdeling,A.,andKyto,M.(Eds.)CorpusLinguistics.Aninternationalhandbook.MoutonDeGruyter,Berlin,900-919.
• Bolozky 1999.Measuringproductivity inwordformation:thecaseofIsraeliHebrew.Leiden:Brill.
Referencesandfurtherreading
• Gaeta2009.Inflectionalmorphologyandproductivity:Consideringqualitativeandquantitativeapproaches,inP.O.Steinkrüger &M.Krifka (eds.),OnInflection,Berlin,MoutondeGruyter,2009,45-68.
• Gaeta&Ricca 2006.ProductivityinItalianwordformation:Avariable-corpusapproach.Linguistics44-1,57–89.
Referencesandfurtherreading
• Gaeta&Ricca 2015.Productivity,inP.O.Mul̈ler,I.Ohnheiser,S.Olsen,F.Rainer(eds.),Word-Formation.AnInternationalHandbookoftheLanguages ofEurope,Vol.2,Berlin/NewYork:MoutondeGruyter,2015,841-858.
• Wurzel 1989. Inflectional Morphology andNaturalness,Dordrecht:Kluwer.