jurgis pakerys (vilnius...

Post on 05-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

JurgisPakerys(VilniusUniversity)Measuringmorphologicalproductivity

jurgis.pakerys@flf.vu.lt

GraduateSchoolofLinguistics,PhilosophyandSemiotics(GSLPS)

TartuUniversity,March20,2017

Outline1.Productivityandfrequency2.Measuringproductivity2.1.Sourcesofmeasurements2.2.Realizedproductivity2.3.Hapax-basedmeasuresofproductivity

2.3.1.Expandingproductivity2.3.2.Potentialproductivity

4.Summary5.References

1.Productivity andfrequency

Morphologicalprocessesrelatedtolexemes:• Composition• Derivation• Assignmenttoinflectionalclasses(=declinations,conjugations)

• Grammaticalforms

1.Productivity andfrequency

Frequencyvs.productivity• Frequent =abundant=affectsmanymembers• Productive =alive=attracts/producesmanyNEWmembers

1.Productivity andfrequency

Understandingfrequency

• Token frequency=numberoftimesalexemeoccursinthecorpus

• Type frequency=numberoftimesamorphologicalprocessisfoundinalllexemesofthecorpus

1.Productivity andfrequency

Typevs.token,artificialexample

• Token frequencyofmängi-mine is567=variousformsofthisNoccur567timesinagivencorpus

• Type frequencyof-mineis14232=suffix-mineisfound14232timesinthelistoflexemes(nottheirforms!)ofagivencorpus

1.Productivity andfrequency

Combinationsoffrequencyandproductivity1.FrequentandProductive• Hightypefrequency• Attractsnewmembers

2.FrequentandNon-Productive• Hightypefrequency• Doesnotattractnewmembers

1.Productivity and frequency

Combinationsoffrequencyandproductivity3.ProductiveandNon-Frequent• Attractsnewmembers• Lowtypefrequency

4.Non-productiveandNon-Frequent• Doesnotattractnewmembers• Lowtypefrequency

2.Measuringproductivity

2.1.Sourcesofmeasurements• Dictionaries• Corpora• Questionnaires,tests–Open-endedcoinagetests,judgmenttasks(see,forexample,Bolozky1999)

2.2.Realizedproductivity

• Numberofthemembersofthemorphologicalprocessinadictionary/corpus

• Realizedproductivity,extentofuse(Baayen2009:904)

• Frequency=/≠productivity• Neologisms!

2.2.RealizedproductivityDoingit:• Getatraditionaldictionary oralist ofalllemmasofthecorpus

• Filter byaffix(+anyadditionalparametersavailable);whataboutcompounds?

2.2.Realizedproductivity• Clean thedatamanually(synchronicallynon-deriveditems,non-affixes,etc.)

• Delete innerderivationalcycles(optional),cf.English:

• decompos-able <de-compose<compose• de- shouldcountasaderivationalaffixindecomposable

• Butcf.Gaeta&Ricca(2006:79-83)oninnerderivationalcycles:notsoimportant!

2.2.Realizedproductivity

Example(Gaeta&Ricca 2006)

• Corpusstudy(LaStampa,1996-98,75M)• Countingtypes,V(N),verticalaxis• Countingtokens,N,horizontalaxis

1. -mente:adverb2. -mento,-(t)ura,-nza:actionnoun

2.2.Realizedproductivity

Criticizingit:• RealizedproductivityshowshowproductiveamorphologicalprocesswasinthePAST

• WhatprocessesareattractingnewmembersNOW?WhatabouttheFUTURE?

2.3.Hapax-basedmeasuresofproductivity

• Hapax(legomenon)• Attestedonlyonceinacorpus

• Sometimesignoredasrubbish(numbers,typos,crazycharactersequences,etc.)

2.3.Hapax-basedmeasuresofproductivity

• Correlationbetweenhapaxesand newformations/newborrowings

• Donotjustbelieveit,let’sthink:whynewwordsarerare?

2.3.Hapax-basedmeasuresofproductivity

• Note:notallhapaxesarenewwords,butitisfine,theyarejustagoodstatisticalindicator!(cf.Baayen2009:906)

• Sizematters:thebigger,thebetter(?)(seeBaayen1993:189,2009:905)

2.3.Hapax-basedmeasuresofproductivity

Twohapax-basedmeasures• Expanding productivity• Potential productivity

• SeeBaayen 1993,2009:905-907

2.3.1.Expandingproductivity

• V(1,N),thenumberof(derivationallytransparent)hapaxeswiththeaffixX

• V(1),thetotalnumberofhapaxesofthecorpus

P*=V(1,N)/V(1)

• P*showsthemarketshareoftheaffixinthemarketofhapaxes(=possiblynewwords)Baayen2008:902,905

2.3.1.Expandingproductivity

Doingit:• Getthelistofhapaxes ofagivencorpus (DIYoraskforhelp)

• Alemmatizedlistofhapaxes helpsalotforalanguagelikeEstonian

• Filtertheitemsyouareinterestedin(accordingtotheaffixes,etc.)

• Manuallycleanthelists(seeaboveonrealizedproductivity)

2.3.1.Expandingproductivity

• CountP*values• Rank themorphologicalprocesses(affixes,etc.)accordingtoP*

• Q:isdivisionbythetotalnumberofhapaxesofthecorpusnecessary?

2.3.1.Expandingproductivity

Criticizingit:• Someprocesses(affixes,etc.)getextremelyhighnumbersofhapaxes,buttheydonotseemtobeasproductive

• Example:Italiandeverbalagentsuffix-(t)ore(male/generic) has2xmorehapaxes than-trice (female)(Gaeta&Ricca 2006:73-74)

• Notfair!

2.3.1.Expandingproductivity

• Variablecorpusapproach(Gaeta&Ricca2006)

• Counthapaxesforequalnumbersoftokensofagivenprocess

• Forthis,thesizesofthesubcorporawillbedifferent(=variablecorpus)

• Weakness:someaffixesdonotreachthetokenfrequencyneeded(then:binominalinterpolation,extrapolation)

2.3.1.Expandingproductivity

• P*andinflectionclass(IC)productivity?• Wurzel1989:149onnewformations/loansasindicatorsofproductiveICs

• Seeesp.Gaeta2009onusingvariablecorpusapproachtomeasureinflectionalmorphology

2.3.2.Potentialproductivity

• V(1,N),thenumberofhapaxeswiththeaffixX• N,thenumberofformsoflexemeswiththeaffixX (tokens,lexemefrequency)

P=V(1,N)/N

2.3.2.Potentialproductivity

• HighervalueofP:– theformsoflexemeswiththeaffixX are(still)comparativelyrare– theaffixXhasthepotentialtogetalargershareoftheonomasiologicalmarket(Baayen2008:902,906)

• Alternative:variablecorpusapproach(countPforequalnumbersoftokensofagivenaffix)

2.3.2.Potentialproductivity• Example,Dutch(Baayen2008:905-907)• -ster (deverbalagent,female)• ver- (verbalprefix)• -stershouldbemoreproductive(intuitively)

• Types(42Mcorpus):370(-ster) vs.985(ver-)• Hapaxes:161(-ster)vs.274(ver-)• Potentialprod.:0.031(-ster)vs.0.001(ver-)

2.3.2.Potentialproductivity

Doingit:• Getthelistoflexemeswithtokenfrequencydata,filtertherelevantones,cleanthelistmanually,countthetotaltokenfrequency

• Getthelistofhapaxes (filterthefirstlist,frequency=1),filtertherelevantitems,cleanthelistmanually

• CountPvalue,ranktheaffixesaccordingtoit

Summary

• Realized productivity• Hapax-basedmeasures– Expanding productivity(hapaxeswithaffixX:allhapaxes)– Potential productivity(hapaxeswithaffixX:tokenswithaffixX)

• Variablecorpusapproach

Referencesandfurtherreading• WebsiteofR.H.Baayen:http://www.sfs.uni-tuebingen.de/~hbaayen/

• Baayen 1993.Onfrequency,transparency,andproductivity.InBooij,G.E.,andMarle,J.van(Eds),YearbookofMorphology1992,KluwerAcademicPublishers,Dordrecht,181-208.

• Baayen 2009.Corpuslinguisticsinmorphology:morphologicalproductivity.InLüdeling,A.,andKyto,M.(Eds.)CorpusLinguistics.Aninternationalhandbook.MoutonDeGruyter,Berlin,900-919.

• Bolozky 1999.Measuringproductivity inwordformation:thecaseofIsraeliHebrew.Leiden:Brill.

Referencesandfurtherreading

• Gaeta2009.Inflectionalmorphologyandproductivity:Consideringqualitativeandquantitativeapproaches,inP.O.Steinkrüger &M.Krifka (eds.),OnInflection,Berlin,MoutondeGruyter,2009,45-68.

• Gaeta&Ricca 2006.ProductivityinItalianwordformation:Avariable-corpusapproach.Linguistics44-1,57–89.

Referencesandfurtherreading

• Gaeta&Ricca 2015.Productivity,inP.O.Mul̈ler,I.Ohnheiser,S.Olsen,F.Rainer(eds.),Word-Formation.AnInternationalHandbookoftheLanguages ofEurope,Vol.2,Berlin/NewYork:MoutondeGruyter,2015,841-858.

• Wurzel 1989. Inflectional Morphology andNaturalness,Dordrecht:Kluwer.

top related