the particle placement alternation in world englishes: an ......that tell us? exapp, münster...

The particle placement alternation in World

Englishes:

an experimental validation of probabilistic

corpus models?

Laura Rosseel, Benedikt Szmrecsanyi & Jason Grafmiller

RG Quantitative Lexicology and Variational Linguistics

introduction

ExAPP, Münster 27.09.2019

introduction

which of these social psychological measures could be useful to study the social meaning of language variation?

Relational Responding Task


experimental data

observational data

corpus based regression models of language variation

rating tasks gauging speaker intuitions about

language variation

previous work

• Rosenbach (2003)

• Bresnan (2007)

• Bresnan & Ford (2010)

• Klavan & Divjak (2016)

• Klavan & Veismann (2017)

• …

human raters and statistical models trained on corpus data give similar predictions of language variation

can we replicate these findings? and if so, what does that tell us?


case study


case study

particle placement alternation in world Englishes


Exploring probabilistic grammar(s) in varieties of English around the world

Benedikt SzmrecsanyiJason Grafmiller Laura Rosseel Melanie RöthlisbergerBenedikt Heller

case study


(1) continuous variant (verb – particle – object)

My 6yo old actually picked up the book and started reading it! <GloWbE:NZ>

(2) split variant (verb – object – particle)

I have not picked the book up in 2 years. <GloWbE:CA>

ExAPP, Münster 27.09.2019Grafmiller & Szmrecsanyi (2018)

case study



British EnglishCanadian English

Irish EnglishNew Zealand EnglishHong Kong English

Indian EnglishJamaican EnglishPhilippine EnglishSingapore English

modelling the PP alternation

• corpus study

• N = 11,454 datapoints from ICE and GloWbE

• factors conditioning the variation:

Grafmiller & Szmrecsanyi (2018)ExAPP, Münster 27.09.2019

direct object word length structural persistence

givenness cv alternation (phonetics)

definiteness rhythm (stressed syllables)

concreteness semantic compositionality particle verb

thematicity information content verb and particle

presence post-modifying directional PP information content particle

variety genre


• comparative sociolinguistics (e.g. Tagliamonte 2013):

regression modelling across multiple datasets

Response ~ Register + DirObjLength + Semantics + DirObjConcreteness + DirObjGivenness + DirObjDefiniteness + DirObjThematicity + DirectionalPP + CV.binary + Surprisal.P + Surprisal.V + Rhythm + PrimeType + (1|Verb) + (1|Particle) + (1|VerbPart) + (1|Genre)



• results show high uniformity between inner circle / native varieties (e.g. British English, Canadian English, New Zealand English)

• but dissimilarities across outer circle / non-native varieties (e.g. Indian English, Singaporean English) as well as between inner and outer circle varieties

e.g. role of direct object word length


how do model predictions compare

to human raters?


rating experiment

experimental design


task


inspired byBresnan & Ford

(2010)30 items

naturalnessratings

online experiment

stimuli

• random selection of 30 corpus items

• from 5 probability bins based on corpus model


stimuli



ExAPP, Münster 27.09.2019version A version B

stimuli



• divided over two versions: 15 target items per version avoid fatigue

• 15 fillers

• controlled for genre

• items presented semi-random fixed order


participants


British participants Indian participants

- N = 60 (NA = 32, NB = 28)- recruited via Amazon

Mechanical Turk- English as native language

- grew up in the UK- highly educated- regionally diverse

- N = 55 (NA = 27, NB = 28)- recruited via personal

network- diverse set of native

languages- grew up in India- highly educated- regionally diverse

results


macro level

on the whole, corpus predictions and ratings in agreement


GB IN

macro level

on the whole, corpus predictions and ratings in agreement

but what does this actually tell us?


focus on individual variation:

categoriality


GB IN


categoriality



categoriality


• quantification: what range of the rating scale is used?

• slight difference between both groups

• why?


categoriality


• quantification: what range of the rating scale is used?

• slight difference between both groups

• why?to do with difference between L1 and L2? significant correlation between usage of the rating scale and self-reported level of English and confidence in English in Indian participants

focus on single linguistic factor:

end weight

• corpus study shows differences between varietiesregarding the role of certain predictors of the variable

cf. Bresnan & Ford (2010)



end weight

• corpus study shows differences between varietiesregarding the role of certain predictors of the variable


• direct object word length more important predictor for GB than IN

• do we find evidence of this in the experimental ratings?


end weight


predi

cted

rat

ing

(sta

ndar

dize

d)

direct object word length

IN

GB

continuous

split


end weight

• surprising uniformity (cf. Labov 2009)

• why can we not replicate the end weight effect from thecorpus study?



end weight

• surprising uniformity (cf. Labov 2009)

• why can we not replicate the end weight effect from thecorpus study?

– time gap: corpus data mainly from the 1990s

– how comparable are the language users from thecorpus and our participants?

– register

– naturalistic corpus data vs. rating task


so…


corpus data vs. rating data

• predictive capacities of corpus models and raters align globally, but not on the level of the individual predictor

• interesting patterns of individual variation to be explored further

• how can experimental rating data and corpus data be combined to inform linguistic analyses?

• to do? additional varieties: Philippine and Jamaican English


thank you!

[email protected]


the particle placement alternation in world englishes: an ......that tell us? exapp, münster...

Documents