the particle placement alternation in world englishes: an ......that tell us? exapp, münster...
TRANSCRIPT
The particle placement alternation in World
Englishes:
an experimental validation of probabilistic
corpus models?
Laura Rosseel, Benedikt Szmrecsanyi & Jason Grafmiller
RG Quantitative Lexicology and Variational Linguistics
introduction
ExAPP, Münster 27.09.2019
introduction
which of these social psychological measures could be useful to study the social meaning of language variation?
Relational Responding Task
ExAPP, Münster 27.09.2019
experimental data
observational data
corpus based regression models of language variation
rating tasks gauging speaker intuitions about
language variation
previous work
• Rosenbach (2003)
• Bresnan (2007)
• Bresnan & Ford (2010)
• Klavan & Divjak (2016)
• Klavan & Veismann (2017)
• …
human raters and statistical models trained on corpus data give similar predictions of language variation
can we replicate these findings? and if so, what does that tell us?
ExAPP, Münster 27.09.2019
case study
ExAPP, Münster 27.09.2019
case study
particle placement alternation in world Englishes
ExAPP, Münster 27.09.2019
Exploring probabilistic grammar(s) in varieties of English around the world
Benedikt SzmrecsanyiJason Grafmiller Laura Rosseel Melanie RöthlisbergerBenedikt Heller
case study
particle placement alternation in world Englishes
(1) continuous variant (verb – particle – object)
My 6yo old actually picked up the book and started reading it! <GloWbE:NZ>
(2) split variant (verb – object – particle)
I have not picked the book up in 2 years. <GloWbE:CA>
ExAPP, Münster 27.09.2019Grafmiller & Szmrecsanyi (2018)
case study
particle placement alternation in world Englishes
ExAPP, Münster 27.09.2019
British EnglishCanadian English
Irish EnglishNew Zealand EnglishHong Kong English
Indian EnglishJamaican EnglishPhilippine EnglishSingapore English
modelling the PP alternation
• corpus study
• N = 11,454 datapoints from ICE and GloWbE
• factors conditioning the variation:
Grafmiller & Szmrecsanyi (2018)ExAPP, Münster 27.09.2019
direct object word length structural persistence
givenness cv alternation (phonetics)
definiteness rhythm (stressed syllables)
concreteness semantic compositionality particle verb
thematicity information content verb and particle
presence post-modifying directional PP information content particle
variety genre
modelling the PP alternation
• comparative sociolinguistics (e.g. Tagliamonte 2013):
regression modelling across multiple datasets
Response ~ Register + DirObjLength + Semantics + DirObjConcreteness + DirObjGivenness + DirObjDefiniteness + DirObjThematicity + DirectionalPP + CV.binary + Surprisal.P + Surprisal.V + Rhythm + PrimeType + (1|Verb) + (1|Particle) + (1|VerbPart) + (1|Genre)
ExAPP, Münster 27.09.2019
modelling the PP alternation
• results show high uniformity between inner circle / native varieties (e.g. British English, Canadian English, New Zealand English)
• but dissimilarities across outer circle / non-native varieties (e.g. Indian English, Singaporean English) as well as between inner and outer circle varieties
e.g. role of direct object word length
ExAPP, Münster 27.09.2019
modelling the PP alternation
ExAPP, Münster 27.09.2019
how do model predictions compare
to human raters?
ExAPP, Münster 27.09.2019
rating experiment
experimental design
ExAPP, Münster 27.09.2019
task
ExAPP, Münster 27.09.2019
inspired byBresnan & Ford
(2010)30 items
naturalnessratings
online experiment
stimuli
• random selection of 30 corpus items
• from 5 probability bins based on corpus model
ExAPP, Münster 27.09.2019
stimuli
• random selection of 30 corpus items
• from 5 probability bins based on corpus model
ExAPP, Münster 27.09.2019version A version B
stimuli
• random selection of 30 corpus items
• from 5 probability bins based on corpus model
• divided over two versions: 15 target items per version avoid fatigue
• 15 fillers
• controlled for genre
• items presented semi-random fixed order
ExAPP, Münster 27.09.2019
participants
ExAPP, Münster 27.09.2019
British participants Indian participants
- N = 60 (NA = 32, NB = 28)- recruited via Amazon
Mechanical Turk- English as native language
- grew up in the UK- highly educated- regionally diverse
- N = 55 (NA = 27, NB = 28)- recruited via personal
network- diverse set of native
languages- grew up in India- highly educated- regionally diverse
results
ExAPP, Münster 27.09.2019
macro level
on the whole, corpus predictions and ratings in agreement
ExAPP, Münster 27.09.2019
GB IN
macro level
on the whole, corpus predictions and ratings in agreement
but what does this actually tell us?
ExAPP, Münster 27.09.2019
focus on individual variation:
categoriality
ExAPP, Münster 27.09.2019
GB IN
focus on individual variation:
categoriality
ExAPP, Münster 27.09.2019
GB IN
focus on individual variation:
categoriality
ExAPP, Münster 27.09.2019
focus on individual variation:
categoriality
ExAPP, Münster 27.09.2019
• quantification: what range of the rating scale is used?
• slight difference between both groups
• why?
focus on individual variation:
categoriality
ExAPP, Münster 27.09.2019
• quantification: what range of the rating scale is used?
• slight difference between both groups
• why?to do with difference between L1 and L2? significant correlation between usage of the rating scale and self-reported level of English and confidence in English in Indian participants
focus on single linguistic factor:
end weight
• corpus study shows differences between varietiesregarding the role of certain predictors of the variable
cf. Bresnan & Ford (2010)
ExAPP, Münster 27.09.2019
focus on single linguistic factor:
end weight
• corpus study shows differences between varietiesregarding the role of certain predictors of the variable
ExAPP, Münster 27.09.2019
• direct object word length more important predictor for GB than IN
• do we find evidence of this in the experimental ratings?
focus on single linguistic factor:
end weight
ExAPP, Münster 27.09.2019
predi
cted
rat
ing
(sta
ndar
dize
d)
direct object word length
IN
GB
continuous
split
focus on single linguistic factor:
end weight
• surprising uniformity (cf. Labov 2009)
• why can we not replicate the end weight effect from thecorpus study?
ExAPP, Münster 27.09.2019
focus on single linguistic factor:
end weight
• surprising uniformity (cf. Labov 2009)
• why can we not replicate the end weight effect from thecorpus study?
– time gap: corpus data mainly from the 1990s
– how comparable are the language users from thecorpus and our participants?
– register
– naturalistic corpus data vs. rating task
ExAPP, Münster 27.09.2019
so…
ExAPP, Münster 27.09.2019
corpus data vs. rating data
• predictive capacities of corpus models and raters align globally, but not on the level of the individual predictor
• interesting patterns of individual variation to be explored further
• how can experimental rating data and corpus data be combined to inform linguistic analyses?
• to do? additional varieties: Philippine and Jamaican English
ExAPP, Münster 27.09.2019