bh i lbehavioral paradigms to di tldi rectly study ... · the lexicon & phrase types • 10...
TRANSCRIPT
B h i l di t di tlB h i l di t di tlBehavioral paradigms to directly Behavioral paradigms to directly study linguistic universalsstudy linguistic universalsy gy g
EVELIN 2012
Fl i J H L P i L b
EVELIN 2012, Psycholinguistic and Grammar, Lecture 3
Florian Jaeger, Human Language Processing Lab http://www.hlp.rochester.edu/
Artificial language learning Artificial language learning
P i i l i i l h h (2• Participants learn miniature languages through exposure (2 minutes – several days, depending on complexity of language)
• Originally used to study what infants and adults can learn, e.g. can we pick up on statistical contingencies in the input languages [e.g. Saffran et al 1996; Wonnacott et al., 2008]
[2]
The Procedure
4-day procedure:I N l i & T tI. Noun learning & Test II. Sentence training (80 trials)III. Noun Test IV Sentence comprehension (80 trials)IV. Sentence comprehension (80 trials)V. Sentence production (80 trials)
Ch h h h h d D ib h id i h li lChoose the match to the heard sentenceDescribe the video in the alien language
Three measures to assess biases in Three measures to assess biases in acquisition within the ALL paradigmacquisition within the ALL paradigmacquisition within the ALL paradigmacquisition within the ALL paradigm
• Accuracy of learned language
• Generalization to novel stimuli
• Deviation from input (bias)
[4]
Artificial Language LearningArtificial Language Learning
L f ifi i l l ll h h i i• Learners of artificial languages generally match the statistics in the input [e.g. Hudson Kam & Newport, 2005, 2009]
[5]
Artificial Language LearningArtificial Language Learning
Wh h d i f h i ( b/ h li• When they deviate from the input (e.g. b/c the generalize patterns that are variable in the input), these deviates tend to reflect patterns that are typologically more frequent [e.g. Culbertson and Smolensky, 2010; Hudson Kam & Newport, 2005, 2009; Fedzechkina, Jaeger, Newport, 2011, submitted]
[6]
Part 1Part 1
Generalizing beyond the inputGeneralizing beyond the input
[Hudson‐Kam and Newport, 2005, 2009]
The input language(s)The input language(s)
51 d• 51 words: – 36 nouns– 7 intransitive verbs– 5 transitive verbs– 1 negative (neg)– 2 determiners (det)
• Two conditions: two classes were arbitrary or mass/count• Four conditions for determiner distribution: 45%, 60%, 75%, 100%
• 13 200 possible sentences• 13,200 possible sentences
[8]
ProcedureProcedure
Si i i d 1 i h d l d• Six sessions exposure sessions and 1 test sessions scheduled over 7 visits to the lab over 7‐10 days, each lasting 25 to 29 min.
• All presentation of the language was auditory.
• Participants were seated in front of a video monitor on which they watched a scene or event.
• Exposure to 230 sentences and their corresponding visual scene (out of 13,200 possible ones)
[9]
TestsTests
V b l ( k h l d h )• Vocabulary test (to make sure they learned the nouns)
(• Picture description (sentence completion when first word for a picture description is given)
• Grammaticality– 4‐point scale– Forced choice
[10]
ResultsResults
[11][data by Hudson‐Kam & Newport, 2005; Figure taken from Tily et al., 2011]
Part 2Part 2
Can one replicated word order Can one replicated word order f ith ALL?f ith ALL?preferences with ALL?preferences with ALL?
[Tily, Frank, & Jaeger, 2011]
A Flash applet (by Hal Tily)A Flash applet (by Hal Tily)
video + sound + (optionally) written stimuli at top[13]
Noun learningNoun learning
[14]
Sentence learningSentence learning
[15]
ComrpehensionComrpehension testtest
[16]
ProductionProduction
[17]
Experiment 1: Attempt to replicateExperiment 1: Attempt to replicate[Til et al 2011][Tily et al., 2011]
C d l i• Constructed language contains:– 6 animate (3 male, 3 female) and 2 inanimate referents– 2 intransitive and 4 transitive actions
• Each subject taught a different randomly generated language, varying:– Proportion of nouns preceded by determiner (none, 1/3, 2/3, or
all)b ( b l )– Determiner type by noun (arbitrary vs. natural gender)
[18]
[19][taken from Tily et al., 2011]
ComparisonComparison
H d K d N (2005)• Hudson Kam and Newport (2005)– 40 participants– Took weeks/months to run the experiment– Required lab manager, RAs, and graduate student to schedule
and run participants and to score data
• Tily, Frank, and Jaeger (2011):– 1‐2 days to get 134 participants ($.50 to $1 per participant)
bj h d li i i d i– No subject scheduling; no RA time required to run experiment– Automatic scoring possible
[20]
Experiment 2: Experiment 2: Word order acquisition biasWord order acquisition biasWord order acquisition biasWord order acquisition bias
[21]
Testing word order universalsTesting word order universals
G b ' (1963) U i l 1 I d i d d h• Greenberg's (1963) Universal 1: In a dominant word order, the subject precedes the object
• Greenberg's (1963) Universal 3: Languages with dominant VSO order are always prepositional
• Greenberg's (1963) Universal 4: Languages with normal SOV g g gorder are usually postpositional
[22]
DesignDesign
12 l d l i d i i• 12 languages, randomly assigned to participants
( )• Basic word order (6):– SOV / SVO / VSO / OSV / OVS / OSV
crossed with
• Determiner‐noun order (2):– Det N / N Det
[23]
[taken from Tily et al., 2011]
• Participants: 285 in 1‐3 days.
[24]
Argument x Determiner OrderArgument x Determiner Order[taken from Tily et al., 2011]
[25]
ConclusionsConclusions
Mi d l• Mixed results:– Overall verb‐argument order results replicate typological
patterns, but no evidence for an advantage of consistent headednessheadedness
– Not enough exposure? The experiment was very short compared to previous experiments in the lab.
– Perhaps the relevant dependencies aren’t long enough in thisPerhaps the relevant dependencies aren t long enough in this artificial language? In that case, Hawkins does not predict an effect.
[26]
Part 3Part 3
Consistency of headednessConsistency of headednessyy
[Christiansen, 2000]
The two input languagesThe two input languages
[28]
[Table 8.2 in Christiansen et al., 2002]
Design & ProcedureDesign & Procedure
T i d bj ( d• Trained subjects on sentences (represented as consonant strings) from the two grammars. Exposure & Repeat aloud.
• Training and test materials controlled for – Length– bigram and trigram frequencies
• After training, subjects judged novel strings as to whether they belong to the language they were exposed to (forced‐choice)
[29]
ResultsResults
• Found only a very weak effect (and they used the wrong statistical test, t(38)=2.54, p<.02):
Correctclassification
Consistent Inconsistent
Grammatical 67.8% 65.8%
Ungrammatical 58.1% 51.7%
[30]
Why such weak results?Why such weak results?
R id f l ll d b b h• Recent evidence from typology actually casts doubts about the existence of Greenbergian word order universals! [Dunn et al., 2011‐Nature; for discussion, see Croft, Bhattacharya, Kl i h id S i h J 2011]Kleinschmidt Smith Jaeger, 2011]
[31]
MethodMethod[Figure 1, Dunn et al., 2011]
F l f ili• Four language families
• Model uncertainty about genetic relation betweengenetic relation between lgs based on word lists and statistics from evolutionary biologyevolutionary biology (trait continuous time Markov‐Chain models)
[32][Figure 1, Croft et al., 2011]
ResultsResults
N i li i l i l d d i li i l• No implicational universals: some word order implicationalshold within language families but not across families.
[Figure 2 Dunn et al 2011][Figure 2, Dunn et al., 2011]
[33]
CaveatCaveat
T l i l h b d ff f d i• Typological approaches are bound to suffer from data sparsity –specifically, the sparsity of independent data points (and it’s the number of independent observations that determine the power of a statistical test to detect effects). [Tily & Jaeger 2011]
[34]
Part 4Part 4
Consistency of headedness, Take 2Consistency of headedness, Take 2y ,y ,
[Culbertson, 2010; Culbertson and Smolensky, 2010]
[following slides generously provided by Jenny Culbertson, but modified somewhat by me – so don’t blame her ;)]modified somewhat by me so don t blame her ;)]
The empirical pattern: Greenberg’s Universal 18
G ( ) Greenberg (1963): Adjective-Noun order Numeral-Noun order
1. ADJ-NOUN & NUM-NOUN ‘red house’ & ‘two houses’
2. NOUN-ADJ & NOUN-NUM ‘house red’ & ‘houses two’
3. NOUN-ADJ & NUM-NOUN ‘house red’ & ‘two houses’
4 *ADJ-NOUN & NOUN-NUM ‘red houses’ & ‘houses two’4. ADJ NOUN & NOUN NUM red houses & houses two
Mixture-Shift Paradigm: experiment conditions
70% this orderOpposite order 30%
E.g. if in condition 1:
If adjective Adj-N 70%, opposite (N-Adj) 30%
If numeral Num-N 70%, opposite (N-Num) 30%
Each phrase contains either an adjective or a numeral
Variation is probabilistic, not lexically specific
One additional condition – “random” with all orders equiprobable
Hypotheses and predictions: regularization and U18
N-Adj Adj-NNum-N 3 1N-Num 2 4
Hypothesis 1: unbiased statistical learning?
Hypothesis 2: regularization bias only? Hypothesis 2: regularization bias only?
Hypothesis 3: typology-based predictions?
H i bi if j it tt i 1 2 l i ti i th t di tiHarmonic bias if majority pattern is 1,2, regularization in that direction
*Adj-N, N-Num bias If majority pattern is 4, no regularization, potentially less matching of the input pattern as well
N-Adj, Num-N if majority pattern is 3, regularization, less than harmonic
Other predictions?Native lang bias more regularization of pre-nominal order
The lexicon & phrase types
( f ) ( )• 10 novel nouns, 5 adjectives (blue, green, fuzzy, big, small), 5 numerals (2-6)
• Nouns are two or three syllables
• Adjectives and numerals: • single syllable nonce words each with unique onset, coda• matched on average English neighborhood density (Levenshtein distance)
• Phrases: comprised of two words, {Adj,N} or {Num,N}
“nerka cherg” “fush nerka” “grifta kez” “glawb grifta”nerka cherg fush nerka grifta kez glawb griftaN Adj Adj N N Num Num N
Experiment training phases
I. Noun Training: exposure to noun vocabulary (50 trials)
II. Noun Testing: production test of nouns (50 trials, 75% correct, 2 tries to pass)p )
III. Noun+Modifier Training: exposure to Noun+Modifier combinations (80 trials)
Experiment training phases 42
IV. Comprehension Testing: picture matching test of Noun+Mod (80 trials)
Experiment testing phase
Production Testing: production of Noun+Mod combinations
• Informant response generated randomly according to condition, independent of subject response
• If subject response order = informant response order 5pts
Participants 44
Participants: 65 Native English speakers (13 per condition)
16 male, 39 female
18-30 years old18 30 years old
Some participants were bilingual/fluent in other languages
8 additional subjects were removed (did not pass Noun Test)
Experiment 1: Production Test results
only trials with correct vocab(= 88% of trials)(= 88% of trials)
Input level
(p<0.05)
(p<0.05)
(p<0.05)
(p<0.29
)
Part 5aPart 5a
Word Order and CaseWord Order and Case
[Fedzechkina, Jaeger, and Newport, 2011]
QuestionsQuestions
I h bi d i b h i d d• Is there a bias to reduce uncertainty about the intended meaning? I.e. if the input language contains systemic ambiguity, do learners reduce that ambiguity/uncertainty?
• Experiment 1: Does the acquisition of word order case‐marking exhibit biases observed cross‐linguistically? Specifically, do word order fixing and case‐marking trade‐off?
• Experiment 2 In lgs ith optional case marking are less• Experiment 2: In lgs with optional case‐marking, are less expected grammatical function assignments more likely to be marked by case?
[47]
Experiment 1Experiment 1
Fo r artificial lang ages• Four artificial languages– Each learned over four days; convergence after day 3‐4– 8 participants per language
• Structure: – Verb final– SOV vs. OSV dominant (63%)
Case vs no case (on object only)– Case vs. no case (on object only)
• Lexicon: – 6 nouns
b f b (d ff d d b )– 8 verbs forming 4 verb groups (different word order biases)
[48]
Example stimulusExample stimulus(SOV no case marker)(SOV, no case marker)
[49]zamper (poke) slergin (kick)
Day 4:
[50]
Part 5bPart 5b
Optional caseOptional case‐‐markingmarkingpp gg
[Fedzechkina, Jaeger, and Newport, submitted]
Experiment 1Experiment 1
• Are atypical objects (NPs that are unlikely to be objects) more likely to be case‐marked than typical objects?
• Object typicality correlated with:• Object typicality correlated with:– person: 1st, 2nd > 3rd– definitness: personal pronoun > proper name >other
anima h man > animate > inanimate– animacy: human > animate > inanimate[cf. Fry, 2001; Lee, 2006; Kurumada & Jaeger, 2010]
[52]
Lexicon and StructureLexicon and Structure
Le icon• Lexicon: – 15 nouns
• 5 human referents (always subject)• 5 human referents (always object)• 5 human referents (always object)• 5 inanimate referents (always object)
– 8 verbs
• Structure: – Word order variation: 60% SOV/ 40% OSV– Case markingCase marking
• Subject: no• Object: 60% yes, 40% no; independent of animacy
[53]
The Procedure
4-day procedure:I N l i & T tI. Noun learning & Test II. Sentence training (80 trials)III. Noun Test IV Sentence comprehension (80 trials)IV. Sentence comprehension (80 trials)V. Sentence production (80 trials)
Ch h h h h d D ib h id i h li lChoose the match to the heard sentenceDescribe the video in the alien language
Example StimuliExample StimuliCase Yes No
SOV
OSV
[55]
L d f d f d 2• Learned over four days; convergence after day 2
• 20 participants, 2 excluded because they used case‐marking always or never.
[56]
Overall word orderOverall word orderand case distributionsand case distributionsand case distributionsand case distributions
[57]
ResultResult
[58]
SummarySummary
C i t t ith th h th i th t l h bi t k• Consistent with the hypothesis that learners show a bias to mark the unexpected / to distribute information more uniformly, we find that antypical objects are more likely to be case‐marked
• These results are obtained although the native language of our participants has no/only remnants of case markingparticipants has no/only remnants of case‐marking
R li t d ith diff t SOV/OSV ti• Replicated with different SOV/OSV proportions
[59]
PredictionPrediction
If h b d ff i i d d d bi k h• If the observed effect is indeed due to a bias to mark the unexpected, we should see the opposite trend in optional subject case‐marking [cf. Fry, 2001; Lee, 2006]j g
[60]
Experiment 2Experiment 2
Le icon same 8 erbs and 15 no ns• Lexicon: same 8 verbs and 15 nouns• 5 human referents (always subject)• 5 inanimate referents (always subject)• 5 inanimate referents (always object)• 5 inanimate referents (always object)
• Structure: – Word order variation: 60% SOV/ 40% OSV
C ki– Case marking• Subject: 60% yes, 40% no; independent of animacy• Object: no
• 20 participants , 6 excluded because they always/never used case‐marking
[61]
Result Result
[62]
Part 6Part 6
WrappingWrapping‐‐upupWrappingWrapping upup
SummarySummary
ALL d IALL b d d li i i i l• ALL and IALL can be used to study linguistic universals, although much work remains to be done – Avoiding native speaker biases– Testing against speakers with many different language
backgrounds– Do results extend to infants or at least young kids who are the
more relevant target population if we’re talking about languagemore relevant target population if we re talking about language change? So far this has not been confirmed, although ALL studies for other questions have been conducted on infants and kids, generally producing similar or the same results as on adults
[64]
An important methodological An important methodological innovationinnovationinnovationinnovation
• ALL and IALL can be conducted over the web– Reducing time necessary to run these studies by 95‐99%– Reducing costs by 50‐75%
• Expanding the (use of this) paradigm• Expanding the (use of this) paradigm– ALL/IALL: different language backgrounds– IALL: reducing simplifying assumptions
L0 L1 LN…
⁞⁞⁞⁞
L0 L1 LN…
⁞ ⁞ ⁞⁞⁞⁞
[65]