filled pauses and l2 proficiency: finnish australians speaking english

55
Filled pauses (FPs) and L2 proficiency Finnish Australians speaking English Timo Lauttamus, John Nerbonne and Wybo Wiersma University of Oulu, Rijksuniversiteit Groningen timo.lauttamus@oulu.fi, [email protected] & [email protected] 5 June 2009

Upload: wybo-wiersma

Post on 07-Nov-2014

269 views

Category:

Devices & Hardware


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Filled pauses and L2 proficiency: Finnish Australians speaking English

Filled pauses (FPs)and L2 proficiency

Finnish Australians speaking English

Timo Lauttamus, John Nerbonne andWybo Wiersma

University of Oulu, Rijksuniversiteit Groningen

[email protected], [email protected] &[email protected]

5 June 2009

Page 2: Filled pauses and L2 proficiency: Finnish Australians speaking English

In this talk we present

A method for detecting syntactic differences andour findings on pausing

3 sub-questions about the method

1 What did your corpus look like ?

2 What is permutation statistics ?

3 How to apply it to syntax ?

3 sub-questions about the results

1 What general differences did you find ?

2 How much pausing is there, and by who ?

3 What does this tell about the speakers ?

Page 3: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Outline of the Talk

IntroductionIn this TalkOutline of the Talk

The Method

Results

Conclusion

Questions

References

Page 4: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

The Method

Introduction

The MethodOur CorpusPermutation StatisticsApplying it to Syntax

Results

Conclusion

Questions

References

Page 5: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

The Method

• Detect a wide range of syntax differences, and these as• significant differences• aggregate differences• relative differences

• This would enable measuring the syntax part of totalimpact:

“No easy way of measuring or characterizing thetotal impact of one language on another in thespeech of bilinguals has been, or probably can bedevised. The only possible procedure is to describethe various forms of interference and to tabulatetheir frequency.”

- Weinreich, Languages in Contact

Page 6: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

The Method

• Detect a wide range of syntax differences, and these as• significant differences• aggregate differences• relative differences

• This would enable measuring the syntax part of totalimpact:

“No easy way of measuring or characterizing thetotal impact of one language on another in thespeech of bilinguals has been, or probably can bedevised. The only possible procedure is to describethe various forms of interference and to tabulatetheir frequency.”

- Weinreich, Languages in Contact

Page 7: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

The Method

We also want to do this automatically and computationallyIn order to be able to:

• Mine for differences in syntax between• learners versus native speakers• speakers of different dialects• writers from different discourses

• Test dialectological and other linguistic hypotheses

• Note over- and under-use instead of right / wrong

Page 8: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

The Method

We did it in four steps:

1 Tag 2 or more collections of comparable material (usingan automatic POS-tagger)

2 Take n-grams (2 - 5 grams) of POS-tags

3 Statistically compare their frequencies

4 Sort the significant POS-n-grams by extent of difference

Aarts J. and Granger S. did this without the statistics in:’Tag sequences in learner corpora: a key to interlanguagegrammar and discourse’ (1998)

Page 9: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Our Corpus

Origins:

• 20,000 Finns immigrated to Australia

• Working class background, limited education

• 25-40 Years upon arrival

Corpus collected 1995-1998 by Greg Watson:

• of the university of Joensuu, Finland

• two age groups; adults and juveniles

• 350.000 words, 305.000 words free conversation

Page 10: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Our Corpus

Adults:

• over 18 years at arrival, on average 30

• on average 58 at time of interview

• 60 interviews, 65 - 70 min each (221.000 words)

Juveniles:

• under 16 years at arrival, on average 6

• on average 36 at time of interview

• 30 interviews, 65 - 70 min each (84.000 words)

Page 11: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Our Corpus

In preparation we Part of Speech-tagged it with:

• Trigrams ’n ’Tags (TnT) Statistical POS Tagger

• made by Thorsten Brants (Universitt des Saarlandes)

It achieves an accuracy of:

• 96.7% on the Penn Treebank

• 85.1% - 90.5% on our spoken material

Accuracy is of course worse for 3-grams:

• 2-grams 74%, 3-grams 65%, 4-grams 58% ...

Page 12: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

It is different from parametric (normal) statistics:

• It is about the data, not about the population• no need for normality• no need for homoscedasticity (eq distrib variances)• no absolute need for random sampling

• Still, important for permutation statistics are• random assignment and independence of observations• in practice no problems for linguistic/dialect data

As a statistical method it is very suitable for linguistics

Page 13: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 14: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 15: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 16: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 17: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 18: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 19: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 20: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 21: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 22: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 23: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Permutation Statistics

Page 24: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Applying it to Syntax

One firstly needs something to permutate:

• We permutated interviewees• more conservative than 3-grams• and also easier than sentences (did this earlier)

• For each interview• we took 3-grams (N-grams too) of POS-tags• we then calculated the 3-gram-promillages for all

3-grams (occurrence of 3-gram type per 1000 3-gramtokens)

• These 3-gram-promillage-vectors were then used• summed per group after each permutation

Page 25: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Applying it to Syntax

Secondly one needs something to measure extremity:

• Both• for the whole group• and for each individual POS-3-gram

• We used r-square and summed r-square• we also tried cosine and summed r

• R-square is the square of the difference (r)• for a POS-3-gram-promillage between the 2 groups

• Summed r-square is the sum of r-square for all 3-grams

Page 26: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

R-square

Page 27: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Applying it to Syntax

Thirdly one needs to apply normalizations for:

• Text-size per subject (for each subject)• divide by sum of the subjects’ 3-grams (the promillages)• to eliminate differences in text-size between authors

• Frequencies of 3-gram types (for each 3-gram type)• divide by the corpus-wide total of the 3-gram-type• to eliminate differences in frequencies (optional)

• Group-size (for both groups, across permutations)• divide by the average fequency of 3-grams in the group• to correctly detect over- and under-use

Page 28: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Applying it to Syntax

Normalisations are needed to

• Prevent false significance• arising from differences in text- and group-sizes

• Increase the weight of less frequent 3-grams• on the level of the group• (as said this is optional)

• And it allows one to sort 3-grams based on• whether they are more or less typical for each group• relative to group-size

Page 29: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Normalizing for Frequency

Page 30: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Results and Analysis

Introduction

The Method

ResultsGeneral DifferencesPausingAnalysis

Conclusion

Questions

References

Page 31: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Results and Analysis

• The following slides summarise some of the material inLauttamus, Nerbonne, and Wiersma (2007, 2009)

• The evidence based on the data of the two groupsshows that there are statistically significant syntacticdifferences between the adult and juvenile groups

• We argue that some of the significant differences foundin the data can be ascribed to the lower level oflanguage proficiency of the adults

Page 32: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

General Differences

Some of the syntactic differences found in the data can bedescribed in most general terms as follows (all for the adultsgroup):

1 Overuse of hesitation phenomena

2 Overuse of parataxis

3 Underuse of contracted forms

4 Reduced repertoire of discourse markers

5 Avoidance of complex verbal structures

6 Avoidance of prepositional and phrasal verbs

7 Underuse of the existential there

Page 33: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

General Differences

• The adults demonstrate features of disfluent speech• such as (filled) pauses, repeats, false starts, incomplete

or false syntactic structures, arising from difficulties inspeech processing, and particularly in lexical access

• We argue that the statistical evidence obtained fromour data reflects syntactic distance between the twovarieties of L2 English

• And, consequently, aggregate effects of the differencesin the two groups English proficiency

• We will now look further into pausing

Page 34: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Pausing

We applied the computational technique described earlier toexamine:

1 if the the adults and juveniles show a differential use ofpausing (filled pauses, FPs), and

2 how such a difference can be analysed and explained

Thus:

• We will now look at the 308 POS-trigrams typical forthe adult (L1) speakers’ syntax

• first we look at the top 200• compare them to the juveniles’ POS-trigrams• and then we look at all of them

Page 35: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Fig. 1: Percentage of FPs, adults top-200

Page 36: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Pausing

• These are the top 200 POS-trigram types which mostcharacteristically distinguish the adults from thejuveniles

• they are significant at a p ≤ 0.05 level

• 42.5%, 85 out of 200• include at least one filled pause, as in (1) and (2)

• 6%, 12 out of 200• include at least two filled pauses, as in (3) and (4)

• In addition, there is one trigram with only filled pauses

Page 37: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Pausing

(1)Interj Conj(subord) Art (def)

politically | uh when the | liberals

(2)V(cop,pres,encl) Interj Adv(inten)

I’ | m ah very | sick

(3)Interj Interj Conj(subord)

and | uh uh because | in the morning

(4)Interj Pron(pers, sing) Interj

and | uh I uh | snow-skied

Page 38: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Pausing

• For the juveniles there are 792 POS-trigram types inwhich they use the sequence of POS tags morefrequently than the adults

• Again significant at the p ≤ 0.5 level

• But only 0.4%, 3 out of 792• include one filled pause

• None include more than one FP

Page 39: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Fig. 2: Percentage of FPs, adults all 308

Page 40: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Pausing

• Of all 308 POS-trigram types typical for the adults• this are all POS-trigrams significant at a p 0.05 level

• 38.0%, 117 out of 308• include at least one filled pause

• 4.5%, 14 out of 308• include two filled pauses

• And again there is one trigram type with FPs only

Page 41: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Analysis

• Both figures show the same trend. The highly skeweddistribution of the filled pauses across the two groups ofFinnish Australian English speakers conclusively showsthat

• the juveniles have a much more varied syntacticrepertoire (measured in terms of POS-trigrams) thanthe adults, and

• the adults have much more limited and idiosyncratic(ungrammatical or substandard) syntactic patterns attheir disposal than the juveniles

Page 42: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Analysis

• The large number of filled pauses found in the adultsspeech as opposed to the juveniles’ is in agreement withthe evidence in Paananen-Porkka (2007: 234), whoargues that native speakers of Finnish show longerpauses on average in English than in Finnish

• The statistically significant differential use of filledpauses by the adults can be explained in terms of theadults lesser proficiency (particularly at the level ofspeech planning) and, consequently, fluency of L2.

Page 43: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Analysis

• The elimination of all FPs from the data has little effecton the significance value for the tag sets

• The outcome of running the scripts again without theFPs showed that there are still

• 729 statistically significant trigram types for thejuveniles

• as opposed to 220 for the adults

Page 44: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Analysis

• The examination of the top 200 FP-less trigram typesproduced by the adults showed that about 38% of thetrigram types are ungrammatical, and that some of theremaining trigram types are substandard

• (e.g. omission of an obligatory article or preposition,omission of an obligatory copula or primary verb be orhave, omission of the subject, use of a redundant articlewith proper nouns etc.; cf. Lauttamus et al. 2007).

Page 45: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Conclusion

• The uneven distribution of the filled pauses across thetwo groups of Finnish Australian English speakersconclusively shows that

• the adults used much more filled pauses than thejuveniles, and

• that the adults have much more limited andidiosyncratic syntactic patterns at their disposal

• The statistically significant differential use of filledpauses by the adults can be explained in terms of theadults lesser proficiency compared to that of thejuveniles

Page 46: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Concluding Remarks

There is room for fine-tuning the method:

• Find optimum size for data-sets

• Try and evaluate with different measures

The method as is can easily be applied to many data-sets:

• Works on untagged corpora of spoken language

• Can empirically buttress theses

Software to do it and to pre-process corpora is freelyavailable:

• the ComLinToo http://old.logilogi.org/ComLinToo

Page 47: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Questions

Any questions ?

Page 48: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Questions

Any questions ?

Page 49: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

References

Jan Aarts and Sylviane Granger.Tag sequences in learner corpora: A key tointerlanguage grammar and discourse.In Sylviane Granger, editor, Learner English onComputer, pages 132 – 141. Longman, London, 1998.

Alan Agresti.An Introduction to Categorical Data Analysis.Wiley, New York, 1996.

Thorsten Brants.Tnt - a statistical part of speech tagger.In 6th Applied Natural Language Processing Conference,pages 224 – 231. ACL, Seattle, 2000.

Eugenio Coseriu.Probleme der kontrastiven Grammatik.Schwann, Dusseldorf, 1970.

Page 50: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

References

Kees de Bot, Wander Lowie, and Marjolijn Verspoor.Second Language Acquisition: An Advanced ResourceBook.Routledge, London, 2005.

Charles Fillmore and Paul Kay.Grammatical constructions and linguistic generalizations:the what’s x doing y construction.Language, 75, 1999.

Roger Garside, Geoffrey Leech, and Tony McEmery.Corpus Annotation: Linguistic Information fromComputer Text Corpora.Longman, London/New York, 1997.

Phillip Good.Permutation Tests.Springer, New York, 1995 2nd, corr. ed.

Page 51: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

References

Brett Kessler.The Significance of Word Lists.CSLI Press, Stanford, 2001.

T. Lauttamus, J. Nerbonne, and W. Wiersma.Detecting syntactic contamination in emigrants: Theenglish of finnish australians.SKY Journal of Linguistics, 20, 2007.

Chris Manning and Hinrich Schutze.Foundations of Statistical Natural Language Processing.MIT Press, Cambridge, 1999.

J. Nerbonne and W. Wiersma.A measure of aggregate syntactic distance.In J. Nerbonne and E. Hinrichs, editors, LinguisticDistances, pages 82 – 90. PA: ACL, Shroudsburg, 2006.

Page 52: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

References

C.C.E. Oomen and A. Postma.Effects of divided attention on the production of filledpauses and repetitions.Journal of Speech, Language, Hearing Research, 44,2001.

M. Paananen-Porkka.Speech Rhythm in an Interlanguage Perspective: FinnishAdolescents Speaking English. Pragmatics, Ideology andContact Monographs.University of Helsinki, Helsinki, 2007.

Shana Poplack and David Sankoff.Borrowing: the synchrony of integration.Linguistics, 22, 1984.

Page 53: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

References

Shana Poplack, David Sankoff, and Christopher Miller.The social correlates and linguistic processes of lexicalborrowing and assimilation.Linguistics, 26, 1988.

William C. Ritchie and editors Tej K. Bhatia.Handbook of Child Language Acquisition.Academic, San Diego, 1998.

Peter Sells.Lectures on Contemporary Syntactic Theories.CSLI, Stanford, 1982.

Sarah Thomason and Terrence Kaufmann.Language Contact, Creolization and Genetic Linguistics.University of California Press, Berkeley, 1988.

Page 54: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

References

Frans van Coetsem.Loan phonology and the two transfer types in languagecontact.In Publications in Language Sciences, page 27. ForisPublications, Dordrecht, 1988.

Greg Watson.The finnish-australian english corpus.ICAME Journal: Computers in English Linguistics, 20,1996.

Uriel Weinreich.Languages in Contact.Mouton, The Hague, 1953 (page numbers from 2nd ed.1968).

Page 55: Filled pauses and L2 proficiency: Finnish Australians speaking English

FPs and L2proficiency:

Timo Lauttamus,John Nerbonne

and WyboWiersma

Introduction

In this Talk

Outline

The Method

Our Corpus

Permutation Statistics

Applying it to Syntax

Results

General Differences

Pausing

Analysis

Conclusion

Questions

References

Copyleft

Copyrights Wybo Wiersma, John Nerbonne and TimoLauttamus, available under the Creative Commons By-Salicense

• Thanks to the OpenClipart archive; Carlitos for thelandscape, and unknown authors for the bomb and thefrogs.

http://creativecommons.org/licenses/by-sa/2.5/