qiufang wen the national research center for foreign language education, bfsu chinese learner...
TRANSCRIPT
Qiufang Wen
The national research center for foreign language education, BFSU
Chinese learner Chinese learner corpora and second corpora and second language researchlanguage research
The 2006 International Symposium of Computer-Assisted Language Learning
June 2-4, 2006, Beijing
Topics to be addressedTopics to be addressed•English corpora of Chinese learnersEnglish corpora of Chinese learners
•Corpus-based studies on English learners in mainlanCorpus-based studies on English learners in mainland Chinad China
•Several corpus-based studies on English learners’ iSeveral corpus-based studies on English learners’ interlanguage by myself or together with my colleaunterlanguage by myself or together with my colleaugesges
•Advantages and disadvantages of corpus-based studAdvantages and disadvantages of corpus-based studies on the interlanguageies on the interlanguage
Topic OneTopic One
English corpora of
Chinese learners
•Chinese learner English Corpus (CLEC)
•College Learners’ Spoken English
Corpus (COLSEC)
•Spoken and Written Corpus of Chinese
Learners (SWECCL)
–Version 1
–Version 2 (under construction)
•Bilingual Corpus of Chinese English
Learners (BICCEL): under construction
1. Chinese learner English Corpus (C1. Chinese learner English Corpus (CLEC) by Gui & Yang in 2003LEC) by Gui & Yang in 2003
•Written corpus: 1 million
•Timed and untimed compositions
•Levels of proficiency– Middle school students
– Non-English major (Band 4)
– Non-English major (Band 6)
– English majors (Band 4 )
– English majors (Band 8)
•Error-tagged
Two Types of English Learners in University
English Majors Non-English majors
Year 4
Year 3
Year 2
Year 1
Band 8Band 8
Band 4Band 4
Year 4
Year 3
Year 2
Year 1
Band 6
Band 4
Band 2Band 2
2. College Learners’ Spoken English Corp2. College Learners’ Spoken English Corpus (COLSEC) by Yang & Wei in 2005us (COLSEC) by Yang & Wei in 2005
•Tokens: 0.7million
•Source: National spoken English
test for non-English majors
•Test items
– Teacher-student conversation
– Student-student discussion
– teacher-student discussion
•Data format: written transcripts
3. Spoken and Written Corpus of Chinese Learn3. Spoken and Written Corpus of Chinese Learners (SWECCL) by Wen, Wang & Liang in 2005ers (SWECCL) by Wen, Wang & Liang in 2005 (V
ersion 1)
SWECCL
WECCLSECCL
1.18 million1.46 million
Spoken (SECCL)Spoken (SECCL)
•Source of data
– National spoken English test: 1996-2002
– Second-year English majors
•Data format
– Digital sounds as well as transcripts of the
speeches
National spoken English test for English majors —
Band 4 •Test format
– Test in a lab•The number of testees annually
– 2006: more than 16,000 – Expect to have 50,000 in the future
•Scoring procedures– A random sample (30-35 tapes)– Two raters scoring one tape independently
•Number of subjects
– 6 groups from each year (1996-
2002)
– 42 groups (30/35) = about 1400
students
– About 230 hours’s speech
•Testing items
Testing itemsTesting items
Task Content Preparation time
Retelling A story Listen twice but no
preparation
3 min.
Monologue
Personal experience
3 min. 3 min.
Role play About an issue in daily life
3 min. 4 min.
The structure of SECCLThe structure of SECCL
SECCL
Text
Tagged
Raw
Special
Article
Past TenseWholeTask
Year
Task ATask B
Task C
Sound files (1996-2002)
The written component
Written
Year 1 Year 2 Year 3 Year 4
The written component
•Source of data
– Timed compositions in class (40 minutes,
no less than 300 words)
– Take-home compositions (no word limit)
•Types of compositions
– Argumentative (a list of topics provided)
– Narrative
SWECCL in 2007SWECCL in 2007 (Version 2)
SWECCL
WECCLSECCL
Two millionTwo million
SECCL(Version 2)SECCL(Version 2)
•2003-2006 National Spoken English Test fo
r second-year English majors (band 4)
•2000-2006 National Spoken English Test fo
r 4th-year English majors-Band 8 (Task 3)
•A longitudinal data (2001-2004)
Spoken (Band 8)
•Testing item (Task C)
– Make a comment on a given
topic
•Data format
– Digital sounds as well as
transcripts of the speeches
Spoken (Longitudinal)Spoken (Longitudinal)
•72 students 56 students•40 hours’ speech
Year 1 Year 2 Year 3 Year 4
Data
collection
time
2001 2002 2003 2004
TasksTasks
•Reading aloud
•Retelling a story
•Talking on a given topic (Narrative)
•Talking on a given topic (argumentative)
•Conversation (Role play)
•Discussion on a given topic
4. Bilingual Corpus of 4. Bilingual Corpus of Chinese English Learners Chinese English Learners
(BICCEL)(BICCEL)
BICCEL
Spoken Written
E-C C-E E-C C-E
0.5 million 0.5 million 0.5 million 0.5 million
Spoken component of Spoken component of BICCELBICCEL
•National Oral English test — Band 8– The 4th year English majors
– Interpreting from English to Chinese (Task A)
– Interpreting from Chinese to English (Task B)
– 2001-2005: 1100 testees
Written component of Written component of BICCELBICCEL
•Source of data: in-class
assignment
–E-C and C-E translation
–Across the 3rd and 4th years
–30 universities across the country
Topic TwoTopic Two
A brief review of corpus-A brief review of corpus-
based studies on Chinese based studies on Chinese
learner Englishlearner English
SourcesSources
•China National Knowledge
Infrastructure (CNKI)(On-line
journals)
•Digital dissertation database
Corpus-based studies in mainland Corpus-based studies in mainland ChinaChina
Studies
Year
Articles dissertations
2006 9 7
2005 40 282004 29 172003 8 5
2002 6 5
2001 6 1
2000 1 0
Total 99 63
Research areasResearch areas
Articles
Dissertations
Total
Phonological 5 1 6
Lexical 43 48 91
Grammatical 27 8 35
Discourse 8 2 10
Others 16 4 20
Total 99 63 162
Conferences & workshopConferences & workshop
•The International conference on “Corpus Linguistics” 25-27 October, 2003
•The First National Symposium on corpus linguistics and ELT Education
11-13 October, 2004
•Workshop on the use of corpus in teaching and research 17-19 March, 2006
Topic ThreeTopic Three
Several corpus-based studies on
English learners’ interlanguage
by myself or together with my col
leagues
Study OneStudy One
Features of oral style in English compositions of advanced Chinese EFL learners
(Wen, Q.F. Ding, Y.R. & Wang, W.Y. 2003, Foreign Language Teaching & Research (4):268-274.
Study TwoStudy Two
A Study on Frequency Adverbs A Study on Frequency Adverbs
Used by Advance English Used by Advance English
Learners in China Learners in China
Wen, Q. F. & Ding, Y. R. 2004. Wen, Q. F. & Ding, Y. R. 2004.
Modern foreign languages(2): Modern foreign languages(2):
141-147.141-147.
Study ThreeStudy Three
An analysis of English Majors’ Abstracting abilities through their English compositions
Wen, Q.F. & Liu, R.Q. 2006. Foreign Languages (2)
Study FourStudy Four
•A longitudinal study on the developmental features of speaking vocabulary by English majors in mainland China
Wen, Q. F. 2006. Foreign Language Teaching and Research (3).
Study FiveStudy Five
•A comparison of developmental features of Speaking and Writing vocabulary by English majors
•Wen, Q. F. 2006. Foreign languages and Foreign Language Teaching (4)
Study SixStudy Six
Patterns of change in
speaking vocabulary
development by English
majors
Study TwoStudy Two
A Study on Frequency Adverbs A Study on Frequency Adverbs
Used by Advance English Used by Advance English
Learners in China Learners in China
Wen, Q. F. & Ding, Y. R. 2004. Wen, Q. F. & Ding, Y. R. 2004.
Modern foreign languages(2): Modern foreign languages(2):
141-147.141-147.
Frequency AdverbsFrequency Adverbs
•Adverbs used for
describing “how often”
something happens
•never, sometimes, usually,
always
Top Twenty Frequency Top Twenty Frequency AdverbsAdverbs
•Most frequently used by native
speakers according to the analyses of the British National Corpus (BNC) by Leech, Rayson and Wilson (2001)
Top Twenty Frequency Adverbs (TTFAs)Top Twenty Frequency Adverbs (TTFAs)Level of vocabulary
Frequency adverbs No.
1000-word level
never, always, often, ever, *sometimes, usually, once, generally, hardly, no longer, increasingly, *twice, in general, occasionally, mostly
15
2000-word level
frequently, rarely, regularly
3
Academic word list
normally, constantly 2
Common featuresCommon features
•All high-frequency words
•Different frequencies in speech and writing except sometimes and twice
(Leech et al. 2001)(Leech et al. 2001)
A comparison of TTFAs in speech aA comparison of TTFAs in speech and writingnd writing
•The overall difference TTFAs more likely occur in writing than in s
peech.
•The specific differences Speech: never, always, ever, normally Neutral: sometimes, twice Writing: 14 words
PPrevious corpus-based revious corpus-based studiesstudies
•e.g. Altenberg & Granger, 2001; Cobb, 2002; Ringbom, 1998; Wen, Ting, & Wang , 2003
•Conflicting finding one: overuse vs. underuse
ExamplesExamples
•Overuse high-frequency words in writing (Cobb, 2001)
•Overuse modal verbs (Aijmer, 2002)
•Underuse adverbial connectors (Altenberg & Tapper, 1998)
•No study on frequency adverbs
Conflicting finding twoConflicting finding two
•Tend to use written style features in their speech
•Tend to use a mixed register in either speech or in writing
•Tend to use oral style features in their writing
•Did not compare the use of high-frequency words in speech with writing
General purposes of this General purposes of this studystudy
Whether Chinese EFL learners simply oveWhether Chinese EFL learners simply ove
ruse the TTFAs or they overuse some whilruse the TTFAs or they overuse some whil
e underusing others e underusing others
whether they use the TTFAs similarly or dwhether they use the TTFAs similarly or d
ifferently when compared their speech wifferently when compared their speech w
ith writingith writing
Research questionsResearch questions
• Do they overuse or underuse the TTFAs differently between speech and writing?
• Do they differ more from native speakers in writing or in speaking with regard to the use of the TTFAs?
• Do they demonstrate a similar pattern of writing-speaking difference as native speakers in the use of the TTFAs?
Data for analysisData for analysisThe
learner corpus:
The corpus of English
majors in China
Spoken
(SECCL)
473,408 words
955,043 wordsWritten
(CLEC) 481,635 words
The native-speaker corpus:
The British
National Corpus(BNC)
Spoken(BNCS)
10 million words
100 million words
Written(BNCW)
90 million words
955,043 words
Data analysisData analysisFour comparisons
• Learners’ speech and native speakers’ speech
SECCL vs. BNCS
• Learner’s writing and native speakers’ writing CLEC vs. BNCW
• Dif. in learners’ speech & native speakers’ and Dif. In learners’ writing & native speakers’
SECCL vs. BNCS and CLEC vs. BNCW
• Dif. In learners’ speech & writing and dif. in native speakers’ speech & writing
SECCL vs. CLEC and BNCS vs. BNCW
Results(1)Results(1)TTFA use in learners’ spoken corpus (SECCL)Tendency Words
Overuse Always, once, often, sometimes, usually, hardly
(6 words/407 Occurrences)(6 words/407 Occurrences)
Underuse Normally, never, ever, twice, generally,in general, occasionally, no longer, constantly, increasingly
(10 words/48 occurrences)(10 words/48 occurrences)
Results(2)Results(2)TTFAs use in learners’ written corpus(CLEC)
Tendency Words
Overuse Always, sometimes, usually, no
longer, never, once, often,
generally, mostly
(9 words/125 occurrences)
Underuse Constantly, occasionally, ever,
regularly, rarely, frequently, twice,
increasingly, normally,
(9 words/37 occurrences)
Results(3)Results(3)Comparison of learners’ speech with their writing in TTFA use (Overuse)
Tendency Words Frequency difference
SECCL BNCS(Spoken) (6)
always, once, often, sometimes, usually, hardly
407
CLEC BNCW(Written) (9)
always, sometimes, usually, no longer, never, once, often, generally, mostly
125
Results(3)Results(3)Comparison (Underuse)
Tendency Words Frequency
difference
SECCL BNCS(Spoken) (10)
normally, never, ever, twice, generally, in general, occasionally, no longer, constantly, increasingly
- 48
CLEC BNCW(Written) (9)
normally, increasingly, twice, frequently, rarely, regularly, ever, occasionally, constantly
- 37
Results(3)Results(3)Comparison (identical or similar)
Tendency Words Frequency
difference
SECCL BNCS(Spoken) (4)
frequently, regularly, rarely, mostly
- 4
CLEC BNCW(Written) (2)
in general, hardly 3
Results(4)Results(4)Speaking-writing differences in TTFA use in the CEMIC and the BNC
Register-neutral Spoken-register sensitive
BNC TwiceSometimes (2)
Never, always, normally, ever (4)
CEMIC Constantly, never, regularly, rarely, increasingly, normally (6)
Always, once, often, sometimes, hardly (5)
Results(4)Results(4)Speaking-writing differences in TTFA use in the CEMIC and the BNC
Written-register sensitive
BNC Often, once, no longer, generally, increasingly, usually, frequently, hardly, rarely, regularly, constantly, in general, occasionally, mostly (14)
CEMIC No longer, generally, usually, in general, ever, mostly, occasionally, frequently, twice (9)
•English majors in China tend to overuse and underuse certain TTFAs in their speech and writing. The overuse tendency is stronger than the underuse tendency in both speech and writing.
Summary (1)Summary (1)
Summary (2)Summary (2)
•The overuse tendency is more marked in their speech than in their writing while the underuse tendency is also slightly stronger in speech than in writing. Some of the overused or underused TTFAs in speech are the same as those in writing but others are different.
Summary (3)Summary (3)
•Chinese English majors demonstrate a pattern of speaking-writing difference that is opposite to that shown in the native speakers’ corpus: they tend to use more TTFAs in their speech than in their writing while native speakers tend to use more TTFAs in their writing than in their speech. This shows that Chinese EFL learners use TTFAs without awareness of their register differences.
Possible reasonsPossible reasons
•Limited vocabulary (Table 1b)
•Use them as “time buyers”
•Without equivalents readily
available in Chinese
Topic FourTopic Four
Advantages and Advantages and
disadvantages of disadvantages of
corpus-based studies on corpus-based studies on
SLASLA
Advantage One Advantage One
•A large sample stored
electronically and open to the
public
– Validity and reliability
(replicable)
– Possible for a diachronic study
Advantage TwoAdvantage Two
•Using a computer software such as WordSmith– Effectiveness and efficiency
Advantage ThreeAdvantage Three
•Understand the learner language from a different perspective– Correct vs. incorrect
– More acceptable vs. less acceptable – Frequency
• Overuse
• Underuse
• unuse
Can Cannot Product Process
Productive Receptive
Group patterns Individual differences
Language use Language knowledge
DisadvantagesDisadvantages
Closing RemarkClosing Remark
•The number of researchers increasing
•Constructing different types of corpora
•Carrying corpus-based studies
•Findings useful for textbook writers as well as for practitioners
Thank you!!!