assessment tools and learner corpora - hypotheses.org · assessment tools and learner corpora angel...
TRANSCRIPT
Assessment tools and learner corpora
Angel Chan
• Assessment Tools
– Mandarin Receptive Vocabulary Test
• Learner Corpora
– L2 spoken Mandarin Chinese corpus
– potential to build clinical corpora featuring Chinese language pathology
A New Assessment Tool for
Child Mandarin Receptive Vocabulary
Angel Chan1, Kathy Lee2 &Virginia Yip3
1 Dept of Chinese & Bilingual Studies, HK Polytechnic University
2 Division of Speech Therapy, Dept of Otorhinolaryngology,
Head & Neck Surgery, Faculty of Medicine, CUHK
3 Childhood Bilingualism Research Centre,
Dept of Linguistics and Modern Languages, CUHK
Outline
• HK children’s exposure to Mandarin in
kindergartens
• Mandarin Receptive Vocabulary Test
• Results
• Summary and significance
HK Kindergartens with Mandarin exposure
• Total no. of kindergartens:
965
• No. of kindergartens
- with Mandarin exposure
= 831
- without Mandarin exposure
= 46
- information unavailable
= 88
86.1%
(n=831)
4.8%
(n=46)
9.1%
(n=88)
With Mandarin Exposure
Without Mandarin Exposure
Information Unavailable
Total number
= 965
Children’s Mandarin exposure in
Hong Kong kindergartens
0
50
100
150
200
250
300
350
400
0 min 1-20 min 21-60 min 61-150min >150 min
K1
K2
K3
No.
of K
Gs
Min/ week
Low
Exposure
Average
Exposure
High
Exposure
High High
Exposure
Growing importance of Mandarin in
Hong Kong kindergartens
• Over 80% of HK kindergartens provide regular
exposure to Mandarin, though with varying
amounts of input.
• There is a lack of research-based understanding
of HK children’s developmental profiles in
Mandarin.
Available tools to assess early
Mandarin vocabulary Based on native Mandarin-speaking children in Taiwan:
• Lu L, Liu H. (1988). Revised Peabody Picture Vocabulary Test: Mandarin Chinese Version 修訂畢保德圖畫詞彙測驗 Psychological Publishing Co Ltd. Taipei, Taiwan.
Based on native Mandarin-speaking children in Beijing:
• Tardif, T., Fletcher, P., Zhang, Z.X., Liang, W.L., & Zuo, Q.H. (2008). The Chinese Communicative Development Inventory (Putonghua and Cantonese versions): Manual, Forms, and Norms. Peking University Medical Press.
• Hao, M.L., H., Shu, A.L. Xing and P. Li. (2008). Early vocabulary inventory for Mandarin Chinese. Behavior Research Methods 40.3: 728-733.
Lack of assessment tools
• No standardized tools to assess the Mandarin
proficiency of Hong Kong preschool children.
• Lack of assessment tools even for monolingual
Mandarin children in China and Taiwan.
Mandarin Receptive Vocabulary Test 普通話詞彙理解測驗
• Early vocabulary inventory for Mandarin Chinese (Hao et al. 2008)
http://brm.psychonomic-journals.org/content/40/3/728/suppl/DC1
• Data from 884 Chinese families in Beijing
• Infants and toddlers from 12 to 30 months
• Checklist and norms a/v via the internet
• Words with 90th percentiles of comprehension vocabulary found at 30 month olds were chosen for item construction
Mandarin Receptive Vocabulary Test 普通話詞彙理解測驗
98 target words belonging to 14 semantic categories
Target children:
– Preschool children aged 3-6
Quick to administer: 10-20 minutes
Easy to administer: Each child is shown four pictures at a time, and asked to point to the named picture
Target: 杯子 /bei1 zi/ ‘glass’
Phonological distracter:被子 /bei4 zi/ ‘quilt’
Semantic distracter:碗 /wan3/ ‘bowl’ Unrelated distracter:枕頭 /zhen3 tou/ ‘pillow’
Subjects
• 1163 Hong Kong children (age 3-6, L1 Cantonese) who learn Mandarin as an L2. – come from 4 input condition groups, which differ
in the amount of Mandarin exposure time children regularly receive in school,
– ranging from 15-20 minutes to more than 150 minutes per week
• 288 L1 Mandarin children in Beijing (age 3-6)
Input condition group N
LE (1-20 min) 468
AE (21-60 min) 312
HE (61-150 min) 280
HHE (>150 min) 103
L1 288
Major findings
• input condition is the strongest factor influencing the test score (p < .05, effect size: 0.655), demonstrating that input quantity influences child L2 competence (De Houwer 2011).
Major findings
Error analysis on average error percentage
• 3-way repeated measure ANOVA was applied to investigate the general error patterns.
• Significant effects were revealed including -
– Distracter main effect (p < 0.001)
– Distracter*age group interaction effect (p < 0.001)
– Distracter*input condition group interaction effect (p < 0.001)
– Distracter*age group*input condition group interaction effect (p < 0.001)
17
Major finding
• Error analyses reveal a significant interaction between distracter type, age group and input condition group (p < 0.001), with L2 and L1 children showing distinct profiles in how the distribution of error types changes across age. – L2 children: Semantic and phonological errors are both frequent at
younger ages, as children grow older, semantic errors diminish but
certain phonological errors (especially tone errors) still persist at age 5
& 6
– L1 children: phonology is not a big problem across age
Average Error Percentage (L1)
3;00 - 3;05 (319
a)
3;06 - 3;11 (220)
4;00 - 4;05 (178)
4;06 - 4;11 (200)
5;00 - 5;05 (134)
5;06 - 5;11 (109)
Phonological (P) 29.4 28.6 28.9 33.3 36.0 20.5
Semantic (S) 51.2 47.0 50.9 48.5 47.9 49.8
Unrelated (U) 19.4 24.4 20.2 18.2 16.1 29.7
Pairwise comparison w Bonferoni correction
S > U ** NS S > U* S > U** NS NS
a Numbers in brackets indicate the total number of error items.
> Denotes statistically significantly larger than; ** p < 0.01, * p < 0.05, NS denotes not statistically
significant
19
Significance
• offers researchers and clinicians a useful screening test and an alternative to parental checklists such as the Chinese Communicative Development Inventory (Tardif et al 2008) and the early vocabulary inventory for Mandarin Chinese (Hao et al. 2008) to assess receptive vocabulary competence in Mandarin.
Significance of findings
–For researchers:
• What are the optimal input conditions for
acquisition in terms of quantity and quality
of input?
• What are the common semantic and
phonological errors?
• What do these errors tell us about the child’s
developing semantic and phonological
systems?
Significance of findings –For clinical & educational practitioners
and parents: • How to create optimal input conditions in
terms of the quantity and quality of input to support balanced bilingual/trilingual development?
• A baseline profile for typically developing bilingual children needs to be established for comparison with the atypically developing counterparts
• How to attend to semantic distinctions
and phonological distinctions in
therapy and pedagogy?
Key References
• Hao, M.L., H., Shu, A.L. Xing and P. Li. (2008).
Early vocabulary inventory for Mandarin Chinese.
Behavior Research Methods 40.3: 728-733.
• Lee, K.Y.S., Lee, L.W.T., & Cheung, P.S.P. (1996).
Hong Kong Cantonese Receptive Vocabulary Test.
Hong Kong: The Hong Kong Society for Child
Health and Development.
Acknowledgments
• Research grants: “Constructing a Blueprint of a New
Assessment Tool for Child L2 Mandarin Receptive Vocabulary”
HKPU Ref No. 1-ZV8K and “From Lexicon to Syntax in
Childhood Bilingualism” RGC Ref. No. CUHK 453808
• We thank Yang Wenchun, Angela He, Jacqueline Lai, Sunny
Park, Kelly Shum, Alice Tse, Eunice Wong, Hinny Wong,
Reace Wong, Zhu Xin, Wang Jiao, Liu Chang, Wang Zheng,
Claire Au and Joffee Lam for their participation.
A New Multimedia Shared L2 Spoken Mandarin Chinese Corpus:
Construction and Linguistic Analyses
Angel CHAN1, Zhen-Hui FENG2, Wen-Chun YANG1
1The Hong Kong Polytechnic University, 2Lingnan University
Data Sharing
• a growing commitment to data-sharing
• basing replicable empirical and theoretical analyses on openly shared data
• initiatives to share learner language corpora on the internet interfaces for the international research community have become more common – However, thus far mostly limited to featuring
European languages as the target languages
Existing Corpora in the Focus Area “SLABank” of Talkbank
No. Name of
Corpus
Target L2
Language L1 Contributors
1 BELC English Spanish Research team at the Department of English of
the University of Barcelona
2 Connolly English Japanese Steve Connolly (Tokyo)
3 CUHK English Chinese Brian MacWhinney
(Department of Psychology,
Carnegie Mellon University)
4 DiazRodriguez Spanish German/Swedish/
Icelandic/
Korean/Chinese
Lourdes Diaz Rodriguez
(Universitat Pompeu Fabra, Spain)
5 Dresden English/French/
Czech
German Angelika Kubanek-German (University of
Braunschweig)
6 ESF Dutch/English/
French/German
/Swedish
Arabic/Finnish/
Punjabi/Spanish/
Turkish
Wolfgang Klein, Clive Perdue (Max Planck
Institute)
7 FLLOC French English Florence Myles
(University of Southampton)
Existing Corpora in the Focus Area “SLABank” of Talkbank (cont’d)
No. Name of
Corpus
Target L2
Language L1 Contributors
8 Køge Danish Turkish Jens Normann Jørgensen (University of
Copenhagen)
9 Langman Hungarian Chinese Juliet Langman
(University of Texas at San Antonio)
10 Liceras
Spanish English Liceras, Juana
(University of Ottawa)
11 PAROLE English, French,
Italian
English, French Languages research team (Laboratoire LLS) at
the Université de Savoie (Chambéry, France)
12 Qatar English Arabic Yun Zhao
(Carnegie Mellon University)
13 Reading French English Brian Richards (University of Reading)
14 SPLLOC Spanish English A team of researchers in Southampton,
Newcastle, and York universities
15 TCD French English Seán Devitt (School of Education, Trinity
College, Dublin)
Existing Corpora in the Focus Area “BilingBank” of Talkbank
No. Name of Corpus Target Languages Contributors
1 Bangor-Pilot Welsh-English Margaret Deuchar
(University of Wales)
2 Bangor (Welsh-English)
Siarad
Welsh-English Margaret Deuchar (Bangor
University)
3 BlumSnow Hebrew-English Shoshana Blum-Kulka
(Hebrew University),
Catherine Snow (Harvard
Graduate School of
Education)
4 Eppler German-English Eva Eppler (University of
Surrey Roehampton)
5 Gardner-Chloros Greek-English Dr.P.H.Gardner-Chloros
(Birkbeck College)
6 Hatzidaki Greek-French Aspa Hatzidaki
7 Køge Turkish-Danish Jens Normann Jørgensen
(University of Copenhagen)
Existing Corpora in the Focus Area “Clinical Corpora” of Talkbank
No. Name of
Corpus Language Age Range N Contributors
1 Bliss English 3:0–11;8
2;3–11;8
8 normal
7 impaired
Lynn S. Bliss
(Wayne State University) 2 Bol /
Kuiken
Dutch 4;1.16–8;1.17
8-18
3-9
1;7-3;7
20
20
20
47
Gerard Bol
(University of Groningen)
3 Bol / Pool Dutch 6-7 6 Gerard Bol
(University of Groningen) 4 Chiat English 5;0-5;8 3 Shula Chiat
5 Conti –
Ramsden 1
English 4;0–9;0 4+4 Gina Conti-Ramsden
(The University of Manchester)
6 Conti-
Ramsden
2
English 1;11–5;8 3+3 Gina Conti-Ramsden
(The University of Manchester)
7 Conti –
Ramsden 3
English 2-4 4 Gina Conti-Ramsden
(The University of Manchester)
Existing Corpora in the Focus Area “Clinical Corpora” of Talkbank
No. Name of
Corpus Language Age Range N Contributors
8 Conti-
Ramsden
4
English 13-15 19, 99 Gina Conti-Ramsden
(The University of Manchester)
9 CORDIS Spanish 10-21 52 Teresa Fernández de Vega Losada,
et al. 10 Feldman
English 1;2–3;0
xxx
4 sets of
twins
Heidi Feldman
(Children’s Hospital) 11 Flusberg English, xxx 6 Autism
6 Down 12 Foudon French 3;9-9;2 8 autism Nadège Foudon
(à l'Institut des Sciences
Cognitives) 13 Fujiki /
Brinton
English 24 –77 years 42 Bonnie Brinton, Martin Fujiki
(Brigham Young University) 14 Hargrove English 3;0–6;0 6 Patricia Hargrove
(Mankato State University) 15 Hooshyar English 1;4–2;11
3;2–11;6
2;8–5;9
40 normal
31 Downs
21 impaired
Nahid Hooshyar
Existing Corpora in the Focus Area “Clinical Corpora” of Talkbank
No. Name of Corpus Languages Age Range N Contributors
16 Le Normand-SLI French 6 Dr. Marie-Thérèse Le
Normand
17 LeNormand –
Apraxia
French 3 Dr. Marie-Thérèse Le
Normand
18 Levy Hebrew 1;10–8;4 14 Yonata Levy
(Hebrew University) 19 Malakoff /
Mayes
English 2;0–2;22 76 Marguerite E. Malakoff
(Harvey Mudd College)
20 MOC Spanish 1;6-5;5 1 Ignacio Moreno-Torres,
Santiago Torres, Rafael
Santana
(University of Málaga,
University of Las
Palmas) 21 Nadig English,
French 3-7 12 English,
8 French
Janet Bang, Aparna
Nadig 22 Nicholas English 12-48 90 Johanna Nicholas
Existing Corpora in the Focus Area “Clinical Corpora” of Talkbank
No. Name of Corpus Target
Languages Age Range N Contributors
23 Oviedo Spanish 7–8 2 Eliseo Diez-Itza
(Universidad de Oviedo)
24 Rollins English 2;2–3;1 5 Pamela Rosenthal Rollins
(University of Texas at
Dallas) 25 Rondal English 3;0–12;1 21
Downs
21
controls
Jean Rondal
(Laboratoire de
Psychologie )
26 Serra Spanish 3;9-5;1 10 Miquel Serra
(Universitat de Barcelona)
27 Ulm German 3;0–7;5 165 Andrea Haege
(University of Ulm)
28 Weismer English 2;6, 3;6, 4;6,
and
5;6
138 Weismer, Susan Ellis
(San Diego State
University)
Existing corpora on L2 Chinese • Only a handful of SLA corpora featuring Chinese as the target language, with a
recent emerging trend to share their SLA learner corpora of Chinese on the internet: – (i).暨南大學留學生書面語語料庫
– (ii).暨南大學華文學院留學生口語語料庫
– (iii).北京語言大學「HSK 動態作文語料庫」 – (iv).北京語言大學漢語仲介語語料庫
• Most corpora feature only written data rather than spoken data
– except (ii)
• The web interfaces of all these corpora are all in Chinese – may not be user-friendly for non-Chinese researchers who would like to conduct
cross-linguistic comparisons involving Chinese
This project
• constructed a web accessible and video-linked Second Language (L2) spoken Mandarin Chinese corpus in a common interchange international format – using the commonly used frog story in cross-linguistic
research (Mayer 1969, Berman & Slobin 1994)
– featuring 14 L2 adult participants (First Language (L1): English) and 6 L1 adult participants as controls
– aiming to share the corpus through the international TalkBank database platform (MacWhinney et al 2004; MacWhinney 2007; http://talkbank.org/)
Background information of the 14 L2 Subjects (L1 English)
No.
Subject Age Gender Education
Level
Age at which learning
of Chinese started Contexts of acquisition Other languages
1 Mi 24 F Master 18
Classroom/Conversation/ Reading
German, French
2 Pa 48 M Master 28
Classroom/Conversation/ Self-learning
German, French
3
Ga 42 M Doctor 19
Conversation/ Reading/Watching TV and
movies
Spanish
4 Je 38 M Doctor 30
Conversation
French
5 Jo 37 M High School 34
Classroom/Conversation
None
6 Ta 36 F High School 33
Classroom/Conversation
None
7 Aa 23 M Bachelor 22
Classroom/Conversation
French, Swedish
8 Al 34 M Bachelor 30
Classroom/Conversation/ Reading/Self-
learning
None
9 Ba 20 M Bachelor 18
Classroom/Conversation
Spanish, German
10 Ge 34 M Master 27
Conversation
Spanish
11 Ja 31 M Master 24
Classroom/Conversation/ Self-learning
Spanish, French
12 Mo 70 M Doctor 22
Classroom/Conversation/ Reading/Self-
learning
French
13 Na 23 M Bachelor 20
Classroom/Conversation
French
14 Pi 24 M Bachelor 20
Classroom/Conversation/ Self-learning
None
Background Information of the 6 L1 Mandarin Subjects
No.
Participant
Age
Education
Level
Gender
L2
Matching
Subject
1 Ya
24 Master F Mi
2 Qi 49 Bachelor M Pa
3 Gu 38 Bachelor M Ga
4 Do 35 Doctor M Je
5 Wu 35 Doctor M Jo
6 Zha 32 Master F Ta
Corpus construction
follow the Talkbank format • videotape and collect speech samples from each participant • orthographically transcribe the speech samples according to
the standard CHAT format • link each transcribed utterance to the original video data • conduct inter-person reliability checks of
– the transcriptions format – the video-linking and synchronization of the data
• perform automatic parts-of-speech and English tagging of the transcriptions using the CLAN software in the TalkBank system
• manual disambiguation of the automatic tagging
Story Script for Frog, Where Are You? by Mercer Mayer, 1969
1 There once was a boy who had a dog and a pet frog. He kept the frog in a large jar in
his bedroom. 2 One night while he and his dog were sleeping, the frog climbed out of the jar. He
jumped out of an open window. 3 When the boy and the dog woke up the next morning, they saw that the jar was empty. 4 The boy looked everywhere for the frog. The dog looked for the frog too. When the
dog tried to look in the jar, he got his head stuck. 5 The boy called out the open window, “Frog, where are you?” The dog leaned out the
window with the jar still stuck on his head. 6 The jar was so heavy that the dog fell out of the window headfirst! 7 The boy picked up the dog to make sure he was ok. The dog wasn’t hurt but the jar
was smashed. 8 - 9 The boy and the dog looked outside for the frog. The boy called for the frog. 10 He called down a hole in the ground while the dog barked at some bees in a beehive. 11 A gopher popped out of the hole and bit the boy on right on his nose. Meanwhile, the
dog was still bothering the bees, jumping up on the tree and barking at them. 12 The beehive fell down and all of the bees flew out. The bees were angry at the dog for
ruining their home. 13 The boy wasn’t paying any attention to the dog. He had noticed a large hole in a tree.
So he climbed up the tree and called down the hole.
Story Script for Frog, Where Are You? by Mercer Mayer, 1969
14 All of a sudden an owl swooped out of the hole and knocked the boy to the ground.
15 The dog ran past the boy as fast as he could because the bees were chasing him. 16 The owl chased the boy all the way to a large rock. 17 The boy climbed up on the rock and called again for his frog. He held onto some
branches so he wouldn’t fall. 18 But the branches weren’t really branches! They were deer antlers. The deer picked up
the boy on his head. 19 The deer started running with the boy still on his head. The dog ran along too. They
were getting close to a cliff. 20-21 The deer stopped suddenly and the boy and the dog fell over the edge of the cliff. 22 There was a pond below the cliff. They landed with a splash right on top of one
another. 23 They heard a familiar sound. 24 The boy told the dog to be very quiet. 25 They crept up and looked behind a big log. 26 There they found the boy’s pet frog. He had a mother frog with him. 27 They had some baby frogs and one of them jumped towards the boy. 28-29 The baby frog liked the boy and wanted to be his new pet. The boy and the dog were
happy to have a new pet frog to take home. As they walked away the boy waved and
said “goodbye” to his old frog and his family.
A Sample Video-linked Transcript
Adding Tagging using the CLAN software
Key characteristics of this corpus
• Will be made openly accessible on the internet
• Having video-linked (hence audio-linked too) transcribed oral data
L2 acquisition of Chinese directional complements
(Feng 2011)
Directional Complements in Chinese (Wu 2011)
Six types of directional complement constructions in Chinese (Wu 2011)
Type Example
1. Simple DCs 他 走 到 了。
2. Complex DCs 他 走 进 来 了。
3. Simple DCs with Object NPs 他 搬 出 了 一张大桌子。
4. Simple DCs with Place NPs 他 走 回 宿舍 了。
5. Complex DCs with Object NPs 他 搬 出 一张大桌子 来 了。
6. Complex DCs with Place NPs 他 走 回 宿舍 来 了。
L2 acquisition of Chinese directional complements (Feng 2011)
• L1-L2 comparisons on:
- frequency of use
- accuracy
- productivity of verb usage (verb type frequency)
• Major finding:
- L1-L2 difference especially with constructions that are structurally more complex
Significance
• could raise the visibility of SLA learner corpora featuring Chinese as the target language
• has the potential of being further expanded and developed into a multi-mother tongue corpus of learner Chinese • featuring a variety of first languages as well as a variety of
Chinese languages as the second languages
• expand to clinical corpora featuring Chinese
Thank You!