feedback types - arbeitsbereichedm/handouts/amaral-meurers-calico-08-03... · feedback types...

7
On the identification and interpretation of tokens in ICALL Luiz Amaral & Detmar Meurers Introduction Research Context The TAGARELA system Feedback Types Feedback Examples System Architecture NLP Analysis Modules Annotation-based NLP Little Things with Big Effects Accents Interpretation Portuguese Properties Mismatches in the interpretation of tokens Solution Tokenization Interpretation Portuguese Properties Mismatches in the identification of tokens Solution Conclusion Little Things With Big Effects: On the identification and interpretation of tokens for error diagnosis in ICALL Luiz Amaral University of Victoria Detmar Meurers The Ohio State University CALICO 2008 Automatic Analysis of Learner Language Workshop March 19, 2008 1/23 On the identification and interpretation of tokens in ICALL Luiz Amaral & Detmar Meurers Introduction Research Context The TAGARELA system Feedback Types Feedback Examples System Architecture NLP Analysis Modules Annotation-based NLP Little Things with Big Effects Accents Interpretation Portuguese Properties Mismatches in the interpretation of tokens Solution Tokenization Interpretation Portuguese Properties Mismatches in the identification of tokens Solution Conclusion Introduction ICALL systems often focus on providing automatic feedback on learner input. Providing feedback involves identifing linguistic properties of the input and interpreting them in terms of likely (mis)conceptions of the learner. For written learner input, this includes orthographic, morphological, syntactic, and semantic properties, all of which require a first step: the identification and interpretation of the words (tokens). Based on our experience with TAGARELA, we discuss the identification and interpretation of tokens in the system how students identify and interpret foreign language tokens problems arising from mismatches between the two our solutions to these problems 2/23 On the identification and interpretation of tokens in ICALL Luiz Amaral & Detmar Meurers Introduction Research Context The TAGARELA system Feedback Types Feedback Examples System Architecture NLP Analysis Modules Annotation-based NLP Little Things with Big Effects Accents Interpretation Portuguese Properties Mismatches in the interpretation of tokens Solution Tokenization Interpretation Portuguese Properties Mismatches in the identification of tokens Solution Conclusion Research Context: The TAGARELA system TAGARELA is an intelligent web-based workbook for learners of Portuguese (Amaral and Meurers 2005, 2006). It addresses real-life needs identified in interviews with OSU foreign language instructors (Amaral 2004, 2007). It provides opportunities for students to practice their listening, reading, and writing skills. Focus: on-the-spot feedback for activity types considered useful but typically assigned as homework. It offers six types of activities: listening comprehension reading comprehension picture description fill-in-the-blank rephrasing vocabulary 3/23 On the identification and interpretation of tokens in ICALL Luiz Amaral & Detmar Meurers Introduction Research Context The TAGARELA system Feedback Types Feedback Examples System Architecture NLP Analysis Modules Annotation-based NLP Little Things with Big Effects Accents Interpretation Portuguese Properties Mismatches in the interpretation of tokens Solution Tokenization Interpretation Portuguese Properties Mismatches in the identification of tokens Solution Conclusion Feedback Types TAGARELA provides on-the-spot feedback on orthographic errors non-words spacing capitalization punctuation semantic errors missing concepts extra concepts word choice syntactic errors nominal agreement verbal agreement 4/23

Upload: phamhanh

Post on 09-Nov-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Little Things With Big Effects:On the identification and interpretation of tokens

for error diagnosis in ICALL

Luiz AmaralUniversity of Victoria

Detmar MeurersThe Ohio State University

CALICO 2008Automatic Analysis of Learner Language Workshop

March 19, 2008

1 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Introduction

I ICALL systems often focus on providing automaticfeedback on learner input.

I Providing feedback involves identifing linguistic propertiesof the input and interpreting them in terms of likely(mis)conceptions of the learner.

I For written learner input, this includes orthographic,morphological, syntactic, and semantic properties, all ofwhich require a first step:

I the identification and interpretation of the words (tokens).

I Based on our experience with TAGARELA, we discussI the identification and interpretation of tokens in the systemI how students identify and interpret foreign language tokensI problems arising from mismatches between the twoI our solutions to these problems

2 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Research Context: The TAGARELA system

I TAGARELA is an intelligent web-based workbook forlearners of Portuguese (Amaral and Meurers 2005, 2006).

I It addresses real-life needs identified in interviews withOSU foreign language instructors (Amaral 2004, 2007).

I It provides opportunities for students to practice theirlistening, reading, and writing skills.

I Focus: on-the-spot feedback for activity types considereduseful but typically assigned as homework.

I It offers six types of activities:I listening comprehensionI reading comprehensionI picture descriptionI fill-in-the-blankI rephrasingI vocabulary

3 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Feedback Types

I TAGARELA provides on-the-spot feedback onI orthographic errors

I non-wordsI spacingI capitalizationI punctuation

I semantic errorsI missing conceptsI extra conceptsI word choice

I syntactic errorsI nominal agreementI verbal agreement

4 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Feedback on Spelling Error

5 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Feedback on Capitalization Error

6 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Feedback on Missing Word

7 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Feedback on Word Choice

8 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Feedback on Agreement Error

9 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Overall system architecture of TAGARELA

Expert Module

Linguistic Analysis sub-modules

Strategic Analysis sub-modules

task strategiestask appropriatenesstransfer

Ana

lysi

s M

anag

er•L

ingu

istic

Ana

lysi

s (fo

rm a

nd c

onte

nt)

•Stra

tegi

c A

naly

sis

Activity Model

Error Taxonomy

Instruction Model

Personal information

Interaction Preferences

Student Model

Language Competence

Feed

back

Man

ager

(ped

agog

ical

mod

ules

)•E

rror

Filt

erin

g•R

anki

ng•S

tude

nt a

naly

sis

•Fee

dbac

k se

lect

ion

FeedbackGeneration

Form Analysis:tokenizerspell-checkerlexical look-updisambiguatorparser

Content Analysis:difflibcorrect answertoken matchercanonic matcherpos matcher

Web InterfaceStudent Input Feedback Message

10 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Linguistic Analysis Modules in TAGARELA

I Tokenizer: takes into account specifics of Portuguese(cliticization, contractions, abbreviations)

I Lexical/Morphological Lookup: returns multipleanalyses based on CURUPIRA lexicon (Martins et al. 2006)

I Finite State Disambiguation: narrows down lexicalinformation, in the spirit of Constraint Grammar(Karlsson et al. 1995; Bick 2000, 2004)

I Bottom-Up Chart Parser: establishes relations to checkagreement, case and global well-formedness

I Content Assessment using shallow semantic matchingbetween answer and target (Bailey and Meurers 2006)

11 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Annotation-based Processing

I NLP analysis = a process of enriching the learner inputwith annotations (parallel to corpus annotation).

I The same data structure is used throughout:I the learner input annotated with information.

12 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Interpreting tokens: Accents (I)

13 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Interpreting tokens: Accents (II)

14 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Properties of PortugueseAccents and their importance for lexical distinctions

I Accents in Portuguese encode important linguisticdistinctions.

I Part-of-speech differences:I pronoun vs. verb

I esta (this) – esta (is)I conjunction vs. verb

I e (and) – e (is)I verb vs. noun

I para (stop) – Para (state’s name)

I Other differences:I gender

I avo (grandfather) – avo (grandmother)I meaning

I coco (coconut) – coco (poop)

15 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Mismatches in the interpretation of accents

I Learner Input: O vaso esta em cima de mesa.

I System’s interpretation:I The word esta in the learner input is a determiner.I There is no form of the verb estar in the answer.⇒ The student did not include the main verb.

I Student’s interpretation:I I included esta as a form of the verb estar.

I (The correct spelling is esta.)I There is a verb in the sentence.⇒ The lack of an accent is a spelling error.

16 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Addressing the Interpretation of Accents

I Learners perceive the unaccented and accentedversions of a character as orthographically similar andin consequence confuse linguistically unrelated forms.

I The system needs to capture the confusability ofaccented with unaccented characters.

I Treat accented and unaccented characters parallel tocommon L1-transfer phonological confusions.

I esta and esta are confused just likeI liver and river are by Japanese learners of English

⇒ Develop a module that compares whether(un)accentuated variants of input words are more likely.

I Where this is the case, provide dedicated feedbackalerting learner of this confusion.

17 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Identifying tokens (I)

18 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Identifying tokens (II)

19 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Properties of PortugueseTokenization

I Certain Portuguese words are syntactically complex.

I Contraction: preposition + determiner/pronounI no = em (in) + o (the)

I nela = em (in) + ela (it)

I destes = de (of) + estes (these)

I as = a (to) + as (the)

I Encliticization:I compra-lo = comprar (to buy) + o (it)

I compram-nas = compram (buy) + as (them)

I comprei-a = comprei (bought) + a (it)

20 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Mismatches in the identification of tokens

I Learner input: O Amazonas fica no regiao norte.

I System’s interpretation: no = em + oI tokenized input: [em, o, regiao, norte]I syntactically analyzed: [PP em [NP omasc , regiaofem, norte]]⇒ Agreement error between o and regiao.

I Student’s interpretation:I There is no o regiao norte in the sentence I wrote.I I used the ‘preposition’ no.⇒ So no seems to be the wrong preposition?

21 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Addressing the Identification of Tokens

I The system needs to connect the surface form providedby the student with the system analysis of this input.

I An annotation-based NLP architecture readily supportsthis with multiple parallel layers of annotation for thelearner input.

I The tokenization mismatch can be adressed byrepresenting both surface and deep tokenizations of thelearner input, and the mapping between the two.

I Refer to surface form when generating the feedback.

22 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

ConclusionI In an ICALL system, problems can arise from

mismatches between:I the identification and interpretation of the learner input

by the systemI how the learner perceives and conceptualize the input

I Where such mismatches arise, the feedback producedby the system is inadequate.

I We discussed two such mismatches for Portuguesetokens in TAGARELA:

I interpretation of tokens: accented charactersI identification of tokens: contraction, encliticization

I We argued that these problems can be adressedI by treating accented and unaccented characters parallel

to common L1-transfer phonological confusions.I using an annotation-based NLP processing architecture

supporting a rich representation of the learner input,including surface and deep tokenizations.

23 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

References

Amaral, Luiz (2007). Designing Intelligent Language Tutoring Systems: integratingNatural Language Processing technology into foreign language teaching.Ph.D. thesis, The Ohio State University.

Amaral, Luiz and Detmar Meurers (2006). Where does ICALL Fit into ForeignLanguage Teaching? CALICO Conference. May 19, 2006. University of Hawaii.

Amaral, Luiz Alexandre (2004). ICALL Specifications. Ms. Department ofLinguistics & Department of Spanish and Portuguese. Ohio State University.

Amaral, Luiz Alexandre and Walt Detmar Meurers (2005). Towards Bridging theGap between the Needs of Foreign Language Teaching and NLP in ICALL. InAntonio Pedros-Gascon (ed.), Electronic Proceedings of the 8th AnnualSymposium on Hispanic and Luso-Brazilian Literatures, Linguistics, andCultures.

Bailey, Stacey and Detmar Meurers (2006). Exercise-driven selection of contentmatching methodologies. EUROCALL Conference. September 6, 2006.University of Granada.

Bick, Eckhard (2000). The Parsing System “Palavras”: Automatic GrammaticalAnalysis of Portuguese in a Constraint Grammar Framework . AarhusUniversity Press.

Bick, Eckhard (2004). PaNoLa: Integrating Constraint Grammar and CALL. InHenrik Holmboe (ed.), Nordic Language Technology, Arbog for NordiskSprogteknologisk Forskningsprogram 2000-2004 (Yearbook 2003),Copenhagen: Museum Tusculanum, pp. 183–190.

23 / 23

On the identificationand interpretation of

tokens in ICALLLuiz Amaral & Detmar Meurers

Introduction

Research ContextThe TAGARELA system

Feedback Types

Feedback Examples

System Architecture

NLP Analysis Modules

Annotation-based NLP

Little Things withBig EffectsAccents

Interpretation

Portuguese Properties

Mismatches in theinterpretation of tokens

Solution

Tokenization

Interpretation

Portuguese Properties

Mismatches in theidentification of tokens

Solution

Conclusion

Karlsson, Fred, Atro Voutilainen, Juha Heikkila and Arto Anttila (eds.) (1995).Constraint Grammar: A Language-Independent System for ParsingUnrestricted Text . No. 4 in Natural Language Processing. Berlin and New York:Mouton de Gruyter.

Martins, Ronaldo, Ricardo Hasegawa and Maria das Gracas Nunes (2006).Curupira: a functional parser for Brazilian Portuguese. In ComputationalProcessing of the Portuguese Language, 6th International Workshop,PROPOR. Lecture Notes in Computer Science 2721. Faro, Portugal: Springer.

23 / 23