uima-based systems for language learning...task-speci c pipelines, modular architecture matching on...

48
UIMA-based Systems for Language Learning Bj¨ orn Rudzewitz ubingen University May 17, 2017 Rudzewitz (T¨ ubingen University) UIMA Systems CALICO, May 17, 2017 1 / 46

Upload: others

Post on 30-Dec-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

UIMA-based Systems for Language Learning

Bjorn Rudzewitz

Tubingen University

May 17, 2017

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 1 / 46

Page 2: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Plan

1 Introduction

2 FeedBook

3 TAGARELA

4 VIEW/WERTi

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 2 / 46

Page 3: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Our Perspective

NLP = Natural Language Processing

NLP applicable to all language variations, e.g. social media,literature, clinical data, . . .

our perspective: analysis of narural language for language learningpurposes

diverging levels of evidence for insights into language acquisitionprocesses

NLP for analysis, pre-processing, feedback generation, inputenhancement, . . .

following: presentation of UIMA-based NLP systems and hands-onsession

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 3 / 46

Page 4: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

FeedBook

ICALL system for English as a second language

developing an enhanced online version of a paper-based workbook for7th grade English

Application of NLP:

(currently:) correction aid for teachers(work in progress:) scaffolding feedback for students on

1 spectrum from open to closed activities2 form and meaning tasks

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 4 / 46

Page 5: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

FeedBookStep 1

fully functional online version of certified schoolbook with 55 exercises

four activity types:1 short answers2 fill-in-the-blanks3 true/false4 mapping

different UIMA pipelines for different task types and expected inputs

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 5 / 46

Page 6: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

The FeedBook System

platform-independent multi-layer web application

first system version for collecting student answer and teacherfeedback data

workflow:

students select and work on exercisesteachers correct student answersstudents inspect the teacher’s resultsteachers inspect statistics about results

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 6 / 46

Page 7: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

FeedBook - Student Lobby

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 7 / 46

Page 8: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Input Interface for Students (1)

Example: listeningcomprehensionwith sentences to write

each subtask displayed onone page

each page contains allnecessary information andmedia

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 8 / 46

Page 9: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Input Interface for Students (2)

Example: fill-in-the-blanks

frontend as in the paperworkbook.

student can save or submitexercise

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 9 / 46

Page 10: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

FeedBook - Teacher Lobby

system shows overview about corrected tasksand task to be corrected

teacher selects exercise for correction

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 10 / 46

Page 11: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Correction Interface for Teachers

correction interfaceshows:

exercisestudent answerstarget answers

task of the teacher:

mark andcategorize errors

optionally commentthem

give globalcomment andrating

correction aid by system

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 11 / 46

Page 12: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Correction Aid for Teachers

system shows

green checkmark= answer correct

system marks visually

differences betweenstudent & target answer

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 12 / 46

Page 13: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Error Annotation by Teachers

teacher selects part of thestudent answer and chooseserror category

optional:

comment for studentcorrect solution(automatic)

FeedBook rememberscorrections

→ “Feedback Memory”

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 13 / 46

Page 14: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Error Annotation by Teacher

Language form errors Content errorsphrasing, agreement, problematic understanding,determiner, preposition, missing information,grammar, spelling, wrong information,pronoun, tense, lack of understanding,clause structure, extra information,word choice, missing word, alternate answerword order, punctuation

Table: Error types in FeedBook

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 14 / 46

Page 15: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Feedback Memory - Step 1

system identifies identicalanswers automatically

system automatically detectsdeviations from target answer

→ automatic creation ofannotations

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 15 / 46

Page 16: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Feedback Memory - Step 2

teacher

selects error type

can modify theinformation

automatic distinguished frommanual feedback via colors

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 16 / 46

Page 17: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Feedback Memory - Step 3

errors corrected once arereloaded

system corrects know errorsautomatically.

teacher can change ormodify the information.

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 17 / 46

Page 18: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Result Interface for Students

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 18 / 46

Page 19: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Diagnostics Interface for Teachers

teachers can group errors andvisualize them

grouping by task possible,planned by student

status of a task also visualized

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 19 / 46

Page 20: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

FeedBookStep 2 (work in progress)

UIMA-based NLP deep analysis of answers for fully automaticscaffolding feedback for students

incremental, task-informed processing for multi-level comparison ofstudent and target answer

type of divergence and convergence on different levels for derivingdiagnosis

parallel processing of high number of utterances

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 20 / 46

Page 21: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

FeedBookStep 2

creation of feedback from diagnosis based on

task factorsstudent history/competencesgeneral language factors

transfer feedback technology from teacher to student interface

question for NLP: type of feedback generation for activities based onposition in spectrum from task-constrained to general languagei.e. where does it make sense to perform analyses ?

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 21 / 46

Page 22: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

FeedBookLinks

FeedBook info page: http://feedbook.website

FeedBook web app: https://feedbook.schule

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 22 / 46

Page 23: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

TAGARELA

TAGARELA ICALL system [Amaral et al., 2011] for learningPortuguese

feedback on form and certain aspects of meaning

accompanies regular course material

feedback generation based on language, task, and user factors

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 23 / 46

Page 24: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

TAGARELA Example

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 24 / 46

Page 25: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Activities

six activity types

different UIMA pipelines/NLP tools for different activity types

→ generation of diagnosis

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 25 / 46

Page 26: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

TAGARELA

diagnosis → feedback

error taxonomy based on authentic data

error diagnosis: non-words, orthography, agreement, missing concept,extra concept, word order, word choice

feedback module

prioritizes errors according to task (meaning/form) and student model(error history)generates verbalization of diagnosis (feedback message)

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 26 / 46

Page 27: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

TAGARELA - Feedback

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 27 / 46

Page 28: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

TAGARELAUIMA requirement

Portuguese as a challenging case for NLP

multi-level language analyses: contracted tokens, enclisis

learner language analysis

task-specific pipelines, modular architecture

matching on various levels of abstraction

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 28 / 46

Page 29: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

TAGARELAUIMA for Portuguese

parallel representation and incremental enrichment of analysis helpful

information from one layer can help making decisions on another layer

example: tokenization and part of speech tagging

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 29 / 46

Page 30: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

UIMA

Example: da = de + as

tokenizer decision informs part of speech tagging level (preposition +article)

token annotation as hierarchical structure of surface form (text) anddeep forms (split tokens)

parallel surface augmentation representation

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 30 / 46

Page 31: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Complex Token

Figure: Example of a complex Token

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 31 / 46

Page 32: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

TAGARELALinks

info page and web app:http://sifnos.sfs.uni-tuebingen.de/tagarela/index.py/main

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 32 / 46

Page 33: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

VIEW/WERTi

VIEW: Visual Input Enhancement of the Web

input enhancement tool for language learners

based on the WERTi system

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 33 / 46

Page 34: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

WERTi

WERTI system [Meurers et al., 2010]

WERTi: Working with English Real Texts

tool for enhancing web pages

users can enhance web pages of their own choice

goal: raising awareness for linguistic constructions

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 34 / 46

Page 35: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

WERTiInput Enhancement

input enhancement [Smith, 1993]

communicative activities with (incidental) focus on form [Long, 1998]

exposure to linguistic data with certain constructions (potentially)highlighted

language learning not possible without noticing of forms → inputenhancement

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 35 / 46

Page 36: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

WERTi

high motivation/relevant material via free choice by learners

contextualized learning by enhancing web pages (include images,videos, . . . )

supplements regular course material/self-learning

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 36 / 46

Page 37: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

WERTiConstructs

modular system architecture allows flexibility in components

constructs included:

determinersprepositiongerundsinfinitiveswh-questionsphrasal verbs

3 activity types:

color highlighting of constructsclicking (with color feedback)gapped text

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 37 / 46

Page 38: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

WERTi - ExampleColor Highlighting

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 38 / 46

Page 39: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

WERTi - ExampleClicking

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 39 / 46

Page 40: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

WERTi - ExampleGapped Text

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 40 / 46

Page 41: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

VIEW/WERTi

VIEW: extension of VIEW to multiple languages (EN, DE, ES, RU)

implementation as a browser plugin

more activities (multiple choice) and constucts implemented

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 41 / 46

Page 42: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

VIEW - Example (1)

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 42 / 46

Page 43: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

VIEW - Example (2)

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 43 / 46

Page 44: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

VIEW/WERTiLinks

VIEW page: http://sifnos.sfs.uni-tuebingen.de/VIEW/

WERTi page:http:

//sifnos.sfs.uni-tuebingen.de/WERTi/index.jsp?content=home

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 44 / 46

Page 45: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

References

Pleae note: all URLs were accessed last on May 8, 2017.

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46

Page 46: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Luiz Amaral, Detmar Meurers, and Ramon Ziai. Analyzing learnerlanguage: towards a flexible natural language processing architecture forintelligent language tutors. Computer Assisted Language Learning, 24(1):1–16, 2011.

Ana Dıaz-Negrillo, Detmar Meurers, Salvador Valera, and Holger Wunsch.Towards interlanguage POS annotation for effective learner corpora inSLA and FLT. In Language Forum, volume 36, pages 139–154, 2010.

David Ferrucci and Adam Lally. UIMA: an architectural approach tounstructured information processing in the corporate researchenvironment. Natural Language Engineering, 10(3-4):327–348, 2004.

Michael H Long. Focus on form theory, research, and practice michael h.long peter robinson. Focus on form in classroom second languageacquisition, 15:15–41, 1998.

Detmar Meurers, Ramon Ziai, Luiz Amaral, Adriane Boyd, AleksandarDimitrov, Vanessa Metcalf, and Niels Ott. Enhancing authentic webpages for language learners. In Proceedings of the NAACL HLT 2010Fifth Workshop on Innovative Use of NLP for Building EducationalApplications, pages 10–18. Association for Computational Linguistics,2010.

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46

Page 47: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

Philip Ogren and Steven Bethard. Building test suites for UIMAcomponents. In Proceedings of the Workshop on Software Engineering,Testing, and Quality Assurance for Natural Language Processing(SETQA-NLP 2009), pages 1–4, Boulder, Colorado, June 2009.Association for Computational Linguistics. URLhttp://www.aclweb.org/anthology/W/W09/W09-1501.

Marwa Ragheb and Markus Dickinson. Defining Syntax for LearnerLanguage Annotation. In COLING (Posters), pages 965–974, 2012.

Michael Sharwood Smith. Input enhancement in instructed sla. Studies insecond language acquisition, 15(02):165–179, 1993.

Sylvie Thouesny. Increasing the reliability of a part-of-speech tagging toolfor use with learner language. In Presentation given at the AutomaticAnalysis of Learner Language (AALL’09) workshop on automaticanalysis of learner language: from a better understanding of annotationneeds to the development and standardization of annotation schemes,2009.

Bertus Van Rooy and Lande Schafer. An evaluation of three pos taggersfor the tagging of the tswana learner english corpus. In Proceedings of

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46

Page 48: UIMA-based Systems for Language Learning...task-speci c pipelines, modular architecture matching on various levels of abstraction Rudzewitz (Tubingen University) UIMA Systems CALICO,

the Corpus Linguistics 2003 conference, volume 16, pages 835–844.Citeseer, 2003.

Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46