uima-based systems for language learning...task-speci c pipelines, modular architecture matching on...
TRANSCRIPT
UIMA-based Systems for Language Learning
Bjorn Rudzewitz
Tubingen University
May 17, 2017
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 1 / 46
Plan
1 Introduction
2 FeedBook
3 TAGARELA
4 VIEW/WERTi
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 2 / 46
Our Perspective
NLP = Natural Language Processing
NLP applicable to all language variations, e.g. social media,literature, clinical data, . . .
our perspective: analysis of narural language for language learningpurposes
diverging levels of evidence for insights into language acquisitionprocesses
NLP for analysis, pre-processing, feedback generation, inputenhancement, . . .
following: presentation of UIMA-based NLP systems and hands-onsession
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 3 / 46
FeedBook
ICALL system for English as a second language
developing an enhanced online version of a paper-based workbook for7th grade English
Application of NLP:
(currently:) correction aid for teachers(work in progress:) scaffolding feedback for students on
1 spectrum from open to closed activities2 form and meaning tasks
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 4 / 46
FeedBookStep 1
fully functional online version of certified schoolbook with 55 exercises
four activity types:1 short answers2 fill-in-the-blanks3 true/false4 mapping
different UIMA pipelines for different task types and expected inputs
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 5 / 46
The FeedBook System
platform-independent multi-layer web application
first system version for collecting student answer and teacherfeedback data
workflow:
students select and work on exercisesteachers correct student answersstudents inspect the teacher’s resultsteachers inspect statistics about results
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 6 / 46
FeedBook - Student Lobby
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 7 / 46
Input Interface for Students (1)
Example: listeningcomprehensionwith sentences to write
each subtask displayed onone page
each page contains allnecessary information andmedia
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 8 / 46
Input Interface for Students (2)
Example: fill-in-the-blanks
frontend as in the paperworkbook.
student can save or submitexercise
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 9 / 46
FeedBook - Teacher Lobby
system shows overview about corrected tasksand task to be corrected
teacher selects exercise for correction
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 10 / 46
Correction Interface for Teachers
correction interfaceshows:
exercisestudent answerstarget answers
task of the teacher:
mark andcategorize errors
optionally commentthem
give globalcomment andrating
correction aid by system
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 11 / 46
Correction Aid for Teachers
system shows
green checkmark= answer correct
system marks visually
differences betweenstudent & target answer
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 12 / 46
Error Annotation by Teachers
teacher selects part of thestudent answer and chooseserror category
optional:
comment for studentcorrect solution(automatic)
FeedBook rememberscorrections
→ “Feedback Memory”
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 13 / 46
Error Annotation by Teacher
Language form errors Content errorsphrasing, agreement, problematic understanding,determiner, preposition, missing information,grammar, spelling, wrong information,pronoun, tense, lack of understanding,clause structure, extra information,word choice, missing word, alternate answerword order, punctuation
Table: Error types in FeedBook
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 14 / 46
Feedback Memory - Step 1
system identifies identicalanswers automatically
system automatically detectsdeviations from target answer
→ automatic creation ofannotations
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 15 / 46
Feedback Memory - Step 2
teacher
selects error type
can modify theinformation
automatic distinguished frommanual feedback via colors
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 16 / 46
Feedback Memory - Step 3
errors corrected once arereloaded
system corrects know errorsautomatically.
teacher can change ormodify the information.
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 17 / 46
Result Interface for Students
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 18 / 46
Diagnostics Interface for Teachers
teachers can group errors andvisualize them
grouping by task possible,planned by student
status of a task also visualized
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 19 / 46
FeedBookStep 2 (work in progress)
UIMA-based NLP deep analysis of answers for fully automaticscaffolding feedback for students
incremental, task-informed processing for multi-level comparison ofstudent and target answer
type of divergence and convergence on different levels for derivingdiagnosis
parallel processing of high number of utterances
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 20 / 46
FeedBookStep 2
creation of feedback from diagnosis based on
task factorsstudent history/competencesgeneral language factors
transfer feedback technology from teacher to student interface
question for NLP: type of feedback generation for activities based onposition in spectrum from task-constrained to general languagei.e. where does it make sense to perform analyses ?
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 21 / 46
FeedBookLinks
FeedBook info page: http://feedbook.website
FeedBook web app: https://feedbook.schule
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 22 / 46
TAGARELA
TAGARELA ICALL system [Amaral et al., 2011] for learningPortuguese
feedback on form and certain aspects of meaning
accompanies regular course material
feedback generation based on language, task, and user factors
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 23 / 46
TAGARELA Example
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 24 / 46
Activities
six activity types
different UIMA pipelines/NLP tools for different activity types
→ generation of diagnosis
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 25 / 46
TAGARELA
diagnosis → feedback
error taxonomy based on authentic data
error diagnosis: non-words, orthography, agreement, missing concept,extra concept, word order, word choice
feedback module
prioritizes errors according to task (meaning/form) and student model(error history)generates verbalization of diagnosis (feedback message)
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 26 / 46
TAGARELA - Feedback
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 27 / 46
TAGARELAUIMA requirement
Portuguese as a challenging case for NLP
multi-level language analyses: contracted tokens, enclisis
learner language analysis
task-specific pipelines, modular architecture
matching on various levels of abstraction
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 28 / 46
TAGARELAUIMA for Portuguese
parallel representation and incremental enrichment of analysis helpful
information from one layer can help making decisions on another layer
example: tokenization and part of speech tagging
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 29 / 46
UIMA
Example: da = de + as
tokenizer decision informs part of speech tagging level (preposition +article)
token annotation as hierarchical structure of surface form (text) anddeep forms (split tokens)
parallel surface augmentation representation
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 30 / 46
Complex Token
Figure: Example of a complex Token
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 31 / 46
TAGARELALinks
info page and web app:http://sifnos.sfs.uni-tuebingen.de/tagarela/index.py/main
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 32 / 46
VIEW/WERTi
VIEW: Visual Input Enhancement of the Web
input enhancement tool for language learners
based on the WERTi system
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 33 / 46
WERTi
WERTI system [Meurers et al., 2010]
WERTi: Working with English Real Texts
tool for enhancing web pages
users can enhance web pages of their own choice
goal: raising awareness for linguistic constructions
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 34 / 46
WERTiInput Enhancement
input enhancement [Smith, 1993]
communicative activities with (incidental) focus on form [Long, 1998]
exposure to linguistic data with certain constructions (potentially)highlighted
language learning not possible without noticing of forms → inputenhancement
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 35 / 46
WERTi
high motivation/relevant material via free choice by learners
contextualized learning by enhancing web pages (include images,videos, . . . )
supplements regular course material/self-learning
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 36 / 46
WERTiConstructs
modular system architecture allows flexibility in components
constructs included:
determinersprepositiongerundsinfinitiveswh-questionsphrasal verbs
3 activity types:
color highlighting of constructsclicking (with color feedback)gapped text
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 37 / 46
WERTi - ExampleColor Highlighting
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 38 / 46
WERTi - ExampleClicking
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 39 / 46
WERTi - ExampleGapped Text
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 40 / 46
VIEW/WERTi
VIEW: extension of VIEW to multiple languages (EN, DE, ES, RU)
implementation as a browser plugin
more activities (multiple choice) and constucts implemented
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 41 / 46
VIEW - Example (1)
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 42 / 46
VIEW - Example (2)
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 43 / 46
VIEW/WERTiLinks
VIEW page: http://sifnos.sfs.uni-tuebingen.de/VIEW/
WERTi page:http:
//sifnos.sfs.uni-tuebingen.de/WERTi/index.jsp?content=home
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 44 / 46
References
Pleae note: all URLs were accessed last on May 8, 2017.
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46
Luiz Amaral, Detmar Meurers, and Ramon Ziai. Analyzing learnerlanguage: towards a flexible natural language processing architecture forintelligent language tutors. Computer Assisted Language Learning, 24(1):1–16, 2011.
Ana Dıaz-Negrillo, Detmar Meurers, Salvador Valera, and Holger Wunsch.Towards interlanguage POS annotation for effective learner corpora inSLA and FLT. In Language Forum, volume 36, pages 139–154, 2010.
David Ferrucci and Adam Lally. UIMA: an architectural approach tounstructured information processing in the corporate researchenvironment. Natural Language Engineering, 10(3-4):327–348, 2004.
Michael H Long. Focus on form theory, research, and practice michael h.long peter robinson. Focus on form in classroom second languageacquisition, 15:15–41, 1998.
Detmar Meurers, Ramon Ziai, Luiz Amaral, Adriane Boyd, AleksandarDimitrov, Vanessa Metcalf, and Niels Ott. Enhancing authentic webpages for language learners. In Proceedings of the NAACL HLT 2010Fifth Workshop on Innovative Use of NLP for Building EducationalApplications, pages 10–18. Association for Computational Linguistics,2010.
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46
Philip Ogren and Steven Bethard. Building test suites for UIMAcomponents. In Proceedings of the Workshop on Software Engineering,Testing, and Quality Assurance for Natural Language Processing(SETQA-NLP 2009), pages 1–4, Boulder, Colorado, June 2009.Association for Computational Linguistics. URLhttp://www.aclweb.org/anthology/W/W09/W09-1501.
Marwa Ragheb and Markus Dickinson. Defining Syntax for LearnerLanguage Annotation. In COLING (Posters), pages 965–974, 2012.
Michael Sharwood Smith. Input enhancement in instructed sla. Studies insecond language acquisition, 15(02):165–179, 1993.
Sylvie Thouesny. Increasing the reliability of a part-of-speech tagging toolfor use with learner language. In Presentation given at the AutomaticAnalysis of Learner Language (AALL’09) workshop on automaticanalysis of learner language: from a better understanding of annotationneeds to the development and standardization of annotation schemes,2009.
Bertus Van Rooy and Lande Schafer. An evaluation of three pos taggersfor the tagging of the tswana learner english corpus. In Proceedings of
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46
the Corpus Linguistics 2003 conference, volume 16, pages 835–844.Citeseer, 2003.
Rudzewitz (Tubingen University) UIMA Systems CALICO, May 17, 2017 45 / 46