dr. plaban kumar bhowmick assistant professor centre for educational technology
TRANSCRIPT
Langauge Processing for e-Learning
Dr. Plaban Kumar BhowmickAssistant ProfessorCentre for Educational Technology
About E61002
Course Description & Objective Course Description
Use of basic NLP tools and techniques to address challenges in text-based eLearning systems or to develop text-based e-learning applications
Course Objectives To identify challenges in text mediated e-
learning systems To identify NLP techniques to deal with the
challenges To apply NLP tools to address the challenges
Grading Policy
End semester exam 50 marks
Mid semester exam 30 marks
TA evaluation Project work
▪ 20 marks (mid term and end term evaluation)▪ Previous year achievement
▪ Finalist of Samsung Innovation Awards 2014▪ Paper submission at IUI’2015
Course Materials
Books Speech and Natural Language
Processing, Jurafsky and Martin Handbook of Automated Essay
Evaluation, Shermis and Burstein Automated grammatical error detection
for language learners, Leacock et al. Research articles
ACL, COLING, Computational Linguistics and others
Similar Courses
Educational Natural Language Processing, Tutorial in COLING, 2008.
Natural Language Processing and eLearning Prof. Dr. Iryna Gurevych, Dr. Delphine Bernhard Ubiquitous Knowledge Processing, Technische
Universitat DarmstadNatural Language Processing for
Educational Applications Rada Mihalcea, University of North Texas
Course Webpage
http://www.cel.iitkgp.ernet.in/~plaban/courses/ET61002/ET61002_2015.html
Evolution in Education Space
Source: http://joyreactor.com/post/500058
Evolution in Education Space
Evolution in Education Space
Evolution in Education Space
Evolution in Education Space
Evolution in Education Space
Evolution in Education Space
1728 1913 1930 1996 1990s 2000s
Distance Education
Source: EdTechReview
Evolution in Education Space
Evolution in Education Space
Uniformity
Multi Dimensional Evolution
Uniformity
Reach
abilit
y
Know
led
ge s
pace
No affective interaction
Restricted assessment
De-personalization
Non-adaptive
Information overload
Increased search time
Interoperability
Large scale data
Big Question
How different computational techniques and systems can automate different processes in teaching and learning?
e-Learning
E-learning (or eLearning) the use of electronic media and information and
communication technologies (ICT) in education Synonymous with
multimedia learning, technology-enhanced learning (TEL), computer-based instruction(CBI), computer-based training (CBT), computer-assisted instruction or
computer-aided instruction (CAI), And many others
E-learning: Early Days
Pre-packaged multi-media course materials with restricted type and set of assessment items.
Not learner centric Mostly focus towards cognitive
aspects of learning ignoring social and affective dimensions altogether.
E-learning 2.0 = E-learning + Web 2.0
Web 2.0 is not a new web Web 2.0: a new way of designing
participation, hosting services, and web-based communities, promoting creativity and information sharing.
specific technologies like wikis and blogs, a new way of creating web pages like mash-ups, and a massive use of descriptors or tags in what has been defined as a folksonomies.
Six Big Ideas behind Web 2.0
Individual production and User Generated Content
Harness the power of the crowd Data on an epic scale Architecture of Participation Openness
Web 2.0 in e-learning
A wiki is essentially a website constructed in such a way as to allow users to change content on the site. Key features:
▪ Hypertextual structure ▪ Social authoring - collaborative production ▪ Dynamic document - always under construction
Educational use:▪ To support collaborative work, substituting old .doc or .pdf
documents. ▪ To produce a course or study corpus in cooperation with all
academic stakeholders: lecturers, students, … ▪ To distribute information to students, in order to facilitate the
updating of materials by the professor.
Web 2.0 in e-learning
A blog is a way of distributing news Key features:
▪ There are one or several authors that produce entries ▪ Visitors can add comments ▪ New entries and comments do not substitute older ones ▪ It is possible to subscribe in order to receive news via email
or through RSS readers. Educational use:
▪ Teachers have used blogs as an easy way to produce dynamic learning environments without previous knowledge of html.
▪ Students have used blogs as an alternative digital portfolio or as a learning log.
▪ Ultimately, blogs have been used as support for collaborative work.
Web 2.0 in e-learning
Online office: Google drive, MS Sharepoint etc
Social bookmarking Video repositories and online
videos Social networks
New Learning Paradigms
Study at any place, any time Several devices may be used for learning:
computer, iPod, PDA, etc.
Authority in educational systems is distributed: collective intelligence and wisdom of the crowds Learn not only from teachers and instructors, but
also from peers
New forms of knowledge organization: tags and folksonomies
Massive - enrolment numbersOpen - no mandatory qualificationsOnline - fullyCourse - structured, temporal
Course designed in short (~10min) modules Low study hours per week - modules not degree
programmes Certificates of completion rather than credit…
some learners are not students of universities
Rise of the MOOCS
The MOOC Platform
Content Assessment Communication
Video lecture Multiple Choice Quiz (MCQ) Threaded discussion forum
Video group discussion Peer Assessment
‘Robot’ grading
• Live webcasts or Hangouts
Text as Media
We know it’s important even in this digital age.
Textual discourses in education Learning materials in web Wiki, blog Assessment items Formative feedback
Educational NLP
Use of Natural Language Processing (NLP) techniques to assist current e-learning and MOOC-based platforms through automated text analysis assessment item generation automated grading and feedback generation text adaptation tutoring dialog processing content metadata extraction etc.
Educational Natural Language Processing
Educational Natural Language Processing
eLearning NLP
Computer aided learning/Instructio
n
Computational analysis of language
Intelligent Computer Assisted Language Learning (ICALL)
Definition of CALL from Levy 1997: “Computer-Assisted Language Learning (CALL)
may be defined as the search for and the study of applications of the computer in language teaching and learning”
Intelligent CALL (ICALL), or more appropriately parser-based CALL from Holland et al 1993: ICALL relies on parsing, “a technique that enables
the computer to encode complex grammatical knowledge such as humans use to assemble sentences, recognize errors and make corrections”From Markus
Dickinson slides
Importance of Language Forms
Research since the 90s has shown that awareness of language forms and rules is important for an adult learner to successfully acquire a foreign language.
Real Life Constraints
The time a student can spend with an instructor/tutor typically is very limited. work on form and grammar is often
deemphasized and confined to homework The downside
learner has relatively few opportunities to gain awareness of forms and rules and receive individual feedback on errors.
An Opportunity for CALL
Excellent opportunity for developing Computer-Aided Language Learning (CALL) tools to provide individual feedback on learner errors and foster learner awareness of relevant language
forms and categories. But CALL systems which offer exercises
typically are limited to multiple choice, or simple form filling
feedback usually is limited to yes/no or letter-by-letter matching of the string with a pre-stored answer.
From CALL to ICALL
Linguistic modeling is needed to improve on this situation: tokenization: identify words morphological analysis: identify/interpret morphemes syntactic analysis: identify selection, government and
agreement relations and word order requirements formal pragmatic analysis: identify coreference
relations, information structure partitioning, . . . Computational tools identifying such linguistic
properties need to be integrated into CALL systems to obtain language-aware “Intelligent” CALL (ICALL).
Intelligent CALL
Intelligent CALL (ICALL) focuses on using linguistics and natural language processing to make CALL better.
When building a full-blown ICALL system, several issues arise: Diagnosing and accounting for user
errors Parser-based CALL Modeling the system on particular (kinds
of) users Presenting useful feedback to the user
Parser-based CALL
We want to use parsers to: Assign an analysis to a learner’s input Identify erroneous spots in the input Provide an analysis of what the error signifies
But parsers are designed for well-formed constructs When learners type in incorrect sentences,
parsers have to be modified to correctly analyze the incorrectness.
Finding Errors
Use mal-rules = rules which are added to the grammar to handle error cases. e.g., A singular noun and a plural verb are
allowed to combine, but it is marked as an error. Constraint relaxation: a parser can be
reworked to handle ill-formed input. Parsers normally just “die” when handling bad
input. So, we allow some constraints (e.g., that a
subject and verb must match in number) to be relaxed while parsing
Second Language Acquisition and ICALL
Given multiple possible analyses for a sentence, which one is most likely, based on: the stage of acquisition of the learner the first language of the learner the focus of the exercise
What kinds of errors are we interested in and do we expect? We would like an error typology = a
classification of errors into different groups. People are starting to look at error-tagged
corpora to find the most common errors
Feedback in ICALL
Feedback = response to the learner based on their input. Purpose of feedback: Reinforcement: feedback can act as a reinforcer to
learn a particular concept (behaviorism) Learning processes need feedback to know right
from wrong (cognitivism) Things to keep in mind when designing a
system: Feedback needs to be accurate. Displaying more than one error message at a time
is not helpful (Heift 2001). Explanations should be short.
NLP Needs in ICALL: Summary Learner Error Corpus
Error taxonomy L1 specific corpora collection and annotation
Automatic detection of grammatical error Parser-based methods Statistical methods
Grammatical error correction Correcting individual errors Correcting whole sentence
Evaluation of error detection and correction
Readability Level Assessment
Readability The feature of language that makes it
easy to read is called readability.
Benefits of Readability
Greater readability increases Comprehension (Understanding) Retention (Memory) Reading speed Persistence (reading more of the text)
What happens when the text is too difficult?
Readers feel frustrated. Most often, they stop reading
without even thinking about it. They may seek help or call support. They go to some other task.
Readability Variables
The reader Prior knowledge Reading skill Interest Motivation
The text Content Style Design Organization
How do we measure readability?
Look at Words and their formations Average sentence length Is the text coherent?
Readability formula They make some hard assumptions Do not consider semantic features
Word’s readability scores for these two texts are almost identical. Yet they are clearly different in nature and effect on the reader.
Reading Ease = 65.9 Reading Ease = 65.4
Reading Comprehension
What is constructed when we comprehend a sentence? Propositional representation
Do words have role in comprehension? Mental lexicon
How do we resolve different ambiguities? A thief shot a cop in the park
How does comprehension relate to reading time? Reading time experiments Eye tracking experiments
Text Adaptation
What if a particular text segment is not readable? Sources of difficulty
▪ Lexical: Meaning of technical terms, entities▪ Syntactic difficulty▪ Discourse level difficulty
Encyclopaedic Annotation
Encyclopaedic Annotation
Entity linking in text linking name mentions in text with
their referent entities in a knowledge base
Wikify
Mark Twain
“I notice that you use plain, simple language, short words, and brief sentences. That is the way to write English—it is the modern way and the best way. Stick to it; and don’t let the fluff and flowers and verbosity creep in.
“When you catch an adjective, kill it. No, I don’t mean utterly, but kill most of them—then the rest will be valuable. They weaken when close together. They give strength when they are wide apart.”
—Mark Twain, in a letter to a 12-year-old boy.
Text Simplification
Lexical Simplification
A subtask of text simplification Replacing words or short phrases by
simpler variants in a context aware fashion
Motivation To reach out to wider range of readers
having limited vocabulary▪ Children▪ People with low literacy level or cognitive
disability▪ Second language learners
Examples
Technical Medical Language Hypertension risk factors include obesity,... High blood pressure risk factors include excessive
weight,... Legal Language
The Products transacted through the Service are... The Products managed through the Service are...
Low Literacy Readers Hitler committed terrible atrocities during the
second World Hitler committed terrible cruelties during the
second World War
Text Simplification
The readability of a text can be improved by transforming it into a simpler text
Characteristics of manually simplified texts shorter sentences fewer and shorter phrases fewer adjectives, adverbs and coordinating
conjunctions nouns are less often replaced with pronouns
Original text: Congress gave Yosemite the money to repair damage from the 1997 flood.Abridged text: Congress gave the money after the 1997 flood
Automated Test Item Generation
Writing test questions is an extremely difficult and time consuming task for teachers
Use of NLP to automatically generate test items for language learning comprehension test
Automated Test Item Generation
Multiple-Choice-Questions
Who was voted the best international footballer for 2004?(a) Henry (b) Beckham (c) Ronaldinho (d) Ronaldo
stem
keydistractors
Fill-in-the-blanks cloze questions
Automated Test Item Generation
Challenges Identifying candidate sentence Identifying key and stem Identifying distractors Evaluation of item generation systems
Automated Test Item Generation
Why question generation? Idealistic vision: learners ask questions to
deal with knowledge deficit Reality:
▪ Trouble in identifying knowledge gap▪ Shyness ▪ typical student asks less than 0.2 questions per
hour in a classroom (Graesser and Person’s, 1994)
▪ Train the learner’s to ask deep questions (such as why, why not, how, what-if, what-if-not)
Automated Test Item Generation
Text-to-Question Generation
Text-to-Question Generation
Chinese words are made up of many little marks instead of letters. These words are called characters. There is no alphabet, so children do not need to learn to spell. However, they have to learn at least 3,500 different characters before they can read a simple book. Hundred of years ago, when the Chinese began to write, each word was a picture. The word for umbrella still looks like an open umbrella.
1. What are the Chinese words are made up of?2. Are the Chinese words made up of letters?3. What are Chinese words called?4. Why do the Chinese children not need to learn to spell?5. Do Chinese children need to learn to spell?6. How does the word for umbrella look like in Chinese?
Question Generation Challenges Lexical
Question depends on the semantics of the answer▪ Herman Melville wrote Moby Dick Who wrote Moby Dick? What
wrote Moby Dick? Non-compositionality of words or phrases
▪ Multiword expressions, phrasal verbs▪ When Russia invaded Finland in 1808, Helsinki was again burned to
the ground What was Helsinki again burned to? Syntactic
Imprecision of syntactic parsers Constraints on WH-movements
▪ John thought Mary said that James wanted Susan to like Peter ▪ Who thought Mary said that James wanted Susan to like Peter? (answer: John)▪ Who did John think Mary said that wanted Susan to like Peter? (answer: James)
Question Generation Challenges Discourse Challenges
Pronoun anaphora▪ Abraham Lincoln was the 16th president. He was
assassinated by John Wilkes Booth. Who was he assassinated by?
Vague noun phrases▪ . . . The show boosted the studio to the top of the TV
cartoon field . . . . What boosted the studio to the top of the TV cartoon field?
Implicit discourse relations and world knowledge▪ Booth jumped into the box and aimed a single-shot,
round-slug .44 caliber Henry Deringer at his head, firing at point-blank range Who killed Abraham Lincoln?
Automated Essay/Answer Evaluation (AEE)
Automated essay evaluation “The process of evaluating and scoring
written prose via computer programs” (Shermis and Burstein)
Use NLP technique to provide▪ Holistic scoring▪ Formative feedback
Why automatic essay scoring? to reduce laborious human effort
▪ Software systems do the task fully automatically▪ Computer generated scores match human accuracy
History of AEE
Ellis Page, 1966 Grading burden, a significant
impediment to the improvement of overall writing capacity
Project Essay Grade (PEG), 1973
History of AEE
Evolution of commercial AEE Intelligent Essay Assessor (Pearson
Education) ▪ Latent Semantic Analysis
E-rater® (ETS) ▪ NLP to extract linguistic information and text
characteristics (grammatical errors, discourse analysis)
C-rater (ETS) ▪ NLP to extract content information for short-
answer scoring
Writing Features
Grammatical errors Subject-verb agreement Verb-form Pronoun errors
Discourse structure Development of thesis statements, main
points, support, conclusion Topic relevant word usage Sophistication
Idioms, metaphor, style
Expectation from NLP
Grammaticality Syntactic processing
Word usage Collocation
▪ Powerful tea vs strong computer Content coverage
Computational text similarity Text coherence
Discourse processing
Dialogue-based Intelligent Tutoring System
Intelligent Tutoring System
Dialog-based ITS
present challenging problems and questions to the learner
the learner types in or utters answers
there is a lengthy multiturn dialogue as complete solutions or answers evolve Automated Dialog
Processing
Dialog-based ITS
AutoTutor Tutoring Research Group (TRG) at the
University of Memphis ATLAS/ANDES
University of Pittsburgh BEETLE
University of Edinburgh
Topics of Study
Dialog and conversational agent Architecture of dialog-based ITSs