dr. plaban kumar bhowmick assistant professor centre for educational technology

74
Langauge Processing for e-Learning Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Upload: juniper-brown

Post on 22-Dec-2015

230 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Langauge Processing for e-Learning

Dr. Plaban Kumar BhowmickAssistant ProfessorCentre for Educational Technology

Page 2: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

About E61002

Page 3: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Course Description & Objective Course Description

Use of basic NLP tools and techniques to address challenges in text-based eLearning systems or to develop text-based e-learning applications

Course Objectives To identify challenges in text mediated e-

learning systems To identify NLP techniques to deal with the

challenges To apply NLP tools to address the challenges

Page 4: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Grading Policy

End semester exam 50 marks

Mid semester exam 30 marks

TA evaluation Project work

▪ 20 marks (mid term and end term evaluation)▪ Previous year achievement

▪ Finalist of Samsung Innovation Awards 2014▪ Paper submission at IUI’2015

Page 5: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Course Materials

Books Speech and Natural Language

Processing, Jurafsky and Martin Handbook of Automated Essay

Evaluation, Shermis and Burstein Automated grammatical error detection

for language learners, Leacock et al. Research articles

ACL, COLING, Computational Linguistics and others

Page 6: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Similar Courses

Educational Natural Language Processing, Tutorial in COLING, 2008.

Natural Language Processing and eLearning Prof. Dr. Iryna Gurevych, Dr. Delphine Bernhard Ubiquitous Knowledge Processing, Technische

Universitat DarmstadNatural Language Processing for

Educational Applications Rada Mihalcea, University of North Texas 

Page 7: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Course Webpage

http://www.cel.iitkgp.ernet.in/~plaban/courses/ET61002/ET61002_2015.html

Page 8: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Page 9: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Source: http://joyreactor.com/post/500058

Page 10: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Page 11: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Page 12: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Page 13: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Page 14: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Page 15: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

1728 1913 1930 1996 1990s 2000s

Distance Education

Source: EdTechReview

Page 16: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Page 17: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Evolution in Education Space

Uniformity

Page 18: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Multi Dimensional Evolution

Uniformity

Reach

abilit

y

Know

led

ge s

pace

No affective interaction

Restricted assessment

De-personalization

Non-adaptive

Information overload

Increased search time

Interoperability

Large scale data

Page 19: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Big Question

How different computational techniques and systems can automate different processes in teaching and learning?

Page 20: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

e-Learning

E-learning (or eLearning) the use of electronic media and information and

communication technologies (ICT) in education Synonymous with 

multimedia learning,  technology-enhanced learning (TEL), computer-based instruction(CBI),  computer-based training (CBT),  computer-assisted instruction or

computer-aided instruction (CAI),  And many others

Page 21: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

E-learning: Early Days

Pre-packaged multi-media course materials with restricted type and set of assessment items.

Not learner centric Mostly focus towards cognitive

aspects of learning ignoring social and affective dimensions altogether.

Page 22: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

E-learning 2.0 = E-learning + Web 2.0

Web 2.0 is not a new web Web 2.0: a new way of designing

participation, hosting services, and web-based communities, promoting creativity and information sharing.

specific technologies like wikis and blogs, a new way of creating web pages like mash-ups, and a massive use of descriptors or tags in what has been defined as a folksonomies.

Page 23: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Six Big Ideas behind Web 2.0

Individual production and User Generated Content

Harness the power of the crowd Data on an epic scale Architecture of Participation Openness

Page 24: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Web 2.0 in e-learning

A wiki is essentially a website constructed in such a way as to allow users to change content on the site. Key features:

▪ Hypertextual structure ▪ Social authoring - collaborative production ▪ Dynamic document - always under construction

Educational use:▪ To support collaborative work, substituting old .doc or .pdf

documents. ▪ To produce a course or study corpus in cooperation with all

academic stakeholders: lecturers, students, … ▪ To distribute information to students, in order to facilitate the

updating of materials by the professor.

Page 25: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Web 2.0 in e-learning

A blog is a way of distributing news Key features:

▪ There are one or several authors that produce entries ▪ Visitors can add comments ▪ New entries and comments do not substitute older ones ▪ It is possible to subscribe in order to receive news via email

or through RSS readers. Educational use:

▪ Teachers have used blogs as an easy way to produce dynamic learning environments without previous knowledge of html.

▪ Students have used blogs as an alternative digital portfolio or as a learning log.

▪ Ultimately, blogs have been used as support for collaborative work.

Page 26: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Web 2.0 in e-learning

Online office: Google drive, MS Sharepoint etc

Social bookmarking Video repositories and online

videos Social networks

Page 27: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

New Learning Paradigms

Study at any place, any time Several devices may be used for learning:

computer, iPod, PDA, etc.

Authority in educational systems is distributed: collective intelligence and wisdom of the crowds Learn not only from teachers and instructors, but

also from peers

New forms of knowledge organization: tags and folksonomies

Page 28: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Massive - enrolment numbersOpen - no mandatory qualificationsOnline - fullyCourse - structured, temporal

Course designed in short (~10min) modules Low study hours per week - modules not degree

programmes Certificates of completion rather than credit…

some learners are not students of universities

Rise of the MOOCS

Page 29: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

The MOOC Platform

Content Assessment Communication

Video lecture Multiple Choice Quiz (MCQ) Threaded discussion forum

Video group discussion Peer Assessment

‘Robot’ grading

• Live webcasts or Hangouts

• Twitter

Page 30: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Text as Media

We know it’s important even in this digital age.

Textual discourses in education Learning materials in web Wiki, blog Assessment items Formative feedback

Page 31: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Educational NLP

Use of Natural Language Processing (NLP) techniques to assist current e-learning and MOOC-based platforms through automated text analysis assessment item generation automated grading and feedback generation text adaptation tutoring dialog processing content metadata extraction etc.

Page 32: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Educational Natural Language Processing

Educational Natural Language Processing

eLearning NLP

Computer aided learning/Instructio

n

Computational analysis of language

Page 33: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Intelligent Computer Assisted Language Learning (ICALL)

Definition of CALL from Levy 1997: “Computer-Assisted Language Learning (CALL)

may be defined as the search for and the study of applications of the computer in language teaching and learning”

Intelligent CALL (ICALL), or more appropriately parser-based CALL from Holland et al 1993: ICALL relies on parsing, “a technique that enables

the computer to encode complex grammatical knowledge such as humans use to assemble sentences, recognize errors and make corrections”From Markus

Dickinson slides

Page 34: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Importance of Language Forms

Research since the 90s has shown that awareness of language forms and rules is important for an adult learner to successfully acquire a foreign language.

Page 35: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Real Life Constraints

The time a student can spend with an instructor/tutor typically is very limited. work on form and grammar is often

deemphasized and confined to homework The downside

learner has relatively few opportunities to gain awareness of forms and rules and receive individual feedback on errors.

Page 36: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

An Opportunity for CALL

Excellent opportunity for developing Computer-Aided Language Learning (CALL) tools to provide individual feedback on learner errors and foster learner awareness of relevant language

forms and categories. But CALL systems which offer exercises

typically are limited to multiple choice, or simple form filling

feedback usually is limited to yes/no or letter-by-letter matching of the string with a pre-stored answer.

Page 37: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

From CALL to ICALL

Linguistic modeling is needed to improve on this situation: tokenization: identify words morphological analysis: identify/interpret morphemes syntactic analysis: identify selection, government and

agreement relations and word order requirements formal pragmatic analysis: identify coreference

relations, information structure partitioning, . . . Computational tools identifying such linguistic

properties need to be integrated into CALL systems to obtain language-aware “Intelligent” CALL (ICALL).

Page 38: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Intelligent CALL

Intelligent CALL (ICALL) focuses on using linguistics and natural language processing to make CALL better.

When building a full-blown ICALL system, several issues arise: Diagnosing and accounting for user

errors Parser-based CALL Modeling the system on particular (kinds

of) users Presenting useful feedback to the user

Page 39: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Parser-based CALL

We want to use parsers to: Assign an analysis to a learner’s input Identify erroneous spots in the input Provide an analysis of what the error signifies

But parsers are designed for well-formed constructs When learners type in incorrect sentences,

parsers have to be modified to correctly analyze the incorrectness.

Page 40: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Finding Errors

Use mal-rules = rules which are added to the grammar to handle error cases. e.g., A singular noun and a plural verb are

allowed to combine, but it is marked as an error. Constraint relaxation: a parser can be

reworked to handle ill-formed input. Parsers normally just “die” when handling bad

input. So, we allow some constraints (e.g., that a

subject and verb must match in number) to be relaxed while parsing

Page 41: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Second Language Acquisition and ICALL

Given multiple possible analyses for a sentence, which one is most likely, based on: the stage of acquisition of the learner the first language of the learner the focus of the exercise

What kinds of errors are we interested in and do we expect? We would like an error typology = a

classification of errors into different groups. People are starting to look at error-tagged

corpora to find the most common errors

Page 42: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Feedback in ICALL

Feedback = response to the learner based on their input. Purpose of feedback: Reinforcement: feedback can act as a reinforcer to

learn a particular concept (behaviorism) Learning processes need feedback to know right

from wrong (cognitivism) Things to keep in mind when designing a

system: Feedback needs to be accurate. Displaying more than one error message at a time

is not helpful (Heift 2001). Explanations should be short.

Page 43: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

NLP Needs in ICALL: Summary Learner Error Corpus

Error taxonomy L1 specific corpora collection and annotation

Automatic detection of grammatical error Parser-based methods Statistical methods

Grammatical error correction Correcting individual errors Correcting whole sentence

Evaluation of error detection and correction

Page 44: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Readability Level Assessment

Readability The feature of language that makes it

easy to read is called readability.

Page 45: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Benefits of Readability

Greater readability increases Comprehension (Understanding) Retention (Memory) Reading speed Persistence (reading more of the text)

Page 46: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

What happens when the text is too difficult?

Readers feel frustrated. Most often, they stop reading

without even thinking about it. They may seek help or call support. They go to some other task.

Page 47: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Readability Variables

The reader Prior knowledge Reading skill Interest Motivation

The text Content Style Design Organization

Page 48: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

How do we measure readability?

Look at Words and their formations Average sentence length Is the text coherent?

Readability formula They make some hard assumptions Do not consider semantic features

Page 49: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Word’s readability scores for these two texts are almost identical. Yet they are clearly different in nature and effect on the reader.

Reading Ease = 65.9 Reading Ease = 65.4

Page 50: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Reading Comprehension

What is constructed when we comprehend a sentence? Propositional representation

Do words have role in comprehension? Mental lexicon

How do we resolve different ambiguities? A thief shot a cop in the park

How does comprehension relate to reading time? Reading time experiments Eye tracking experiments

Page 51: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Text Adaptation

What if a particular text segment is not readable? Sources of difficulty

▪ Lexical: Meaning of technical terms, entities▪ Syntactic difficulty▪ Discourse level difficulty

Page 52: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Encyclopaedic Annotation

Page 53: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Encyclopaedic Annotation

Entity linking in text linking name mentions in text with

their referent entities in a knowledge base

Wikify

Page 54: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Mark Twain

“I notice that you use plain, simple language, short words, and brief sentences. That is the way to write English—it is the modern way and the best way. Stick to it; and don’t let the fluff and flowers and verbosity creep in.

“When you catch an adjective, kill it. No, I don’t mean utterly, but kill most of them—then the rest will be valuable. They weaken when close together. They give strength when they are wide apart.”

—Mark Twain, in a letter to a 12-year-old boy.

Text Simplification

Page 55: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Lexical Simplification

A subtask of text simplification Replacing words or short phrases by

simpler variants in a context aware fashion

Motivation To reach out to wider range of readers

having limited vocabulary▪ Children▪ People with low literacy level or cognitive

disability▪ Second language learners

Page 56: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Examples

Technical Medical Language Hypertension risk factors include obesity,... High blood pressure risk factors include excessive

weight,... Legal Language

The Products transacted through the Service are... The Products managed through the Service are...

Low Literacy Readers Hitler committed terrible atrocities during the

second World Hitler committed terrible cruelties during the

second World War

Page 57: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Text Simplification

The readability of a text can be improved by transforming it into a simpler text

Characteristics of manually simplified texts shorter sentences fewer and shorter phrases fewer adjectives, adverbs and coordinating

conjunctions nouns are less often replaced with pronouns

Original text: Congress gave Yosemite the money to repair damage from the 1997 flood.Abridged text: Congress gave the money after the 1997 flood

Page 58: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Automated Test Item Generation

Writing test questions is an extremely difficult and time consuming task for teachers

Use of NLP to automatically generate test items for language learning comprehension test

Page 59: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Automated Test Item Generation

Multiple-Choice-Questions

Who was voted the best international footballer for 2004?(a) Henry (b) Beckham (c) Ronaldinho (d) Ronaldo

stem

keydistractors

Fill-in-the-blanks cloze questions

Page 60: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Automated Test Item Generation

Challenges Identifying candidate sentence Identifying key and stem Identifying distractors Evaluation of item generation systems

Page 61: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Automated Test Item Generation

Why question generation? Idealistic vision: learners ask questions to

deal with knowledge deficit Reality:

▪ Trouble in identifying knowledge gap▪ Shyness ▪ typical student asks less than 0.2 questions per

hour in a classroom (Graesser and Person’s, 1994)

▪ Train the learner’s to ask deep questions (such as why, why not, how, what-if, what-if-not)

Page 62: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Automated Test Item Generation

Text-to-Question Generation

Page 63: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Text-to-Question Generation

Chinese words are made up of many little marks instead of letters. These words are called characters. There is no alphabet, so children do not need to learn to spell. However, they have to learn at least 3,500 different characters before they can read a simple book. Hundred of years ago, when the Chinese began to write, each word was a picture. The word for umbrella still looks like an open umbrella.

1. What are the Chinese words are made up of?2. Are the Chinese words made up of letters?3. What are Chinese words called?4. Why do the Chinese children not need to learn to spell?5. Do Chinese children need to learn to spell?6. How does the word for umbrella look like in Chinese?

Page 64: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Question Generation Challenges Lexical

Question depends on the semantics of the answer▪ Herman Melville wrote Moby Dick Who wrote Moby Dick? What

wrote Moby Dick? Non-compositionality of words or phrases

▪ Multiword expressions, phrasal verbs▪ When Russia invaded Finland in 1808, Helsinki was again burned to

the ground What was Helsinki again burned to? Syntactic

Imprecision of syntactic parsers Constraints on WH-movements

▪ John thought Mary said that James wanted Susan to like Peter ▪ Who thought Mary said that James wanted Susan to like Peter? (answer: John)▪ Who did John think Mary said that wanted Susan to like Peter? (answer: James)

Page 65: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Question Generation Challenges Discourse Challenges

Pronoun anaphora▪ Abraham Lincoln was the 16th president. He was

assassinated by John Wilkes Booth. Who was he assassinated by?

Vague noun phrases▪ . . . The show boosted the studio to the top of the TV

cartoon field . . . . What boosted the studio to the top of the TV cartoon field?

Implicit discourse relations and world knowledge▪ Booth jumped into the box and aimed a single-shot,

round-slug .44 caliber Henry Deringer at his head, firing at point-blank range Who killed Abraham Lincoln?

Page 66: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Automated Essay/Answer Evaluation (AEE)

Automated essay evaluation “The process of evaluating and scoring

written prose via computer programs” (Shermis and Burstein)

Use NLP technique to provide▪ Holistic scoring▪ Formative feedback

Why automatic essay scoring? to reduce laborious human effort

▪ Software systems do the task fully automatically▪ Computer generated scores match human accuracy

Page 67: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

History of AEE

Ellis Page, 1966 Grading burden, a significant

impediment to the improvement of overall writing capacity

Project Essay Grade (PEG), 1973

Page 68: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

History of AEE

Evolution of commercial AEE Intelligent Essay Assessor (Pearson

Education) ▪ Latent Semantic Analysis

E-rater® (ETS) ▪ NLP to extract linguistic information and text

characteristics (grammatical errors, discourse analysis)

C-rater (ETS) ▪ NLP to extract content information for short-

answer scoring

Page 69: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Writing Features

Grammatical errors Subject-verb agreement Verb-form Pronoun errors

Discourse structure Development of thesis statements, main

points, support, conclusion Topic relevant word usage Sophistication

Idioms, metaphor, style

Page 70: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Expectation from NLP

Grammaticality Syntactic processing

Word usage Collocation

▪ Powerful tea vs strong computer Content coverage

Computational text similarity Text coherence

Discourse processing

Page 71: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Dialogue-based Intelligent Tutoring System

Intelligent Tutoring System

Page 72: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Dialog-based ITS

present challenging problems and questions to the learner

the learner types in or utters answers

there is a lengthy multiturn dialogue as complete solutions or answers evolve Automated Dialog

Processing

Page 73: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Dialog-based ITS

AutoTutor Tutoring Research Group (TRG) at the

University of Memphis ATLAS/ANDES

University of Pittsburgh BEETLE

University of Edinburgh

Page 74: Dr. Plaban Kumar Bhowmick Assistant Professor Centre for Educational Technology

Topics of Study

Dialog and conversational agent Architecture of dialog-based ITSs