chemistry studio: an intelligent tutoring system ankit kumar, abhishek kar, ashish gupta, akshay...

21
CHEMISTRY STUDIO: AN INTELLIGENT TUTORING SYSTEM Ankit Kumar, Abhishek Kar, Ashish Gupta, Akshay Mittal Mentors: Dr. Sumit Gulwani (MSR, Redmond) Dr. Ashish Tiwari (SRI Intl.) Dr. Amey Karkare (IIT Kanpur)

Upload: jennifer-burby

Post on 16-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

CHEMISTRY STUDIO: AN INTELLIGENT TUTORING

SYSTEMAnkit Kumar, Abhishek Kar, Ashish Gupta, Akshay

Mittal

Mentors:

Dr. Sumit Gulwani (MSR, Redmond)

Dr. Ashish Tiwari (SRI Intl.)

Dr. Amey Karkare (IIT Kanpur)

Introduction

Aim to build an intelligent tutoring system targeted at the domain of Periodic Table (Chemistry)

Targeted at solving problems by emulating thought processes/lines of reasoning employed by students

Much more than a problem solver – aid learning by generating hints and intelligent problems

System Overview

System divided into two components – Natural Language Component

Translate natural language input to an intermediate logical representation

Problem Solving Component Solve problems, generate hints and new

problems of graded difficulty More info: Problem Solving team

Natural Language Component

Lexer

Option Parsin

g• Terms in logic

Parser Tier 1• Domain

information

Parser Tier 2• Toke

ns

• Full logical representation

• Input Problem

An Example - Lexer

Which element in group 2 has the maximum metallic property?– i)Be ii)Mg iii)Ca iv)Sr

Which element in Group 2 has the maximum metallic character?

Group 2 has the maximum metallic character? 2 has the maximum metallic character? maximum metallic character? metallic character?

Group 2 Max MetallicProperty

Parser – Tier 1

Group 2 Max MetallicProperty

Same

Group 2

Hole

$1 Max

Hole HoleMetallicProperty

Parsing Tier 2

Max

Hole Hole

Same

Group 2

Hole

Max

MetallicProperty Same

Group 2

$1

MetallicProperty

$1

Introduction of Variables

Implicit introduction of free variables needed to formulate a valid logical formula.

Example: Alkali metals belong to Group 1 Intelligently guess the requirement of a

variable Two situations:

Hole (of type elem) present. Not satisfied by tokens in unused list (even after replication)

Hole (of type elem) present. No tokens left in unused list. No original tokens replicated satisfy

Introduce a new variable!

Handling Quantifiers

Universal Quantifiers: General scheme - <insert ∀ x: A(x) B(x)>

Existential Quantifiers: General scheme - <insert ∃ x: A(x) ∧ B(x)>

Assumptions: Quantification over a single variable No nesting of quantifiers

Universal Quantification

Problems Finding the position of implication Finding the antecedent and consequent

Example – Alkali metals show metallic characterSolution – ForAll($1, AlkaliMetal($1)Metallic($1))

Position of implication ≈ Position of verb Deciding the antecedent and consequent

is more complicated

ForAll Resolution Algorithm

Active vs. Passive Voice (Stanford CoreNLP) Alkali metals show metallic character Metallic character is shown by alkali metals Both have the same translation!

Assertion Based Questions

Assert facts Pose questions Span multiple sentences Example - An element A forms covalent

bond with oxygen. It has high electronegativity and belongs to group 13. What is its atomic number?

Problem – Anaphora Resolution! Solution – Use Stanford CoreNLP to get

coreference graph

Assertion Based Questions

Method for translating assertion based questions Construct logical formula corresponding to

sentence independently Use coreference graph to find variables

referring to the same entity Construct the formula – A1(x)∧A2(x)…∧An(x),

where Ai(x) = logical formula of ith sentence Quantify over the free variable(s).

Typically ask about a single entity. Existential quantification suffices

Negations

Non-: Which of the following non-metals is a gas at

STP? Couple non with the predicate immediately next

to it And(IsGasAtSTP($1), Not(Metallic($1)))

Not: Not all alkali metals form basic oxides. Negation of statement to the right of not Not(ForAll($1, Implies(AlkaliMetal($1),

BasicOxide($1))))

Negations

No: No halogen is metallic in nature. Natural interpretation of no as “there does

not exist” Not(Exists($1,

And(Halogen($1),Metallic($1))))

Ranking Algorithm

Need to rank different representation trees generated

Heuristics Greater cover Greater confidence Higher confidence to filling a hole with a token

closer to its parent in the English sentence Penalize when:

Replicate tokens – Larger tokens More penalization Insert handcrafted tokens – And, Or, Implies Unused tokens – Greater proportion of unused

tokens More penalization

Evaluation

Currently able to solve 70 out of the 126 problems collected from Tata McGraw Hill textbook for Grade XI

More problems can be solved by modeling of more chemistry-specific predicates.

This just corresponds to adding domain knowledge to our system

Another evaluation metrics could be the ratio of the number of rules encoded to the corpus size of problems solved.

We encode 173 predicates/entities/functions in our algorithm (out of which 118 are names of elements).

Conclusions

While contemporary works focus on analyzing languages by learning, we hypothesize that for a simpler structured domain like Chemistry, a much simpler type-theoretic approach armed with some heuristics observed from the domain can achieve similar, if not better, success.

During the later phase of the project, we tried to use some techniques of learning to improve upon our system and were successful in doing so.

In conclusion, we feel that a combination of such a type-theoretic approach and the standard machine learning techniques can achieve good success for a well structured domain like Chemistry.

Future Work

Disambiguate – At, As, In (names of elements)

1 = 1st = first (Stanford CoreNLP NER Tool)

And(And(x,y),z) = And(And(x,z),y) Model electronic configuration Better modelling of conjunctions – “Alkali

metals belong to group 1 and are metallic in nature”

Stanford CoreNLP

Collection of commonly used NLP tools – POS tagging, parsing, coreference analysis, NER

Problem – Integrating Java package with C#

Command line interface slow – needs to large load data models (17 secs per question!)

Solution - Query online demo Get XML response

http://nlp.stanford.edu:8080/corenlp/

Thank You