question classification (cont’d)

97
Question Classification (cont’d) Ling573 NLP Systems & Applications April 12, 2011

Upload: dot

Post on 23-Mar-2016

61 views

Category:

Documents


1 download

DESCRIPTION

Question Classification (cont’d). Ling573 NLP Systems & Applications April 12, 2011. Upcoming Events. Two seminars: Friday 3:30 Linguistics seminar: Janet Pierrehumbert : Northwestern Example-Based Learning and the Dynamics of the Lexicon AI Seminar: Patrick Pantel : MSR - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Question Classification (cont’d)

Question Classification

(cont’d)Ling573

NLP Systems & ApplicationsApril 12, 2011

Page 2: Question Classification (cont’d)

Upcoming EventsTwo seminars: Friday 3:30

Linguistics seminar:Janet Pierrehumbert: Northwestern

Example-Based Learning and the Dynamics of the Lexicon

AI Seminar:Patrick Pantel: MSR

Associating Web Queries with Strongly-Typed Entities

Page 3: Question Classification (cont’d)

RoadmapQuestion classification variations:

SVM classifiers

Sequence classifiers

Sense information improvements

Question series

Page 4: Question Classification (cont’d)

Question Classification with Support Vector

MachinesHacioglu & Ward 2003Same taxonomy, training, test data as Li & Roth

Page 5: Question Classification (cont’d)

Question Classification with Support Vector

MachinesHacioglu & Ward 2003Same taxonomy, training, test data as Li & RothApproach:

Shallow processing

Simpler features

Strong discriminative classifiers

Page 6: Question Classification (cont’d)

Question Classification with Support Vector

MachinesHacioglu & Ward 2003Same taxonomy, training, test data as Li & RothApproach:

Shallow processing

Simpler features

Strong discriminative classifiers

Page 7: Question Classification (cont’d)

Features & ProcessingContrast: (Li & Roth)

POS, chunk info; NE tagging; other sense info

Page 8: Question Classification (cont’d)

Features & ProcessingContrast: (Li & Roth)

POS, chunk info; NE tagging; other sense infoPreprocessing:

Only letters, convert to lower case, stopped, stemmed

Page 9: Question Classification (cont’d)

Features & ProcessingContrast: (Li & Roth)

POS, chunk info; NE tagging; other sense infoPreprocessing:

Only letters, convert to lower case, stopped, stemmed

Terms:Most informative 2000 word N-grams Identifinder NE tags (7 or 9 tags)

Page 10: Question Classification (cont’d)

Classification & ResultsEmploys support vector machines for

classificationBest results: Bi-gram, 7 NE classes

Page 11: Question Classification (cont’d)

Classification & ResultsEmploys support vector machines for

classificationBest results: Bi-gram, 7 NE classes

Better than Li & Roth w/POS+chunk, but no semantics

Fewer NE categories better More categories, more errors

Page 12: Question Classification (cont’d)

Classification & ResultsEmploys support vector machines for

classificationBest results: Bi-gram, 7 NE classes

Better than Li & Roth w/POS+chunk, but no semantics

Page 13: Question Classification (cont’d)

Classification & ResultsEmploys support vector machines for

classificationBest results: Bi-gram, 7 NE classes

Better than Li & Roth w/POS+chunk, but no semantics

Fewer NE categories better More categories, more errors

Page 14: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’

Page 15: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntax

Page 16: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet?

Page 17: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet?

Page 18: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?

Page 19: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?

Page 20: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Page 21: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Page 22: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer

spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Single contiguous span of tokens

Page 23: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Single contiguous span of tokensHow much does a rhino weigh?

Page 24: Question Classification (cont’d)

Enhanced Answer Type Inference … Using Sequential Models

Krishnan, Das, and Chakrabarti 2005Improves QC with CRF extraction of ‘informer spans’Intuition:

Humans identify Atype from few tokens w/little syntaxWho wrote Hamlet? How many dogs pull a sled at Iditarod?How much does a rhino weigh?

Single contiguous span of tokensHow much does a rhino weigh?Who is the CEO of IBM?

Page 25: Question Classification (cont’d)

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Page 26: Question Classification (cont’d)

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Page 27: Question Classification (cont’d)

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Page 28: Question Classification (cont’d)

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams

Page 29: Question Classification (cont’d)

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams Informer ngrams hypernyms:

Generalize over words or compounds

Page 30: Question Classification (cont’d)

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams Informer ngrams hypernyms:

Generalize over words or compoundsWSD?

Page 31: Question Classification (cont’d)

Informer Spans as Features

Sensitive to question structureWhat is Bill Clinton’s wife’s profession?

Idea: Augment Q classifier word ngrams w/IS info

Informer span features: IS ngrams Informer ngrams hypernyms:

Generalize over words or compoundsWSD? No

Page 32: Question Classification (cont’d)

Effect of Informer SpansClassifier: Linear SVM + multiclass

Page 33: Question Classification (cont’d)

Effect of Informer SpansClassifier: Linear SVM + multiclass

Notable improvement for IS hypernyms

Page 34: Question Classification (cont’d)

Effect of Informer SpansClassifier: Linear SVM + multiclass

Notable improvement for IS hypernymsBetter than all hypernyms – filter sources of noise

Biggest improvements for ‘what’, ‘which’ questions

Page 35: Question Classification (cont’d)

Effect of Informer SpansClassifier: Linear SVM + multiclass

Notable improvement for IS hypernymsBetter than all hypernyms – filter sources of noise

Biggest improvements for ‘what’, ‘which’ questions

Page 36: Question Classification (cont’d)

Perfect vs CRF Informer Spans

Page 37: Question Classification (cont’d)

Recognizing Informer Spans

Idea: contiguous spans, syntactically governed

Page 38: Question Classification (cont’d)

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Page 39: Question Classification (cont’d)

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Tag spans with B(egin),I(nside),O(outside)Employ syntax to capture long range factors

Page 40: Question Classification (cont’d)

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Tag spans with B(egin),I(nside),O(outside)Employ syntax to capture long range factors

Matrix of features derived from parse tree

Page 41: Question Classification (cont’d)

Recognizing Informer Spans

Idea: contiguous spans, syntactically governedUse sequential learner w/syntactic information

Tag spans with B(egin),I(nside),O(outside)Employ syntax to capture long range factors

Matrix of features derived from parse treeCell:x[i,l], i is position, l is depth in parse tree, only

2Values:

Tag: POS, constituent label in the positionNum: number of preceding chunks with same tag

Page 42: Question Classification (cont’d)

Parser OutputParse

Page 43: Question Classification (cont’d)

Parse TabulationEncoding and table:

Page 44: Question Classification (cont’d)

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Page 45: Question Classification (cont’d)

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd

Page 46: Question Classification (cont’d)

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd

All features improve

Page 47: Question Classification (cont’d)

CRF Indicator FeaturesCell:

IsTag, IsNum: e.g. y4 = 1 and x[4,2].tag=NPAlso, IsPrevTag, IsNextTag

Edge: IsEdge: (u,v) , yi-1=u and yi=v IsBegin, IsEnd

All features improve

Question accuracy: Oracle: 88%; CRF: 86.2%

Page 48: Question Classification (cont’d)

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Page 49: Question Classification (cont’d)

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Best results in L&R - ~200,000 feats; ~700 activeCan we do as well with fewer features?

Page 50: Question Classification (cont’d)

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Best results in L&R - ~200,000 feats; ~700 activeCan we do as well with fewer features?

Approach:Refine features:

Page 51: Question Classification (cont’d)

Question Classification Using Headwords and Their HypernymsHuang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?

Best results in L&R - ~200,000 feats; ~700 activeCan we do as well with fewer features?

Approach:Refine features:

Restrict use of WordNet to headwords

Page 52: Question Classification (cont’d)

Question Classification Using Headwords and Their Hypernyms

Huang, Thint, and Qin 2008Questions:

Why didn’t WordNet/Hypernym features help in L&R?Best results in L&R - ~200,000 feats; ~700 active

Can we do as well with fewer features?

Approach:Refine features:

Restrict use of WordNet to headwordsEmploy WSD techniques

SVM, MaxEnt classifiers

Page 53: Question Classification (cont’d)

Head Word FeaturesHead words:

Chunks and spans can be noisy

Page 54: Question Classification (cont’d)

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Page 55: Question Classification (cont’d)

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Page 56: Question Classification (cont’d)

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words

Page 57: Question Classification (cont’d)

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words

Issue: vague headsE.g. What is the proper name for a female walrus?

Head = ‘name’?

Page 58: Question Classification (cont’d)

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words

Issue: vague headsE.g. What is the proper name for a female walrus?

Head = ‘name’?Apply fix patterns to extract sub-head (e.g. walrus)

Page 59: Question Classification (cont’d)

Head Word FeaturesHead words:

Chunks and spans can be noisyE.g. Bought a share in which baseball team?

Type: HUM: group (not ENTY:sport) Head word is more specific

Employ rules over parse trees to extract head words Issue: vague heads

E.g. What is the proper name for a female walrus? Head = ‘name’?

Apply fix patterns to extract sub-head (e.g. walrus)Also, simple regexp for other feature type

E.g. ‘what is’ cue to definition type

Page 60: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also

Page 61: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Page 62: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Page 63: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS

Page 64: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Page 65: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN gloss

Page 66: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN glossHow deep?

Page 67: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animalCan generate noise: also dog ->…-> person

Adding low noise hypernymsWhich senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN glossHow deep?

Based on validation set: 6

Page 68: Question Classification (cont’d)

WordNet FeaturesHypernyms:

Enable generalization: dog->..->animal Can generate noise: also dog ->…-> person

Adding low noise hypernyms Which senses?

Restrict to matching WordNet POS Which word senses?

Use Lesk algorithm: overlap b/t question & WN gloss How deep?

Based on validation set: 6

Q Type similarity: compute similarity b/t headword & type Use type as feature

Page 69: Question Classification (cont’d)

Other FeaturesQuestion wh-word:

What,which,who,where,when,how,why, and rest

Page 70: Question Classification (cont’d)

Other FeaturesQuestion wh-word:

What,which,who,where,when,how,why, and rest

N-grams: uni-,bi-,tri-grams

Page 71: Question Classification (cont’d)

Other FeaturesQuestion wh-word:

What,which,who,where,when,how,why, and rest

N-grams: uni-,bi-,tri-grams

Word shape:Case features: all upper, all lower, mixed, all digit,

other

Page 72: Question Classification (cont’d)

Results

Per feature-type results:

Page 73: Question Classification (cont’d)

Results: IncrementalAdditive improvement:

Page 74: Question Classification (cont’d)

Error AnalysisInherent ambiguity:

What is mad cow disease?ENT: disease or DESC:def

Page 75: Question Classification (cont’d)

Error AnalysisInherent ambiguity:

What is mad cow disease?ENT: disease or DESC:def

Inconsistent labeling:What is the population of Kansas? NUM: otherWhat is the population of Arcadia, FL ?

Page 76: Question Classification (cont’d)

Error AnalysisInherent ambiguity:

What is mad cow disease?ENT: disease or DESC:def

Inconsistent labeling:What is the population of Kansas? NUM: otherWhat is the population of Arcadia, FL ? NUM:count

Parser error

Page 77: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Page 78: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noise

Page 79: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases error

Page 80: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Page 81: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Alternative solutions:

Page 82: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Alternative solutions:Use more accurate shallow processing, better

classifier

Page 83: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for

training

Alternative solutions:Use more accurate shallow processing, better

classifierRestrict addition of features to

Page 84: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for training

Alternative solutions:Use more accurate shallow processing, better classifierRestrict addition of features to

Informer spansHeadwords

Page 85: Question Classification (cont’d)

Question Classification: Summary

Issue: Integrating rich features/deeper processing

Errors in processing introduce noiseNoise in added features increases errorLarge numbers of features can be problematic for training

Alternative solutions:Use more accurate shallow processing, better classifierRestrict addition of features to

Informer spansHeadwords

Filter features to be added

Page 86: Question Classification (cont’d)

Question SeriesTREC 2003-…Target: PERS, ORG,..Assessors create series of questions about

target Intended to model interactive Q/A, but often stilted Introduces pronouns, anaphora

Page 87: Question Classification (cont’d)

Question SeriesTREC 2003-…Target: PERS, ORG,..Assessors create series of questions about

target Intended to model interactive Q/A, but often stilted Introduces pronouns, anaphora

Page 88: Question Classification (cont’d)

Handling Question SeriesGiven target and series, how deal with

reference?

Page 89: Question Classification (cont’d)

Handling Question SeriesGiven target and series, how deal with

reference?Shallowest approach:

Concatenation:Add the ‘target’ to the question

Page 90: Question Classification (cont’d)

Handling Question SeriesGiven target and series, how deal with

reference?Shallowest approach:

Concatenation:Add the ‘target’ to the question

Shallow approach:Replacement:

Replace all pronouns with target

Page 91: Question Classification (cont’d)

Handling Question SeriesGiven target and series, how deal with

reference?Shallowest approach:

Concatenation:Add the ‘target’ to the question

Shallow approach:Replacement:

Replace all pronouns with target

Least shallow approach:Heuristic reference resolution

Page 92: Question Classification (cont’d)

Question Series ResultsNo clear winning strategy

Page 93: Question Classification (cont’d)

Question Series ResultsNo clear winning strategy

All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

Page 94: Question Classification (cont’d)

Question Series ResultsNo clear winning strategy

All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit?

Page 95: Question Classification (cont’d)

Question Series ResultsNo clear winning strategy

All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed?

Page 96: Question Classification (cont’d)

Question Series ResultsNo clear winning strategy

All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed?

Wouldn’t replace ‘the band’

Page 97: Question Classification (cont’d)

Question Series ResultsNo clear winning strategy

All largely about the targetSo no big win for anaphora resolutionIf using bag-of-words features in search, works fine

‘Replacement’ strategy can be problematic E.g. Target=Nirvana: What is their biggest hit? When was the band formed?

Wouldn’t replace ‘the band’

Most teams concatenate