using multitask deep learning for question answering a use ... · using multitask deep learning for...
TRANSCRIPT
Chair of Software Engineering for Business Information Systems (sebis) Faculty of InformaticsTechnische Universität Münchenwwwmatthes.in.tum.de
Using Multitask Deep Learning for Question AnsweringA use case on the InsuranceQA datasetIman Jundi, 09.11.2018, Master Thesis Final PresentationAdvisor: Msc. Ahmed Elnaggar
Supervisor: Prof. Dr. Florian Matthes
© sebis181109 Jundi Multitask Deep Learning for Question Answering 2
● Introduction & Motivation
● Research Questions
● Approach
● Models & Results
● Analysis & Samples
● Conclusion
Outline
© sebis181109 Jundi Multitask Deep Learning for Question Answering 3
Introduction & Motivation
complex
ultimatelypass Turing Test
useful
many applicationsin many domains
Advances in recent yearse.g. IBM’s Watson winning Jeopardy
against all-time champions
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
Using Multitask Deep Learning for Question Answering
SystemQuestion Answer
Text Retrieval
Answer Selection
Reading Comprehension
Natural Language Generation
Involves varying tasksin IR and NLP
© sebis181109 Jundi Multitask Deep Learning for Question Answering 4
Text Retrieval
Introduction & Motivation
many layers → deep
train with huge #of data
learn complex representations
hard for some domain or tasks e.g. Insurance
task
related task
Benefit from related tasks with more data
No feature engineering
varyingarchitecture
architecture engineering per task
Share layers between tasks
better/generic representation
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
Using Multitask Deep Learning for Question Answering
SystemQuestion Answer
Answer Selection
Reading Comprehension
Natural Language Generation
promote generic architectures
Go-to tool in recent years
© sebis181109 Jundi Multitask Deep Learning for Question Answering 5
Use Case: Answer Selection - InsuranceQA Dataset [Feng15]
Question
instantresponse
lessresources
17,000 questions + 27,500 answers
http://www.insurancelibrary.com[Feng15] Feng, M., Xiang, B., Glass, M.R., Wang, L. and Zhou, B., 2015, December. Applying deep learning to answer selection: A study and an open task.
In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 813-820). IEEE.
Expert Answers
System
QuestionPool of
Answers
Answer
Similar Question
Select the correct answer of a question from answer pool
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
InsuranceQA Dataset:
refer
client company
© sebis181109 Jundi Multitask Deep Learning for Question Answering 6
Related Task: Reading Comprehension - SQuAD Dataset [Rajpurkar16]
paragraph 100,000+ Q&A
Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P., 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383-2392).
question
System
Paragraph Question
Answer Span
answer span
500+ Wikipedia articles
crowd workers
Selected from paragraphcrowd workers
vs. InsuranceQA 17,000 Questions
Answer comprehension questions on a paragraph (similar to humans)
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
● related task → multitask works● active research → benefit from● bigger dataset: ~6x questions● clean dataset: manually created
Stanford Question Answering Dataset (SQuAD):
SQuAD?
© sebis181109 Jundi Multitask Deep Learning for Question Answering 7
Answer SelectionReading Comprehension Answer Selection Reading Comprehension
● Which Deep Neural Network model can be used for Multitask Learning of both Reading Comprehension and Answer Selection?
Research Questions
Model
QuestionPool of
Answers
Answer
Model
Paragraph Question
Answer Span
?Model
Paragraph
Answer Span
Question
Answer
Pool ofAnswersQuestion
??
model
performance with single-task trainingperformance with multitask training
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
● How well is the performance of this network for the Answer Selection task on theInsuranceQA dataset if it is trained only on this task?
● Would Answer Selection performance improve when the model is trained jointly on both this task and Reading Comprehension?
© sebis181109 Jundi Multitask Deep Learning for Question Answering 8
Research Approach
Model
QuestionPool of
Answers
Answer
Model
Paragraph Question
Answer Span
?Model
Paragraph
Answer Span
Question
Answer
Pool ofAnswersQuestion
??
model
performance with single-task trainingperformance with multitask training
idea
implementation
evaluationPerformance Metrics:InsuranceQAAccuracy & Top-5 AccuracySQuADF1 & EM (Exact Match) on words
related work or observations from previous iterations online
implementationsof related workimprove
analyze
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 9
Answer SelectionReading Comprehension
Multitask Deep Learning Approach
Model
Paragraph
Answer Span
Question
Answer
Pool ofAnswersQuestion
RC layers AS layers
shared layers
multitask training:iterate between tasks per batch
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 10
Approach & Problem Definition of the Tasks
Paragraph Question
startprobability
endprobability
Question Answer
probability
Question Answer
similarity
encode
compare
train to make sim(Q,A+) > sim(Q,A-)
by a margin(hinge margin loss)
encode
binary (+/-)classification problem(cross entropy loss)
classification problemover the paragraph words
(cross entropy loss)
Answer Selection Reading Comprehension
Model
QuestionPool of
Answers
Answer
Model
Paragraph Question
Answer Span
Model
Paragraph Question
Answer Span
predict score for each answer predict word index for start,end
encode
Multitask Learningdoesn’t work
compatiblelearning objective
Word → Vector
vector representationof words in paragraph
based on question
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
common approachin related work
similarity functione.g. cosine
© sebis181109 Jundi Multitask Deep Learning for Question Answering 11
Models
Also variations:- additional separate bi-GRU- CNN in output [Tan15]Multitask Learning bad on InsuranceQA
Tan, M., Santos, C.D., Xiang, B. and Zhou, B., 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.Wang, W., Yang, N., Wei, F., Chang, B. and Zhou, M., 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics
Wang, S. and Jiang, J., 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747.Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.
more advanced models
Shared bi-RNN (AS similarity) Shared bi-RNN (AS probability) mtlRNet mltQANet
Compatible learning objective → Multitask learning starts working
adapt 2 SQuAD models RNet [Wang17] and QANet [Yu18]
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 12
Multitask Learning QANet [Yu18] - mtlQANet
InsuranceQA Accuracy
SQuAD F1
Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.
more relevant →more weight
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
word → vector(pretrained word embeddings)
word representation withinformation from context
representation of words in (answer/paragraph) based on the question
QANet [Yu18] RC on SQuADadapted for AS on InsuranceQA
task-specific output layer
classificationprobability output
further encoding
better with multitatask learning
© sebis181109 Jundi Multitask Deep Learning for Question Answering 13
Final Results
Focus on optimizingsimilarity function
Shift the focus to a more generic solution
similarity implicit in attention
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
* mtlRNet and mtlQANet with Multitask Training reach 93%, 93.3% top-5 accuracy overall
© sebis181109 Jundi Multitask Deep Learning for Question Answering 14
mtlQANet – Multitask Learning Ablation Study
InsuranceQA Accuracy
more sharing → better perfromance
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
few AS task-specific parameters (400)
© sebis181109 Jundi Multitask Deep Learning for Question Answering 15
InsuranceQA Prediction Samples – correct top answer
focused attention only relevant words
SQuAD pinpoints answer?
single-task
multitask
Questionhow do I drop my Health Insurance
Ground Truth (predicted rank=1)how drop your health insurance in every case it involve write notice of your desire cancel your coverage on an individual basis you will write a note the insurance company customer service department and send it there for a business you will write a memo your hr cancel or wave coverage please note : if marry you may be ask include your spouse signature too..
a2q attention visualizationtop prediction = ground truth
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 16
multitask
InsuranceQA Prediction Samples – incorrect top answer
attention on “qualify”for criteria words
single-task
Questionhow do I qualify for government Health Insurance
Ground Truth: predicted rank=8 (single/multi-task)depending on where you live your age , income and other factor - you may qualify for Medicaid , MRMIB , Medicare , etc. the 2 big enchaladas of healthcare reform guarantee insurability and subsidy come into effect 1/1/14 ( open enrollment be 10/1/13 3/31/14 so...
top prediction (single/multi-task)when you say government health insuranc if you be refer to Healthcare Reform or Obamacare the rate depend on many factor such as age who be on the policy , zip code which plan you buy , etc. there will be subsidy as well so it depend if you be eligible for a subsidy in reality the actual rate..
more words in commonthan in ground truth
a2q attention visualizationtop prediction
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 17
multitask
InsuranceQA Prediction Samples – incorrect top answer
single-task
Questionhow do I qualify for government Health Insurance
Ground Truth: predicted rank=8 (single/multi-task)depending on where you live your age , income and other factor - you may qualify for Medicaid , MRMIB , Medicare , etc. the 2 big enchaladas of healthcare reform guarantee insurability and subsidy come into effect 1/1/14 ( open enrollment be 10/1/13 3/31/14 so...
top prediction (single/multi-task)when you say government health insuranc if you be refer to Healthcare Reform or Obamacare the rate depend on many factor such as age who be on the policy , zip code which plan you buy , etc. there will be subsidy as well so it depend if you be eligible for a subsidy in reality the actual rate..
a2q attention visualizationground truth
same effect for ground truth
not perfect
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 18
multitask
InsuranceQA Prediction Samples – incorrect top answer (only single-task)
single-task
Questioncan you buy auto insurance out of state
Ground Truth: predicted rank=2 (single-task)predicted rank=1 (multitask)you must purchase your auto insurance policy in the state in which the vehicle be register due to the regulatory framework of the insurance industry every state have regulatory agency that approve rate and coverage for company operate within that state if you own multiple vehicle in multiple state you will need buy separate policy within each state..
top prediction (single-task)liability coverage on auto be typically require in all state limit be usually in the $20,000 per person $40,000 per occurrence range with additional property damage benefit of course some state have low requirement and other have high requirement the cost can vary depending on your age , your driving record , your zip code and the type of vehicle you drive monthly premium can range from $35..
a2q attention visualizationground truth
focused attention as before
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 19
Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi’s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ”golden anniversary” with variou gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ”Super Bowl L”), so that the logo could prominently feature the Arabic numerals 50.
● What year did the Denver Broncos secure a Super Bowl title for the third time?ground truth: 2015 , 2016 , 2015prediction: 2016em=1.00 f1=1.00
● What does AFC stand for?ground truth: American Football Conference , American Football Conference , American Football Conferenceprediction: American Football Conferenceem=1.00 f1=1.00
● Which NFL team represented the NFC at Super Bowl 50?ground truth: Carolina Panthers , Carolina Panthers , Carolina Panthersprediction: Denver Broncosem=0.00 f1=0.00
● What color was used to emphasize the 50th anniversary of the Super Bowl?ground truth: gold , gold , goldprediction: goldenem=0.00 f1=0.00
SQuAD Prediction Samples
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
© sebis181109 Jundi Multitask Deep Learning for Question Answering 20
Answer SelectionReading Comprehension Answer Selection Reading Comprehension
Conclusion
Model
Question Answers
Answer
Model
Paragraph Question
Answer Span
?Model
Paragraph
Answer Span
Question
Answer
AnswersQuestion
??
model
performance with single-task trainingperformance with multitask training
adapt 2 SQuAD modelswith minimal adjustment improves
● Which Deep Neural Network model can be used for Multitask Learning of both Reading Comprehension and Answer Selection?
● How well is the performance of this network for the Answer Selection task on theInsuranceQA dataset if it is trained only on this task?
● Would Answer Selection performance improve when the model is trained jointly on both this task and Reading Comprehension?
more generic models perform well
good
almostfully-shared
only task-specificoutput layer
compatiblelearning objective
competitiveresults
more genericno AS-specificoptimization
Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion
SQuAD 2.0[Rajpurkar18]
utilize optimized similarity functions
but more generic e.g. add to combined representation
+questions without answerAS: A- ↔ RC: no answer?
other (more) tasks &multitask models
e.g. NLP Decathlon, MQAN [McCann18]
Rajpurkar, P., Jia, R. and Liang, P., 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. arXiv preprint arXiv:1806.03822.McCann, B., Keskar, N.S., Xiong, C. and Socher, R., 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730.
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
? ?
? pre-trained languagerepresentation model
e.g. BERT [Devlin18]Multitask Learning still useful?
?
© sebis181109 Jundi Multitask Deep Learning for Question Answering 21
QA
© sebis181109 Jundi Multitask Deep Learning for Question Answering 22
Shared bi-RNN Model
Similarity measure
Classification
probability of each word to be start, end of the answer
Model
Q A-
S(Q,A-)Training with Margin Loss→ more similarity with A+
A+
S(Q,A+)
SQuAD F1
InsuranceQA Accuracy
Variations:- additional separate bi-GRU- CNN in output [Tan15]Multitask learning still bad on InsuranceQA
Used as score
Not compatible with RC approach?
Tan, M., Santos, C.D., Xiang, B. and Zhou, B., 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.
Backup: Models & Results
© sebis181109 Jundi Multitask Deep Learning for Question Answering 23
Shared bi-RNN Model
InsuranceQA Accuracy
binary classification
Probability output trained with cross entropy loss
similar to SQuAD
More advance models
Backup: Models & Results
© sebis181109 Jundi Multitask Deep Learning for Question Answering 24
Multitask Learning R-Net [Wang17] - mtlRNet
InsuranceQA Accuracy
SQuAD F1
Wang, W., Yang, N., Wei, F., Chang, B. and Zhou, M., 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational LinguisticsWang, S. and Jiang, J., 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747.
Answer to Question Attetion
Self -Attetion
Aggregate CNN [Wang16]
Backup: Models & Results
© sebis181109 Jundi Multitask Deep Learning for Question Answering 25
Multitask Learning QANet [Yu18] - mtlQANet
InsuranceQA Accuracy
SQuAD F1
Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.
Encoder (CNN + Att) [Yu18]
Move away from Recurrent NetsReplace RNN with CNN + Attetnion
Backup: Models & Results