using multitask deep learning for question answering a use ... · using multitask deep learning for...

Chair of Software Engineering for Business Information Systems (sebis) Faculty of InformaticsTechnische Universität Münchenwwwmatthes.in.tum.de

Using Multitask Deep Learning for Question AnsweringA use case on the InsuranceQA datasetIman Jundi, 09.11.2018, Master Thesis Final PresentationAdvisor: Msc. Ahmed Elnaggar

Supervisor: Prof. Dr. Florian Matthes

© sebis181109 Jundi Multitask Deep Learning for Question Answering 2

● Introduction & Motivation

● Research Questions

● Approach

● Models & Results

● Analysis & Samples

● Conclusion

Outline


Introduction & Motivation

complex

ultimatelypass Turing Test

useful

many applicationsin many domains

Advances in recent yearse.g. IBM’s Watson winning Jeopardy

against all-time champions

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Using Multitask Deep Learning for Question Answering

SystemQuestion Answer

Text Retrieval

Answer Selection

Reading Comprehension

Natural Language Generation

Involves varying tasksin IR and NLP


Text Retrieval

Introduction & Motivation

many layers → deep

train with huge #of data

learn complex representations

hard for some domain or tasks e.g. Insurance

task

related task

Benefit from related tasks with more data

No feature engineering

varyingarchitecture

architecture engineering per task

Share layers between tasks

better/generic representation


Using Multitask Deep Learning for Question Answering

SystemQuestion Answer

Answer Selection

Reading Comprehension

Natural Language Generation

promote generic architectures

Go-to tool in recent years


Use Case: Answer Selection - InsuranceQA Dataset [Feng15]

Question

instantresponse

lessresources

17,000 questions + 27,500 answers

http://www.insurancelibrary.com[Feng15] Feng, M., Xiang, B., Glass, M.R., Wang, L. and Zhou, B., 2015, December. Applying deep learning to answer selection: A study and an open task.

In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 813-820). IEEE.

Expert Answers

System

QuestionPool of

Answers

Answer

Similar Question

Select the correct answer of a question from answer pool


InsuranceQA Dataset:

refer

client company

http://www.insurancelibrary.com/


Related Task: Reading Comprehension - SQuAD Dataset [Rajpurkar16]

paragraph 100,000+ Q&A

Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P., 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383-2392).

question

System

Paragraph Question

Answer Span

answer span

500+ Wikipedia articles

crowd workers

Selected from paragraphcrowd workers

vs. InsuranceQA 17,000 Questions

Answer comprehension questions on a paragraph (similar to humans)


● related task → multitask works● active research → benefit from● bigger dataset: ~6x questions● clean dataset: manually created

Stanford Question Answering Dataset (SQuAD):

SQuAD?


Answer SelectionReading Comprehension Answer Selection Reading Comprehension

● Which Deep Neural Network model can be used for Multitask Learning of both Reading Comprehension and Answer Selection?

Research Questions

Model

QuestionPool of

Answers

Answer

Model

Paragraph Question

Answer Span

?Model

Paragraph

Answer Span

Question

Answer

Pool ofAnswersQuestion

??

model

performance with single-task trainingperformance with multitask training


● How well is the performance of this network for the Answer Selection task on theInsuranceQA dataset if it is trained only on this task?

● Would Answer Selection performance improve when the model is trained jointly on both this task and Reading Comprehension?


Research Approach

Model

QuestionPool of

Answers

Answer

Model

Paragraph Question

Answer Span

?Model

Paragraph

Answer Span

Question

Answer


??

model


idea

implementation

evaluationPerformance Metrics:InsuranceQAAccuracy & Top-5 AccuracySQuADF1 & EM (Exact Match) on words

related work or observations from previous iterations online

implementationsof related workimprove

analyze



Answer SelectionReading Comprehension

Multitask Deep Learning Approach

Model

Paragraph

Answer Span

Question

Answer


RC layers AS layers

shared layers

multitask training:iterate between tasks per batch



Approach & Problem Definition of the Tasks

Paragraph Question

startprobability

endprobability

Question Answer

probability

Question Answer

similarity

encode

compare

train to make sim(Q,A+) > sim(Q,A-)

by a margin(hinge margin loss)

encode

binary (+/-)classification problem(cross entropy loss)

classification problemover the paragraph words

(cross entropy loss)

Answer Selection Reading Comprehension

Model

QuestionPool of

Answers

Answer

Model

Paragraph Question

Answer Span

Model

Paragraph Question

Answer Span

predict score for each answer predict word index for start,end

encode

Multitask Learningdoesn’t work

compatiblelearning objective

Word → Vector

vector representationof words in paragraph

based on question


common approachin related work

similarity functione.g. cosine


Models

Also variations:- additional separate bi-GRU- CNN in output [Tan15]Multitask Learning bad on InsuranceQA

Tan, M., Santos, C.D., Xiang, B. and Zhou, B., 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.Wang, W., Yang, N., Wei, F., Chang, B. and Zhou, M., 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics

Wang, S. and Jiang, J., 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747.Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.

more advanced models

Shared bi-RNN (AS similarity) Shared bi-RNN (AS probability) mtlRNet mltQANet

Compatible learning objective → Multitask learning starts working

adapt 2 SQuAD models RNet [Wang17] and QANet [Yu18]



Multitask Learning QANet [Yu18] - mtlQANet

InsuranceQA Accuracy

SQuAD F1

Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.

more relevant →more weight


word → vector(pretrained word embeddings)

word representation withinformation from context

representation of words in (answer/paragraph) based on the question

QANet [Yu18] RC on SQuADadapted for AS on InsuranceQA

task-specific output layer

classificationprobability output

further encoding

better with multitatask learning


Final Results

Focus on optimizingsimilarity function

Shift the focus to a more generic solution

similarity implicit in attention


* mtlRNet and mtlQANet with Multitask Training reach 93%, 93.3% top-5 accuracy overall


mtlQANet – Multitask Learning Ablation Study


more sharing → better perfromance


few AS task-specific parameters (400)


InsuranceQA Prediction Samples – correct top answer

focused attention only relevant words

SQuAD pinpoints answer?

single-task

multitask

Questionhow do I drop my Health Insurance

Ground Truth (predicted rank=1)how drop your health insurance in every case it involve write notice of your desire cancel your coverage on an individual basis you will write a note the insurance company customer service department and send it there for a business you will write a memo your hr cancel or wave coverage please note : if marry you may be ask include your spouse signature too..

a2q attention visualizationtop prediction = ground truth



multitask

InsuranceQA Prediction Samples – incorrect top answer

attention on “qualify”for criteria words

single-task

Questionhow do I qualify for government Health Insurance

Ground Truth: predicted rank=8 (single/multi-task)depending on where you live your age , income and other factor - you may qualify for Medicaid , MRMIB , Medicare , etc. the 2 big enchaladas of healthcare reform guarantee insurability and subsidy come into effect 1/1/14 ( open enrollment be 10/1/13 3/31/14 so...

top prediction (single/multi-task)when you say government health insuranc if you be refer to Healthcare Reform or Obamacare the rate depend on many factor such as age who be on the policy , zip code which plan you buy , etc. there will be subsidy as well so it depend if you be eligible for a subsidy in reality the actual rate..

more words in commonthan in ground truth

a2q attention visualizationtop prediction



multitask

InsuranceQA Prediction Samples – incorrect top answer

single-task

Questionhow do I qualify for government Health Insurance

Ground Truth: predicted rank=8 (single/multi-task)depending on where you live your age , income and other factor - you may qualify for Medicaid , MRMIB , Medicare , etc. the 2 big enchaladas of healthcare reform guarantee insurability and subsidy come into effect 1/1/14 ( open enrollment be 10/1/13 3/31/14 so...

top prediction (single/multi-task)when you say government health insuranc if you be refer to Healthcare Reform or Obamacare the rate depend on many factor such as age who be on the policy , zip code which plan you buy , etc. there will be subsidy as well so it depend if you be eligible for a subsidy in reality the actual rate..

a2q attention visualizationground truth

same effect for ground truth

not perfect



multitask

InsuranceQA Prediction Samples – incorrect top answer (only single-task)

single-task

Questioncan you buy auto insurance out of state

Ground Truth: predicted rank=2 (single-task)predicted rank=1 (multitask)you must purchase your auto insurance policy in the state in which the vehicle be register due to the regulatory framework of the insurance industry every state have regulatory agency that approve rate and coverage for company operate within that state if you own multiple vehicle in multiple state you will need buy separate policy within each state..

top prediction (single-task)liability coverage on auto be typically require in all state limit be usually in the $20,000 per person $40,000 per occurrence range with additional property damage benefit of course some state have low requirement and other have high requirement the cost can vary depending on your age , your driving record , your zip code and the type of vehicle you drive monthly premium can range from $35..

a2q attention visualizationground truth

focused attention as before



Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi’s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ”golden anniversary” with variou gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ”Super Bowl L”), so that the logo could prominently feature the Arabic numerals 50.

● What year did the Denver Broncos secure a Super Bowl title for the third time?ground truth: 2015 , 2016 , 2015prediction: 2016em=1.00 f1=1.00

● What does AFC stand for?ground truth: American Football Conference , American Football Conference , American Football Conferenceprediction: American Football Conferenceem=1.00 f1=1.00

● Which NFL team represented the NFC at Super Bowl 50?ground truth: Carolina Panthers , Carolina Panthers , Carolina Panthersprediction: Denver Broncosem=0.00 f1=0.00

● What color was used to emphasize the 50th anniversary of the Super Bowl?ground truth: gold , gold , goldprediction: goldenem=0.00 f1=0.00

SQuAD Prediction Samples



Answer SelectionReading Comprehension Answer Selection Reading Comprehension

Conclusion

Model

Question Answers

Answer

Model

Paragraph Question

Answer Span

?Model

Paragraph

Answer Span

Question

Answer

AnswersQuestion

??

model


adapt 2 SQuAD modelswith minimal adjustment improves

● Which Deep Neural Network model can be used for Multitask Learning of both Reading Comprehension and Answer Selection?

● How well is the performance of this network for the Answer Selection task on theInsuranceQA dataset if it is trained only on this task?

● Would Answer Selection performance improve when the model is trained jointly on both this task and Reading Comprehension?

more generic models perform well

good

almostfully-shared

only task-specificoutput layer

compatiblelearning objective

competitiveresults

more genericno AS-specificoptimization


SQuAD 2.0[Rajpurkar18]

utilize optimized similarity functions

but more generic e.g. add to combined representation

+questions without answerAS: A- ↔ RC: no answer?

other (more) tasks &multitask models

e.g. NLP Decathlon, MQAN [McCann18]

Rajpurkar, P., Jia, R. and Liang, P., 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. arXiv preprint arXiv:1806.03822.McCann, B., Keskar, N.S., Xiong, C. and Socher, R., 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730.

Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.

? ?

? pre-trained languagerepresentation model

e.g. BERT [Devlin18]Multitask Learning still useful?

?


QA


Shared bi-RNN Model

Similarity measure

Classification

probability of each word to be start, end of the answer

Model

Q A-

S(Q,A-)Training with Margin Loss→ more similarity with A+

A+

S(Q,A+)

SQuAD F1


Variations:- additional separate bi-GRU- CNN in output [Tan15]Multitask learning still bad on InsuranceQA

Used as score

Not compatible with RC approach?

Tan, M., Santos, C.D., Xiang, B. and Zhou, B., 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.

Backup: Models & Results


Shared bi-RNN Model


binary classification

Probability output trained with cross entropy loss

similar to SQuAD

More advance models



Multitask Learning R-Net [Wang17] - mtlRNet


SQuAD F1

Wang, W., Yang, N., Wei, F., Chang, B. and Zhou, M., 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational LinguisticsWang, S. and Jiang, J., 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747.

Answer to Question Attetion

Self -Attetion

Aggregate CNN [Wang16]



Multitask Learning QANet [Yu18] - mtlQANet


SQuAD F1

Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.

Encoder (CNN + Att) [Yu18]

Move away from Recurrent NetsReplace RNN with CNN + Attetnion


using multitask deep learning for question answering a use ... · using multitask deep learning for...

Documents