summarunner: a recurrent neural network based sequence model for extractive summarization of...
TRANSCRIPT
![Page 1: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/1.jpg)
SummaRuNNer
Ramesh Nallapati, Feifei Zhai, Bowen Zhou
Presented by :
Sharath T.S
Shubhangi Tandon
![Page 2: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/2.jpg)
Contributions of this paper
● SummaRuNNer, a simple recurrent network based sequence classifier that outperforms or matches state-of-the-art models for extractive summarization
● The simple formulation of model facilitates interpretable visualization of its decisions
● A novel training mechanism that allows our extractive model to be trained end-to-end using abstractive summaries.
![Page 3: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/3.jpg)
SummaRuNNer
● Treat extractive summarization as a sequence classification problem ● Each sentence is visited sequentially in the original document order● A binary decision is made (taking into account previous decisions)● GRU based RNN basic building block of sequence classifier● Recurrent network with two gates, u :update gate and r : reset gate
![Page 4: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/4.jpg)
Recurrents neural networksLSTMs:
● Input gate: Decides what fraction of the new input flowing into the LSTM cell has to be updated.
![Page 5: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/5.jpg)
LSTMs - Continued● Update gate: Calculates what amount of current cell state to forget, and
updates the new information.
![Page 6: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/6.jpg)
LSTMs - Continued● Output gate: Evaluates the new cell state and decides what parts of the
information has to be output.
Refer: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 7: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/7.jpg)
GRU LSTMsModifications compared to LSTMs:
● It combines the forget(f) and input(i) gate into a single update gate.● Merges the cell state and hidden state into one state.
![Page 8: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/8.jpg)
The Model
![Page 9: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/9.jpg)
SummaRuNNer
Model:● Two-layer bi-directional GRU-RNN - The first layer of the RNN runs at the word level, computes
hidden state representations at each word position. Another RNN at the word level that runs backwards from the last word to the first.
● second layer of bi-directional RNN that runs at the sentence-level and accepts the average-pooled, concatenated hidden states of word-level RNNs.
● Document representation : `
![Page 10: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/10.jpg)
Computing Posterior - Logistic loss
(7)
![Page 11: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/11.jpg)
Extractive Summary labels - Greedy Algorithm
Why is it needed?
● most summarization corpora only contain human written abstractive summaries as ground truth.
● Algorithm○ selected sentences from the document should be the ones that maximize the Rouge
score with respect to gold summaries.○ Stop when none of the remaining candidate when added improve the ROUGE score.
● Train the network with labelled data.
![Page 12: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/12.jpg)
Abstractive training - Decoder● Apart from the sigmoid function present to compute the class a sentence belongs to,
the decoder in addition does the following○ Takes embedding of a word(hidden state) as input from the previous state as x
k, s
-1 is the value computed
at the last sentence of the RNN( Equation 7).
○ Computes softmax to output the most probable word.
○ Optimize the log likelihood of the word distribution in the abstractive summaries.(context captured by
RNN)
○ Predict using weights W, without the decoder on test samples.
![Page 13: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/13.jpg)
Decoder - ContinuedHow does it work?
● The summary representation s−1 acts as an information channel between the SummaRuNNer model and the decoder.
● Maximizing the probability of abstractive summary words as computed by the decoder will require the model to learn a good summary representation which in turn depends on accurate estimates of extractive probabilities p(yj).
![Page 14: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/14.jpg)
SummaRuNNer Visualisation
![Page 15: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/15.jpg)
Corpus used● Daily Mail ( Cheng & Lapata) : 200k Tr, 12k Val , 10k Test● Daily Mail/CNN (Nallapati) : 286k Tr, 13k Val, 11k Test● DUC 2002 : 567 documents ( out of Domain Testing)● Average statistics
○ 28 sentences/ doc○ 3-4 sentences in reference summary○ 802 word / doc
● Training Data Constraints○ Vocab size : 150k ○ Maximum sentences/ doc : 100○ Max Sentence Length : 50 words○ Model hidden state : 200○ Batch Size : 64
![Page 16: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/16.jpg)
Experiments and Results : Daily Mail Corpus
![Page 17: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/17.jpg)
Experiments and Results : Daily Mail /CNN data
![Page 18: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/18.jpg)
Experiments and Results : DUC 2002 data
![Page 19: SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents - Ramesh Nallapati, Feifei Zhai, Bowen Zhou](https://reader030.vdocuments.site/reader030/viewer/2022012323/58b8a6331a28abc06d8b5fc7/html5/thumbnails/19.jpg)
Future Work● Pre-Train extractive model using abstractive training ● Construct a joint extractive-abstractive model where predictions of
extractive component form stochastic intermediate units to be consumed by abstractive component.