chapter 6 document summarization based on …in this chapter, a method to automate the process of...
Post on 16-Mar-2020
3 Views
Preview:
TRANSCRIPT
153
CHAPTER 6
DOCUMENT SUMMARIZATION BASED ON SENTENCE
RANKING USING VECTOR SPACE MODEL
WWW is a repository of large collection of information available in the form of
unstructured documents. It is a challenging task to select the documents of interest from
such a huge document pool. To fasten the process of document retrieval, text
summarization technique is used. Ranking of documents is made based on the summary or
the abstract provided by the authors of the document. But it is not always possible as not
all documents come with an abstract or summary. Also when different summarization
tools are used to summarize the document, not all the topics covered within the document
are reflected in its summary. In this chapter, a method to automate the process of text
document summarization is proposed based on the term frequency within the document at
different levels – paragraph and sentence. To summarize the document, similarity between
the paragraphs and sentences within the paragraph is considered using Vector Space
Model. Proposed system evaluation on the standard reference corpus from DUC-2002
using the ROUGE package indicates comparable avg. Recall, avg. Precision and avg. F-
measure to existing summarization tools – Copernic, SweSum, Extractor, MSWord
AutoSummarizer, Intelligent, Brevity, Pertinence taking DUC-2002 (100 words) human
summary as baseline summary.
6.1 Introduction
In the sixties, a large amount of scientific papers and books have been digitally stored.
However, the storage media to store such a large database was very expensive. Therefore
the concept of automatic shortening of texts was introduced to store the information about
papers and books in limited storage space. Now, due to advancement in technology, the
storage media are no longer expensive and bulk of information can be fit into the large
databases these days. But due to increased use of the Internet, and large amount of
information available on the web, there is a need to represent each document by its
summary to save time and effort for searching the correct information. Automatic
154
document summarization is extremely helpful in tackling the information overload
problems. It is the technique to identify the most important pieces of information from the
document, omitting irrelevant information and minimizing details to generate a compact
coherent summary document.
There are different types of summarization approaches [11], [31], [37] depending on what
the summarization method focuses on to make the summary of the text.
i. Abstract vs. Extract summary - Abstraction is the process of paraphrasing sections
of the source document whereas extraction is the process of picking subset of
sentences from the source document and presents them to user in form of summary
that provides an overall sense of the documents content.
ii. Generic vs. Query-based summary - Generic summary do not target to any
particular group. It addresses broad community of readers while Query or topic
focused queries are tailored to the specific needs of an individual or a particular
group and represent particular topic.
iii. Single vs. Multi-document summary - Single document summary provide the most
relevant information contained in single document to the user that helps the user in
deciding whether the document is related to the topic of interest or not whereas
multi-document summary helps to identify redundancy across documents and
compute the summary of a set of related documents of a corpus such that they
cover the major details of the events in the documents, taking into account some of
the major issues [61]: the need to carefully eliminate redundant information from
multiple documents and achieve high compression ratios; information about
document and passage similarities, and weighting different passages accordingly;
the importance of temporal information; co-reference among entities and facts
occurring across documents. Kumar et al. in [82] studied a risk minimization
framework for sentence extraction to produce generic multi-document summary.
155
Automatic text summarization approaches [41] are also classified as:
i. Vector based approach - The summary generated for each document will consist of
sentences that are extracted from it using the Vector Space Model (VSM). After the
preprocessing step each text element, a sentence in the case of text summarization,
is considered as N-dimensional vector [68]. The sentences are then ranked using
the VSM according to their similarity within the document.
ii. Fuzzy based approach - All the rules needed for summarization, are included in the
knowledge base of the fuzzy system [17]. Different characteristic of a text such as
sentence length, location in paragraph, similarity to key word etc, is given as input
to fuzzy system. A value from zero to one is obtained as an output for each
sentence based on sentence characteristics and the available rules in the knowledge
base. The obtained value in the output determines the degree of the importance of
the sentence in the final summary.
iii. Genetic algorithm based approach - In Genetic Algorithm, the solutions are called
individuals or chromosomes. After the initial population is generated randomly,
selection and variation function are executed in a loop until some termination
criterion is reached. Each run of the loop is called a generation. The selection
operator is intended to improve the average quality of the population by giving
individuals of higher quality a higher probability to be copied into the next
generation. The quality of an individual is measured by a fitness function. Fattah
and Ren [48] proposed an automatic text summarizer using several feature score
functions like sentence position, positive and negative keyword, sentence centrality
etc. to train genetic algorithm and mathematical regression models to obtain a
suitable combination of feature weights.
iv. Neural Network based approach - A neural network is trained on a corpus of
documents. The neural network is then modified, through feature fusion, to
produce a summary of highly ranked sentences in the document. Through feature
fusion, the network discovers the importance (and unimportance) of various
156
features used to determine the summary-worthiness of each sentence [76]. The
input to the neural network can be either real or binary vectors.
Based on the summarization approaches discussed above and summarization techniques
discussed in section 2.4, different automation tools for summarization have been
developed to generate fixed length or variable size summary of text documents. Some of
the automated summarization tools producing fixed length (100 words), generic, extract-
based single text document summary are briefly discussed below:
i. Brevity Text Summarizer – Brevity1 works by comparing a document to a set of
similar documents. It stores this document information in a Summary Dictionary.
Several dictionaries are included with Brevity. These are dictionaries designed for
general categories of documents. For example to summarize a newsfeed of political
news, it compares text to other political news stories of the same type.
ii. Copernic Summarizer - Copernic summarization2 technologies implement a wide
range of heuristics to isolate sentences, bulleted lists, and special strings such as e-
mail addresses and scientific formulas. In addition, they tokenize each and every
word according to the context in order to identify actions, people, places and
things. The set of concepts associated with the document‘s main topic forms the
core information extracted by intelligent summarization. The more a sentence
exhibits pertinent concepts, the more it is suited to developing important ideas, and
consequently, the more likely it will be retained for inclusion in the summary.
iii. Extractor Text Summarizer – Extractor3 is a software text summarization engine. It
can works on documents of different type like (text, html, email) and using a
patented genetic extraction algorithm (GenEx), analyzes the recurrence of words
and phrases, their proximity to one another, and the uniqueness of the words to a
particular document. The engine returns a list of key words and phrases found in
the document together with their relative ranking (how many times was the
word/phrase found in the document) along with contextual links back to the
position of the key word/phrase in the document itself.
1[http://www.lextek.com/manuals/brevity/functions.html]
2[http://www.copernic.com/data/pdf/summarization-whitepapereng.pdf]
3[http://www.componentsource.com/products/dbi-extractor/summary.html]
157
iv. Intelligent Text Summarizer – It generates two summaries [17]. Initially a summary
is generated by fuzzy swarm module and is given as input to swarm diversity
module which uses input sentences as initial centroids for clustering process to
generate the final summary after scoring these input sentences, filtering the similar
sentences and selecting the most diverse one.
v. MSWord AutoSummarizer - The AutoSummary Tool in Microsoft Office Word
2007 analyzes a document to identify keywords and then assign score to each
word. Sentences containing the most frequent words in the document having
highest scores are then selected to be included in the summary [55].
vi. SweSum Text Summarizer – SewSum4 is the automatic text summarizer based on
statistical, linguistical and heuristic methods where the summarization system
calculates how often certain key words (the Swedish system has 700,000 possible
Swedish entries pointing at 40,000 Swedish base key words) appear in the
document. The key words belong to the so called open class words. Summarization
system calculates the frequency of key words in the text, which sentences they are
present in, and where these sentences are in the text. It considers if the text is
tagged with bold text tag, first paragraph tag or numerical values. All this
information is compiled and used to summarize the original text. SweSum is
available for Swedish, Danish, Norwegian, English, Spanish, French, Italian,
Greek, Farsi (Persian) and German texts.
vii. Pertinence Summarizer - Pertinence Summarizer5 performs linguistic processing
over a document and evaluates the pertinence (the relevance) of its sentences. The
process takes into account not only general and/or specialized linguistic markers
depending on the nature of the document analyzed, but also the user‘s keywords,
and optionally terminological bases, to enhance the relevance of the selected
sentences.
4[http://people.dsv.su.se/~hercules/textsammanfattningeng.html]
5[http://www.pertinence.net/produits_en.html]
158
After reviewing different summarization approaches and automation tools, a new generic,
extract-based, single document summarization approach is proposed in section 6.2, based
on statistical heuristics using Vector Space Model.
6.2 Proposed Method
Text Document
- Special Character elimination
- Stopwords removal
- Stemming
- Tokenizer
- Construct sentence term vector
- Construct paragraph term vector
- Construct document term vector
- Score sentences
- Ordering of sentences
- Select sentences
Preprocessing
Summarizer
Restructuring / Reorganizing
Synthesizing
Summary
% summary
required
Stopwords
Porter
Stemming
Algorithm
Figure 6.1: Proposed System Architecture
159
The proposed automatic summarization process has three phases:
a) Analyzing the source text (Preprocessing)
In the first phase, preprocessing of the text document is done to obtain a structured
representation of the original text. The preprocessing step includes:
i. Stop-word elimination – common words with no semantics and which do not
aggregate relevant information to the task e.g., ―this‖, ―is‖ are eliminated.
ii. Case folding - all the characters are converted to the same letter case i.e., either
upper case or lower case.
iii. Stemming – all the syntactically similar words, such as plurals, verbal variations,
etc. are reduced to their stems.
After the preprocessing step each text element, a sentence, is considered as N–dimensional
vector.
b) Determining the salient features (Restructuring/Reorganizing)
Sentences in the document are ranked according to their significance relevance to the
document and the paragraph containing the corresponding sentence. Steps to rank the
sentences in the document include:
i. Compute the sentence index term vector, <ti,fi>
• Compute the frequency of occurrence (fi) of each term (ti) appearing in
the sentence
ii. Compute the paragraph index term vector, <ti,csi>
• Select highest frequency occurrence index term(s) from each sentence
of the paragraph to be included in paragraph index term vector
• Compute frequency of occurrence (csi) of each selected term (ti) as
equal to sum of number of member sentences of the paragraph
containing term ti in their index term vector
iii. If the no. of index terms shared between any two paragraphs is greater than or
equal to the smallest size of their paragraph index term vector, then merge the
two paragraphs and re-compute the merged paragraph index term vector
containing index terms with occurrence frequency greater than one
iv. Compute the document index term vector, <ti,cpi>
160
• Select index term(s) (ti) having highest frequency occurrence (csi) in
each paragraph of the document
• Compute frequency of occurrence (cpi) of each selected term (ti) as
equal to sum of number of paragraphs containing (ti) as index term in
its index term vector
v. Repeat until the required number of sentences/words are not included in the
final summary
• Obtain the next highest unique value of term frequency occurrence
(cpx) from the document index term vector
• Select term(s) {ti}T from the document index term vector having
frequency occurrence greater than or equal to (cpx)
• New score of each sentence in the document is computed as
(6.1)
where,
Sk is the score of kth
sentence in the document
T is the total no. of selected terms
cpi is the frequency occurrence of term ti in document index term
vector
A is the adjustment factor defined as:
(6.2)
vi. Arrange the sentences according to their rank score in decreasing order
c) Synthesizing an appropriate output (Filtering)
Sentences with higher score are included in the final summary in the same order of their
occurrence as in the original text document to retain their semantic meaning.
Sentences are included in the final summary based on the following rules:
Rule 1- Sentences are selected to be included in the summary according to their highest to
lowest rank score value.
Rule 2- If more than one sentence in the same paragraph shows same rank score then,
sentence appearing earlier in the paragraph is given preference over the sentence appearing
later to generate fixed length summary.
161
Rule 3- Sentences having same rank score are selected based on their relative order of
occurrence within the original document.
Rule 4- If two or more sentences from the same paragraph shows non-zero sentence score
and also share some index terms between their index term vector, then rank score of
sentences appearing later in the paragraph is modified by subtracting the value obtained on
computing the sum of product of shared terms to their frequency occurrence in document
index term vector from their initial sentence score value.
Rule 5- If sentences of similar paragraphs show non-zero rank score then consider the
similar paragraphs as single merged paragraph and sentence(s) from this merged single
paragraph are selected following Rule 1, 2 and 4.
Above discussed tasks are implemented on the DUC2002 dataset to generate the extracted
summary of 100 words. The intermediate steps of the algorithm applied on DUC-2002
document are explained in section 6.3.2. The comparison results of summary generated by
the proposed summarizer to existing extract-based summarization tools are discussed in
next section 6.3.
6.3 Experiments
6.3.1 Experimental setup
The new proposed single document summarization system was evaluated on the standard
reference corpus from DUC-2002. The DUC-2002 corpus included a single-document
summarization task, in which 13 systems participated. 2002 is the last version of DUC that
included single-document summarization evaluation of informative summaries. The DUC-
2002 corpus used for the task contains 567 documents from different sources; 10 assessors
were used to provide for each document two 100-word human summaries. We gave the
names H1 and H2 for those two model summaries. The human summary H2 is used as
benchmark to measure the quality of our method summary, while the human summary H1
is used as reference summary. In addition to the results of the 13 participating systems, the
DUC organizers also distributed baseline summaries (the first 100 words of a document).
The coverage of all the summaries was assessed by humans [124].
162
Automatic evaluation measures are used to assess the performance of the automatic
summarization tools and the quality of the generated summary. There are different
approaches to evaluate the overall quality of a summarization system [74], [100], [117]. In
general, there are two categories of evaluation: intrinsic and extrinsic. In intrinsic
approaches, the quality of the summarization is evaluated based on analysis of the content
of a summary. In extrinsic approaches, the quality of the summary is measured based on
task-based setting, determining their usefulness as part of an information browsing and
access interface.
We used ROUGE (Recall-Oriented Understudy for Gisting Evaluation) toolkit for
comparing our system to other single document, extract-based summarization tools –
Copernic, SweSum, Extractor, MSWord AutoSummary tool, Intelligent, Brevity,
Pertinence taking DUC-2002 (100 word) human summary as baseline summary. ROUGE
1.5.5 is an intrinsic summarization evaluation toolkit. It is used to calculate the ratio of
how the tested summary (TEST) overlaps the model summary (MODEL). In ROUGE, a
human reference summary is taken as Model, and peer summary generated by machine or
human is taken as TEST. The ROUGE evaluation measure6 (version 1.5.5) generates three
scores for each summary: avg. Recall, avg. Precision and avg. F-measure. These measures
help to quantify how closely the system‘s extract corresponds to the human‘s [9], [89].
ROUGE is the main metric in the DUC text summarization evaluations. Certain ROUGE
configurations have been shown to correlate well with DUC coverage [100]. The measure
is computed by counting the number of overlapping words between the computer-
generated summary to be evaluated and the ideal summaries created by humans. ROUGE
has different variants. In our experiment, we use ROUGE-N and ROUGE-L to compute
avg. Recall, avg. Precision and avg. F-measure. ROUGE-N is an n-gram measure between
a candidate summary and a set of reference summaries.
6[http://www.isi.edu/~cyl/ROUGE]
163
The reason for selecting the measure is that ROUGE-N work well for single document
summarization. N takes the values from 1 to 8. ROUGE-1 is unigram-based, ROUGE-2 is
bigram-based and so on. Unigram recall reflects the proportion of words in X (reference
summary sentence) that are also present in Y (candidate summary sentence); while
unigram precision is the proportion of words in Y that are also in X. ROUGE-L is defined
as the longest common subsequence (LCS) with maximum length in the given two
sequences X and Y. Unigram recall and precision count all co-occurring words regardless
their orders; while ROUGE-L counts only in-sequence co-occurrences.
For data preprocessing, ROUGE-1.5.5's input-format can be SEE, SPL and ISI or
SIMPLE. We used SPL input-format for evaluating the generated summaries.
When running ROUGE, the following evaluation setup is used:
i. Both model and system summaries are stemmed using Porter Stemmer before
computing various statistics.
ii. Stop words are removed in model and system summaries before computing various
statistics.
iii. ROUGE-N (N=1 to 8) and ROUGE-L are computed.
iv. All systems are evaluated at 95% confidence interval.
v. Model average scoring formula is used to compute Recall, Precision and F-
measure since there were 2 model summaries for each document set.
vi. Average ROUGE is computed by averaging sentence (unit) ROUGE scores.
ROUGE can be freely downloadable for research purpose at:
http://www.isi.edu/~cyl/ROUGE.
6.3.2 Experimental Results
ROUGE-N (N=1 to 8) and ROUGE-L values obtained on applying different
summarization tools to DUC 2002 dataset for analyzing avg. Recall, avg. Precision
and avg. F-measure results are shown below.
164
Table 6.1: Comparison of different Summarization tools: average recall using ROUGE-1 to 8 at the 95%-confidence interval
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-4 ROUGE-5 ROUGE-6 ROUGE-7 ROUGE-8
Proposed 0.40160 0.22365 0.13257 0.07335 0.04269 0.02871 0.01582 0.00630
SweSum 0.33847 0.15899 0.08302 0.04148 0.02120 0.01210 0.00426 0.00000
Copernic 0.41421 0.20637 0.11124 0.06474 0.03732 0.02357 0.01093 0.00169
Extractor 0.38345 0.19179 0.10402 0.06227 0.03732 0.02357 0.01093 0.00169
Brevity 0.33814 0.15650 0.08279 0.04148 0.02120 0.01210 0.00426 0.00000
Intelligent 0.33975 0.15260 0.07810 0.04326 0.02398 0.01400 0.00696 0.00169
MSWord 0.45635 0.23939 0.13475 0.07495 0.04339 0.02908 0.01416 0.00632
Pertinence 0.25456 0.09172 0.03768 0.01631 0.01110 0.00567 0.00000 0.00000
165
Figure 6.2. Comparative chart for Recall scores obtained by Different Summarization tools
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45 R
eca
ll M
eas
ure
Text Summarizer
ROUGE-1
ROUGE-2
ROUGE-3
ROUGE-4
ROUGE-5
ROUGE-6
ROUGE-7
ROUGE-8
166
Table 6.2: List different Summarization tools in decreasing order of average recall measures for ROUGE-1 to 8 at the 95%-confidence interval
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-4 ROUGE-5 ROUGE-6 ROUGE-7 ROUGE-8
MSWord MSWord MSWord MSWord MSWord MSWord Proposed MSWord
Copernic Proposed Proposed Proposed Proposed Proposed MSWord Proposed
Proposed Copernic Copernic Copernic Copernic Copernic Copernic Copernic
Extractor Extractor Extractor Extractor Extractor Extractor Extractor Extractor
Intelligent SweSum SweSum Intelligent Intelligent Intelligent Intelligent Intelligent
SweSum Brevity Brevity Brevity Brevity Brevity Brevity Brevity
Brevity Intelligent Intelligent SweSum SweSum SweSum SweSum Pertinence
Pertinence Pertinence Pertinence Pertinence Pertinence Pertinence Pertinence SweSum
167
Table 6.3: Comparison of different Summarization tools: average Precision using ROUGE-1 to 8 at the 95%-confidence interval
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-4 ROUGE-5 ROUGE-6 ROUGE-7 ROUGE-8
Proposed 0.41799 0.23141 0.13640 0.07420 0.04195 0.02803 0.01523 0.00577
SweSum 0.37573 0.17809 0.09396 0.04705 0.02362 0.01354 0.00492 0.00000
Copernic 0.43132 0.21493 0.11613 0.06771 0.03895 0.02481 0.01181 0.00187
Extractor 0.43079 0.21794 0.11798 0.06995 0.03810 0.02420 0.01145 0.00183
Brevity 0.37427 0.17515 0.09501 0.04991 0.02692 0.01602 0.00643 0.00000
Intelligent 0.35308 0.15650 0.08021 0.04461 0.02451 0.01433 0.00721 0.00173
MSWord 0.39189 0.20389 0.11441 0.06397 0.03736 0.02502 0.01224 0.00555
Pertinence 0.25054 0.08747 0.03586 0.01627 0.01107 0.00565 0.00000 0.00000
168
Figure 6.3. Comparative chart for Precision scores obtained by Different Summarization tools
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pre
cisi
on
Me
asu
re
Text Summarizer
ROUGE-8
ROUGE-7
ROUGE-6
ROUGE-5
ROUGE-4
ROUGE-3
ROUGE-2
ROUGE-1
169
Table 6.4: List different Summarization tools in decreasing order of average Precision measures for ROUGE-1 to 8 at the 95%-confidence interval
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-4 ROUGE-5 ROUGE-6 ROUGE-7 ROUGE-8
Copernic Proposed Proposed Proposed Proposed Proposed Proposed Proposed
Extractor Extractor Extractor Extractor Copernic MSWord MSWord MSWord
Proposed Copernic Copernic Copernic Extractor Copernic Copernic Copernic
MSWord MSWord MSWord MSWord MSWord Extractor Extractor Extractor
SweSum SweSum Brevity Brevity Brevity Brevity Intelligent Intelligent
Brevity Brevity SweSum SweSum Intelligent Intelligent Brevity Brevity
Intelligent Intelligent Intelligent Intelligent SweSum SweSum SweSum Pertinence
Pertinence Pertinence Pertinence Pertinence Pertinence Pertinence Pertinence SweSum
170
Table 6.5: Comparison of different Summarization tools: average F-measure using ROUGE-1 to 8 at the 95%-confidence interval
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-4 ROUGE-5 ROUGE-6 ROUGE-7 ROUGE-8
Proposed 0.40837 0.22676 0.13402 0.07353 0.04220 0.02828 0.01547 0.00602
SweSum 0.35477 0.16729 0.08775 0.04386 0.02223 0.01269 0.00452 0.00000
Copernic 0.42191 0.21017 0.11339 0.06604 0.03803 0.02412 0.01134 0.00178
Extractor 0.39893 0.19968 0.10803 0.06440 0.03766 0.02385 0.01118 0.00176
Brevity 0.35343 0.16431 0.08782 0.04485 0.02341 0.01359 0.00509 0.00000
Intelligent 0.34560 0.15435 0.07908 0.04389 0.02423 0.01416 0.00709 0.00171
MSWord 0.42099 0.21985 0.12354 0.06892 0.04011 0.02687 0.01312 0.00591
Pertinence 0.25131 0.08923 0.03665 0.01627 0.01107 0.00565 0.00000 0.00000
171
Figure 6.4. Comparative chart for F-measure scores obtained by Different Summarization tools
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F-m
eas
ure
Text Summarizer
ROUGE-8
ROUGE-7
ROUGE-6
ROUGE-5
ROUGE-4
ROUGE-3
ROUGE-2
ROUGE-1
172
Table 6.6: List different Summarization tools in decreasing order of average F-measure for ROUGE-1 to 8 at the 95%-confidence interval
ROUGE-1 ROUGE-2 ROUGE-3 ROUGE-4 ROUGE-5 ROUGE-6 ROUGE-7 ROUGE-8
Copernic Proposed Proposed Proposed Proposed Proposed Proposed Proposed
MSWord MSWord MSWord MSWord MSWord MSWord MSWord MSWord
Proposed Copernic Copernic Copernic Copernic Copernic Copernic Copernic
Extractor Extractor Extractor Extractor Extractor Extractor Extractor Extractor
SweSum SweSum Brevity Brevity Intelligent Intelligent Intelligent Intelligent
Brevity Brevity SweSum Intelligent Brevity Brevity Brevity Brevity
Intelligent Intelligent Intelligent SweSum SweSum SweSum SweSum Pertinence
Pertinence Pertinence Pertinence Pertinence Pertinence Pertinence Pertinence SweSum
173
Table 6.7: Comparison of different Summarization tools: ROUGE-L at the 95%-confidence interval
According to the Recall results obtained for different ROUGE-N values as given in Table
6.1 and 6.2, MSWord Autosummarizer outperforms all the other summarization tools
including our proposed method for all the ROUGE-N (N=1 to 8) values except for
ROUGE-7. Our proposed summarizer shows highest avg. recall measure for ROUGE-7
when compared to all the other summarizer tools. For ROUGE-2 to ROUGE-6 and
ROUGE-8, our text Summarizer shows next higher avg. recall value after MSWord
AutoSummarizer. As shown in figure 6.2, the difference between recall values between
our proposed summarizer and MSWord AutoSummarizer reduced considerably (showing
difference in value at third decimal place) for ROUGE-2 to ROUGE-8. This slight
variation may occur due to length of the summary not exactly equal to 100 words but vary
between 95 words to 112 words depending upon the length of the last sentence included in
the final summary. Also the human baseline summary given in DUC2002 and as obtained
by MSWord AutoSummarizer are not purely extract summary and sometimes create a
sentence merging more than one sentence from the original document in final summary.
ROUGE-L
Avg Recall Avg Precision Avg F-measure
Proposed 0.38617 0.40184 0.39264
SweSum 0.32148 0.35763 0.33728
Copernic 0.39835 0.41507 0.40588
Extractor 0.36729 0.41377 0.38240
Brevity 0.32116 0.35531 0.33564
Intelligent 0.32422 0.33785 0.33024
MsWord 0.42567 0.36599 0.39295
Pertinence 0.23355 0.22809 0.22966
174
While the summary generated by our proposed summarizer contain original sentences
from the given input text document.
It is observed that our proposed method shows highest avg. Precision value for ROUGE-2
to ROUGE-8 as given in table 6.3 and 6.4. Also, if we analyze the results obtained for avg.
F-measure shown in table 6.5 and 6.6, the proposed method outperforms all the other
summarization tools for ROUGE-2 to ROUGE-8. MSWord AutoSummarizer shows lower
avg. Precision value compared to our proposed method even for ROUGE-1. The
comparative results of avg. Precision and avg. F-measure obtained for different
summarizer tools are shown in figure 6.3 and 6.4 respectively. From table 6.7, it is
analyzed that Copernic summarizer shows highest ROUGE-L values for avg. Precision
and avg. F-measure and MSWord AutoSummarizer shows highest ROUGE-L value for
avg. Recall when compared to other summarization tools. Our proposed method shows
slightly lesser value of ROUGE-L (difference in values is less than 0.1) as compared to
Copernic and MSWord AutoSummarizer.
To summarize, Copernic Summarizer is the highest scored summarization tool amongst all
the discussed automation tools in terms of ROUGE-L for all the three measures. Similarly,
MSWord Autosummarizer outperforms all the summarization tools in terms of ROUGE-N
for avg. Recall. Our proposed method shows best results for avg. Precision and avg. F-
measure compared to other listed summarization tools. Also the proposed method shows
ROUGE-L values comparable to Copernic summarizer and avg. Recall values comparable
to MSWord Autosummarizer.
To further evaluate the quality of the summary obtained by our proposed text summarizer,
we compare the summary obtained by MSWord (showing best recall measure) and our text
summarizer on 30 documents related to different topics (computer, mobile, festivals, etc).
On analyzing the Summaries, it is found that our text summarizer include more
meaningful information.
6.3.3 Discussion
A. A sample document of DUC-2002 d061j ( AP880911-0016) and its summary obtained
by different Summarizing tools (Our Proposed Summarizing model, DUC-2002
Human Summary (H2-H1),Copernic, SweSum, Extractor, MSWord AutoSummary,
Intelligent, Brevity, Pertinence text summarizer) for 100 words is given below as the
175
reference. The Bold and Italics text in each summary shows matching lines of the
respective text Summarizer to human generated summary given in DUC-2002.
Hurricane Gilbert swept toward the Dominican Republic Sunday, and the Civil
Defense alerted its heavily populated south coast to prepare for high winds, heavy
rains and high seas. The storm was approaching from the southeast with sustained
winds of 75 mph gusting to 92 mph.
``There is no need for alarm,'' Civil Defense Director Eugenio Cabral said in a
television alert shortly before midnight Saturday.
Cabral said residents of the province of Barahona should closely follow Gilbert's
movement. An estimated 100,000 people live in the province, including 70,000 in
the city of Barahona, about 125 miles west of Santo Domingo.
Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a
hurricane Saturday night. The National Hurricane Center in Miami reported its
position at 2 a.m. Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles
south of Ponce, Puerto Rico, and 200 miles southeast of Santo Domingo.
The National Weather Service in San Juan, Puerto Rico, said Gilbert was moving
westward at 15 mph with a ``broad area of cloudiness and heavy weather'' rotating
around the center of the storm.
The weather service issued a flash flood watch for Puerto Rico and the Virgin
Islands until at least 6 p.m. Sunday.
Strong winds associated with the Gilbert brought coastal flooding, strong southeast
winds and up to 12 feet feet to Puerto Rico's south coast. There were no reports of
casualties.
San Juan, on the north coast, had heavy rains and gusts Saturday, but they subsided
during the night.
On Saturday, Hurricane Florence was downgraded to a tropical storm and its
remnants pushed inland from the U.S. Gulf Coast. Residents returned home, happy
to find little damage from 80 mph winds and sheets of rain.
Florence, the sixth named storm of the 1988 Atlantic storm season, was the second
hurricane. The first, Debby, reached minimal hurricane strength briefly before
hitting the Mexican coast last month.
Figure 6.5: DUC-2002 d061j ( AP880911-0016 ) Original text Document
176
TTrrooppiiccaall SSttoorrmm GGiillbbeerrtt iinn tthhee eeaasstteerrnn CCaarriibbbbeeaann ssttrreennggtthheenneedd iinnttoo aa hhuurrrriiccaannee
SSaattuurrddaayy nniigghhtt ..
The National Hurricane Center in Miami reported its position at 22 aa..mm.. SSuunnddaayy to be
about 140 miles south of Puerto Rico and 220000 mmiilleess ssoouutthheeaasstt ooff SSaannttoo DDoommiinnggoo.
It is mmoovviinngg wweessttwwaarrdd aatt 1155mmpphh with a broad area of cloudiness and heavy weather
with sustained wwiinnddss ooff 7755mmpphh gusting to 92mph .
The Dominican Republic's Civil Defense alerted that country ' s heavily populated
south coast and the National Weather Service in San Juan , Puerto Rico issued a fflloooodd
watch for PPuueerrttoo RRiiccoo aanndd tthhee VViirrggiinn IIssllaannddss until at least 6 p.m. Sunday .
Figure 6.6: d061j ( AP880911-0016 ) Human Summary (H1)
Hurricane Gilbert is moving toward the Dominican Republic , where the residents
of the south coast , especially the Barahona Province , have been alerted to prepare
for heavy rains , and high winds and seas .
TTrrooppiiccaall SSttoorrmm GGiillbbeerrtt ffoorrmmeedd iinn tthhee eeaasstteerrnn CCaarriibbbbeeaann aanndd bbeeccaammee aa hhuurrrriiccaannee oonn
SSaattuurrddaayy nniigghhtt .
BByy 22 aa..mm.. SSuunnddaayy it was about 220000 mmiilleess ssoouutthheeaasstt ooff SSaannttoo DDoommiinnggoo and mmoovviinngg
wweessttwwaarrdd aatt 1155 mmpphh with wwiinnddss ooff 7755 mmpphh .
Flooding is expected in Puerto Rico and the Virgin Islands .
The second hurricane of the season , Florence , is now over the southern United
States and downgraded to a tropical storm .
Figure 6.7: d061j ( AP880911-0016 ) Human Summary (H2)
** Shadow and underlined text show matching words between H1 and H2 summary. The
two human generated summaries show very few lines of the original document common
between them.
177
Hurricane Gilbert swept toward the Dominican Republic Sunday, and the Civil
Defense alerted its heavily populated south coast to prepare for high winds, heavy
rains and high seas .
The storm was approaching from the southeast with sustained winds of 75 mph
gusting to 92 mph .
There is no need for alarm,' Civil Defense Director Eugenio Cabral said in a television
alert shortly before midnight Saturday. Cabral said residents of the province of
Barahona should closely follow Gilbert's movement. An estimated 100,000 people live
in the province, including 70,000 in the city of Barahona, about 125 miles west of
Santo Domingo.
Figure 6.8: d061j ( AP880911-0016 ) Summary obtained from Brevity text Summarizer
Hurricane Gilbert swept toward the Dominican Republic Sunday , and the Civil
Defense alerted its heavily populated south coast to prepare for high winds , heavy
rains and high seas .
The storm was approaching from the southeast with sustained winds of 75 mph
gusting to 92 mph .
Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a
hurricane Saturday night .
The National Weather Service in San Juan , Puerto Rico , said Gilbert was moving
westward at 15 mph with a ` ` broad area of cloudiness and heavy weather ' ' rotating
around the center of the storm .
Figure 6.9 : d061j ( AP880911-0016 ) Summary obtained from Copernic text Summarizer
178
Hurricane Gilbert swept toward the Dominican Republic Sunday , and the Civil
Defense alerted its heavily populated south coast to prepare for high winds , heavy
rains and high seas .
The storm was approaching from the southeast with sustained winds of 75 mph
gusting to 92 mph .
Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a
hurricane Saturday night .
The National Weather Service in San Juan , Puerto Rico , said Gilbert was moving
westward at 15 mph with a ` ` broad area of cloudiness and heavy weather ' ' rotating
around the center of the storm .
Figure 6.10: d061j ( AP880911-0016 ) Summary obtained from Extractor text Summarizer
Hurricane Gilbert swept toward the Dominican Republic Sunday , and the Civil
Defense alerted its heavily populated south coast to prepare for high winds , heavy
rains and high seas .
The storm was approaching from the southeast with sustained winds of 75 mph
gusting to 92 mph .
Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a
hurricane Saturday night .
The National Hurricane Center in Miami reported its position at 2 a.m. Sunday at
latitude 16.1 north , longitude 67.5 west , about 140 miles south of Ponce , Puerto
Rico , and 200 miles southeast of Santo Domingo .
The National Weather Service in San Juan , Puerto Rico, said Gilbert was moving
westward at 15 mph with a `` broad area of cloudiness and heavy weather '' rotating
around the center of the storm .
Figure 6.11: d061j ( AP880911-0016 ) Summary obtained from Intelligent text Summarizer
179
Hurricane Gilbert swept toward the Dominican Republic Sunday, and the Civil
Defense alerted its heavily populated south coast to prepare for high winds, heavy
rains and high seas.
The storm was approaching from the southeast with sustained winds of 75 mph
gusting to 92 mph.
Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a
hurricane Saturday night. Strong winds associated with the Gilbert brought coastal
flooding, strong southeast winds and up to 12 feet feet to Puerto Rico's south coast.
Florence, the sixth named storm of the 1988 Atlantic storm season, was the second
hurricane.
Figure 6.12: d061j ( AP880911-0016 ) Summary obtained from MSWord AutoSummarizer
Hurricane Gilbert swept toward the Dominican Republic Sunday , and the Civil
Defense alerted its heavily populated south coast to prepare for high winds , heavy
rains and high seas .
The storm was approaching from the southeast with sustained winds of 75 mph
gusting to 92 mph .
`` There is no need for alarm , '' Civil Defense Director Eugenio Cabral said in a
television alert shortly before midnight Saturday .
Cabral said residents of the province of Barahona should closely follow Gilbert ' s
movement .
An estimated 100,000 people live in the province , including 70,000 in the city of
Barahona , about 125 miles west of Santo Domingo .
Figure 6.13: d061j ( AP880911-0016 ) Summary obtained from SweSum text Summarizer
180
``There is no need for alarm,'' Civil Defense Director Eugenio Cabral said in a
television alert shortly before midnight Saturday. Cabral said residents of the province
of Barahona should closely follow Gilbert's movement.
An estimated 100,000 people live in the province, including 70,000 in the city of
Barahona, about 125 miles west of Santo Domingo. Tropical Storm Gilbert formed in
the eastern Caribbean and strengthened into a hurricane Saturday night.
On Saturday, Hurricane Florence was downgraded to a tropical storm and its remnants
pushed inland from the U.S. Gulf Coast. Residents returned home, happy to find little
damage from 80 mph winds and sheets of rain.
Figure 6.14: d061j ( AP880911-0016 ) Summary obtained from Pertinence text Summarizer
Hurricane Gilbert swept toward the Dominican Republic Sunday, and the Civil
Defense alerted its heavily populated south coast to prepare for high winds, heavy
rains and high seas.
The storm was approaching from the southeast with sustained winds of 75 mph
gusting to 92 mph.
Tropical Storm Gilbert formed in the eastern Caribbean and strengthened into a
hurricane Saturday night .
The National Weather Service in San Juan , Puerto Rico , said Gilbert was moving
westward at 15 mph with a ``broad area of cloudiness and heavy weather ' ' rotating
around the center of the storm .
On Saturday, Hurricane Florence was downgraded to a tropical storm and its
remnants pushed inland from the U.S. Gulf Coast.
Florence, the sixth named storm of the 1988 Atlantic storm season, was the
second hurricane.
Figure 6.15: d061j ( AP880911-0016 ) Summary obtained from our Proposed Text Summarizer
181
** On comparing the summary obtained from different summarization tools and two
human generated summaries H1 and H2 (shown in bold and italics), it is observed that
sentences 1, 2 and 8 appear with maximum frequency occurrence among all the sixteen
sentences of the given document in the above listed summaries. This shows, that sentences
1, 2 and 8 are most relevant sentences to the given document and should be included in the
final summary. The summary generated by our proposed summarizer include all the three
sentences 1, 2 and 8 in the final summary as shown in figure 6.15, while sentence 8 is
found missing in the final summary obtained from MsWord AutoSummarizer.
To generate the extracted summary of 100 words, our proposed text summarizer compute
the document, paragraph and sentence index term vector shown below:
Document Name: d061j-AP880911-0016
Document Index term vector - < {storm, hurrican, coast}3, {wind, weather, tropic, mph,
heavi,}2 > ** subscript integer value represents term frequency occurrence of
corresponding term in index term vector
Table 6.8: Sentence Index term vector for document d061j-AP880911-0016
Paragraph
No.
Sentence
No.
Sentence
Index Term
Vector
Sentence
Score
Ist
Iteration
Modified
Sentence
Score
No. of
words
Included
in
Summary
(Y/N)
1 1 < {heavi}2,
{wind, swept,
south, sea, republ,
rain, prepar,
popul, hurrican,
gilbert,
dominican,
defens, coast,
civil, alert}1 >
6 6 28 Y
2 2 < {mph}2,{wind,
sustain, storm,
southeast, gust,
approach}1 >
3 3 17 Y
182
3 3 < {televis,
shortli, midnight,
eugenio, director,
defens, civil,
cabral, alert,
alarm}1 >
0 0 - N
4 4 < {resid, provinc,
movement,
gilbert, follow,
close, cabral,
barahona}1 >
0 0 - N
4 5 < {west, santo,
provinc, peopl,
mile, live, includ,
estim, domingo,
citi, barahona}1 >
0 0 - N
5 6 < {tropic,
strengthen, storm,
night, hurrican,
gilbert, form,
eastern,
Caribbean}1 >
6 6 15 Y
5 7 < {mile}2, {west,
southeast, south,
santo, rico,
report, puerto,
posit, ponc,
north, nation,
miami, longitud,
latitud, hurrican,
domingo,
center}1 >
3 2 - N
6 8 < {weather}2,
{westward, storm,
servic, san, rotat,
rico, puerto,
nation, mph,
move, juan,
heavi, gilbert,
cloudi, center}1 >
3 3 38 Y
7 9 < {weather,
watch, virgin,
servic, rico,
puerto, issu,
0 0 - N
183
island, flood,
flash}1 >
8 10 < {wind, strong,
feet}2, {southeast,
south, rico,
puerto, gilbert,
flood, coastal,
coast, brought,
associ}1 >
3 3 23 N
8 11 < {report,
casualty}1 >
0 0 - N
9 12 < {subsid, san,
rain, north, night,
juan, heavi, gust,
coast}1 >
3 2 18 N
10 13 < {tropic, storm,
remnant, push,
inland, hurrican,
gulf, florenc,
downgrad,
coast}1 >
9 9 20 Y
10 14 < {wind, sheet,
return, resid, rain,
mph, littl, home,
happi, damage}1
>
0 0 - N
11 15 < {storm}2,
{sixth, season,
name, hurrican,
florenc, atlant}1 >
9 9 15 Y
11 16 < {strength,
reach, month,
minim, mexican,
hurrican, hit,
debbi, coast,
briefly}1 >
6 5 15 N
184
Table 6.9: Paragraph Index term vector for document d061j-AP880911-0016
Paragraph
No.
No. of
sentences
per
paragraph
Paragraph Index Term Vector Similar to
paragraph
No.
1 1 < {heavi}1 > 9
2 1 < {mph}1 > -
3 1 < {alarm, alert, cabral, civil, defens, director,
eugenio, midnight, shortli, televis}1 >
-
4 2 < {barahona, provinc}2, {west, santo, resid,
peopl, movement, mile, live, includ, gilbert,
follow, estim, domingo, close, citi, cabral}1 >
-
5 2 < {caribbean, eastern, form, gilbert, hurrican,
mile, night, storm, strengthen, tropic}1 >
-
6 1 < {weather}1 > -
7 1 < {flash, flood, island, issu, puerto, rico, servic,
virgin, watch, weather}1 >
-
8 2 < {casualti, feet, report, strong, wind}1 > -
9 1 < {coast, gust, heavi, juan, night, north, rain,
san, subsid}1 >
1
10 2 < {coast, damag, downgrad, florenc, gulf,
happi, home, hurrican, inland, littl, mph, push,
rain, remnant, resid, return, sheet, storm, tropic,
wind}1 >
-
11 2 < {briefli, coast, debbi, hit, hurrican, mexican,
minim, month, reach, storm, strength}1 >
-
185
Document d061j-AP880911-0016 contains 11 paragraphs, total 16 sentences and 317
words. To generate a summary of 100 words, proposed algorithm compute the document,
paragraph(s), and sentence(s) index term vector. Score of each sentence is computed using
equation (6.1 and 6.2) and the selection of non-zero score sentences in the final summary
follow rules 1 to 5 listed in section 6.2. Paragraph nos. 1 and 9 are merged as they are
found similar paragraphs shown in table 6.9. The sentence score of sentences are then
modified based on Rule 4.
From the sentence rank score obtained as shown in table 6.8, sentence no. 13 and 15 show
highest rank score value of 9 since all the three highest frequency occurrence term of the
paragraph index term vector <storm, hurrican, coast> matches its sentence index term
vector.
Using equation (6.1 and 6.2) final score of sentence no. 13 is computed as shown below:
Score(sentence 13) = cpstorm * cphurrican * cpcoast = 3 * 3 * 3 = 9
Sentence nos. 1 and 6 show the next highest rank score value of 6 so are selected to be
included in the final summary. Since, the total no. of words in the selected sentences are
only 78 (28+15+20+15) which is less than required summary length of 100 words, so
sentences with next highest rank score value i.e., 3 are selected following rule 1-5.
Second highest sentence score value 6, is taken by sentence no. 1, 6, 15, and 16. So, these
sentences are considered in the final summary making a total of 88 (28+15+20+15+15)
words in the final summary. Since the no. of words in the final summary are still less than
100 words so next sentence score value 3 is picked and sentences having score 3 are
selected following rules 1-5. Sentence no. 11 although shows sentence score value equal to
3, but it is not included in the final summary following rule 3. Sentence 7 belonging to
paragraph 5 initially shows rank score value equal to 3, but the value is modified following
rule 4 since it shares one term (hurrican) with sentence 6 of the same paragraph. So the
modified score value of sentence no. 7 is computed as shown below:
Score(sentence 7) = 3 - cphurrican = 3 -3 = 0
Sentences with sentence score 0 are rejected to be included in the document summary.
186
B. Document-A ( Topic : Plastic Bags)
Document A on topic ―Plastic Bags‖ is shown below along with its summary obtained
through our summarizing model and MSWord AutoSummarizer.
Every once in a while the government here passes out an order banning shop keepers
from providing plastic bags to customers for carrying their purchases, with little lasting
effect. Plastic bags are very popular with both retailers as well as consumers because
they are cheap, strong, lightweight, functional, as well as a hygienic means of carrying
food as well as other goods. Even though they are one of the modern conveniences that
we seem to be unable to do without, they are responsible for causing pollution, killing
wildlife, and using up the precious resources of the earth.
About a hundred billion plastic bags are used each year in the US alone. And then, when
one considers the huge economies and populations of India, China, Europe, and other
parts of the world, the numbers can be staggering. The problem is further exacerbated by
the developed countries shipping off their plastic waste to developing countries like
India.
Here are some of the harmful effects of plastic bags:
Plastic bags litter the landscape. Once they are used, most plastic bags go into landfill,
or rubbish tips. Each year more and more plastic bags are ending up littering the
environment. Once they become litter, plastic bags find their way into our waterways,
parks, beaches, and streets. And, if they are burned, they infuse the air with toxic fumes.
Plastic bags kill animals. About 100,000 animals such as dolphins, turtles whales,
penguins are killed every year due to plastic bags. Many animals ingest plastic bags,
mistaking them for food, and therefore die. And worse, the ingested plastic bag remains
intact even after the death and decomposition of the animal. Thus, it lies around in the
landscape where another victim may ingest it.
Plastic bags are non-biodegradable. And one of the worst environmental effects of
plastic bags is that they are non-biodegradable. The decomposition of plastic bags takes
about 1000 years.
Petroleum is required to produce plastic bags. As it is, petroleum products are
diminishing and getting more expensive by the day, since we have been using this non-
renewable resource increasingly. Petroleum is vital for our modern way of life. It is
187
necessary for our energy requirements – for our factories, transport, heating, lighting,
and so on. Without viable alternative sources of energy yet on the horizon, if the supply
of petroleum were to be turned off, it would lead to practically the whole world grinding
to a halt. Surely, this precious resource should not be wasted on producing plastic bags,
should it?
So, What Can be Done about the Use of Plastic Bags?
Single-use plastic bags have become such a ubiquitous way of life that it seems as if we
simply cannot do without them. However, if we have the will, we can start reducing
their use in small ways. A tote bag can make a good substitute for holding the shopping.
You can keep the bag with the cahier, and then put your purchases into it instead of the
usual plastic bag. Recycling the plastic bags you already have is another good idea.
These can come into use for various purposes, like holding your garbage, instead of
purchasing new ones.
Figure 6.16: Original Document –A ―Plastic Bags”
Here are some of the harmful effects of plastic bags:
Plastic bags litter the landscape. Once they are used, most plastic bags go into
landfill, or rubbish tips. Plastic bags kill animals. Many animals ingest plastic
bags, mistaking them for food, and therefore die. Plastic bags are non-
biodegradable. The decomposition of plastic bags takes about 1000 years.
Petroleum is required to produce plastic bags. Surely, this precious resource
should not be wasted on producing plastic bags, should it?
So, What Can be Done about the Use of Plastic Bags?
Recycling the plastic bags you already have is another good idea.
Figure 6.17: Document-A Summary Obtained from MSWord AutoSummarizer
188
Every once in a while the government here passes out an order banning shop keepers
from providing plastic bags to customers for carrying their purchases , with little
lasting effect .
About a hundred billion plastic bags are used each year in the US alone.
Here are some of the harmful effects of plastic bags:
Plastic bags litter the landscape.
Plastic bags kill animals.
Plastic bags are non-biodegradable.
Petroleum is required to produce plastic bags.
So , What Can be Done about the Use of Plastic Bags.
Single-use plastic bags have become such a ubiquitous way of life that it seems as if
we simply cannot do without them.
Figure 6.18: Document-A Summary Obtained from our Proposed text Summarizer
Consider the following two sentences in the original document shown in figure 6.16:-
“Plastic bags kill animals. “
“Many animals ingest plastic bags, mistaking them for food, and therefore die.”
The above listed both the lines convey same meaning and therefore either of the two lines
should be included in the summary. Summary generated by MSWord AutoSummarizer
shown in figure 6.17 include both the lines while our proposed text summarizer avoid such
redundancy and include only second line in the final summary as shown in figure 6.18.
This improves the quality of the summary generated by the summarizer by including more
information to the fixed length generated summary.
189
6.4 Conclusion of this chapter
In this chapter, a new generic, extract-based, single document summarization approach is
proposed based on statistical heuristics using Vector Space Model. Summary document
clearly depicts the different topics included in text document and shows linkage between
the sentences of the summary. Also the method is independent of the structure of text
document and the position of sentence within the document. A sentence appearing later in
the document can be included in the summary according to its importance within the
paragraph of the document. Also the method reduces redundancy among the sentences and
paragraphs and hence provides more information to be included in fixed length generated
summary. The proposed summarization approach is evaluated on DUC-2002 corpus and it
showed satisfactory results when compared to all the reported summarization tools in
terms of avg. Recall, avg. Precision and avg. F-measure for different values of ROUGE-N
(N=1 to 8) and ROUGE-L. Our proposed method shows best results for avg. Precision and
avg. F-measure compared to other discussed summarization tools. Also the proposed
method shows ROUGE-L values comparable to Copernic summarizer and avg. Recall
values comparable to MSWord Autosummarizer.
190
top related