Intelligent Database Systems Lab
Presenter : JIAN-REN CHEN
Authors : Rafael Ferreiraa, *, Luciano De Souza Cabrala, Rafael Dueire Linsa,
Gabriel Pereira E Silvaa, Fred Freitasa, George D.C. Cavalcantia, Rinaldo Limaa,
Steven J. Simskeb, Luciano Favaroc
2013.ESA
Assessing sentence scoring techniques for extractive text summarization
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
MotivationDue to the huge volume of information in the Internet, it has
become unfeasible to efficiently sieve useful information from the
huge mass of documents.
Text Summarization
- Extractive
- Abstractive
Intelligent Database Systems Lab
Objectives
• We want to introduce 15 sentence scoring methods and assess all of them for extractive text summarization.
Intelligent Database Systems Lab
Methodology – Word scoring
• Word frequency• TF/IDF• Upper case• Proper noun• Word co-occurrence• Lexical similarity
Score(s) =
n-gram
Intelligent Database Systems Lab
Methodology – Sentence scoring
• Cue-phrases• Sentence inclusion of numerical data• Sentence length• Sentence position• Sentence centrality• Sentence resemblance to the title
in summary, in conclusion, our investigationthe best, the most important, according to the study,significantly, important, in particular, hardly, impossible
Score(s) =
Sp( Si) {1 𝑓𝑖𝑟𝑠𝑡 𝑁 𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠0 h𝑜𝑡 𝑒𝑟𝑤𝑖𝑠𝑒
Intelligent Database Systems Lab
Methodology – Graph scoring
• Text rank• Bushy path of the node• Aggregate similarity
Score (s) = #(branches connected to the node)
Score (s) =
Intelligent Database Systems Lab
Experiments - Datasets、 Evaluation
CNN Blog SUMMAC
data 400 100 post50 blog 183
Summary length 2-4 7 2-7
Datasets:
ROUGE- Quantitative Assessment- Qualitative Assessment
Evaluation:
Intelligent Database Systems Lab
Experiments - CNN
word scoring: TF/IDFsentence scoring: Sentence position 1graph scoring: TextRank score
Intelligent Database Systems Lab
Experiments - Blogword scoring: TF/IDFsentence scoring: Sentence lengthgraph scoring: TextRank score
Intelligent Database Systems Lab
Experiments - SUMMACword scoring: TF/IDFsentence scoring: Resemblance to the titlegraph scoring: TextRank score
Intelligent Database Systems Lab
Sentence scoring results improveMorphological transformation:- Truncation 、 Stemming 、 LemmatizationStop wordsSimilar semantics - WordNet 、 Lexical ChainsCo-reference - word frequency featuresAmbiguity - Lexical ChainsRedundancy - Sentence fusion
lights: light, lights, lighting, litcolleg*: college, colleges, collegium, collegial col*r : color, colour, colander
be: is, am, arecar, wheel, seat, passenger => automobile topicJohn will travel tomorrow. He bought the ticket yesterday
Intelligent Database Systems Lab
Conclusions
• The Word Frequency, TF/IDF, Lexical Similarity, Sentence Length and Text-Rank Score was chosen by as providing good results.
- computationally intensive: TF/IDF- balance in execution-time: Word Frequency
Sentence Length
Intelligent Database Systems Lab
Comments• Advantages
- understand the basic methods and their difference• Applications
- text summarization