human aided text summarizer “saar” using reinforcement learning
TRANSCRIPT
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 1/31
Paper ID: ISCMI2014-1-031E
Human Aided Text Summarizer“SAAR” using Reinforcement Learning
By : Chandra Prakash
ABV-IIITM Gwalior
&
Dr. Anupam Shukla
Professor, ABV-IIITM, Gwalior
2014 Intl. Conference on Soft Computing & Machine Intelligence (ISCMI 2014)
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 2/31
Approach
Problem Definition
Motivation
Literature survey
Scope of Project Methodology/Approach
Tools used
Result
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 3/31
Introduction
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 4/31
Real time Problem
Imagine
Download 1000 + papers and now want to get thesummary..
We have list of emails about sports event, get the summaryof those emails in one para…
We have to study lots of books for the exam and thesummarizer gives the key concepts of the books as fewpages notes…
Value for researchers
Get me everything/Papers say about “Automatic TextSummarization”
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 5/31
Definition
Automatic Summaries• An active research area where computer automatically
summarize text from both single and multi-documents.
• A short summary, which conveys the essence of the document
• Should be less than half of original text
• Can be extractive or abstractive based
• May be produced from single or multiple documents
Dipanjan Das, Andre F.T. Martins (2007). A Survey on Automatic Text Summarization. LiteratureSurvey for the Language and Statistics II course at CMU, Pittsburg
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 6/31
Problem definition
With the advent of the information revolution (WWW), Electronic documents are becoming a principle media of business and academic
information
Thousands of electronic documents are produced and made available on the internet each day.
not easy to read each and every document .
Information Access Agent:
Search engines : Google, Yahoo etc.
Information retrieval is far greater than that a user can handle and manage.
User has to analyze searched result one by one until felt satisfactory, this is time
consuming and inefficient.
What could be the possible solution than???
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 7/31
Problem definition (cont..)
Text summarization is not as per user specification.
Generic summary generation not possible as summary changes as userchanges.
Even two human can‘t generate a similar summary from a given
document.
Internal factors (background, education etc.) play vital role in generating asummary
What could be the possible solution now ???
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 8/31
Solution: Human A ided Text Summarization
Benefits of summarization include: Save reading time
Value for researchers
Abstracts for Scientific and other articles
Facilitate fast literature searches
Facilities classification of articles and other written data :
Improve Search engines indexing efficiency of web pages
Assists in storing the text in much lesser space.
Heading of the given article/document
News summarization
Opinion Mining and Sentiment Analysis
Enables Cell phones to access the Web information
With human feedback – user oriented summary
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 9/31
Previous Approach :
1950 : Automatic creation of literature abstracts was proposed by IBMLuhn.
Text Mining: Includes discovery of patterns and trends in data associations among entities
in a document. Consist of three steps:text preparation,text processing andtext analysis.
Text Summarization : Text Summarization Methods.
Extraction: Construct the summery by taking the most importantsentences
Abstraction: Construct the summary by paraphrasing section of theoriginal document.
99
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 10/31
Type of Techniques:
Statistical techniques :
Based on Term Frequency.
Stop-word filtering : remove the unwanted noise.
Stemming or Lemmatization: different forms of the same word.
Determine term importance
Term Frequency/Inverse-documents-frequency (TF-IDF) Weighting scheme, etc.
Linguistic techniques :
Looks for text semantics.
Linguistic techniques extract sentence by Parsing and part of Natural language processing (NLP).
Speech tagging is among the starting steps.
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 11/31
Scope of Project
Problem Definition: Extractive Text Summarization
Single Document
Fully Automated Summarization (FAS)
Human Aided Machine Summarization (HAMS)
Machine Learning
Reinforcement Learning
Tools used:
Matlab
Java
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 12/31
Earlier Methodology proposed (FAS)
Chandra Prakash, Anupam Shukla “Automated summary generation from singe document using information gain ”
Springer, Contemporary Computing ,Communications in Computer and Information Science Volume 94, pp 152-159,
2010 .
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 13/31
Methodology proposed (HAMS)
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 14/31
Keyword Significant Factor
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 15/31
15
Solution
Approach for the Problem Input: Document with text is fed into the system.
Preprocessing:
Tokenization: Divides the character sequence into words
sentence splitting further divides sequences of words into
sentences, and so on. Stemming or Lemmatization
Stop word filtering Feature Extraction :
Sentence Ranking: Machine Learning
Human Feedback
Output\ Result: Generated Summary an abstract.
15
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 16/31
Methodology Steps..
Methodology for text summarization involves Term Selection using Pre-Processing
Tokenization or Segmentation
Stop word Filtering
Stemming or Lemmatization
Term weighting Term Frequency (TF):
Wi(T j)=f ij
where f ij is the frequency of j th term in sentence i.
Inverse Sentence Frequency (ISF) :
where N =no of sentences in the collection
n j =no of sentence where the term j appears.
nj
NlogWi(Tjj fij
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 17/31
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 18/31
Methodology Steps (cont…)
Information Gain is calculated as
Information Gain (IG) = (TFW)i+ ISFS(Tj)i + (NSL)i +(SPS)i+ (PNS)i
where i is the sentence and j is the term
Term-Sentence matrix after IG :
)(....)2()1(
................
)2(....)22()21(
)1(....)12()11(
)(
Wmn IGWm IGmW IG
nW IGW IGW IG
nW IGW IGW IG
TSM
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 19/31
Element of reinforcement learning
Agent: Intelligent programs Environment: External condition
Policy : Defines the agent’s behavior at a given time A mapping from states to actions Lookup tables or simple function
An agent learns behavior through trial-and-error interactions with a dynamic
environment.
Agent
Environment
State Reward Action
Policy
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 20/31
Methodology Steps (cont…)
Processing Step:
Action Sentence scoring using Reinforcement Learning
Selection Policies
Ɛ-greedy
In our approach we have considerState : Sentences ;
Action: Updating Term weight is considered
Policy: Update the term to maximum the sentence rank
Reward : scalar value of Term. (IG)
Q-Learning
y probabilithaction witRandom
1- probilitywith, =
t
t
aa
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 21/31
Processing Step:
Matrix Q : learning matrix.
updted updted updted
updted updted updted
updted updted updted
Wmn IGWm IGmW IG
nW IGW IGW IG
nW IGW IGW IG
TSM updted
)(....)2()1(
................
)2(....)22()21(
)1(....)12()11(
)(
)(....)2()1(
................
)2(....)22()21(
)1(....)12()11(
)(
Wmn IGWm IGmW IG
nW IGW IGW IG
nW IGW IGW IG
TSM
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 22/31
Summary Generation :
Sentence selection : Euclidean n-space
P = 1, 2 … …
Q = 1, 2 … …
Dataset Article from “The Hindu” (june 2013) DUC’06 sets of documents :
12 document sets
No of document in each Set 25
Average no of sentence 32
300 document summary
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 23/31
Evaluation
Evaluation Techniques
where, r is no of common sentence, K m is length of machine generated summary and k h is length ofhuman generated summary
Available automated text summarizers Open Text summarizer (OTS),
Pertinence Summarizer (PS), and
Extractor Test Summarizer Software (ETSS).
The compression ratio is 30%
m K
r 100 =(P)Precision
h K
r 100 =(R)Recall
mh K + K =
R+ P R P = score F 2r 100100
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 24/31
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 25/31
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 26/31
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 27/31
Comparison of generated textsummary for HAMS
Comparison of Recall, PrecisionValue and F-score for HAMS
Methods Precision value (P)
Recall Value(R)
F-score
SAAR (user
feedback)90 85 87.42
IGsummary
75 65 70.57
OTS 75 60 66.66
PS 75 60 66.66
ETSS 75 60 66.66
Result
0 20 40 60 80 100
SAAR Based
IG Summary
OTS
PS
ETSS
Chart Title
F-Score Recall Value ® Precision Value (P)
Compared with some available automated text summarizers• Open Text summarizer (OTS), Pertinence Summarizer (PS),
and Extractor Test Summarizer Software (ETSS)
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 28/31
Conclusion and future scope
A novel approach for human aided text summarization by userfeedback from single document
This summarization by extract will be good enough for a reader tounderstand the main idea of a document, though the understandability might not be as good as a summary by abstract.
As a future work this approach can be exacted for multi-documentsummary document extraction using machine learning.
We can introduce the concept of multi agent into the system. This will increase its speed as well make the summary or abstract more generic.
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 29/31
References
1. Verma R, Chen P, “Integrating Ontology Knowledge into a Query -based InformationSummarization System”, DUC 2007, 2007. Rochester, NY.2. Lunh H. P. 'The automatic creation of literature abstracts”, IBM Journal of Research and
Development, vol 2, pp 159—165, 1958.3. Edmundson H. P., “New Methods in Automatic Extracting”, Journal of the ACM (JACM),
vol. 16 no.2, pp. 264-285, 1969.4. Salton G., Buckley, C., “Term-Weighting Approaches in Automatic Text Retrieval
Information Processing & Management”, Vol 24. pp.513 523, 1988.
5. Luhn H.P, “A Statical Approach to Mechanical Encoding and Searching of LiteraryInformation”. IBM Journal of Research and Development, pp. 309-317, 1975.
6. Salton G., Buckley, C. “Term- Weighting Approaches in Automatic Text Retrieval”.Information Processing & Management, Vol 24. pp.513–523, 1988.
7. Kupiec J et al., “A trainable document summarizer”, In Proceedings of SIGIR, 1995.8. Conroy J. M., O'leary D. P, “Text summarization via hidden markov model”, In Proceedings
of SIGIR '01, pp 406-407, 2001, New York, NY, USA.9. Agarwal N., Ford K. H., Shneider M., “Sentence Boundary Detection using a MaxEnt
Classifer”.10. García-Hernández R. A., Ledeneva Y., “Word Sequence Models for Single Text
Summarization”, 2009 Second International Conferences on Advances in Computer-HumanInteractions, pp. 44-48, 2009.
11. The Hindu [http://www.hinduonnet.com/] Accessed on 23rd June 2009.12. Van Rsbergen C J. Information Retrieval, 2nd edition. Dept. of Computer Science, University
of Glasgow. 1979.
8/10/2019 Human Aided Text Summarizer “SAAR” using Reinforcement Learning
http://slidepdf.com/reader/full/human-aided-text-summarizer-saar-using-reinforcement-learning 30/31
References
13. V. A. Yatsko and T. N. Vishnyakov (2006). A Method for Evaluating Modern Systems of Automatic Text Summarization.14. S. Hariharan, and R. Srinivasan,(2008).Investigations in single document summarization by
extraction method.15. René Arnulfo García-Hernández and Yulia Ledeneva (2009) Word Sequence Models for
Single Text Summarization.16. Kyoomarsi, F.; Khosravi, H.; Eslami, E.; Dehkordy, P.K.; Tajoddin, A.; Optimizing Text
Summarization Based on Fuzzy Logic. In Proceedings of Computer and Information Science,
2008. ICIS 08.17. Sparck-Jones, K. Automatic summarizing: factors and directions. In Mani, I.; Maybury, M.
Advances in Automatic Text Summarization. The MIT Press (1999) 1-1218. Hovy, E. and C.-Y. Lin (1997). Automated Text Summarization in SUMMARIST. In
Proceedings of the ACL97/EACL97 Workshop on Intelligent Scalable Text Summarization,Madrid, Spain.
19. Mani, I. and M. T. Maybury (editors) (1999). Advances in Automatic Text Summarization.MIT Press, Cambridge, MA.
20. Sparck-Jones, K. (1999). Automatic Summarizing: Factors and Directions. In Mani, I. and M.T. Maybury (editors), Advances in Automatic Text Summarization, pp. 1–13. The MIT Press.
21. Lin, C.-Y. and E. Hovy (2000). The automated acquisition of topic signatures for textsummarization. In Proceedings of the 18th COLING Conference, Saarbr¨ucken, Germany.
22. Baldwin, B., R. Donaway, E. Hovy, E. Liddy, I. Mani, D. Marcu, K. McKeown, V. Mittal, M.Moens, D. Radev, K. Sparck-Jones, B. Sundheim, S. Teufel, R. Weischedel, and M. White(2000). An Evaluation Road Map for Summarization Research. http://www-nlpir.nist.gov/projects/duc/papers/summarization.roadmap.doc.