summarization of multiple , metadata rich , product reviews
DESCRIPTION
Department of Informatics – Aristotle University of Thessaloniki LPIS Group: http://lpis.csd.auth.gr. Summarization of Multiple , Metadata Rich , Product Reviews. Fotis Kokkoras, Efstratia Lampridou , Konstantinos Ntonas, Ioannis Vlahavas. MS o D a '08 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/1.jpg)
Summarization of Multiple, Metadata
Rich,Product Reviews
Fotis Kokkoras, Efstratia Lampridou,
Konstantinos Ntonas, Ioannis Vlahavas
Department of Informatics – Aristotle University of ThessalonikiLPIS Group: http://lpis.csd.auth.gr
MSoDa '08
ECAI 2008 Workshop on Mining Social Data
![Page 2: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/2.jpg)
2
Introduction Modern, successful on-line shops allow
consumers to express their opinion on products and services they purchased. These reviews are valuable for new customers.
If there are dozens, or even hundreds, of reviews for a single product, their utilization is time-consuming.
The need for automatically generated summaries of these reviews is obvious.
![Page 3: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/3.jpg)
3
Summarization Background Types of summary:
Extractive: use sentences from the original text Abstractive: reuse sentence fragments
Text features usually used: frequency and location of words, sentence location in
article, syntactic rules, dictionaries of important words Various Techniques/Approaches
Machine Learning Techniques LSA (Latent Semantic Analysis) Lexical Chains Cluster-based
They perform well on article-style texts.
![Page 4: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/4.jpg)
4
The Special Nature of Reviews On-line product reviews in e-shops, are quite
different than article-style texts: They are usually short and do not obey to strict
syntactic rules. They convey only the subjective opinion of each
reviewer. there are a lot of reviewers!
They include a lot of repeated content. There are usually too many reviews.
![Page 5: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/5.jpg)
5
What is the problem? Traditional summarization techniques do
not work very well of such data. Why?
a frequently mentioned problem can be reported many times in the summary of summarizers that work on the sentence level
reuse of sentence fragments to construct new sentences is risky because reviews are short with weak/poor syntax
it is difficult to detect biased reviews based on their text only
![Page 6: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/6.jpg)
6
Motivation
On-line reviews are usually accompanied by various metadata, such as: buyer's technology level, ownership of the product, overall judgment for the product or service, in some scale, labeled (positive or negative) or unlabeled comments, usefulness of the review to other customers, etc.
How can these metadata help in summarization?
![Page 7: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/7.jpg)
7
Our Approach ReSum Algorithm (Review Summarizer)
Creates extractive summary Uses dictionary of important words and metadata Is applied separately for (+) and (-) comments
For each product two summaries are created
How it works Scores the sentences based on their words Adjusts the initial score based on the metadata Selects sentences avoiding repetition of concepts
Tested on newegg.com
![Page 8: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/8.jpg)
8
Requirements A dictionary D of important words for the
domain: automatically created from a few thousands
reviews of the domain in question concatenation of reviews removal of common (500) English words selection of the top 150 most frequent words
Access to the reviews (and their metadata): we use DEiXTo, an in-house
developed, web content extraction system
HTML/DOM based extraction rules
![Page 9: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/9.jpg)
9
Step 1: Concatenate all positive (or negative)
comments and divide them into separate sentences.
Remove stop words, punctuation, numbers, etc Count frequency fv of every word v.
Step 2: Score every sentence i based on its words and
the dictionary D:
ReSum – Initial Scoring
Dv
vDv
vi
j
j
j
jffR 2
![Page 10: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/10.jpg)
10
ReSum – Metadata Contribution Metadata used:
Reviewer’s Technology Level (w1) Ownership duration of the product (w2) Usefulness of a review to other users (w3)
Step 3: Initial score Ri is adjusted based on the
metadata, in a weighted fashion: weights are initialized using multicriteria techniques
(will be explained later)
3
1kkiii wRRS
![Page 11: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/11.jpg)
11
ReSum – Redundancy Elimination Step 4:
Select the sentence with the highest score S. Penalize the rest sentences that share common
words with the selected. This eliminates redundancy.
Dv
vDv
vii
j
j
j
jffSS 2'
The step is repeated until the desired number of sentences is reached.
![Page 12: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/12.jpg)
12
Weight Initialization (1/3) Subjective task
we need a consistent way for weight initialization
Analytic Hierarchy Process (AHP–Saaty ‘99) multicriteria method provides a methodology to calculate consistent
weights for selection criteria, according to the importance we assign to them
importance values are selected from a predefined scale (defined by AHP)
![Page 13: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/13.jpg)
13
Weight Initialization (2/3)
Tech level Ownership Usefulness
Tech level 1 1/2 3/2
Ownership 2 1 2/3
Usefulness 3 2 1
Value Interpretation
1 Criteria a and b are of the same importance.
2 Criterion a is very little more important than b.
3 Criterion a is a little more important than b.
5 Criterion a is enough more important than b.
etc (up to 9) etc
Subjective Importance Values we used
Fundamental Scale of AHP
![Page 14: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/14.jpg)
14
Weight Initialization (3/3) Calculated weights: w’1=0.14, w’2=0.24, w’3=0.62 Initial weights were further adjusted based on the
metadata values:
otherwise
echLevelww
0
'1
1
highT
otherwise
yearathanmore Ownership
0
'2
2
ww
24.124.1
),('3
δ2.0
'33
w
e1
1wgw
vv
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
-40 -30 -20 -10 0 10 20 30 40
![Page 15: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/15.jpg)
15
Experimental Results (1/2) Dataset:
1587 reviews from newegg.com 3 domains (monitors, printers, cpu coolers) 9 products (3 from each domain)
Reference Summary manually generated by 3 human experts
Comparison Systems Two commercial summarizers:
TextAnalyst (Megaputer Intelligence Inc) Copernic (Copernic Inc)
Naive ReSum contribution of metadata (step 3) was removed
![Page 16: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/16.jpg)
16
Experimental Results (2/2) Average Recall: 91.7 (78.8), 69.5, 54 Average Precision: 73.3 (62.8), 58.3, 53.3
Precision %
0
10
20
30
40
50
60
70
80
90
100
Monitor Α Monitor Β Monitor C Printer Α Printer B Printer C Cooler Α Cooler B Cooler C
ReSum
Naïve ReSum
Copernic
TextAnalyst
Recall %
0
10
20
30
40
50
60
70
80
90
100
MonitorΑ
MonitorΒ
MonitorC
Printer Α Printer B Printer C Cooler Α Cooler B Cooler C
ReSumNaïve ReSumCopernicTextAnalyst
![Page 17: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/17.jpg)
17
Interesting Facts in our Summaries Neither biased nor abusive comments
appeared it did happened in the other 3 systems
Comments with low frequency but with significant meaning were included was not the case for the other 3 systems
Repetition of concepts was minimal or absent thanks to the redundancy elimination step that’s why naive ReSum performed so well repetition in Copernic and TextAnalyst was
evident
![Page 18: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/18.jpg)
18
Conclusions Metadata can contribute to a better
summary. We proposed an algorithm for summarizing
on-line, metadata rich, product reviews. Is Statistical in it's nature. Assumes labeled comments (pros & cons). Works at the sentence level:
Ranks sentences based on some "importance” measure and selects the N most important of them.
Uses metadata to make "good" ranking.
![Page 19: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/19.jpg)
19
Future Work Generalize our methodology to adapt to the
availability or not of the various metadata. the scoring algorithm is modular – can easily
add or remove weights/metadata Remove the requirement for categorized
reviews (positive and negative)
![Page 20: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/20.jpg)
Summarization of Multiple, Metadata
Rich,Product Reviews
Fotis Kokkoras, Efstratia Lampridou,
Konstantinos Ntonas, Ioannis Vlahavas
Department of Informatics – Aristotle University of ThessalonikiLPIS Group: http://lpis.csd.auth.gr
MSoDa '08
ECAI 2008 Workshop on Mining Social Data
Thank you!
![Page 21: Summarization of Multiple , Metadata Rich , Product Reviews](https://reader035.vdocuments.site/reader035/viewer/2022070412/56814a78550346895db78e6d/html5/thumbnails/21.jpg)
21
Monitor A - ReSum PROS1. Great resolution, clear picture, very very good price, 24in monitors are gigantic, widescreen
aspect ratio makes dvds look awesome 2. Very, VERY bright, HDMI, no dead pixels, looks much nicer than online photos, unbeatable
viewing angle 3. Excellent color reproduction; fantastic image and text quality; very good brightness and contrast;
HDMI input; unbeatable value4. Several things stood out above all other monitors I'd considered: Almost non-existent issues of
dead/stuck pixels5. Resolution & sharpness is amazing In my opinion, sleek design Functional speakers (not the best)
Audio output is available Multiple inputs
CONS1. So when Windows power management turns off the monitor signal, instead of turning off the
monitor goes to bluescreen and says ""no signal"" on the HDMI input 2. no height or rotation adjustments; flimsy base; awkward location of OSD buttons; no DVI
connection (no DVI to HDMI cable included)3. Weak stand, awful menu controls, no audio out, no USB ports, low buzzing sound when
brightness turned down 4. This monitor is so darn tall it strains my neck a bit to view it - but that's simply a natural
consequence of its size5. Doesn't come with a DVI to HDMI cable that you will need to run this with a computer to get a
good picture (don't use the vga port)